Efficient Integration of MultiOrder Dynamics and Internal Dynamics in Stock Movement Prediction
Abstract.
Advances in deep neural network (DNN) architectures have enabled new prediction techniques for stock market data. Unlike other multivariate timeseries data, stock markets show two unique characteristics: (i) multiorder dynamics, as stock prices are affected by strong nonpairwise correlations (e.g., within the same industry); and (ii) internal dynamics, as each individual stock shows some particular behaviour. Recent DNNbased methods capture multiorder dynamics using hypergraphs, but rely on the Fourier basis in the convolution, which is both inefficient and ineffective. In addition, they largely ignore internal dynamics by adopting the same model for each stock, which implies a severe information loss.
In this paper, we propose a framework for stock movement prediction to overcome the above issues. Specifically, the framework includes temporal generative filters that implement a memorybased mechanism onto an LSTM network in an attempt to learn individual patterns per stock. Moreover, we employ hypergraph attentions to capture the nonpairwise correlations. Here, using the wavelet basis instead of the Fourier basis, enables us to simplify the message passing and focus on the localized convolution. Experiments with US market data over six years show that our framework outperforms stateoftheart methods in terms of profit and stability. Our source code and data are available at https://github.com/thanhtrunghuynh93/estimate.
1. Introduction
The stock market denotes a financial ecosystem where the stock shares that represents the ownership of businesses are held and traded among the investors, with a market capitalization of more than 93.7$ trillion globally at the end of 2020 (Wang et al., 2021). In recent years, approaches for automated trading emerged that are driven by artificial intelligence (AI) models. They continuously analyze the market behaviour and predict the shortterm trends in stock prices. While these methods struggle to understand the complex rationales behind such trends (e.g., macroeconomic factors, crowd behaviour, and companies’ intrinsic values), they have been shown to yield accurate predictions. Moreover, they track market changes in realtime, by observing massive volumes of trading data and indicators, and hence, enable quick responses to events, such as a market crash. Also, they are relatively robust against emotional effects (greed, fear) that tend to influence human traders (Nourbakhsh et al., 2020).
Stock market analysis has received much attention in the past. Early work relies on handcrafted features, a.k.a technical indicators, to model the stock movement. For example, RIMA (Piccolo, 1990), a popular timeseries statistics model, may be applied to moving averages of stock prices to derive price predictions (Ariyo et al., 2014). However, handcrafted features tend to lag behind the actual price movements. Therefore, recent approaches adopt deep learning to model the market based on historic data. Specifically, recurrent neural networks (RNN) (Chen et al., 2015) have been employed to learn temporal patterns from the historic data and, based thereon, efficiently derive shortterm price predictions using regression (Nelson et al., 2017) or classification (Zhang et al., 2017).
However, stock market analysis based on deep learning faces two important requirements. First, multiorder dynamics of stock movements need to be incorporated. Price movements are often correlated within a specific group of stocks, e.g., companies of the same industry sector that are affected by the same government policies, laws, and tax rates. For instance, as shown in Fig. 1, in early 2022, prices for US technology stocks (APPL (Apple), META (Facebook), GOOG (Google), NFLX (Netflix)) went down due to the general economic trend (inflation, increased interest rates), whereas stocks in the energy sector, like MPC, OKE, or OXY, experienced upward trends due to oil shortages caused by the RussiaUkraine war. Second, the internal dynamics per stock need to be incorporated. In practice, even when considering highly correlated stocks, there is commonly still some individual behaviour. For example, in Fig. 1, APPL and GOOG stocks decrease less severely than META and NFLX, as the former companies (Apple, Google) maintain a wider and more sustainable portfolio compared to the latter two (Facebook, Netflix) (New York Times, 2022).
Existing work provides only limited support for these requirements. First, to incorporate multiorder dynamics of stock markets, RNNs can be combined with graph neural networks (GNNs) (Kim et al., 2019). Here, stateoftheart solutions adopt hypergraphs, in which an edge captures the correlation of multiple stocks (Sawhney et al., 2020; Sawhney et al., 2021). Yet, these approaches rely on the Fourier basis in the convolution, which implies costly matrix operations and does not maintain the localization well. This raises the question of how to achieve an efficient and effective convolution process for hypergraphs (Challenge 1). Moreover, stateoftheart approaches apply a single RNN to all stocks, thereby ignoring their individual behaviour. The reason being that maintaining a separate model per stock would be intractable with existing techniques. This raises the question of how to model the internal dynamics of stocks efficiently (Challenge 2).
In this work, we address the above challenges by proposing Efficient Stock Integration with Temporal Generative Filters and Wavelet Hypergraph Attentions (ESTIMATE), a profitdriven framework for quantitative trading. Based on the aforementioned idea of adopting hypergraphs to capture nonpairwise correlations between stocks, the framework includes two main contributions:

•
We present temporal generative filters that implement a hybrid attentionbased LSTM architecture to capture the stocks’ individual behavioural patterns (Challenge 2). These patterns are then fed to hypergraph convolution layers to obtain spatiotemporal embeddings that are optimized with respect to the potential of the stocks for shortterm profit.

•
We propose a mechanism that combines the temporal patterns of stocks with spatial convolutions through hypergraph attention, thereby integrating the internal dynamics and the multiorder dynamics. Our convolution process uses the wavelet basis, which is efficient and also effective in terms of maintaining the localization (Challenge 1).
To evaluate our approach, we report on backtesting experiments for the US market. Here, we try to simulate the real trading actions with a strategy for portfolio management and risk control. The results demonstrate the robustness of our technique compared to existing approaches in terms of stability and return. Our source code and data are available (Github, 2022).
The remainder of the paper is organised as follows. § 2 introduces the problem statement and gives an overview of our approach. We present our new techniques, the temporal generative filters and wavelet hypergraph attentions, in § 3 and § 4. § 5 presents experiments, § 6 reviews related works, and § 7 concludes the paper.
2. Model and approach
2.1. Problem Formulation
In this section, we formulate the problem of predicting the trend of a stock in the short term. We start with some basic notions.
OHCLV data. At timestep $t$, the openhighlowclosevolume (OHLCV) record for a stock $s$ is a vector ${x}_{s}^{t}=[{o}_{s}^{t},{h}_{s}^{t},{l}_{s}^{t},{c}_{s}^{t},{v}_{s}^{t}]$. It denotes the open, high, low, and close price, and the volume of shares that have been traded within that timestep, respectively.
Relative price change. We denote the relative close price change between two timesteps $$ of stock $s$ by ${d}_{s}^{({t}_{1},{t}_{2})}=({c}_{s}^{{t}_{2}}{c}_{s}^{{t}_{1}})/{c}_{s}^{{t}_{1}}$. The relative price change normalizes the market price variety between different stocks in comparison to the absolute price change.
Following existing work on stock market analysis (Sawhney et al., 2020; Feng et al., 2019a), we focus on the prediction of the change in price rather than the absolute value. The reason being that the timeseries of stock prices are nonstationary, whereas their changes are stationary (Li et al., 2020). Also, this avoids the problem that forecasts often lag behind the actual value (Kim et al., 2019; Hu et al., 2018). We thus define the addressed problem as follows:
Problem 1 (Stock Movement Prediction).
Given a set $S$ of stocks and a lookback window of $k$ trading days of historic OHLCV records ${x}_{s}^{(tk1)\mathrm{\dots}t}$ for each stock $s\in S$, the problem of Stock Movement Prediction is to predict the relative price change ${d}_{s}^{(t,t+w)}$ for each stock in a shortterm lookahead window $w$.
We formulate the problem as a shortterm regression for several reasons. First, we consider a lookahead window over nextday prediction to be robust against random market fluctuations (Zhang et al., 2017). Second, we opt for shortterm prediction, as an estimation of the longterm trend is commonly considered infeasible without the integration of expert knowledge on the intrinsic value of companies and on macroeconomic effects. Third, we focus on a regression problem instead of a classification problem to incorporate the magnitude of a stock’s trend, which is important for interpretation (Gu et al., 2020).
2.2. Design Principles
We argue that any solution to the above problem shall satisfy the following requirements:

•
R1: Multidimensional data integration: Stock market data is multivariate, covering multiple stocks and multiple features per stock. A solution shall integrate these data dimensions and support the construction of additional indicators from basic OHCLV data.

•
R2: Nonstationary awareness: The stock market is driven by various factors, such as socioeconomic effects or supplydemand changes. Therefore, a solution shall be robust against nonpredictable behaviour of the market.

•
R3: Analysis of multiorder dynamics: The relations between stocks are complex (e.g., companies may both, cooperate and compete) and may evolve over time. A solution thus needs to analyse the multiorder dynamics in a market.

•
R4: Analysis of internal dynamics: Each stock also shows some individual behaviour, beyond the multiorder correlations induced by market segments. A solution therefore needs to analyse and integrate such behaviour for each stock.
2.3. Approach Overview
To address the problem of stock movement prediction in the light of the above design principles, we propose the framework shown in Fig. 2. It takes historic data in the form of OHCLV records and derives a model for shortterm prediction of price changes per stock.
Our framework incorporates requirement R1 by first extracting the historic patterns per stock using a temporal attention LSTM. Here, the attention mechanism is used along with a 1DCNN to assess the impact of the previous timesteps. In addition to the OHCLV data, we employ technical indicators to mitigate the impact of noisy market behaviour, thereby addressing requirement R2. Moreover, we go beyond the state of the art by associating the core LSTM parameters with a learnable vector for each stock. It serves as a memory that stores its individual information (requirement R4) and results in a system of temporal generative filters. We explain the details of these filters in § 3.
To handle multiorder dynamics (requirement R3), we model the market with an industrybased hypergraph, which naturally presents nonpairwise relationships. We then develop a wavelet convolution mechanism, which leverages the wavelet basis to achieve a simpler convolution process than existing approaches. We apply a regression loss to steer the model to predict the shortterm trend of each stock price. The details of our proposed hypergraph convolution process are given in § 4.
3. Temporal Generative Filters
This section describes our temporal generative filters used to capture the internal dynamics of stocks.
Technical indicators. We first compute various technical indicators from the input data in order to enrich the data and capture the historical context of each stock. These indicators, summarized in Table 1, are widely used in finance. For each stock, we concatenate these indicators to form a stock price feature vector ${x}_{t}$ on day $t$. This vector is then forwarded through a multilayer perceptron (MLP) layer to modulate the input size.
Type  Indicators 

Trend Indicators  Arithmetic ratio, Close Ratio, Close SMA, 
Volume SMA, Close EMA, Volume EMA, ADX  
Oscillator Indicators  RSI, MACD, Stochastics, MFI 
Volatility Indicators  ATR, Bollinger Band, OBV 
Local trends. To capture local trends in stock patterns, we employ convolutional neural networks (CNN). By compressing the length of the series of stock features, they help to mitigate the issue of longterm dependencies. As each feature is a onedimensional timeseries, we apply onedimensional filters (1DCNN) over all timesteps:
(1)  $${x}_{l}^{k}={b}_{k}^{l}+conv1D({w}_{ik}^{l1},{s}_{i}^{l1})$$ 
where ${x}_{l}^{k}$ represent the input feature at the ${k}^{th}$ neuron of layer $l$; ${b}_{k}^{l}$ is the corresponding bias; ${w}_{ik}^{l1}$ is the kernel from the ${i}^{th}$ neuron at layer $l1$ to the ${k}^{th}$ neuron at layer $l$; and ${s}_{i}^{l1}$ is the output of the ${i}^{th}$ neuron at layer $l1$.
Temporal LSTM extractor with Distinct Generative Filter. After forwarding the features through the CNNs, we use an LSTM to capture the temporal dependencies, exploiting its ability to memorize longterm information. Given the concatenated feature ${q}_{t}$ of the stocks at time $t$, we feed the feature through the LSTM layer:
(2)  $${h}_{k}=LSTM({x}_{k},{h}_{k1}),tT\le k\le t1$$ 
where ${h}_{k}\in {\mathbb{R}}^{d}$ is the hidden state for day $l$ and d is the hidden state dimension. The specific computation in each LSTM unit includes:
${i}_{t}=\sigma ({W}_{xi}{x}_{t}+{U}_{hi}{h}_{t1}+{b}_{i}),$  ${f}_{t}=\sigma ({W}_{xf}{x}_{t}+{U}_{hf}{h}_{t1}+{b}_{f}),$  
${g}_{t}=\text{tanh}({W}_{xg}{x}_{t}+{U}_{hg}{h}_{t1}+{b}_{g}),$  ${o}_{t}=\sigma ({W}_{xo}{x}_{t}+{U}_{ho}{h}_{t1}+{b}_{o}),$  
${c}_{t}={f}_{t}\odot {c}_{t1}+{i}_{t}\odot {g}_{t},$  ${h}_{t}={o}_{t}\odot \text{tanh}({c}_{t}).$ 
As mentioned, existing approaches apply the same LSTM to the historical data of different stocks, which results in the learned set of filters ($\mathbb{W}=\{{W}_{xi},{W}_{xf},{W}_{xg},{W}_{xo}\},\mathbb{U}=\{{U}_{xi},{U}_{xf},{U}_{xg},{U}_{xo}\}$) representing the average temporal dynamics. This is insufficient to capture each stock’s distinct behaviour (Challenge 2).
A straightforward solution would be to learn and store a set of LSTM filters, one for each stock. Yet, such an approach quickly becomes intractable, especially when the number of stocks is large.
In our model, we overcome this issue by proposing a memorybased mechanism onto the LSTM network to learn the individual patterns per stock, while not expanding the core LSTM. Specifically, we first assign to each stock $i$ a memory ${M}^{i}$ in the form of a learnable mdimensional vector, ${M}^{i}\in {\mathbb{R}}^{m}$. Then, for each entity, we feed the memory through a Distinct Generative Filter, denoted by DGF, to obtain the weights (${\mathbb{W}}^{i},{\mathbb{U}}^{i}$) of the LSTM network for each stock:
(3)  $${\mathbb{W}}^{i},{\mathbb{U}}^{i}=DGF({M}^{i})$$ 
Note that DGF can be any neural network architecture, such as a CNN or an MLP. In our work, we choose a 2layer MLP as DGF, as it is simple yet effective. As the DGF is required to generate a set of eight filters $\{{W}_{xi}^{i},{W}_{xf}^{i},{W}_{xg}^{i},{W}_{xo}^{i},{U}_{xi}^{i},{U}_{xf}^{i},{U}_{xg}^{i},{U}_{xo}^{i}\}$ from ${M}^{i}$, we generate a concatenation of the filters and then obtain the results by splitting. Finally, replacing the common filters by the specific ones for each stock in Eq. 2, we have:
(4)  $${h}_{k}^{i}=LSTM({x}_{k}^{i},{h}_{k1}^{i}\mid {\mathbb{W}}^{i},{\mathbb{U}}^{i}),tT\le k\le t1$$ 
where ${h}_{k}^{i}$ is the hidden feature of each stock $i$.
To increase the efficiency of the LSTM, we apply a temporal attention mechanism to guide the learning process towards important historical features. The attention mechanism attempts to aggregate temporal hidden states ${\widehat{h}}_{k}^{i}=[{h}_{tT}^{i},\mathrm{\dots},{h}_{t1}^{i}]$ from previous days into an overall representation using learned attention weights:
(5)  $$\mu ({\widehat{h}}_{t}^{i})=\sum _{k=(tT)}^{t1}{\alpha}_{k}{h}_{k}^{i}=\sum _{k=(tT)}^{t1}\frac{\mathrm{exp}({h}_{k}^{{i}^{T}})W{\widehat{h}}_{t}^{i}}{{\sum}_{k}\mathrm{exp}({h}_{k}^{{i}^{T}})W{\widehat{h}}_{t}^{i}}{h}_{k}^{i}$$ 
where $W$ is a linear transformation, ${\alpha}_{k}=\frac{\mathrm{exp}({h}_{k}^{{i}^{T}})W{\widehat{h}}_{t}^{i}}{{\sum}_{k}\mathrm{exp}({h}_{k}^{{i}^{T}})W{\widehat{h}}_{t}^{i}}{h}_{k}^{i}$ are the attention weights using softmax. To handle the nonstationary nature of the stock market, we leverage the Hawkes process (Bacry et al., 2015), as suggested for financial timeseries in (Sawhney et al., 2021), to enhance the temporal attention mechanism in Eq. 5. The Hawkes process is a “selfexciting” temporal point process, where some random event “excites” the process and increases the chance of a subsequent other random event (e.g., a crises or policy change). To realize the Hawke process, the attention mechanism also learns an excitation parameter ${\u03f5}_{k}$ of the day $k$ and a corresponding decay parameter $\gamma $:
(6)  $$\widehat{\mu}({\widehat{h}}_{t}^{i})=\sum _{k=(tT)}^{t1}{\alpha}_{k}{h}_{k}^{i}+{\u03f5}_{k}\mathrm{max}({\alpha}_{k}{h}_{k}^{i},0)\mathrm{exp}(\gamma {\alpha}_{k}{h}_{k}^{i})$$ 
Finally, we concatenate the extracted temporal feature ${z}_{i}=\widehat{\mu}({\widehat{h}}_{t}^{i})$ of each stock to form ${\mathbf{Z}}_{\mathbf{T}}\in {\mathbb{R}}^{n\times d}$, where $n$ is the number of stocks and $d$ is the embedding dimension.
4. Highorder market learning with wavelet hypergraph attentions
To model the groupwise relations between stocks, we aggregate the learned temporal patterns of each stock over a hypergraph that represents multiorder relations of the market.
Industry hypergraph. To model the interdependence between stocks, we first initialize a hypergraph based on the industry of the respective companies. Mathematically, the industry hypergraph is denoted as ${\mathbb{G}}_{i}=(S,{E}_{i},{w}_{i})$, where $S$ is the set of stocks and ${E}_{i}$ is the set of hyperedges; each hyperedge ${e}_{i}\in {E}_{i}$ connects the stocks that belong to the same industry. The hyperedge ${e}_{i}$ is also assigned a weight ${w}_{i}$ that reflects the importance of the industry, which we derive from the market capital of all related stocks.
Price correlation augmentation. Following the Efficient Market Hypothesis (Malkiel, 1989), fundamentally correlated stocks maintain similar price patterns, which can be used to reveal the missing endogenous relations in addition to the industry assignment. To this end, for the start of each training and testing period, we calculate the price correlation between the stocks using the historical price of the last 1year period. We employ the leadlag correlation and the clustering method proposed in (Bennett et al., 2021) to simulate the lag of the stock market, where a leading stock affects the trend of the rests. Then, we form hyperedges from the resulting clusters and add them to ${E}_{i}$. The hyperedge weight is, again, derived from the total market capital of the related stocks. We denote the augmented hypergraph by $\mathbb{G}=(\text{A},\text{W})$, with A and W being the hypergraph incidence matrix and the hyperedge weights, respectively.
Wavelet Hypergraph Convolution. To aggregate the extracted temporal information of the individual stocks, we develop a hypergraph convolution mechanism on the obtained hypergraph $\mathbb{G}$, which consists of multiple convolution layers. At each layer $l$, the latent representations of the stocks in the previous layer ${X}^{(l1)}$ are aggregated by a convolution operator HConv(·) using the topology of $\mathbb{G}=(\text{A},\text{W})$ to generate the current layer representations ${X}^{l}$:
(7)  $${\mathbf{Z}}^{(\mathbf{l})}=\mathrm{HConv}({\mathbf{Z}}^{(\mathbf{l}\mathrm{\U0001d7cf})},\mathbf{A},\mathbf{W},\mathbf{P})$$ 
where ${\mathbf{X}}^{\mathbf{l}}\in {\mathbb{R}}^{n\times {d}^{l}}$ and ${\mathbf{X}}^{\mathbf{l}\mathrm{\U0001d7cf}}\in {\mathbb{R}}^{n\times {d}^{l1}}$ with $n$ being the number of stocks and ${d}^{l1},{d}^{l}$ as the dimension of the layerwise latent feature; $\mathbf{P}$ is a learnable weight matrix for the layer. Following (Yadati et al., 2019), the convolution process requires the calculation of the hypergraph Laplacian $\mathbf{\Delta}$, which serves as a normalized presentation of $\mathbb{G}$:
(8)  $$\mathbf{\Delta}=\mathbf{I}\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{\frac{1}{2}}\mathrm{\mathbf{A}\mathbf{W}\mathbf{D}}_{\mathbf{e}}^{}{}_{}{}^{1}{\mathbf{A}}^{T}\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{1/2}$$ 
where ${\mathbf{D}}_{\mathbf{v}}$ and ${\mathbf{D}}_{\mathbf{s}}$ are the diagonal matrices containing the vertex and hyperedge degrees, respectively. For later usage, we denote $\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{\frac{1}{2}}\mathrm{\mathbf{A}\mathbf{W}\mathbf{D}}_{\mathbf{e}}^{}{}_{}{}^{1}{\mathbf{A}}^{T}\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{1/2}$ by $\mathbf{\Theta}$. As $\mathbf{\Delta}$ is a ${\mathbb{R}}^{n\times n}$ positive semidefinite matrix, it can be diagonalized as: $\mathbf{\Delta}=\mathbf{U}\mathbf{\Lambda}{\mathbf{U}}^{\mathbf{T}}$, where $\mathbf{\Lambda}=\mathrm{\mathbf{d}\mathbf{i}\mathbf{a}\mathbf{g}}({\lambda}_{0},\mathrm{\dots},{\lambda}_{k})$ is the diagonal matrix of nonnegative eigenvalues and $\mathbf{U}$ is the set of orthonormal eigenvectors.
Existing work leverages the Fourier basis (Yadati et al., 2019) for this factorization process. However, using the Fourier basis has two disadvantages: (i) the localization during convolution process is not wellmaintained (Xu et al., 2019), and (ii) it requires the direct eigendecomposition of the Laplacian matrix, which is costly for a complex hypergraph, such as those faced when modelling stock markets (Challenge 1). We thus opt to rely on the wavelet basis (Xu et al., 2019), for two reasons: (i) the wavelet basis represents the information diffusion process (Sun et al., 2021), which naturally implements localized convolutions of the vertex at each layer, and (ii) the wavelet basis is much sparser than the Fourier basis, which enables more efficient computation.
Applying the wavelet basis, let ${\mathbf{\Psi}}_{\mathbf{s}}={\mathbf{U}}_{\mathbf{s}}{\mathbf{\Lambda}}_{\mathbf{s}}{\mathbf{U}}_{\mathbf{s}}^{\mathbf{T}}$ be a set of wavelets with scaling parameter $s$. Then, we have ${\mathbf{\Lambda}}_{\mathbf{s}}=\mathrm{\mathbf{d}\mathbf{i}\mathbf{a}\mathbf{g}}({e}^{{\lambda}_{0}s},\mathrm{\dots},{e}^{{\lambda}_{k}s})$ as the heat kernel matrix. The hypergraph convolution process for each vertex $t$ is computed by:
(9)  $$HConv({x}_{t},y)=({\mathbf{\Psi}}_{\mathbf{s}}{({\mathbf{\Psi}}_{\mathbf{s}})}^{1})\odot {({\mathbf{\Psi}}_{\mathbf{s}})}^{1}y)=\mathbf{\Psi}{}_{\mathbf{s}}\mathbf{\Lambda}{}_{\mathbf{s}}({\mathbf{\Psi}}_{\mathbf{s}}){}^{1}x{}_{t}$$ 
where $y$ is the filter and ${({\mathbf{\Psi}}_{\mathbf{s}})}^{1}y$ is its corresponding spectral transformation. Based on the StoneWeierstrass theorem (Xu et al., 2019), the graph wavelet $({\mathbf{\Psi}}_{\mathbf{s}})$ can be polynomially approximated by:
(10)  $${\mathbf{\Psi}}_{\mathbf{s}}\approx \sum _{k=0}^{K}{\alpha}_{k}{(\mathbf{\Delta})}^{k}=\sum _{k=0}^{K}{\theta}_{k}{(\mathbf{\Theta})}^{k}$$ 
where $K$ is the polynomial order of the approximation.
The approximation facilitates the calculation of ${\mathbf{\Psi}}_{\mathbf{s}}$ without the eigendecomposition of $\mathbf{\Delta}$. Applying it to Eq. 10 and Eq. 7 and choosing LeakyReLU (Agarap, 2018) as the activation function, we have:
(11)  $${\mathbf{Z}}_{\mathbf{H}}^{(\mathbf{l})}=LReLU\sum _{k=0}^{K}({(\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{\frac{1}{2}}\mathrm{\mathbf{A}\mathbf{W}\mathbf{D}}_{\mathbf{e}}^{}{}_{}{}^{1}{\mathbf{A}}^{T}\mathbf{D}_{\mathbf{v}}^{}{}_{}{}^{1/2})}^{k}{\mathbf{Z}}_{\mathbf{H}}^{(\mathbf{l}\mathrm{\U0001d7cf})}\mathbf{P})$$ 
To capture the varying degree of influence each relation between stocks on the temporal price evolution of each stock, we also employ an attention mechanism (Sawhney et al., 2021). This mechanism learns to adaptively weight each hyperedge associated with a stock based on its temporal features. For each node ${v}_{i}\in S$ and its associated hyperedge ${e}_{j}\in E$, we compute an attention coefficient ${\widehat{\mathbf{A}}}_{ij}$ using the stock’s temporal feature ${x}_{i}$ and the aggregated hyperedge features ${x}_{j}$, quantifying how important the corresponding relation ${e}_{j}$ is to the stock ${v}_{i}$:
(12)  $${\widehat{\mathbf{A}}}_{ij}=\frac{\mathrm{exp}(LReLU(\widehat{a}[{\mathbf{P}}_{{x}_{i}}\parallel {\mathbf{P}}_{{x}_{j}}]))}{{\sum}_{k\in {N}_{i}}\mathrm{exp}(LReLU(\widehat{a}[{\mathbf{P}}_{{x}_{i}}\parallel {\mathbf{P}}_{{x}_{j}}]))}$$ 
where $\widehat{a}$ is a singlelayer feed forward network, $\parallel $ is concatenation operator and $\mathbf{P}$ represents a learned linear transform. ${N}_{i}$ is the neighbourhood set of the stock ${x}_{i}$, which is derived from the constructed hypergraph $\mathbb{G}$. The attentionbased learned hypergraph incidence matrix $\widehat{\mathbf{A}}$ is then used instead of the original $\mathbf{A}$ in Eq. 11 to learn intermediate representations of the stocks. The representation of the hypergraph is denoted by ${\mathbf{Z}}_{\mathbf{H}}$, which is concatenated with the temporal feature ${\mathbf{Z}}_{\mathbf{T}}$ to maintain the stock individual characteristic (Challenge 2), which then goes through the MLP for dimension reduction to obtain the final prediction:
(13)  $$\mathbf{Z}=MLP({\mathbf{Z}}_{\mathbf{T}}\parallel {\mathbf{Z}}_{\mathbf{H}})$$ 
Finally, we use the popular root mean squared error (RMSE) to directly encourage the output $\mathbf{X}$ to capture the actual relative price change in the short term ${d}_{s}^{(t,t+w)}$ of each stock $s$, with $w$ being the lookahead window size (with a default value of five).
5. Empirical Evaluation
Model  Phase #1  Phase #2  Phase #3  Phase #4  Phase #5  Phase #6  Phase #7  Phase #8  Phase #9  Phase #10  Phase #11  Phase #12  Mean  

Return  LSTM  0.064  0.057  0.028  0.058  0.036  0.032  0.059  0.139  0.125  0.100  0.062  0.008  0.036 
ALSTM  0.056  0.043  0.022  0.053  0.009  0.068  0.036  0.121  0.115  0.097  0.066  0.009  0.026  
HATS  0.102  0.031  0.003  0.042  0.062  0.067  0.092  0.074  0.188  0.132  0.063  0.059  0.054  
LSTMRGCN  0.089  0.051  0.005  0.006  0.077  0.019  0.088  0.121  0.155  0.107  0.038  0.032  0.039  
RSR  0.065  0.043  0.009  0.016  0.014  0.007  0.059  0.113  0.089  0.056  0.038  0.052  0.014  
STHANSR  0.108  0.074  0.024  0.016  0.052  0.085  0.090  0.105  0.158  0.107  0.058  0.008  0.055  
HIST  0.080  0.020  0.020  0.030  0.050  0.010  0.030  0.020  0.200  0.100  0.020  0.050  0.022  
ESTIMATE  0.109  0.080  0.025  0.105  0.051  0.135  0.149  0.124  0.173  0.065  0.147  0.057  0.102  
IC  LSTM  0.014  0.030  0.016  0.006  0.020  0.034  0.006  0.014  0.002  0.039  0.022  0.023  0.009 
ALSTM  0.024  0.025  0.025  0.009  0.029  0.018  0.033  0.024  0.045  0.046  0.016  0.015  0.007  
HATS  0.013  0.011  0.006  0.005  0.018  0.029  0.027  0.002  0.010  0.017  0.028  0.012  0.002  
LSTMRGCN  0.019  0.020  0.024  0.021  0.005  0.021  0.032  0.035  0.086  0.043  0.005  0.030  0.009  
RSR  0.008  0.009  0.003  0.017  0.009  0.018  0.011  0.005  0.036  0.018  0.058  0.003  0.007  
STHANSR  0.025  0.015  0.016  0.029  0.000  0.018  0.022  0.000  0.010  0.009  0.007  0.013  0.000  
HIST  0.003  0.000  0.005  0.010  0.006  0.008  0.005  0.017  0.006  0.009  0.011  0.006  0.003  
ESTIMATE  0.037  0.080  0.153  0.010  0.076  0.080  0.080  0.011  0.127  0.166  0.010  0.131  0.080  
Rank_IC  LSTM  0.151  0.356  0.289  0.089  0.186  1.091  0.151  0.201  0.019  0.496  0.259  0.397  0.185 
ALSTM  0.211  0.266  0.409  0.099  0.182  0.289  0.476  0.243  0.242  0.323  0.094  0.174  0.096  
HATS  0.169  0.156  0.139  0.063  0.408  0.517  0.333  0.032  0.085  0.344  0.547  0.135  0.060  
LSTMRGCN  0.271  0.210  0.223  0.152  0.035  0.279  0.261  0.273  0.416  0.329  0.036  0.354  0.110  
RSR  0.151  0.159  0.051  0.213  0.107  0.282  0.135  0.090  0.292  0.175  0.541  0.040  0.056  
STHANSR  0.690  0.357  0.365  0.714  0.008  0.369  0.523  0.005  0.265  0.169  0.141  0.276  0.006  
HIST  0.085  0.008  0.125  0.225  0.192  0.204  0.107  0.328  0.174  0.256  0.215  0.157  0.080  
ESTIMATE  0.386  0.507  1.613  0.059  0.284  0.585  0.412  0.062  0.704  0.936  0.054  0.595  0.516  
ICIR  LSTM  0.010  0.036  0.007  0.010  0.016  0.038  0.004  0.010  0.011  0.041  0.021  0.010  0.006 
ALSTM  0.057  0.041  0.030  0.012  0.033  0.015  0.028  0.034  0.057  0.053  0.009  0.002  0.009  
HATS  0.023  0.014  0.010  0.001  0.016  0.033  0.035  0.020  0.023  0.005  0.042  0.034  0.003  
LSTMRGCN  0.033  0.012  0.022  0.027  0.009  0.028  0.047  0.051  0.085  0.054  0.006  0.039  0.012  
RSR  0.031  0.018  0.005  0.033  0.009  0.029  0.001  0.007  0 .019  0.017  0.072  0.031  0.010  
STHANSR  0.018  0.011  0.016  0.021  0.005  0.016  0.023  0.008  0.003  0.004  0.009  0.007  0.002  
HIST  0.004  0.002  0.006  0.001  0.007  0.001  0.005  0.014  0.009  0.009  0.021  0.010  0.004  
ESTIMATE  0.033  0.081  0.148  0.032  0.076  0.103  0.064  0.058  0.103  0.142  0.020  0.098  0.080  
Rank_ICIR  LSTM  0.100  0.364  0.110  0.140  0.141  0.984  0.094  0.168  0.117  0.525  0.227  0.172  0.114 
ALSTM  0.423  0.344  0.415  0.134  0.202  0.192  0.313  0.343  0.306  0.377  0.047  0.019  0.098  
HATS  0.234  0.155  0.179  0.013  0.221  0.511  0.340  0.270  0.194  0.076  0.525  0.372  0.033  
LSTMRGCN  0.387  0.111  0.197  0.170  0.058  0.313  0.284  0.326  0.354  0.318  0.034  0.427  0.109  
RSR  0.378  0.353  0.053  0.285  0.065  0.326  0.010  0.081  0.114  0.131  0.428  0.288  0.068  
STHANSR  0.435  0.211  0.310  0.573  0.107  0.386  0.478  0.271  0.068  0.075  0.230  0.149  0.056  
HIST  0.079  0.044  0.144  0.020  0.205  0.018  0.122  0.289  0.236  0.229  0.356  0.209  0.080  
ESTIMATE  0.315  0.446  1.344  0.178  0.307  0.587  0.329  0.311  0.541  0.885  0.100  0.488  0.486  
Prec@N  LSTM  0.542  0.553  0.581  0.471  0.456  0.569  0.440  0.547  0.588  0.615  0.554  0.608  0.544 
ALSTM  0.583  0.585  0.550  0.471  0.514  0.575  0.431  0.556  0.650  0.518  0.497  0.627  0.546  
HATS  0.624  0.651  0.597  0.495  0.551  0.642  0.532  0.619  0.542  0.550  0.529  0.690  0.585  
LSTMRGCN  0.565  0.589  0.600  0.505  0.505  0.628  0.522  0.583  0.517  0.538  0.566  0.592  0.559  
RSR  0.587  0.531  0.608  0.455  0.465  0.619  0.473  0.608  0.553  0.618  0.590  0.676  0.565  
STHANSR  0.554  0.614  0.573  0.463  0.562  0.611  0.530  0.575  0.553  0.626  0.534  0.510  0.559  
HIST  0.691  0.625  0.476  0.512  0.452  0.561  0.548  0.634  0.463  0.615  0.395  0.605  0.548  
ESTIMATE  0.619  0.631  0.673  0.524  0.540  0.739  0.568  0.669  0.611  0.679  0.547  0.724  0.627 
In this section, we empirically evaluate our framework based on four research questions, as follows:

(RQ1)
Does our model outperform the baseline methods?

(RQ2)
What is the influence of each model component?

(RQ3)
Can our model be interpreted in a qualitative sense?

(RQ4)
Is our model sensitive to hyperparameters?
Below, we first describe the experimental setting (§ 5.1). We then present our empirical evaluations, including an endtoend comparison (§ 5.2), a qualitative study (§ 5.4), an ablation test (§ 5.3), and an examination of the hyperparameter sensitivity (§ 5.5).
5.1. Setting
Datasets. We evaluate our approach based on the US stock market. We gathered historic price data and the information about industries in the S&P 500 index from the Yahoo Finance database (Yahoo Finance, 2022), covering 2016/01/01 to 2022/05/01 (1593 trading days). Overall, while the market witnessed an upward trend in this period, it also experienced some considerable correction in 2018, 2020, and 2022. We split the data of this period into 12 phases with varying degrees of volatility, with the period between two consecutive phases being 163 days. Each phase contains 10 month of training data, 2 month of validation data, and 6 month of testing data (see Fig. 3).
Metrics. We adopt the following evaluation metrics: To thoroughly evaluate the performance of the techniques, we employ the following metrics:

•
Return: is the estimated profit/loss ratio that the portfolio achieves after a specific period, calculated by $N{V}_{e}/N{V}_{s}1$, with $N{V}_{s}$ and $N{V}_{e}$ being the net asset value of the portfolio before and after the period.

•
Information Coefficient (IC): is a coefficient that shows how close the prediction is to the actual result, computed by the average Pearson correlation coefficient.

•
Information ratio based IC (ICIR): The information ratio of the IC metric, calculated by $ICIR=mean(IC)/std(IC)$

•
Rank Information Coefficient (Rank_IC): is the coefficient based on the ranking of the stocks’ shortterm profit potential, computed by the average Spearman coefficient (Myers and Sirois, 2004).

•
Rank_ICIR: Information ratio based Rank_IC (ICIR): The information ratio of the Rank_IC metric, calculated by:
$rank\mathrm{\_}ICIR=mean(rank\mathrm{\_}IC)/std(rank\mathrm{\_}IC)$ 
•
Prec@N: evaluates the precision of the top N shortterm profit predictions from the model. This way, we assess the capability of the techniques to support investment decisions.
Baselines. We compared the performance of our technique with that of several stateoftheart baselines, as follows:

•
LSTM: (Hochreiter and Schmidhuber, 1997) is a traditional baseline which leverages a vanilla LSTM on temporal price data.

•
ALSTM: (Feng et al., 2019a) is a stock movement prediction framework that integrates the adversarial training and stochasticity simulation in an LSTM to better learn the market dynamics.

•
HATS: (Kim et al., 2019) is a stock prediction framework that models the market as a classic heterogeneous graph and propose a hierarchical graph attention network to learn a stock representation to classify nextday movements.

•
LSTMRGCN: (Li et al., 2020) is a graphbased prediction framework that constructs the connection among stocks with their price correlation matrix and learns the spatiotemporal relations using a GCNbased encoderdecoder architecture.

•
RSR: (Feng et al., 2019b) is a stock prediction framework that combines Temporal Graph Convolution with LSTM to learn the stocks’ relations in a timesensitive manner.

•
HIST: (Xu et al., 2021) is a graphbased stock trend forecasting framework that follows the encoderdecoder paradigm in attempt to capture the shared information between stocks from both predefined concepts as well as revealing hidden concepts.

•
STHANSR: (Sawhney et al., 2021) is a deep learningbased framework that also models the complex relation of the stock market as a hypergraph and employs vanilla hypergraph convolution to learn directly the stock shortterm profit ranking.
Trading simulation. We simulate a trading portfolio using the output prediction of the techniques. At each timestep, the portfolio allocates an equal portion of money for k stocks, as determined by the prediction. We simulate the risk control by applying a trailing stop level of 7% and profit taking level of 20% for all positions. We ran the simulation 1000 times per phase and report average results.
Reproducibility environment. All experiments were conducted on an AMD Ryzen ThreadRipper 3.8 GHz system with 128 GB of main memory and four RTX 3080 graphic cards. We used Pytorch for the implementation and Adam as gradient optimizer.
5.2. Endtoend comparisons
To answer research question RQ1, we report in Table 2 an endtoend comparison of our approach (ESTIMATE) against the baseline methods. We also visualize the average accumulated return of the baselines and the S&P 500 index during all 12 phases in Fig. 4.
In general, our model outperforms all baseline methods across all datasets in terms of Return, IC, Rank_IC and Prec@10. Our technique consistently achieves a positive return and an average Prec@10 of 0.627 over all 12 phases; and performs significant better than the S&P 500 index with higher overall return. STHANSR is the best method among the baselines, yielding a high Return in some phases (#1, #2, #5, #10). This is because STHANSR, similar to our approach, models the multiorder relations between the stocks using a hypergraph. However, our technique still outperforms STHANSR by a considerable margin for the other periods, including the ranking metric Rank_IC even though our technique does not aim to learn directly the stock rank, like STHANSR.
Among the other baseline, models with a relational basis like RSR, HATS, and LSTMRGCN outperform vanilla LSTM and ALSTM models. However, the gap between classic graphbased techniques like HATS and LSTMRGCN is small. This indicates that the complex relations in a stock market shall be modelled. An interesting finding is that all the performance of the techniques drops significantly during phase #3 and phase #4, even though the market moves sideways. This observation highlights issues of prediction algorithm when there is no clear trend for the market.
The training times for these techniques are shown in Fig. 5, where we consider the size of training set ranging from 40 to 200 days. As expected, the graphbased techniques (HATS, LSTMRGCN, RSR, STHANR, HIST and ESTIMATE) are slower than the rest, due to the tradeoff between accuracy and computation time. Among the graphbased techniques, there is no significant difference between the techniques using classic graphs (HATS, LSTMRGCN) and those using hypergraphs (ESTIMATE, STHANR). Our technique ESTIMATE is faster than STHANR by a considerable margin and is one of the fastest among the graphbased baselines, which highlights the efficiency of our wavelet convolution scheme compared to the traditional Fourier basis.
5.3. Ablation Study
To answer question RQ2, we evaluated the importance of individual components of our model by creating four variants: (EST1) This variant does not employ the hypergraph convolution, but directly uses the extracted temporal features to predict the shortterm trend of stock prices. (EST2) This variant does not employ the generative filters, but relies on a common attention LSTM by existing approaches. (EST3) This variant does not apply the pricecorrelation based augmentation, as described in § 4. It employs solely the industrybased hypergraph as input. (EST4) This variant does not employ the wavelet basis for hypergraph convolution, as introduced in § 4. Rather, the traditional Fourier basis is applied.
Metric  ESTIMATE  EST1  EST2  EST3  EST4 

Return  0.102  0.024  0.043  0.047  0.052 
IC  0.080  0.013  0.020  0.033  0.020 
RankIC  0.516  0.121  0.152  0.339  0.199 
Prec@N  0.627  0.526  0.583  0.603  0.556 
Table 3 presents the results for several evaluation metrics, averaged over all phases due to space constraints. We observe that our full model ESTIMATE outperforms the other variants, which provides evidence for the the positive impact of each of its components. In particular, it is unsurprising that the removal of the relations between stocks leads to a significant degradation of the final result (approximately 75% of the average return) in EST1. A similar drop of the average return can be seen for EST2 and EST3, which highlights the benefits of using generati +ve filters over a traditional single LSTM temporal extractor (EST2); and of the proper construction of the representative hypergraph (EST3). Also, the full model outperforms the variant EST4 by a large margin in every metric. This underlines the robustness of the convolution with the wavelet basis used in ESTIMATE over the traditional Fourier basis that is used in existing work.
5.4. Qualitative Study
We answer research question RQ3 by visualizing in Fig. 6 the prediction results of our technique ESTIMATE for the technology stocks APPL and META from 01/10/2021 to 01/05/2022. We also compare ESTIMATE’s performance to the variant that does not consider relations between stocks (EST1) and the variant that does not employ temporal generative filters (EST2). This way, we illustrate how our technique is able to handle Challenge 1 and Challenge 2.
The results indicate that modelling the complex multiorder dynamics of a stock market (Challenge 1) helps ESTIMATE and EST2 to correctly predict the downward trend of technology stocks around the start of 2022; while the prediction of EST1, which uses the temporal patterns of each stock, suffers from a significant delay. Also, the awareness of internal dynamics of ESTIMATE due to the usage of generative filters helps our technique to differentiate the trend observed for APPL from the one of META, especially at the start of the correction period in January 2022.
5.5. Hyperparameter sensitivity
This experiment addresses question RQ4 on the hyperparameter sensitivity. Due to space limitations, we focus on the most important hyperparameters. The backtesting period of this experiment is set from 01/07/2021 to 01/05/2022 for the same reason.
Lookback window length T. We analyze the prediction performance of ESTIMATE when varying the length T of the lookback window in Fig. 7. It can be observed that the best window length is 20, which coincides with an important length commonly used by professional analyses strategies (Adam et al., 2016). The performance drops quickly when the window length is less or equal than 10, due to the lack of information. On the other hand, the performance also degrades when the window length increases above 25. This shows that even when using an LSTM to mitigate the vanishing gradient issue, the model cannot handle very long sequences.
Lookahead window length w. We consider different lengths w of the lookahead window (Fig. 7) and observe that ESTIMATE achieves the best performance for a window of length 5. The results degrade significantly when w exceeds 10. This shows that our model performs well for shortterm prediction, it faces issues when considering a longterm view.
Number of selected topk stocks. We analyze the variance in the profitability depending on the number of selected topk stocks from the ranking in Fig. 7. We find that ESTIMATE performs generally well, while the best results are obtained for k = 10.
6. Related Work
Traditional Stock Modelling. Traditional techniques often focus on numerical features (Liou et al., 2021; Ruiz et al., 2012), referred to as technical indicators, such as the Moving Average (MA) or the Relative Strength Index (RSI). The features are combined with classical timeseries models, such as ARIMA (Piccolo, 1990), to model the stock movement (Ariyo et al., 2014). However, such techniques often require the careful engineering by experts to identify effective indicator combinations and thresholds. Yet, these configurations are often not robust against market changes.
For individual traders, they often engineer rules based on a specific set of technical indicators (which indicate the trading momentum) to find the buying signals. For instance, a popular strategy is to buy when the moving average of length 5 (MA5) cross above the moving average of length 20 (MA20), and the Moving Average Convergence Divergence (MACD) is positive. However, the engineering of the specific features requires the extensive expertise and experiment of the traders in the market. Also, the traders must continuously tune and backtest the strategy, as the optimized strategy is not only different from the markets but also keeps changing to the evolving of the market. Last but not least, it is exhaustive for the trader to keep track of a large number of indicators of a stock, as well as the movement of multiple stocks in real time. Due to these drawbacks, quantitative trading with the aid of the AI emerges in recent years, especially with the advances of deep neural network (DNN).
DNNbased Stock Modelling. Recent techniques leverage advances in deep learning to capture the nonlinear temporal dynamics of stock prices through highlevel latent features (Ren and Malik, 2019; Huynh et al., 2021; Bendre et al., 2021). Earlier techniques following this paradigm employ recurrent neural networks (RNN) (Nelson et al., 2017) or convolutional neural networks (CNN) (Tsantekidis et al., 2017) to model a single stock price and predict its shortterm trend. Other works employ deep reinforcement learning (DRL), the combination of deep learning with reinforcement learning (RL), a subfield of sequential decisionmaking. For instance, the quantitative trading problem can be formulated as a Markov decision process (Liu et al., 2021) and be addressed by wellknown DRL algorithms (e.g. DQN, DDPG (Lillicrap et al., 2016)). However, these techniques treat the stocks independently and lack a proper scheme to consider the complex relations between stocks in the market.
Graphbased Stock Modelling. Some stateoftheart techniques address the problem of correlations between stocks and propose graphbased solutions to capture the interstock relations. For instance, the market may be modelled as a heterogeneous graph with different type of pairwise relations (Kim et al., 2019), which is then used in an attentionbased graph convolution network (GCN) (Duong et al., 2023; Nguyen et al., 2022b, 2021b, 2020; Tam et al., 2021; Trung et al., 2020) to predict the stock price and market index movement. Similarly, a market graph may be constructed and an augmented GCN with temporal convolutions can be employed to learn, at the same time, the stock movement and stock relation evolution (Li et al., 2020). The most recent techniques (Feng et al., 2019b; Sawhney et al., 2021) are based on the argument that the stock market includes multiorder relations, so that the market should be modelled using hypergraphs. Specifically, external knowledge from knowledge graphs enables the construction of a market hypergraph (Sawhney et al., 2021), which is used in a spatiotemporal attention hypergraph network to learn interdependences of stocks and their evolution. Then, a ranking of stocks based on shortterm profit is derived.
Different from previous work, we propose a market analysis framework that learns the complex multiorder correlation of stocks derived from a hypergraph representation. We go beyond the state of the art ((Feng et al., 2019b; Sawhney et al., 2021)) by proposing temporal generative filters that implement a memorybased mechanism to recognize better the individual characteristics of each stock, while not overparameterizing the core LSTM model. Also, we propose a new hypergraph attention convolution scheme that leverages the wavelet basis to mitigate the high complexity and dispersed localization faced in previous hypergraphbased approaches.
7. Conclusion
In this paper, we address two unique characteristics of the stock market prediction problem: (i) multiorder dynamics which implies strong nonpairwise correlations between the price movement of different stocks, and (ii) internal dynamics where each stock maintains its own dedicated behaviour. We propose ESTIMATE, a stock recommendation framework that supports learning of the multiorder correlation of the stocks (i) and their individual temporal patterns (ii), which are then encoded in node embeddings derived from hypergraph representations. The framework provides two novel mechanisms: First, temporal generative filters are incorporated as a memorybased shared parameter LSTM network that facilitates learning of temporal patterns per stock. Second, we presented attention hypergraph convolutional layers using the wavelet basis, i.e., a convolution paradigm that relies on the polynomial wavelet basis to simplify the message passing and focus on the localized convolution.
Extensive experiments on realworld data illustrate the effectiveness of our techniques and highlight its applicability in trading recommendation. Yet, the experiments also illustrate the impact of concept drift, when the market characteristics change from the training to the testing period. In future work, we plan to tackle this issue by exploring timeevolving hypergraphs with the ability to memorize distinct periods of past data and by incorporating external data sources such as earning calls, fundamental indicators, news data (Nguyen et al., 2019b, a, 2015), social networks (Nguyen et al., 2022a; Tam et al., 2019; Tam et al., 2017; Nguyen et al., 2021a), and crowd signals (Nguyen et al., 2013; Hung et al., 2017, 2013).
References
 (1)
 Adam et al. (2016) Klaus Adam, Albert Marcet, and Juan Pablo Nicolini. 2016. Stock market volatility and learning. The Journal of finance 71, 1 (2016), 33–82.
 Agarap (2018) Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
 Ariyo et al. (2014) Adebiyi A Ariyo, Adewumi O Adewumi, and Charles K Ayo. 2014. Stock price prediction using the ARIMA model. In UKSIM. 106–112.
 Bacry et al. (2015) Emmanuel Bacry, Iacopo Mastromatteo, and JeanFrançois Muzy. 2015. Hawkes processes in finance. Market Microstructure and Liquidity 1, 01 (2015), 1550005.
 Bendre et al. (2021) Mangesh Bendre, Mahashweta Das, Fei Wang, and Hao Yang. 2021. GPR: Global Personalized Restaurant Recommender System Leveraging Billions of Financial Transactions. In WSDM. 914–917.
 Bennett et al. (2021) Stefanos Bennett, Mihai Cucuringu, and Gesine Reinert. 2021. Detection and clustering of leadlag networks for multivariate time series with an application to financial markets. In MiLeTS. 1–12.
 Chen et al. (2015) Kai Chen, Yi Zhou, and Fangyan Dai. 2015. A LSTMbased method for stock returns prediction: A case study of China stock market. In Big Data. 2823–2824.
 Duong et al. (2023) Chi Thang Duong, Thanh Tam Nguyen, TrungDung Hoang, Hongzhi Yin, Matthias Weidlich, and Quoc Viet Hung Nguyen. 2023. Deep MinCut: Learning Node Embeddings from Detecting Communities. Pattern Recognition 133 (2023), 1–12.
 Feng et al. (2019a) Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding, Maosong Sun, and TatSeng Chua. 2019a. Enhancing Stock Movement Prediction with Adversarial Training. In IJCAI. 5843–5849.
 Feng et al. (2019b) Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and TatSeng Chua. 2019b. Temporal Relational Ranking for Stock Prediction. ACM Trans. Inf. Syst. 37, 2 (2019).
 Github (2022) Github. 2022. {https://github.com/thanhtrunghuynh93/estimate}
 Gu et al. (2020) Yuechun Gu, Da Yan, Sibo Yan, and Zhe Jiang. 2020. Price forecast with highfrequency finance data: An autoregressive recurrent neural network model with technical indicators. In CIKM. 2485–2492.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Shortterm Memory. Neural computation 9 (1997), 1735–80.
 Hu et al. (2018) Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and TieYan Liu. 2018. Listening to chaotic whispers: A deep learning framework for newsoriented stock trend prediction. In WSDM. 261–269.
 Hung et al. (2013) Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Lam Ngoc Tran, and Karl Aberer. 2013. An evaluation of aggregation techniques in crowdsourcing. In WISE. 1–15.
 Hung et al. (2017) Nguyen Quoc Viet Hung, Huynh Huu Viet, Nguyen Thanh Tam, Matthias Weidlich, Hongzhi Yin, and Xiaofang Zhou. 2017. Computing crowd consensus with partial agreement. IEEE Transactions on Knowledge and Data Engineering 30, 1 (2017), 1–14.
 Huynh et al. (2021) Thanh Trung Huynh, Chi Thang Duong, Tam Thanh Nguyen, Vinh Van Tong, Abdul Sattar, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2021. Network alignment with holistic embeddings. TKDE (2021).
 Kim et al. (2019) Raehyun Kim, Chan Ho So, Minbyul Jeong, Sanghoon Lee, Jinkyu Kim, and Jaewoo Kang. 2019. Hats: A hierarchical graph attention network for stock movement prediction. arXiv preprint arXiv:1908.07999 (2019).
 Li et al. (2020) Wei Li, Ruihan Bao, Keiko Harimoto, Deli Chen, Jingjing Xu, and Qi Su. 2020. Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction. In IJCAI. 4541–4547.
 Lillicrap et al. (2016) Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. ICLR (2016).
 Liou et al. (2021) YiTing Liou, ChungChi Chen, TsunHsien Tang, HenHsen Huang, and HsinHsi Chen. 2021. FinSense: an assistant system for financial journalists and investors. In WSDM. 882–885.
 Liu et al. (2021) XiaoYang Liu, Hongyang Yang, Jiechao Gao, and Christina Dan Wang. 2021. FinRL: deep reinforcement learning framework to automate trading in quantitative finance. In ICAIF. 1–9.
 Malkiel (1989) Burton G Malkiel. 1989. Efficient market hypothesis. In Finance. Springer, 127–134.
 Myers and Sirois (2004) Leann Myers and Maria J Sirois. 2004. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences 12 (2004).
 Nelson et al. (2017) David MQ Nelson, Adriano CM Pereira, and Renato A De Oliveira. 2017. Stock market’s price movement prediction with LSTM neural networks. In IJCNN. 1419–1426.
 New York Times (2022) New York Times. 2022. {https://www.nytimes.com/2022/04/26/business/stockmarkettoday.html}
 Nguyen et al. (2013) Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Ngoc Tran Lam, and Karl Aberer. 2013. Batc: a benchmark for aggregation techniques in crowdsourcing. In SIGIR. 1079–1080.
 Nguyen et al. (2015) Thanh Tam Nguyen, Quoc Viet Hung Nguyen, Matthias Weidlich, and Karl Aberer. 2015. Result selection and summarization for Web Table search. In 2015 IEEE 31st International Conference on Data Engineering. 231–242.
 Nguyen et al. (2020) Tam Thanh Nguyen, Thanh Trung Huynh, Hongzhi Yin, Vinh Van Tong, Darnbi Sakong, Bolong Zheng, and Quoc Viet Hung Nguyen. 2020. Entity alignment for knowledge graphs with multiorder convolutional networks. IEEE Transactions on Knowledge and Data Engineering (2020).
 Nguyen et al. (2022a) Thanh Tam Nguyen, Thanh Trung Huynh, Hongzhi Yin, Matthias Weidlich, Thanh Thi Nguyen, Thai Son Mai, and Quoc Viet Hung Nguyen. 2022a. Detecting rumours with latency guarantees using massive streaming data. The VLDB Journal (2022), 1–19.
 Nguyen et al. (2021a) Thanh Toan Nguyen, Thanh Tam Nguyen, Thanh Thi Nguyen, Bay Vo, Jun Jo, and Quoc Viet Hung Nguyen. 2021a. Judo: Justintime rumour detection in streaming social platforms. Information Sciences 570 (2021), 70–93.
 Nguyen et al. (2021b) Thanh Toan Nguyen, Minh Tam Pham, Thanh Tam Nguyen, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Thanh Tho Quan, et al. 2021b. Structural representation learning for network alignment with selfsupervised anchor links. Expert Systems with Applications 165 (2021), 113857.
 Nguyen et al. (2022b) Thanh Tam Nguyen, Thanh Cong Phan, Minh Hieu Nguyen, Matthias Weidlich, Hongzhi Yin, Jun Jo, and Quoc Viet Hung Nguyen. 2022b. Modelagnostic and diverse explanations for streaming rumour graphs. KnowledgeBased Systems 253 (2022), 109438.
 Nguyen et al. (2019a) Thanh Tam Nguyen, Thanh Cong Phan, Quoc Viet Hung Nguyen, Karl Aberer, and Bela Stantic. 2019a. Maximal fusion of facts on the web with credibility guarantee. Information Fusion 48 (2019), 55–66.
 Nguyen et al. (2019b) Thanh Tam Nguyen, Matthias Weidlich, Hongzhi Yin, Bolong Zheng, Quoc Viet Hung Nguyen, and Bela Stantic. 2019b. User guidance for efficient fact checking. PVLDB 12, 8 (2019), 850–863.
 Nourbakhsh et al. (2020) Armineh Nourbakhsh, Mohammad M Ghassemi, and Steven Pomerville. 2020. Spread: Automated financial metric extraction and spreading tool from earnings reports. In WSDM. 853–856.
 Piccolo (1990) Domenico Piccolo. 1990. A distance measure for classifying ARIMA models. Journal of time series analysis 11, 2 (1990), 153–164.
 Ren and Malik (2019) Ke Ren and Avinash Malik. 2019. Investment recommendation system for lowliquidity online peer to peer lending (P2PL) marketplaces. In WSDM. 510–518.
 Ruiz et al. (2012) Eduardo J Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with microblogging activity. In WSDM. 513–522.
 Sawhney et al. (2021) Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, Tyler Derr, and Rajiv Ratn Shah. 2021. Stock selection via spatiotemporal hypergraph attention network: A learning to rank approach. In AAAI. 497–504.
 Sawhney et al. (2020) Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, and Rajiv Ratn Shah. 2020. Spatiotemporal hypergraph convolution network for stock movement forecasting. In ICDM. 482–491.
 Sun et al. (2021) Xiangguo Sun, Hongzhi Yin, Bo Liu, Hongxu Chen, Jiuxin Cao, Yingxia Shao, and Nguyen Quoc Viet Hung. 2021. Heterogeneous hypergraph embedding for graph classification. In WSDM. 725–733.
 Tam et al. (2021) Nguyen Thanh Tam, Huynh Thanh Trung, Hongzhi Yin, Tong Van Vinh, Darnbi Sakong, Bolong Zheng, and Nguyen Quoc Viet Hung. 2021. Entity alignment for knowledge graphs with multiorder convolutional networks. In ICDE. 2323–2324.
 Tam et al. (2017) Nguyen Thanh Tam, Matthias Weidlich, Duong Chi Thang, Hongzhi Yin, and Nguyen Quoc Viet Hung. 2017. Retaining Data from Streams of Social Platforms with Minimal Regret. In IJCAI. 2850–2856.
 Tam et al. (2019) Nguyen Thanh Tam, Matthias Weidlich, Bolong Zheng, Hongzhi Yin, Nguyen Quoc Viet Hung, and Bela Stantic. 2019. From anomaly detection to rumour detection using data streams of social platforms. PVLDB 12, 9 (2019), 1016–1029.
 Trung et al. (2020) Huynh Thanh Trung, Tong Van Vinh, Nguyen Thanh Tam, Hongzhi Yin, Matthias Weidlich, and Nguyen Quoc Viet Hung. 2020. Adaptive network alignment with unsupervised and multiorder convolutional networks. In ICDE. 85–96.
 Tsantekidis et al. (2017) Avraam Tsantekidis, Nikolaos Passalis, Anastasios Tefas, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. 2017. Forecasting stock prices from the limit order book using convolutional neural networks. In CBI. 7–12.
 Wang et al. (2021) Guifeng Wang, Longbing Cao, Hongke Zhao, Qi Liu, and Enhong Chen. 2021. Coupling macrosectormicro financial indicators for learning stock representations with less uncertainty. In AAAI. 4418–4426.
 Xu et al. (2019) Bingbing Xu, Huawei Shen, Qi Cao, Yunqi Qiu, and Xueqi Cheng. 2019. Graph wavelet neural network. ICLR (2019).
 Xu et al. (2021) Wentao Xu, Weiqing Liu, Lewen Wang, Yingce Xia, Jiang Bian, Jian Yin, and TieYan Liu. 2021. HIST: A Graphbased Framework for Stock Trend Forecasting via Mining ConceptOriented Shared Information. arXiv preprint arXiv:2110.13716 (2021).
 Yadati et al. (2019) Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. 2019. HyperGCN: A New Method For Training Graph Convolutional Networks on Hypergraphs. In NIPS. 1–12.
 Yahoo Finance (2022) Yahoo Finance. 2022. {https://finance.yahoo.com/}
 Zhang et al. (2017) Liheng Zhang, Charu Aggarwal, and GuoJun Qi. 2017. Stock price prediction via discovering multifrequency trading patterns. In KDD. 2141–2149.
Appendix A Technical indicators formulation
In this section, we express equations to formulate indicators described in Section 3: Temporal Generative Filters. Let $t$ denote the $t$th time step and ${O}_{t},{H}_{t},{L}_{t},{C}_{t},{V}_{t}$ represents the open price, high price, low price, close price, and trading volume at $t$th time step, respectively. Since most of the indicators are calculated within a certain period, we denote $n$ as that time window.

•
Arithmetic ratio (AR): The open, high, and low price ratio over the close price.
(14) $$A{R}_{O}=\frac{{O}_{t}}{{C}_{t}},A{R}_{H}=\frac{{H}_{t}}{{C}_{t}},A{R}_{L}=\frac{{L}_{t}}{{C}_{t}}$$ 
•
Close Price Ratio: The ratio of close price over the highest and the lowest close price within a time window.
(15) $$\begin{array}{cc}& R{C}_{mi{n}_{t}}=\frac{{C}_{t}}{\text{min}([{C}_{t},{C}_{t1},\mathrm{\dots},{C}_{tn}])}\hfill \\ & R{C}_{ma{x}_{t}}=\frac{{C}_{t}}{\text{max}([{C}_{t},{C}_{t1},\mathrm{\dots},{C}_{tn}])}\hfill \end{array}$$ 
•
Close SMA: The simple moving average of the close price over a time window
(16) $$SM{A}_{{C}_{t}}=\frac{{C}_{t}+{C}_{t1}+\mathrm{\dots}+{C}_{tn}}{n}$$ 
•
Close EMA: The exponential moving average of the close price over a time window
(17) $$EM{A}_{{C}_{t}}={C}_{t}\ast k+EM{A}_{{C}_{t1}}\ast (1k)$$ where: $k=\frac{2}{n+1}$

•
Volume SMA: The simple moving average of the volume over a time window
(18) $$SM{A}_{{V}_{t}}=\frac{{V}_{t}+{V}_{t1}+\mathrm{\dots}+{V}_{tn}}{n}$$ 
•
Volume EMA: The exponential moving average of the close price over a time window
(19) $$EM{A}_{{V}_{t}}={V}_{t}\ast k+EM{A}_{{V}_{t1}}\ast (1k)$$ where: $k=\frac{2}{n+1}$

•
Average Directional Index (ADX): ADX is used to quantify trend strength. ADX calculations are based on a moving average of price range expansion over a given period of time.
(20) $$\begin{array}{cc}& D{I}_{t}^{+}=\frac{\text{Smoothed}D{M}^{+}}{AT{R}_{t}}\ast 100\hfill \\ & D{I}_{t}^{}=\frac{\text{Smoothed}D{M}^{}}{AT{R}_{t}}\ast 100\hfill \\ & D{X}_{t}=\frac{D{I}_{t}^{+}D{I}_{t}^{}}{D{I}_{t}^{+}+D{I}_{t}^{}}\ast 100\hfill \\ & AD{X}_{t}=\frac{AD{X}_{t1}\ast (n1)+D{X}_{t}}{n}\hfill \end{array}$$ where:

–
$D{M}^{+}={H}_{t}{H}_{t1}$

–
$D{M}^{}={L}_{t1}{L}_{t}$

–
$\text{Smoothed}D{M}^{+/}=D{M}_{t1}\frac{D{M}_{t1}}{n}+D{M}_{t}$

–
ATR: Average True Range

–

•
Relative Strength Index (RSI): measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock or other asset. It is the normalized ration of the average gain over the average loss.
(21) $$\begin{array}{cc}& avgGai{n}_{t}=\frac{(n1)\ast avgGai{n}_{t1}+gai{n}_{t}}{n}\hfill \\ & avgLos{s}_{t}=\frac{(n1)\ast avgLos{s}_{t1}+los{s}_{t}}{n}\hfill \\ & RS{I}_{t}=100\frac{100}{1+\frac{avgGai{n}_{t}}{avgLos{s}_{t}}}\hfill \end{array}$$ where:

–
$gai{n}_{t}=\{\begin{array}{cc}{C}_{t}{C}_{t1},\hfill & \text{if}{C}_{t}>{C}_{t1}\hfill \\ 0,\hfill & \text{otherwise}\hfill \end{array}$

–
$los{s}_{t}=\{\begin{array}{cc}0,\hfill & \text{if}{C}_{t}>{C}_{t1}\hfill \\ {C}_{t1}{C}_{t},\hfill & \text{otherwise}\hfill \end{array}$

–

•
Moving average convergence divergence (MACD): shows the relationship between two moving averages of a stock’s price. It is calculated by the subtraction of the longterm EMA from the shortterm EMA.
(22) $$MAC{D}_{t}=EM{A}_{t}(n=12)EM{A}_{t}(n=26)$$ where:

–
$EM{A}_{t}(n=12)$: The exponential moving average at $t$th time step of the close price over 12time steps.

–
$EM{A}_{t}(n=26)$: The exponential moving average at $t$th time step of the close price over 26time steps.

–

•
Stochastics: an oscillator indicator that points to buying or selling opportunities based on momentum
(23) $$Stochasti{c}_{t}=100\ast \frac{Cu{r}_{t}{L}_{tn}}{{H}_{tn}{L}_{tn}}$$ where:

–
$T{P}_{t}=\frac{{H}_{t}+{L}_{t}+{C}_{t}}{3}$

–
$moneyFlo{w}_{t}=T{P}_{t}\ast Volum{e}_{t}$

–
$posM{F}_{t}=posM{F}_{t1}+moneyFlo{w}_{t}\text{if}T{P}_{t}>T{P}_{t1}$

–
$negM{F}_{t}=negM{F}_{t1}+moneyFlo{w}_{t}\text{if}T{P}_{t}\le T{P}_{t1}$

–

•
Money Flow Index (MFI): an oscillator measures the flow of money into and out over a specified period of time. The MFI is the normalized ratio of accumulating positive money flow (upticks) over negative money flow values (downticks).
(24) $$MF{I}_{t}=100\frac{100}{1+\frac{{\sum}_{tn}^{t}posMF}{{\sum}_{tn}^{t}negMF}}$$ where:

–
$T{P}_{t}=\frac{{H}_{t}+{L}_{t}+{C}_{t}}{3}$

–
$moneyFlo{w}_{t}=T{P}_{t}\ast Volum{e}_{t}$

–
$posM{F}_{t}=posM{F}_{t1}+moneyFlo{w}_{t}\text{if}T{P}_{t}>T{P}_{t1}$

–
$negM{F}_{t}=negM{F}_{t1}+moneyFlo{w}_{t}\text{if}T{P}_{t}\le T{P}_{t1}$

–

•
Average of True Ranges (ATR): The simple moving average of a series of true range indicators. True range indicators show the max range between (High  Low), (HighPrevious_Close), and (Previous_Close  Low).
(25) $$\begin{array}{cc}& T{R}_{t}=\text{Max}({H}_{t}{L}_{t},{H}_{t}{C}_{t1},{L}_{t}{C}_{t1})\hfill \\ & AT{R}_{t}=\frac{1}{n}\sum _{i=tn}^{t}T{R}_{i}\hfill \end{array}$$ 
•
Bollinger Band (BB): a set of trendlines plotted two standard deviations (positively and negatively) away from a simple moving average (SMA) of a stock’s price.
(26) $$\begin{array}{cc}& BOL{U}_{t}=M{A}_{t}(T{P}_{t},w)+m\ast \sigma [T{P}_{t},w]\hfill \\ & BOL{D}_{t}=M{A}_{t}(T{P}_{t},w)m\ast \sigma [T{P}_{t},w]\hfill \end{array}$$ where:

–
$BOL{U}_{t}$: Upper Bollinger Band at $t$th time step

–
$BOL{D}_{t}$: Lower Bollinger Band at $t$th time step

–
$M{A}_{t}$: Moving average at $t$th time step

–
$T{P}_{t}=\frac{{H}_{t}+{L}_{t}+{C}_{t}}{3}$

–
$m$: number of standard deviations (typically 2)

–
$\sigma [TP,n]$: Standard deviation over last w periods of $TP$

–

•
OnBalance Volume (OBV): measures buying and selling pressure as a cumulative indicator that adds volume on up days and subtracts volume on down days.
(27) $$