Join the Reading Group and Community: Stay up to date with the latest developments in Financial Machine Learning!
In this article we delve into the challenge of making an asset price series stationary (for reasons discussed below) and preserving as much memory/signal from the original series. We take inspiration from Chapter 5 of the Advances in Financial Machine Learning (AFML) by Dr. Marcos Lopez de Prado therein he discusses fractionally differencing the time series (as opposed to integer differencing). A fractionally differentiated series is stationary but also has high correlation with the original series. Since a fractionally differenced series retains the memory of the original series (as indicated by the high correlation), it can be used as a feature in a machine-learning algorithm. We provide the code for the functionality in our package MLFinLab and a Jupyter notebook to illustrate the concept. This notebook can be found under the heading Chapter5. If you have any suggestions/ comments please email us at research@hudsonthames.org
Why Fractional Differentiation?
Inferential analysis of data comprises of using a sample of data to describe the characteristics such as mean and standard deviation of a feature of a population. Consider studying heights of men and women in North America or stock prices. For such an analysis and inference to be accurate, it is necessary that the underlying data generation process to remain constant. In the context of finance, the mean return and variance of those returns should be time-invariant (or not change with time). If the underlying process changes as a result of shift in regime, it would be hard to predict expected risk and return of that stock for a future date. A similar requirement exists in the case of supervised machine learning (SML). SML is a process of learning a function that maps an input to an output based on known input-output examples. In this learning process, each example is a pair consisting of an input object (often a vector or features) and an output (or a signal). The supervised learning algorithm analyzes the training examples and infers a transformation function that can be used to map new (unseen) inputs. If the data (features, in the case of SML) are not “stationary” (in other words, their underlying data generation process changes its characteristics) then the machine learning algorithm would not be able to correctly infer the label of the new observation. Therefore, stationarity becomes a necessary condition for inferential analysis and supervised machine learning. But there is a problem here – even though making a series stationary makes inference analysis and SML easier, the series loses its memory (it probably had a trend and that trend is stripped away in the process of integer differencing). This memory is helpful in predicting where will the asset price series be next point in time. This leads to a challenge – how can one make the time series stationary while retaining its predictive power (or memory).
Literature Review
Hosking [1981] appears to be the first to discuss an approach that aims to meet the aforementioned challenge. He showed that “fractionally differenced processes exhibit long-term persistence and anti-persistence; the dependence between observations a long time span apart decays much more slowly with time span than is the case with commonly used time series models”.
Implementation
We illustrate the concept in the figure (below) where we difference the e-Mini S&P 500 futures log-prices using different differencing fractions (d, shown on the x-axis). The augmented Dickey-Fuller (ADF) statistic is on the right y-axis, with the correlation between the original series and the fractionally differenced series on the left y-axis. The chart shows that ADF statistic reaches 95% critical value when the differencing amount is less than 0.2 and the correlation between the original series and the new fractionally differenced series is over 90%. This shows that the new series is not only stationary but also retains considerable memory of the original series.
The fractional differentiation code can be found in our package MLFinLab and a Jupyter notebook can be found under the heading Chapter5. Please email us at research@hudsonthames.org. if you have any comments or questions.
Hosking, J (1981): “Fractional differencing.” Biometrika, Vol. 68, No. 1, pp. 165-175.