In this post, we will investigate and showcase a machine learning selection framework that will aid traders in finding mean-reverting opportunities. This framework is based on the book: “A Machine Learning based Pairs Trading Investment Strategy” by Sarmento and Horta.

A time series is known to exhibit mean reversion when, over a certain period, it reverts to a constant mean. A topic of increasing interest involves the investigation of long-run properties of stock prices, with particular attention being paid to investigate whether stock prices can be characterized as random walks or mean-reverting processes.

Let’s Solve a Mystery.

Suppose that you encountered a promising pair of stocks that move closely together, the spread zig-zagged around 0 like some fine needle stitching that sure looks like a nice candidate for mean-reversion bets. What’s more, you find out that the two stocks’ prices for the past 2 years are all nicely normally distributed. Great! You can avoid some hairy analysis for now. Therefore you fit them as a joint-normal distribution for some sanity check and immediately find that it doesn’t look as promising anymore:

For the past two years, there were some major market events, during which the stocks moved together upwards or downwards, depending on if it was good news or bad news. Your bivariate Gaussian model, in contrast, says that such co-moves are very unlikely to happen since they are so close to the tails of the distribution and you better ignore it. What is more annoying is that the stocks tend to move downward together more than going upward, and the bivariate Gaussian distribution says it should be symmetric.

So what went wrong? For this mini example, there are two major pitfalls present:

Cointegration, a concept that helped Clive W.J. Granger win the Nobel Prize in Economics in 2003 (see Footnote 1), is a cornerstone of pairs and multi-asset trading strategies. Anecdotally, forty years have passed since Granger coined the term “cointegration” in his seminal paper “Some properties of time series data and their use in econometric model specification” (Granger, 1981), yet one still cannot find the term in Merriam-Webster, and some spell checkers will draw a wavy line without hesitation beneath its every occurrence.

Indeed, the concept of cointegration is not immediately apparent from its name. Therefore, in this article, I will attempt to answer the following questions:

Following the work of Professor Tim Leung and Xin Lee, we explore how the Ornstein-Uhlenbeck process known for modelling mean-reverting interest rates, currency exchange rates, and commodity prices can be used in pairs trading and statistical arbitrage. The two-step process looks the following way: