Posts

We have discussed Basic Distance Approach in the previous blog post. In this post, we’ll look into one of the advanced methods in the Distance Approach and its differences to the Basic Distance Approach. If you haven’t read the previous blog post, we recommend reading it before you read this post.

So, what is the Pearson Correlation Approach? It is a type of Distance Approach and applies Pearson correlation on return level for identifying pairs. The main concept is similar to the Basic Distance Approach, where pairs are formed with a particular rule, and a portfolio is constructed based on the trading signals of pairs.

The concept of pairs trading is pretty straightforward. As described in [Gatev et al. (2006)], we first find two stocks that have moved together historically and then monitor the spread between these stocks. If the prices of the two stocks diverge, we short the winner and go long on the loser, hoping that these prices converge in the future. If the spread is mean reverting, it will revert to its historical mean. Then, the positions are reversed and a profit can be made.

There are various frameworks that could be used to identify a pair of stocks and build pairs trading strategies. In this article, we will be discussing a couple of papers related to stochastic control based approaches, which had the highest impact in this domain. We will not be discussing pairs selection techniques here, and interested readers can refer to the Stock Selection Methods using Copula and Machine Learning for Pairs Selection articles. The objective of these methods is to identify the optimal portfolio holdings in the legs of a pairs trade compared to other available assets. Stochastic control theory is used to determine value and optimal policy functions for this portfolio problem. It does sound a bit complicated, but, I’ll try to keep things simple and explain the intuition behind how and why these methods work.

Systematic approaches of pairs trading gained popularity from the mid-1980s. Gatev et al (2006) examined the profitability of a distance-based strategy on normalized prices. Cointegration is another common strategy incorporated approach as discussed in [Vidyamurthy (2004)]. Both methods are tied to the idea of a mean-reverting bet, and the trading signals are generated from the spread: when the spread widens, it is expected to narrow, and when it does happen the trader pockets the profit.

We have previously talked about several advantages from copula-based models in Copula for Pairs Trading: A Detailed, But Practical Introduction, and as a tool it analyzes the dependence structure among several random variables (For pairs trading it is just 2 random variables). We quickly summarize it here:

ArbitrageLab is a python library filled with algorithms from the best academic journals and graduate-level textbooks, which focuses on the branch of statistical arbitrage known as pairs trading.

This playlist is a series of lecture videos that explore advanced topics and highlight how your team can compete with the world’s best hedge funds!

Whether it is for pairs trading or risk management, two natural questions to ask before putting copula for use are: How to draw samples from a copula? How should one fit a copula to data? The necessity of fitting is quite obvious, otherwise, there is no way to calibrate our model for pairs trading or risk analysis using historical data.

For sampling, it is mostly for making a Q-Q plot against the historical data as a sanity check. Note that a copula natively cannot generate future price time series since it treats time series data as independent draws from two random variables, and thus has no information regarding the sequence, which is vital in time series analysis. One way to think about sampling from a copula trained by time series is that it gives the likelihood of where the next data point is going to be, regardless of the input sequence.

Whilst backtesting architectures is a topic on its own, this article dives into how to correctly backtest a pairs trading investment strategy using a vectorized (quick methodology) rather than the more robust event-driven architecture. This is a technique that is very common amongst analysts and is rather straightforward for long-only portfolios, however, when you start to construct long-short portfolios based on statistical arbitrage, strange little nuances start to pop up.

In this post, we will investigate and showcase a machine learning selection framework that will aid traders in finding mean-reverting opportunities. This framework is based on the book: “A Machine Learning based Pairs Trading Investment Strategy” by Sarmento and Horta.

A time series is known to exhibit mean reversion when, over a certain period, it reverts to a constant mean. A topic of increasing interest involves the investigation of long-run properties of stock prices, with particular attention being paid to investigate whether stock prices can be characterized as random walks or mean-reverting processes.

Let’s Solve a Mystery.

Suppose that you encountered a promising pair of stocks that move closely together, the spread zig-zagged around 0 like some fine needle stitching that sure looks like a nice candidate for mean-reversion bets. What’s more, you find out that the two stocks’ prices for the past 2 years are all nicely normally distributed. Great! You can avoid some hairy analysis for now. Therefore you fit them as a joint-normal distribution for some sanity check and immediately find that it doesn’t look as promising anymore:

For the past two years, there were some major market events, during which the stocks moved together upwards or downwards, depending on if it was good news or bad news. Your bivariate Gaussian model, in contrast, says that such co-moves are very unlikely to happen since they are so close to the tails of the distribution and you better ignore it. What is more annoying is that the stocks tend to move downward together more than going upward, and the bivariate Gaussian distribution says it should be symmetric.

So what went wrong? For this mini example, there are two major pitfalls present:

Following the work of Professor Tim Leung and Xin Lee, we explore how the Ornstein-Uhlenbeck process known for modelling mean-reverting interest rates, currency exchange rates, and commodity prices can be used in pairs trading and statistical arbitrage. The two-step process looks the following way: