Research articles for the Hudson and Thames home page.

We have discussed Basic Distance Approach in the previous blog post. In this post, we’ll look into one of the advanced methods in the Distance Approach and its differences to the Basic Distance Approach. If you haven’t read the previous blog post, we recommend reading it before you read this post.

So, what is the Pearson Correlation Approach? It is a type of Distance Approach and applies Pearson correlation on return level for identifying pairs. The main concept is similar to the Basic Distance Approach, where pairs are formed with a particular rule, and a portfolio is constructed based on the trading signals of pairs.

Ordinary least squares (OLS) regression is probably the most commonly used statistical method in quantitative finance (and likely in other quantitative fields). It is very fast to compute, and the results are often quite interpretable. Due to its simplicity, it serves as the cornerstone for many more complex statistical or machine learning models. Also, it has been studied so thoroughly historically, that many of its limitations can be covered by various techniques. For example, the original OLS model treats all instances in the training set of equal importance, and one of the common approaches is to introduce weights on the instances to reflect our beliefs.

In this article, we aim to introduce a systematic and elegant approach to incorporate history’s relevance to the regression process.

Briefly speaking, this method ranks all history instances based on their “relation” to the current input independent variables and selects those that are more informative and similar to regress on. 

There are many types of approaches you can use in pairs trading, but the Distance Approach is one of the most widely used because of its simplicity. The basic concept is as follows: Using Euclidean squared distance on the normalized price time series, n closest pairs of assets are chosen as pairs.

Then, with selected pairs, if the difference between the price of elements in a pair diverged by more than a threshold(ex. 2 standard deviations), the positions are opened. We have a long position for a stock with a lower price and a short position for a higher price in the portfolio.

Pairs selection is the first crucial step to building a pairs trading strategy. And it is no surprise, to perform it correctly, one must diligently examine, compare and contrast numerous test results, graphs and characteristics. For example, cointegration analysis alone can be performed in one of two methods – utilizing the Engle-Granger approach or the Johansen approach. To truly have the complete picture of the pairs suitability, with the Engle-Granger approach, the researcher should perform the test(and further analysis) for both possible combinations, A/B or B/A, in a pair since it is sensitive to which asset we choose to be the “dependent” one.
The Johansen test, in turn, provides multiple cointegration vectors, which also should be examined separately and taken into account. Not to mention the possible analysis of the residuals, auto-correlation tests, etc., brings even more data to the table for you to make your judgement.

And now, we have two options: memorize everything or constantly switch between numerous parameters and plots to check, contrast and compare. It results in loading your brain with tons of ‘noise’ that distracts from focusing on the evaluation itself. But it doesn’t have to be this way. Data analysis thrives when there is order, accessibility and clarity. And what embodies these three qualities better than combining everything into an interactive well-rounded tear sheet?

The concept of pairs trading is pretty straightforward. As described in [Gatev et al. (2006)], we first find two stocks that have moved together historically and then monitor the spread between these stocks. If the prices of the two stocks diverge, we short the winner and go long on the loser, hoping that these prices converge in the future. If the spread is mean reverting, it will revert to its historical mean. Then, the positions are reversed and a profit can be made.

There are various frameworks that could be used to identify a pair of stocks and build pairs trading strategies. In this article, we will be discussing a couple of papers related to stochastic control based approaches, which had the highest impact in this domain. We will not be discussing pairs selection techniques here, and interested readers can refer to the Stock Selection Methods using Copula and Machine Learning for Pairs Selection articles. The objective of these methods is to identify the optimal portfolio holdings in the legs of a pairs trade compared to other available assets. Stochastic control theory is used to determine value and optimal policy functions for this portfolio problem. It does sound a bit complicated, but, I’ll try to keep things simple and explain the intuition behind how and why these methods work.

In this article we introduce a vine copula-based strategy for statistical arbitrage from [Stübinger et al., 2018] with some analysis, then we generalize their framework and suggest what can be modified. With the power of vine copula, we can directly model the relationships among multiple stocks. We want to trade based on the information generated from a vine copula model. Similar to those traditional bivariate copulae approaches in pairs trading, we will use the conditional (cumulative) probability to gauge whether a target stock is underpriced or overpriced against other stocks, and then generate trading signals based on them from a mean-reversion bet.

We aim to cover the following topics:

Quick overview of copula-based trading strategies.
Idea and typical workflow of the C-vine copula approach.
Strategy assumptions and details.
Comments and some analysis for this strategy.

Copula is a very flexible tool for modeling dependencies among random variables. Long been used in risk management, it is also a great statistical arbitrage method when coupled with a good execution rule that is not limited to just mean-reversion strategies. From 2010, multiple trading methods involving copula have been developed: from earlier simple bi-variate copula on prices series to recent sophisticated self-adaptive models using low-latency data. It is a growing and dynamic field of research and practice, however, there is little literature reviewing criteria for selecting tradable stocks dedicated solely to copula-based methods.

[Rad et al (2016)] found that the copula pairs-trading method (the version that they implemented) has much better performance in drawdown risk compared to distance and cointegration, however, bad pairs that fail to converge significantly drove down its performance. It is a serious reminder to practitioners that building a suitable portfolio is just (if not more) as important as applying a great trading method, and a less desirable set of securities can quickly ruin a seemingly great strategy. The Vine copula is created to model across multiple random variables and therefore poses a greater challenge in selecting stocks.

Copula is a great statistical tool to study the relation among multiple random variables: By focusing on the joint cumulative density of quantiles of marginals, we can bypass the idiosyncratic features of marginal distributions and directly look at how they are “related”.

Indeed, traders and analysts have been using copula to exploit statistical arbitrage under the pairs trading framework for some time, and we have implemented some of the most popular methods in ArbitrageLab. However, it is natural to expand beyond dealing with just a pair of stocks: There already exist a great amount of competing stat arb methods alongside copula, thinning the potential alpha. It is also intuitive for humans to think about relative pricing among 2 stocks, whereas for higher dimensions it is not so easy, and left great opportunities for quantitative approaches.

It is time to get down to the nitty-gritty of the implementation of a mean-reversion strategy.

The crux of implementing a mean-reversion trading strategy is to pinpoint the trade location. Apparently, we want to initiate a trade when the spread value has deviated considerably from its long-term mean. However, “a considerable deviation” is a rather vague description and needs to be quantified when it comes to trade execution. For the sake of convenience and clarity, I will use “boundary” to refer to the trade location and “spread” to both the spread of the long-short asset pairs and the value of the multi-asset portfolio in the remainder of this article.

“Buy low, sell high.” One cannot find a more succinct summary of a mean-reversion trading strategy; however, single assets that show stable mean-reversion over a significant period of time such that a mean-reversion trading strategy can readily become profitable are rare to find in markets today. Even if such gems were found, celebrating the discovery of the gateway to easy money could prove premature: