Posts

In this article we introduce a vine copula-based strategy for statistical arbitrage from [Stübinger et al., 2018] with some analysis, then we generalize their framework and suggest what can be modified. With the power of vine copula, we can directly model the relationships among multiple stocks. We want to trade based on the information generated from a vine copula model. Similar to those traditional bivariate copulae approaches in pairs trading, we will use the conditional (cumulative) probability to gauge whether a target stock is underpriced or overpriced against other stocks, and then generate trading signals based on them from a mean-reversion bet.

We aim to cover the following topics:

Quick overview of copula-based trading strategies.
Idea and typical workflow of the C-vine copula approach.
Strategy assumptions and details.
Comments and some analysis for this strategy.

Copula is a very flexible tool for modeling dependencies among random variables. Long been used in risk management, it is also a great statistical arbitrage method when coupled with a good execution rule that is not limited to just mean-reversion strategies. From 2010, multiple trading methods involving copula have been developed: from earlier simple bi-variate copula on prices series to recent sophisticated self-adaptive models using low-latency data. It is a growing and dynamic field of research and practice, however, there is little literature reviewing criteria for selecting tradable stocks dedicated solely to copula-based methods.

[Rad et al (2016)] found that the copula pairs-trading method (the version that they implemented) has much better performance in drawdown risk compared to distance and cointegration, however, bad pairs that fail to converge significantly drove down its performance. It is a serious reminder to practitioners that building a suitable portfolio is just (if not more) as important as applying a great trading method, and a less desirable set of securities can quickly ruin a seemingly great strategy. The Vine copula is created to model across multiple random variables and therefore poses a greater challenge in selecting stocks.

Copula is a great statistical tool to study the relation among multiple random variables: By focusing on the joint cumulative density of quantiles of marginals, we can bypass the idiosyncratic features of marginal distributions and directly look at how they are “related”.

Indeed, traders and analysts have been using copula to exploit statistical arbitrage under the pairs trading framework for some time, and we have implemented some of the most popular methods in ArbitrageLab. However, it is natural to expand beyond dealing with just a pair of stocks: There already exist a great amount of competing stat arb methods alongside copula, thinning the potential alpha. It is also intuitive for humans to think about relative pricing among 2 stocks, whereas for higher dimensions it is not so easy, and left great opportunities for quantitative approaches.

Systematic approaches of pairs trading gained popularity from the mid-1980s. Gatev et al (2006) examined the profitability of a distance-based strategy on normalized prices. Cointegration is another common strategy incorporated approach as discussed in [Vidyamurthy (2004)]. Both methods are tied to the idea of a mean-reverting bet, and the trading signals are generated from the spread: when the spread widens, it is expected to narrow, and when it does happen the trader pockets the profit.

We have previously talked about several advantages from copula-based models in Copula for Pairs Trading: A Detailed, But Practical Introduction, and as a tool it analyzes the dependence structure among several random variables (For pairs trading it is just 2 random variables). We quickly summarize it here:

Whether it is for pairs trading or risk management, two natural questions to ask before putting copula for use are: How to draw samples from a copula? How should one fit a copula to data? The necessity of fitting is quite obvious, otherwise, there is no way to calibrate our model for pairs trading or risk analysis using historical data.

For sampling, it is mostly for making a Q-Q plot against the historical data as a sanity check. Note that a copula natively cannot generate future price time series since it treats time series data as independent draws from two random variables, and thus has no information regarding the sequence, which is vital in time series analysis. One way to think about sampling from a copula trained by time series is that it gives the likelihood of where the next data point is going to be, regardless of the input sequence.

Let’s Solve a Mystery.

Suppose that you encountered a promising pair of stocks that move closely together, the spread zig-zagged around 0 like some fine needle stitching that sure looks like a nice candidate for mean-reversion bets. What’s more, you find out that the two stocks’ prices for the past 2 years are all nicely normally distributed. Great! You can avoid some hairy analysis for now. Therefore you fit them as a joint-normal distribution for some sanity check and immediately find that it doesn’t look as promising anymore:

For the past two years, there were some major market events, during which the stocks moved together upwards or downwards, depending on if it was good news or bad news. Your bivariate Gaussian model, in contrast, says that such co-moves are very unlikely to happen since they are so close to the tails of the distribution and you better ignore it. What is more annoying is that the stocks tend to move downward together more than going upward, and the bivariate Gaussian distribution says it should be symmetric.

So what went wrong? For this mini example, there are two major pitfalls present: