3.4 Triple-Barrier Labeling
In the majority of the literature, authors will make use of a labeling scheme where they classify the next periods directional move as either a 1 for a positive move, a -1 for a negative move, and some authors may add a threshold level that if the return is not above or below it, then a 0 label is provided.
This technique has a few flaws. First the threshold level is usually static and stock returns are known to be heteroskedastic, the volatility changes over time and a fixed threshold value fails to account for this. Second, using this {-1, 0, 1} scheme fails to account for positions that would have been closed by stop loss or profit taking orders.
A more advanced technique such as the Triple Barrier method (Lopez de Prado 2018), addresses these concerns and I am sure that many of you will agree – it makes more sense.
In derivatives pricing, a series of stock prices can be modeled using Geometric Brownian Motion. Similarly in the Triple Barrier method, we assume that stock prices follow a random walk with some drift and variance, we then label this path.
At a given time stamp, 3 barriers are set. An upper and lower horizontal barrier to represent a take profit and stop loss levels. A third and vertical barrier is placed to represent the end of the duration of the trade.
Should the path of a stock reach the upper barrier before the vertical then a value of 1 is returned, conversely if it reaches the bottom barrier then a -1, however should the stock price reach the vertical barrier first then a 0 is returned. This is still a {-1. 0, 1} scheme, however we are labeling a path of returns rather than the next directional move.
The horizontal barriers are determined by calculating the daily standard deviation of the log returns multiplied by a user defined multiple. For example a [1, 1] tuple will set both barriers to be equal to 1 standard deviation.
The following figure provides an example:
Figure 3: Triple Barrier Labeling (Lopez de Prado 2018)
In chart (a) we can see that the lower horizontal barrier is first reached, a -1 value is returned. In chart (b) the path never reaches the horizontal barriers and triggers a 0 label when the vertical barrier is reached.
3.5. Fitting a Primary Model
The primary model is the component that determines which side of the trade to take. It generates a signal {-1, 0, 1}. Where -1 is a short position, 1 is a long position, and 0 means to close all positions.
This model could be but not limited to:
- Statistical arbitrage model based on the spread between two assets.
- Machine learning model such as an SVM or Neural Network.
- Fundamental value or events based strategy where the portfolio manager generates the signal.
- Rules based, technical trading strategy such as moving average crossovers.
The only requirement is that a signal is generated which is used to determine the side of the position. We look to meta labeling and bet sizing to determine the size of the position.
The following two sections discuss the technical analysis inspired strategies we used.
3.5.1. Trend Following
A simple moving average crossover strategy is employed. The idea behind this strategy is to make use of two moving averages to help smooth out the noise in the data and then determine when a trend is in affect.
Traditionally a slow 200 day and a fast 50 day moving average are used. When the fast moving average crosses above the slow, a buy signal (1) is generated. Conversely when the fast crosses below the slow then a sell signal (-1) is generated. Under this scheme, there is always a long or a short side active, i.e. no 0 signals. The figure below shows an example of this.
Figure 4: SMA Crossover Strategy
The green upward arrows indicate when a long (buy) signal is in affect and a red downward arrow a short (sell) signal.
For the primary trend following model we implemented a 20 and 50 bar SMA crossover strategy. Remember that we reduced the number of events by making use of the CUSUM filter, because of this we need much shorter SMA periods to capture the short term trends that may be in affect, and provide more current information to the secondary model since the vertical barrier is set to a single day.
3.5.3. Mean Reversion
The second primary model is based on mean reversion and makes use of Bollinger Bands. Bollinger Bands are a technical analysis indicator which creates bands around the price level which are more than x standard deviations away, where x is a user defined multiple.
The principal is that stock prices are log normally distributed and thus we can make use of the Empirical rule which states that 99.7% of the data lies within 3 standard deviations, 95% within 2 and 68% within 1 standard deviation. Should the closing price be above say 2 standard deviations then we generate short signal (-1) on the premise that prices should mean revert in the near term. The reverse is also true, if prices are below 2 standard deviations a buy signal is generated (1).
The figure below shows an example of a traditional Bollinger band strategy.
Figure 5: Bollinger Band Mean Reversion Strategy
The green upward arrows indicate when a long (buy) signal is in affect and a red downward arrow a short (sell) signal.
Typically a position is held until the price reaches the moving average but in our case, because we are using the triple barrier method, a position is held until one of the three barriers are touched.
3.6. Meta Labeling
The central idea is to create a secondary machine learning (ML) model that learns how to use the primary exogenous model. This leads to improved performance metrics, including: Accuracy, Precision, Recall, and F1-Score. For those readers who are interested in building up a deeper intuition around meta-labeling, the following blog post illustrates a toy example. We would like to stress the importance of this concept and see it as a major contribution of Dr Lopez de Prado work.
Use in Financial Machine Learning
Meta labeling in finance follows the same principles as we outlined in the toy example on the MNIST dataset. First we make use of a primary model, in this case a simple trend following or mean reverting strategy, to determine the position of the trade. Then we fit a Random Forest meta-label model to the primary model to determine when to trade or not.