Release Announcement: MLFinlab v2.1.0

We’re excited to announce the latest release of MLFinlab, our suite of composable components and production-ready algorithms for developing trading strategies that leverage machine learning.

Highlights of this release includes Python 3.9 support, usability improvements to our volatility estimators, productivity improvements when generating bars from tick data, as well as a brand new API reference page. Some smaller changes include updates to more examples, and converting additional examples to doctest for increased reliability and testability.

We’ll break down some of the major changes below, and how they will impact users who are upgrading.

Python 3.9 support

This release officially marks the inclusion of Python 3.9 support to MLFinlab. This has been a sought-after improvement for some time, and we are happy to finally announce its general availability. This also lays the foundation for future Python 3.10 support, which is currently on the roadmap.

DatetimeIndex for creating bars

We’ve updated the behaviour of the data_structures module to now set the date_time column as a DatetimeIndex, instead of a separate column:

>>> from mlfinlab.data_structures import time_data_structures
>>> # Get processed tick data csv from url
>>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv"
>>> time_bars = time_data_structures.get_time_bars(
...     tick_data_url, resolution="D", verbose=False
... )
>>> time_bars  
            tick_num    open   high    low   close  volume  cum_buy_volume  cum_ticks  cum_dollar_value
date_time
2023-03-02     13642  804.25  808.0  803.0  807.00   49208           25816      13641      3.963169e+07
2023-03-03     66322  807.00  812.5  806.0  810.75  171378           94591      52680      1.388561e+08

Why have we made this change? Most of the downstream functions expect to receive input as a DataFrame, where the index is the date_time. Users would have to manually set the date_time column as the index each and every time after creating their bars. This is tedious, is easily forgotten, and can lead to cryptic downstream errors. As a result, we’ve updated the default behaviour to do this automatically.

We’ve also updated all of our examples to reflect this new behaviour.

If your current code does manually set the date_time as the index, you’ll need to remove these parts from your code to prevent an error from being raised. Fortunately removing the code shouldn’t be a difficult process.

Improvements to volatility estimators

Previously, volatility estimators in our relied on the implicit column order when performing calculations. This is error-prone, and wasn’t documented clearly. Unfortunately this made it possible to get erroneous results without warning. As a result, we’ve updated the volatility estimators to instead use column names directly, thus ensuring correct calculations. Depending on the estimator in question, we now check for the following lowercase column names in order to perform the estimation: 'open', 'high', 'low', 'close'. This may require you to update your code, depending on how you’ve named your columns.

We’ve also updated all of our examples to reflect this new approach, so you can refer to them for additional context:

>>> import pandas as pd
>>> import yfinance as yf
>>> from mlfinlab.features.volatility_estimators import parkinson
>>> # Retrieve the DataFrame with time series of returns
>>> ohlc = pd.read_csv(
...     "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/yahoo_finance_SPY_2012-03-26_to_2023-12-06.csv",
...     parse_dates=[0],
...     index_col=0,
... )
>>> ohlc.columns = [
...     col.lower() for col in ohlc.columns
... ]  # volatility estimators expect lower-case column names
>>> list(ohlc.columns)
['open', 'high', 'low', 'close', 'adj close', 'volume']
>>> # Calculate volatility with Parkinson estimator (Sinclair formula)
>>> parkinson_sinclair = parkinson(ohlc, 22, False)
>>> # Calculate volatility with Parkinson estimator (De Prado formula)
>>> parkinson_deprado = parkinson(ohlc, 22, True)

API Reference Docs

We’ve added an API reference to our documentation, which serves as a useful reference for every single module, class and function in MLFinlab. If you’re already familiar with a component, and just need to reference the API, this is a quick way of doing so without having to scroll through the entire user-guide. It’s also a new way to explore what’s on offer in MLFinlab. A demo of the API reference can be seen below:

We believe this is a very useful addition to the documentation.

Changes to ml_cross_val_score

We’ve slightly tweaked the ml_cross_val_score function to now require the require_proba argument as input. Previously, require_proba had a default value of False, which led to accidental errors and user confusion when passing in a scoring function that required probabilities as input rather than labels. Since the argument was optional, users were unaware that they had to do a explicitly specify require_proba=True. This has now been rectified, and require_proba is no longer an optional argument.

Other improvements

We’ve updated a number of examples, and updated links to our notebooks, which are now available on a public repository. Not everything has been migrated across yet, given the immense size of our documentation, but we will continue to steadily do so as we continue to improve and update the various modules in MLFinlab.

Upgrading

As always, you can follow the installation instructions in the documentation for installing the latest version of MLFinlab.