We’re excited to announce the latest release of MLFinlab, our suite of composable components and production-ready algorithms for developing trading strategies that leverage machine learning.
Highlights of this release includes Python 3.9 support, usability improvements to our volatility estimators, productivity improvements when generating bars from tick data, as well as a brand new API reference page. Some smaller changes include updates to more examples, and converting additional examples to doctest for increased reliability and testability.
We’ll break down some of the major changes below, and how they will impact users who are upgrading.
Python 3.9 support
This release officially marks the inclusion of Python 3.9 support to MLFinlab. This has been a sought-after improvement for some time, and we are happy to finally announce its general availability. This also lays the foundation for future Python 3.10 support, which is currently on the roadmap.
DatetimeIndex for creating bars
We’ve updated the behaviour of the data_structures
module to now set the date_time
column as a DatetimeIndex
, instead of a separate column:
>>> from mlfinlab.data_structures import time_data_structures >>> # Get processed tick data csv from url >>> tick_data_url = "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/processed_tick_data.csv" >>> time_bars = time_data_structures.get_time_bars( ... tick_data_url, resolution="D", verbose=False ... ) >>> time_bars tick_num open high low close volume cum_buy_volume cum_ticks cum_dollar_value date_time 2023-03-02 13642 804.25 808.0 803.0 807.00 49208 25816 13641 3.963169e+07 2023-03-03 66322 807.00 812.5 806.0 810.75 171378 94591 52680 1.388561e+08
Why have we made this change? Most of the downstream functions expect to receive input as a DataFrame
, where the index is the date_time
. Users would have to manually set the date_time
column as the index each and every time after creating their bars. This is tedious, is easily forgotten, and can lead to cryptic downstream errors. As a result, we’ve updated the default behaviour to do this automatically.
We’ve also updated all of our examples to reflect this new behaviour.
If your current code does manually set the date_time
as the index, you’ll need to remove these parts from your code to prevent an error from being raised. Fortunately removing the code shouldn’t be a difficult process.
Improvements to volatility estimators
Previously, volatility estimators in our relied on the implicit column order when performing calculations. This is error-prone, and wasn’t documented clearly. Unfortunately this made it possible to get erroneous results without warning. As a result, we’ve updated the volatility estimators to instead use column names directly, thus ensuring correct calculations. Depending on the estimator in question, we now check for the following lowercase column names in order to perform the estimation: 'open', 'high', 'low', 'close'
. This may require you to update your code, depending on how you’ve named your columns.
We’ve also updated all of our examples to reflect this new approach, so you can refer to them for additional context:
>>> import pandas as pd >>> import yfinance as yf >>> from mlfinlab.features.volatility_estimators import parkinson >>> # Retrieve the DataFrame with time series of returns >>> ohlc = pd.read_csv( ... "https://raw.githubusercontent.com/hudson-and-thames/example-data/main/yahoo_finance_SPY_2012-03-26_to_2023-12-06.csv", ... parse_dates=[0], ... index_col=0, ... ) >>> ohlc.columns = [ ... col.lower() for col in ohlc.columns ... ] # volatility estimators expect lower-case column names >>> list(ohlc.columns) ['open', 'high', 'low', 'close', 'adj close', 'volume'] >>> # Calculate volatility with Parkinson estimator (Sinclair formula) >>> parkinson_sinclair = parkinson(ohlc, 22, False) >>> # Calculate volatility with Parkinson estimator (De Prado formula) >>> parkinson_deprado = parkinson(ohlc, 22, True)
API Reference Docs
We’ve added an API reference to our documentation, which serves as a useful reference for every single module, class and function in MLFinlab. If you’re already familiar with a component, and just need to reference the API, this is a quick way of doing so without having to scroll through the entire user-guide. It’s also a new way to explore what’s on offer in MLFinlab. A demo of the API reference can be seen below:
We believe this is a very useful addition to the documentation.
Changes to ml_cross_val_score
We’ve slightly tweaked the ml_cross_val_score
function to now require the require_proba
argument as input. Previously, require_proba
had a default value of False
, which led to accidental errors and user confusion when passing in a scoring function that required probabilities as input rather than labels. Since the argument was optional, users were unaware that they had to do a explicitly specify require_proba=True
. This has now been rectified, and require_proba
is no longer an optional argument.
Other improvements
We’ve updated a number of examples, and updated links to our notebooks, which are now available on a public repository. Not everything has been migrated across yet, given the immense size of our documentation, but we will continue to steadily do so as we continue to improve and update the various modules in MLFinlab.
Upgrading
As always, you can follow the installation instructions in the documentation for installing the latest version of MLFinlab.