High Frequency Market Making & Trading with the help of Machine Learning

AI-HFT will be in backtesting and model training until December 2024.

Our HFT Model is backtesting within crypto, but can also pivot towards traditional markets with the required backtesting.

1) HFT Delayed Learning Curve

High Frequency Trading (HFT) is particularly susceptible to standardized expectations due to the number of executed market orders that market makers and funds use as a source of cash flow. HFT is nowadays even reduced to the secondary activity of funds. There are hardly any algorithmic-quantitative hedge funds specializing in HFT.

Due to the "high frequency" of trades, HFT also brings with it a large data set that can be perfectly analyzed and improved with machine learning. However, at this point in time, an architectural question arises that is of high importance: How does one set up the machine learning so that A) the high frequency of trades can be guaranteed B) a high number of new data sets can be guaranteed C) a high development rate (PAR) can be guaranteed?

Taking an established HFT algorithm in standardized environments like in the limit order markets with respect to strategic trading, liquidity provision and consumption, order profit, market liquidity, price efficiency, price volatility, bid/ask spread and market depth one sees a clear difference between non-AI supported trades (nonMLT) and AI supported trades (MLT) over a medium term period (1,000 executed trades). However, these established HFT algorithms based on MLT have an inexistent learning curve. They do not learn from their mistakes but only reach a higher expected value of profit. Their PAR is therefore zero, making the number of existing data sets on which the MLT could be supported pointless.

Instead of not exploiting the data sets, the Micro AI model of an MLT algorithm takes care to maximize the number of data points. The Micro AI model not only maximizes the number, it creates up to 60 new case studies based on a single trade with different parameters based on several previous trades, which it considers and repeats. A cluster of new information and new results is created, so to speak. The different parameters are so rigorously ranked and improved by repeating them twenty times and executing orders in parallel in a live demo market (Adjusted Parameter Demo Test). Any final parameter MLT model can thus be competed and compared against each other in a demo market. The parameter MLT model with the best performance finally competes in a direct comparison against the original HFT algorithm, which is adjusted in case of outperformance in more than 20 live examples. Thus, a data cluster of different and numerous case studies is created through which the model can be trained.

ShezheaNET's Delayed Learning Algorithm provides a detailed learning environment, but it does not provide the possibility of a constant & continuous learning curve, but a jumpy changing expected value in P&L.

However, if you generalize the PAR in P&L and look at the development on a long-term basis, you are clearly 20%-25% more profitable than traditional MLTs even after the first 100,000 trades.

ShezheaNET process of HFT machine learning via parameters is called HFT Delayed Learning Curve.

2. Information Aggregation Protocol for MLT

During the first demo tests of the ShezheaNET HFT algorithm, the high number of data points caused an extremely high GPU load, so that ShezheaNET could not run the MLT model in the background as a cash flow source as planned, but would have to focus on it. We decided to take a detour: our Information Aggregation Protocol.

The program code is shared publicly below, due to legal regulations that require the publicity of cryptographic encryption of client capital.

// # encodeOrders2.jl



function compressData(volume, liquidity, ticker_price, slippage, order_size)
    data = Dict(
        "Volume" => volume,
        "Liquidity" => liquidity,
        "TickerPrice" => ticker_price,
        "Slippage" => slippage,
        "OrderSize" => order_size
    return JSON.json(data)

function encodeAndSave(compressed_data)
    encoded_data = encodeWithPassword(compressed_data)
    open("VolumeShare.jl", "w") do file
        write(file, encoded_data)
    open("LiquidityShare.jl", "w") do file
        write(file, encoded_data)
    open("TickerPriceShare.jl", "w") do file
        write(file, encoded_data)
    open("slippage.jl", "w") do file
        write(file, encoded_data)
    open("ordersize.jl", "w") do file
        write(file, encoded_data)

data = compressData(100, 200, 300, 400, 500)


Even though the encoding and cryptography of the code does not directly affect the level of performance required in the first execution or the ShezheaNET HFT algorithm, it massively simplifies (-92%) the performance required when calling past trades in similar conditions. This allows ShezheaNET to link similiar market conditions with the same parameters to live accuring market conditions and makes HFT much more efficient.

3. Mitigating Overfitting with Cluster-Training

Overfitting is the most common problem with HFT algorithms. We can observe a worse performance in live-markets than in the demo-backtesting that is naturally based on historical market data. Overfitting is also a big issue for MLT-HFT algorithms. The AI that was trained on historical data suddenly receives live-data that does not correlate with what it learned in the past, resulting in constantly worsening results. A lot of academical work in this field is heavily flawed because it only references historical data in their work. Another key issue with implementing research papers into practical algorithms is the amount of parameters that are required in their models. The more parameters the MLT HFT algorithm refers to when executing trades inside a backtesting environment, the higher the probability of overfitting is. We implemented an autonomous algorithm, called Cluster-Algo, that automatically adjust parameters and takes several trades based on a single historical trade with each trade having different market parameters. This results in efficient & in-depth backtesting that is not reliant on historical data while not loosing the approach of a extremely realistic market situation. The Clusters are usually splitting up a single historical trades in 20-30 individual trades with adjusted parameters. This also multiplies the amount of data-points that the neural net can be trained on. While the amount of parameters is slightly reduced in comparison to traditional academical models we are still including everything from significance.

The Cash-Flow coming from AI-HFT will primarily be used for Fees & Model Training 5% of the cash-flow coming from AI-HFT in the Blockchain sector will directly flow to Buy & Burns of $MAI. ShezheaNET AI-HFT can be rented as a Market Maker by Exchanges (DEX & CEX).

Last updated