1. Simulacrum or Shenanigan

Small Description
BQE Seminar

Deep Generative Models and Simulators for Financial Markets

Derek Snow, BQE, 2022 May


What happens when you combine (1) generative adversarial networks, (2) reinforcement learning, and (3) agent-based models to discover trading strategies?


Can we automatically calibrate a complex simulator using a model known for its strength in modelling complex distributions, and then use a model known for its explorative and exploitative nature to automatically find novel trading strategies using the simulator and historical data?
To understand how we do this, will explore explore each of these technologies (1)-(3) separately and then together.

Synthetic Data

(1) Deep generative models use deep learning algorithms to generate synthetic data.
(2) Agent based models use computational models to generate synthetic data.
Synthetic data generators seek to preserve the original or expected data’s statistical features while producing entirely new data points.

Research Progress

Established the ATI-OMI market simulators workshop.
Evaluated and developed datasets for the FCA digital sandbox.
Developed synthetic payments data for HSBC.
Developed MTSS-GAN and DataGene for time series generation.

Generating Models

Deep Generative Techniques

Generative Adversarial Network
Variational Autoencoder
Vanilla Autoencoders
Flow-based models
Energy-based models

Other Generating Techniques

Synthetic Graph Networks
Bayesian Networks
Autoregressive Models
Agent-based Models

Generating Process

Both deep generative models (DGM) and agent-based models (ABM) have shown special relevance as a general tool to improve learning algorithms.
Distributional: this approach involves the calculation or inference of real statistical distributions which is used to generate synthetic data, i.e., deep generative methods.
Structural: the approach first tries to replicate the behavior of the system and then produces data as an output of the model, i.e., agent-based methods.

Data Augmentation

By and large, the most important reason for synthetic data generation is its use in data augmentation.
Data augmentation is any change in the data with the purpose of improving the model.
Synthesizing is a row-to-row method to augment the data to improve the model.

Better Data; Better Model

Machine learning models are data-driven, hence the proliferation of data improvement research.
We are always looking to improve the robustness, accuracy, and even fairness, and privacy of the data-driven model.
Traditionally we would use task specific algorithms to solve these problems, like SMOTE for imbalanced data, and k-­Anonymity for privacy.
More recently deep generative models have replaced these methods with great success.

Imbalanced Data

Many classifiers produce predictions that are biased towards the class with the largest number of samples.
When the classes have wildly different numbers of observations, this can cause the ML algorithm to learn a poor model.
Oversampling techniques try to balance a dataset by artificially increasing the number of observations in the minority class.

SMOTE (Task Specific Solution)

Privacy Improvement

Lately researchers have noticed that generative models like GANs can also produce entirely new data points that preserves the privacy of the original training samples.
We can compare a wide range of synthetic data generation methods out there, like hand-engineering methods, agent-based methods, and deep generative methods.

More Data

Okay, DGMs can help us fix biases and generative ‘private’ data, but can’t it more generally be used as a method to generate more data?
Could simply having access to more data allow us to improve our models?
Is it useful for training and validation?
We can imagine a world where we can run multiple alternative histories to validate the quality of the models we developed.

Traditional Distributional Methods

Historically most synthetic data for tabular data has been generated by treating each column in a table as a random variable, modelling a joint multivariate probability distribution, and then sampling from that distribution [1].
A set of discrete variables may have been modeled using decision trees [2] and Bayesian networks [3, 4].
Spatial data could be modeled with a spatial decomposition tree [5, 6].
A set of non-linearly correlated continuous variables could be modeled using copulas; as highlighted in the image on the right [7, 8].
The synthetic models above are heavily restricted by the type of distributions and by computational issues, severely limiting the synthetic data’s fidelity [1].

Reference I

Deep Generative Models

Adding deep neural networks to the data generation process has been the largest step change in decades of synthetic data generation research.

Modelling Process

Generative Adversarial Networks

GANs are a type of Generative model; it is know to produce state-of-the-art generative outputs, especially for images and video.
This Person Does Not Exists
This Cat Does Not Exists

Training Objective

Given training data, you can generate new samples from some implicit distribution:
 Learn pmodel (x) that approximates pdata (x)\text { Learn } p_{\text {model }}(x) \text { that approximates } p_{\text {data }}(x)
 Sampling new x from pmodel (x)\text { Sampling new } x \text { from } p_{\text {model }}(x)

GAN Architecture

There are two components to a GAN: (1) a generator and (2) a discriminator.
The generator GθG_{\theta} is a model that generates samples x\mathbf{x} from z\mathbf{z}.
The discriminator DϕD_{\phi} is a model that predicts whether x\mathbf{x} is real or fake.
The image on the right is a graphical model of GθG_{\theta} and DθD_{\theta}.
x\mathbf{x} denotes samples (either from data or generator).
z\mathbf{z} denotes our noise vector, and
y\mathbf{y} denotes the discriminator's prediction.
You can think of the generative model a bit like a TV, the neural network produce static, until some signal is received from the discriminator to adjust the generator’s model weights (parameters).
The Discriminator DθD_{\theta} is performing binary classification: it attempts to assign probability 11 to data points from the training set xpdata \mathbf{x} \sim p_{\text {data }}, and probability 00 to generated samples xpG\mathbf{x} \sim p_{G}. The Generator GθG_{\theta} tries to fool the discriminator by generating samples that look indistinguishable from pdata p_{\text {data }}

2-Player Game

GANs don't work with any explicit density function! Instead, they take game-theoretic approach by learning how to generate from training distribution through a 2-player game.
Where a Discriminator network: tries to distinguish between real and fake images,
And a Generator network: tries to fool the discriminator by generating real-looking images.
Discriminator (θd)\left(\theta_{d}\right) wants to maximize objective such that D(x)D(x) is close to 1 (real) and D(G(z))\mathrm{D}(\mathrm{G}(\mathrm{z})) is close to 0 (fake)
Generator (θg)\left(\theta_{\mathrm{g}}\right) wants to minimize objective such that D(G(z))D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z)G(z) is real)

Adversarial Competition

The hope is that as the two networks face off, they'll both get better and better—with the end result being a generator network that produces realistic outputs.
The discriminator always wants to predict that the real sample xx is real 11 D(x)=1D(x)=1
The discriminator always wants to predict that the fake sample G(z)G(z) is fake 00 D(G(z))=0D(G(z))=0
The generator wants to generate a fake sample G(z)G(z) that looks real 11D(G(z))=0D(G(z))=0

GAN Training Pseudocode

Deep Generative Models

Variational Autoencoders

There are a wide range of different methods.
We can as a shorthand refer to implicit and explicit generative models, alluding to how the distributions are being modeled.
Explicit density estimation (e.g., VAEs): explicitly define and solve for pmodel (x)p_{\text {model }}(x)
Implicit density estimation (e.g., GANs): learn model that can sample from pmodel (x)p_{\text {model }}(x) without explicitly defining it.
Both VAEs and GAN models sample zz from a fixed noise source distribution (uniform or Gaussian) to generate new data.
We take the fixed noise zz and pass it through a deep neural network to obtain samples xx.
GANs tend to be much more finicky to train than VAEs, but when they do work they they tend to yield nicer images. That is because VAEs produce more blurry images due to its explicit modelling.
VAE directly learns a distribution from data, there is no generator and discriminator competing against one another.
GANs on the other hand learn from an adversarial feedback loop where only the discriminator has access to the original data, making it a good candidate for differential privacy.


Multivariate Time Series Simulation Generative Adversarial Networks

Snow (2020)
MTSS-GAN is a new generative adversarial network (GAN) developed to simulate diverse multivariate time series (MTS) data with finance applications in mind.
The purpose of this synthesiser is two-fold, we both want to generate data that accurately represents the original data, while also having the flexibility to generate data with novel and unique relationships that could help with model testing and robustness checks.
In practice we can stack multiple layers of GANs, meaning that we are able to capture high level and lower level feature representations that can be adjusted.
The method is inspired by stacked GANs originally designed for image generation. Stacked GANs have produced some of the best quality images, for that reason MTSS-GAN is expected to be a leading contender in multivariate time series generation.

Reinforcement Learning for Trading

Training Process

The right hand side of the graphic shows the process of developing and reinforcement learning strategy.
The quant’s job is to:
Preprocess market data
Build a training environment
Backtest trading performance


Reinforcement learning process draws on a larger process of automation.
Reinforcement learning only has four steps if you include paper trading, compare that to the seven of supervised learning.
Compared to supervised learning which answers the question, “will the asset increase in price tomorrow?”; reinforcement learning answers the question, “should I buy the asset today?”. The reinforcement learning algorithm is therefore already packaged as a trading strategy.


Many professional players of Go now looks at an RL AI to identify new moves that humans have never played before, so we can also expect the AI to uncover new trading strategies.
“But it also contained several moves made by both man and machine that were outlandish, brilliant, creative, foolish, and even beautiful.”

Towards Automation Survey

Discretionary → Rule-based → Supervised Learning → RL With Hist. Data → RL With Sim. Data

(A.1) Discretionary Trading Process

(A.2) Rule-based (heuristic programming/expert systems)

‘’A computer will never tell you to buy one stock and sell another… (there is) no substitute …for flair in judgement, and a sense of timing.’’ - Wall Street Journal, April 23, 1962, p. 4
Heuristic programming is not unlike the 20 person team that were said to translate Ray Dalio's unique worldview into algorithms.
Employee has referred to it as trying to make Ray's brain into a computer.
Steven Cohen at Point72 is also testing models that mimics the trades of its portfolio managers. And Paul Tudor has also assigned a team of coders to create ‘’Paul in a Box’’.
In Alchemy and Artificial Intelligence, 1965, Hubert L Dreyfus writes, ‘’In problem solving once the problem is structured an planned a machine could take over to work our the details…as in the case of machine shop allocation or investment banking’’

(B) Supervised learning (new regime)

(1) Imagine a situation in which a highly skilled human trader operating in a major financial market has a device installed on her trading station, a small black box.
(2) The box records all the data provided to the trader via her screen and audio/voice lines...
(3) The box learns purely by observation of the inputs to the trader (market data and other information) and her outputs (various order types).
(4) The box starts to automatically and autonomously issue a stream of orders to the market, trading in the style of the human trader whose activity it has been monitoring. And its trading performance matches or exceeds that of the human trader!! At this point the services of the human trader are no longer required.
Is this possible, yes of course, that is the purpose of this course, here is a paper describing this process, Automated Creation of a High Performing Algorithmic Trader (ArXiv).

(C) Reinforcement learning (new-new regime)

The previous robot simply learned from an order book to decide what quote to issue, learning from input-output examples by copying a human.
This is like the first version of Alpha Go; Deep Mind trained neural networks with millions of human played games first using supervised learning.
The Al was able to mimic an average human player to a certain degree (that’s a bit boring, what if humans are bad??)
Next, came the reinforcement learning stage. They played the Al against itself millions of times to give the Al more practice and time to explore which moves are the best.
The only things input into the Al were the black and white stones and the rules of the game (no labelled examples were provided).
Because there was no supervised Iearning done, the Al had to be creative explore a wider action space, and come up with its own strategies and techniques.
For example in a financial context, instead of teaching a computer how to Spoof, it can come up with its own market manipulation strategies.

(C.1) RL with Historical/Simulated Data

When you read a reinforcement learning paper in finance its 99%99 \% likely that an historical data method is being used (as of 2021).
Reinforcement learning agents can be trained and tested on historical data, in this approach real data is revealed to the agents as time advances, and online learning is used to develop strategies.
There is a weakness to this, because the market data (environment state) both at the training stage and in any form of backtest (market replay), does not respond to the actions of the agent, the performance can look overly optimistic.
The smart money knows this weakness and institutional banks, and hedge funds prefer to test their strategies in a market simulator. This is not just for trading strategies, but also for optimal execution and placement strategies.
For example a pension fund might have to sell an asset, and if they place the entire position as a sell-order on the market, the price would fall and return a lower average price than expected, where as an optimal order would have spread such order out over time.

(C.2) RL with Market Simulator (i.e. Artificial Stock Market)

In 2017 Jane Street published a technical presentation of their own financial exchange, the motivation of which was to test new algorithms and models. It has been reported to handle messages in the rate of 500k/second500 \mathrm{k} / \mathrm{second} with latencies in the single digit microsecond.
Similarly some of the market simulators like ABIDES, ESL, and MAXE that we will investigate below have been sponsored by JPM and the Man Group.
Offers a range of benefits:
Environment feedback effects with impacts on the state of the environment (transaction costs & market impact)
Expose agents to situations that might not have occurred historically, but could occur in the future.
Have a larger repository of data (unlimited) to optimize the trading agents.
Run counterfactual in-silica trials for financial what-if questions → good for regulatory policy and business profits.
Understand the logic behind individual traders adding to agent action interpretability.
When Renaissance's Medallion fund reached $600mn\$ 600 \mathrm{mn} it was only trading futures. Henry B. Laufer a former academic modeled the fund's impact on the market and concluded that the returns would wane if it managed more money, leading to the fund getting into other asset classes. Trading effects are real and should therefore be accounted for in a hedging or trading strategy.
Why are we interested in the feedback (further exploration)?

Agent Based Models

The two main reason to develop agent-based market (LOB) simulators include Reinforcement learning and Controlled Experimentation.
Multi-agent simulation presents a natural bottom-up approach to emulating agent interaction in financial markets.
It allows to set up pools of traders with diverse strategies to mimic the financial market trader population, and test the performance of new experimental strategies.

Reinforcement Learning

Agent-based market simulations have been re-energised for a number of reasons, but one would be mistaken not to acknowledge the growth of reinforcement learning as a primary reason for this resurgence.
Studying historical market data without interventions only offers one the ability to construe correlations and associations without being able to isolate cause-and-effect relationships.
Historical data also only offers a limited amount of data to train one, and perhaps most importantly historical data doesn’t offer feedback effects (e.g., market impact).
Breakthroughs in the field of RL have been largely facilitated by the development of open source simulators such as OpenAI Gym and its Atari environments.

Controlled Experimentation

The difficulty in performing controlled experiments is one of the major obstacles for empirical finance to transition into an axiomatic theory (Focardi, 1997).
Earlier agent-based market simulators was instrumental in promoting ideas in evolutionary finance (Farmer & Lo, 1999).
The simulators can run millions of in-silico trials to test counterfactual theories, study emergent phenomena, and train and test algorithms (Miles & Cliff, 2019).
The combination of ABMs with machine learning could once more improve and aid our understanding of the functioning of modern day financial markets.

Open Source Software

Market simulators of late have undergone many changes to improve the engineering performance and the realism of market simulators.
There is a growing trend towards making simulation software being made publicly available, and in this paper, we will compare these simulators from an engineering and realism perspective.
In this section we compare modern open-source market simulator solutions. We only look at multi-agent simulators that can simulate multiple markets.
Execution Delay
N //S
C+ +
C+ +
Agent Library
Matching Library
Calibration Tools

Research Waves

The first wave of market simulators in the 1990s was a deliberate move away from classical economic theories to advance financial market knowledge.
The second wave was a reaction to the failure of economic models in foreseeing the financial crises of 2008
The third wave was a call to understand high frequency trading and the flash-crashes in 2010 and 2013 .
The fourth and current wave combines the concerns with the past, but emphasizes the use of simulators to train machine learning agents.

Research Streams

The research streams that have gone on to develop open-source market simulators come from a range of disciplines.
Those from robotics, like Tucker Balch, take a more applied and experimental approach to building simulators.
The computer scientists, like Michael Wooldridge, take a more formal mathematical approach;
The game-theorists like, Michael Welman, take a more empirical game theoretic approach;
The complexity scientists, like Doyne Farmer, are flexible around their framework of choice.

Defining a realistic simulator

The development of realistic simulators is essential, as a small environmental inconsistencies can invalidate the results (Rollins & Cliff, 2020).
The problem with most simulators is that they can only handle a small number of assets, a small number of agents, a small number of diverging strategies.
A realistic environment is stochastic, sequential, dynamic, continuous, and has multiple agents.
Even more important, a realistic simulator has outputs that resemble those that we would expect in the financial market.

Calibrating and testing a simulator

It is often the case that the agent-specific data is unavailable, and only the time series that result from interaction of multiple market agents (such as price and volume time series) are directly observable to the general public.
There are a number of ways to calibrate and test the quality of market simulators under these conditions:
Train machine learning models on a computer-generated data and compare their performance when applied in the real world.
Make use of realism metrics, by comparing the real and synthetic market.

Realism Metrics

Currently, the majority of ABMs are calibrating parameters to stylized facts for example the fat tails (excess kurtosis).
A recent research paper presents a review of these LOB characteristics which can be referred to as stylized facts (Vyetrenko et al., 2019).
Model power is expressed with reference to the model's ability to jointly produce as many stylized facts as possible.
Optimization with respect to stylized facts can be difficult as it is challenging to design an explicit optimization objective function due to the overlapping nature of stylized facts and lack of clarity over which stylized facts should be given more optimization weight.

Stylized Facts

GAN-based Realism

In order to create a realistic environment for training reinforcement learning agents, why don’t we lean on GAN models to tune a non-differentiable simulator?
This would be a two-step process:
We start off training a GAN models on historical mid-price data, and receive back a Discriminator that can ‘tell’ whether the data generated is real or fictitious.
Armed with the Discriminator, we can evaluate the mid-price outputs of a simulator, the probability which can be used calibrate the simulator.
If GANs were used without the agent-based models to generate realistic order book data, all explainability would be lost.
The proposed method will develop simulators from a predefined agent universe in order to retain explainability of the agent-based system.
This method does not simulate the market using GANs, but rather use GAN discriminator to calibrate a market simulator constructed using agent-based approach. GANs is used as a tool for simulation engineering!

MAS-GAN (2021)

Victor Storchan, Svitlana Vyetrenko, Tucker Balch
This is not science fiction, and the AI team at JP Morgan have shown that is possible using both mid-price and volume as a regulator.
The GAN Discriminator is used to optimize a Discrete event model that simulates a Limit Order Book via three distinct agent behaviors.
As such the Discriminator becomes the objective function for calibrating a multi-agent system.
The outputs of the generator and simulator is assessed for both mid price and volume stylized facts.


Any sensible parameter can be exposed for the calibration process:
It could be at a high level, where we select the type and the number of agents of that type.
It could be at the low level, looking at agent arrival rates, and order sizes.
The MAS-GAN model simply calibrate the number of the type of agents, market makers, noise agents, and value agents; so they are not tweaking the nitty gritty knobs.
Mean discriminator score heatmap with respect to the noise agent number NN^{*} and value agent arrival rate λ\lambda^{*}. For each grid configuration, we run 20 simulations with different seeds for initialization of pseudo-random number generator.
The discriminator score is higher if the synthetic time series shares more features with the historical dataset. Grid-based optimization used.
One can then use the discriminator score as an implicit optimization objective, and optimize simulated model parameters to determine the parameter set that produces most realistic time series (see)


As of now, we still need to some explanatory method to measure the realism of the GAN-calibrated ABM, and currently stylized facts from different time resolutions are used.
A problem with this method is that there is still a disconnect with the ultimate goal which is the discovery of trading strategies.
An argument can be made that the calibration should be tied in with a reinforcement learning model that trains on fake and tests on real.
Could tweak more low level parameters to further improve the realism of the market.
The MAS-GAN method only uses three well-known agents, but many more exist like the heuristic belief learners and momentum traders (Gjerstad, 2007).
Although there now exists RL-enables open source simulators, they are generally just useful for modelling a single security.

The Pipeline

The best libraries (like FinRL) supports a range of RL agent and market environments in a plug-and-play manner.
Within the environment layer we can wrap historical data and live trading APIs of hundreds of markets into training environments.
It is also possible to define many different market environments for financial reinforcement learning.
When improving reinforcement learning models, we have to think of the changes we can perform across the entire pipeline. We could adjust and improve at the Data Layer, the Environment Layer, and the Agent Layer.
What models to use?
From FinRL-Podracer paper on multiple constituents, see ArXiv https://arxiv.org/abs/2111.05188