Home

Álvaro Cartea

Likes
181
Date
2022/11/25
ML Score
3
Job
University of Oxford | Professor
Content
When using exploration, a.k.a. randomized controls, in reinforcement learning, how can you judiciously decide on what distribution to draw from? One common answer is to use the Gibbs measure induced by the Q-function. Rather, in my recent paper with Ryan Donnelly, we show how to determine the optimal exploration distribution when you reward exploration with Tsallis entropy in environments with latent factors in both discrete and continuous time. We also develop a Q-learning paradigm and run some experiments in an algorithmic trading example. Happy for any feedback.https://lnkd.in/gNUQPwK6 #reinforcementlearning #machinelearning #datascience #algorithmictrading #entropy #mathematicalfinance #quantitativefinance #learning
Property
Integromat
Comments
5
Type
this
TOP