I am currently unemployed. I used to be an AI researcher in deep reinforcement learning. I wrote two works improving the optimization stability of off-policy gradient-based Q-learning algorithms.
-
Stabilizing Q-Learning for Continuous Control
David Yu-Tung Hui
MSc Thesis, University of Montreal, 2022
I described two principles for creating stable deep learning algorithms and applied the principles to deep reinforcement learning. The principles were 1) maximum entropy, from which many deep learning loss functions are derived and 2) the neural tangent kernel, which provides a convergence analysis justifying the use of normalization layers and the ReLU activation function. In RL, I used maximum entropy to justify the design of a Q-learning family of algorithms and showed that LayerNorm reduced divergence of these algorithms, especially in high-dimensional continuous control problems.
[.pdf] [Errata] -
Double Gumbel Q-Learning
David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon
Spotlight at NeurIPS 2023
We showed that using deep neural networks in Q-learning introduces two heteroscedastic Gumbel noise sources. An algorithm modeling these noise sources attained just under 2 times the aggregate asymptotic performance of the popular SAC baseline.
[.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Code (GitHub)] [Errata]
The best way to contact me is email. My email address is listed in one of my written works.