Breaking the Sample Size Barrier in Reinforcement Learning: From Generative Models to Online RL

-
Yuxin Chen, Princeton University
Fine Hall 214

This event is in-person and open only to Princeton University ID holders

This talk is concerned with the sample efficiency of reinforcement learning (RL). Despite a large number of prior works, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined.  In particular, prior results in many basic RL settings suffer from an enormous sample size barrier, in the sense that their claimed statistical optimality holds only when the sample size exceeds a large threshold. This, however, presents a major roadblock to achieving efficient RL in sample-limited scenarios. 

In this talk, I will discuss how to overcome such sample size barriers in two basic settings. The first setting assumes access to an idealistic generative model (or simulator), and we show that a perturbed model-based approach is minimax optimal for this scenario. The second setting is concerned with online RL that requires balancing exploration and exploitation; we show that a model-free algorithm with early-settled variance reduction is provably sample- and memory-efficient. Our results cover two distinctive RL paradigms and might shed light on the efficacy of these algorithms in more complicated scenarios. 

This talk is based on joint work with Gen Li, Laixi Shi, Yuejie Chi, Yuantao Gu, and Yuting Wei (https://arxiv.org/abs/2005.12900 and https://arxiv.org/pdf/2110.04645). 

Yuxin Chen is currently an assistant professor in the Department of Electrical and Computer Engineering at Princeton University and is affiliated with Applied and Computational Mathematics, Computer Science, and Center for Statistics and Machine Learning. Prior to joining Princeton, he was a postdoctoral scholar in the Department of Statistics at Stanford University, and he completed his Ph.D. in Electrical Engineering at Stanford University. His research interests include high-dimensional statistics, mathematical optimization, and reinforcement learning. He has received the Princeton graduate mentoring award, Princeton SEAS junior faculty award, the AFOSR Young Investigator Award, the ARO Young Investigator Award, the ICCM best paper award (gold medal), and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization.