Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees

To join via Zoom: To join this seminar virtually, please request Zoom connection details from headsec@stat.ubc.ca

Title: Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees

Abstract: Reinforcement learning (RL) is a pillar for modern artificial intelligence and data-driven decision making. Compared to classical statistical learning, several new statistical phenomena arise from RL problems, leading to different trade-offs in the choice of the estimators, tuning of their parameters, and the design of computational algorithms. In many settings, asymptotic and/or worst-case theory fails to provide the relevant guidance.

In this talk, I present recent advances in optimal algorithms for reinforcement learning. The bulk of this talk focuses on function approximation methods for policy evaluation. I establish a novel class of optimal and instance-dependent oracle inequalities for projected Bellman equations, as well as efficient computational algorithms achieving them under different settings. Among other results, I will highlight how the instance-dependent guarantees guide the selection of tuning parameters in temporal different methods. Drawing on this perspective, I will also discuss a novel class of stochastic approximation methods, yielding optimal statistical guarantees for solving the Bellman optimality equation. At the end of this talk, I will discuss additional works on optimal and instance-dependent guarantees for functional estimation with off-policy data.

Event Type
Location
ESB 4192 / Zoom
Speaker
Wenlong Mou, PhD student, Department of Electrical Engineering and Computer Sciences, University of California Berkeley
Event date time
-