2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

IEEE Information Theory Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

All Dates/Times are Australian Eastern Standard Time (AEST)

Technical Program

Paper Detail

Paper ID	D4-S3-T3.1
Paper Title	Actor-only Deterministic Policy Gradient via Zeroth-order Gradient Oracles in Action Space
Authors	Harshat Kumar, University of Pennsylvania, United States; Dionysios S Kalogerias, Michigan State University, United States; George J Pappas, Alejandro Ribeiro, University of Pennsylvania, United States
Session	D4-S3-T3: Reinforcement Learning
Chaired Session:	Thursday, 15 July, 22:40 - 23:00
Engagement Session:	Thursday, 15 July, 23:00 - 23:20
Abstract	Deterministic policies demonstrate substantial empirical success over their stochastic counterparts as they remove a level of randomness in Policy Gradient (PG) methods when applied to stochastic search problems involving Markov decision processes. However, current implementations require the use of state-action value (Q-function) approximators, also known as critics, to obtain estimates of the associated policy-reward gradient. In this work, we propose the use of two-point stochastic evaluations to obtain gradient estimate of a smoothed Q-function surrogate, constructed by evaluating pairs of the Q-function at low-dimensional, randomized initial action perturbations. This procedure lifts the dependence on a critic and restores true model-free policy learning, with provable algorithmic stability. In fact, our finite complexity bounds improve upon existing results by up to 2 orders of magnitude in terms of iteration complexity, and by up to 3/2 orders of magnitude in terms of sample complexity. Simulation results on an agent navigation problem showcase the effectiveness of our proposed algorithm in a practical setting, as well.