2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

IEEE Information Theory Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Symposium on Information Theory

12-20 July 2021 • Melbourne, Victoria, Australia

All Dates/Times are Australian Eastern Standard Time (AEST)

Technical Program

Paper Detail

Paper ID	D2-S5-T3.3
Paper Title	Understanding Entropic Regularization in GANs
Authors	Daria Reshetova, Stanford University, United States; Yikun Bai, Xiugang Wu, University of Delaware, United States; Ayfer Ozgur, Stanford University, United States
Session	D2-S5-T3: Neural Networks I
Chaired Session:	Tuesday, 13 July, 23:20 - 23:40
Engagement Session:	Tuesday, 13 July, 23:40 - 00:00
Abstract	Generative Adversarial Networks (GANs) are a popular method for learning distributions from data by modeling the target distribution as a function of a known distribution. The function, often referred to as the generator, is optimized to minimize a chosen distance measure between the generated and target distributions. One commonly used measure for this purpose is the Wasserstein distance. However, Wasserstein distance is hard to compute and optimize, and in practice entropic regularization techniques are used to facilitate its computation and improve numerical convergence. The influence of regularization on the learned solution, however, remains not well-understood. In this paper, we study how several popular entropic regularizations of Wasserstein distance impact the solution learned by a Wasserstein GAN in a simple benchmark setting where the generator is linear and the target distribution is high-dimensional Gaussian. We show that entropy regularization of Wasserstein distance promotes sparsification of the solution, while replacing the Wasserstein distance with the Sinkhorn divergence recovers the unregularized solution. The significant benefit of both regularization techniques is that they remove the curse of dimensionality suffered by Wasserstein distance. We show that in both cases the optimal generator can be learned to accuracy $\epsilon$ with $O(1/\epsilon^2)$ samples from the target distribution without requiring to constrain the discriminator. We thus conclude that these regularization techniques can improve the quality of the generator learned from empirical data in a way that is applicable for a large class of distributions.