2021 IEEE International Symposium on Information Theory

Technical Program

Paper ID	D3-S4-T1.3
Paper Title	SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Authors	Navjot Singh, Deepesh Data, University of California, Los Angeles, United States; Jemin George, US Army Research Lab, United States; Suhas Diggavi, University of California, Los Angeles, United States
Session	D3-S4-T1: Distributed Computation II
Chaired Session:	Wednesday, 14 July, 23:00 - 23:20
Engagement Session:	Wednesday, 14 July, 23:20 - 23:40
Abstract	In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that SQuARM-SGD converges at rate O(1/sqrt(nT)), matching that of vanilla distributed SGD. We empirically show that SQuARM-SGD saves significantly in total communicated bits over state-of-the-art without sacrificing much on accuracy.