2021 IEEE International Symposium on Information Theory

Technical Program

Paper ID	D4-S4-T1.3
Paper Title	Efficient and Robust Distributed Matrix Computations via Convolutional Coding
Authors	Anindya Bijoy Das, Aditya Ramamoorthy, Namrata Vaswani, Iowa State University, United States
Session	D4-S4-T1: Coded Distributed Matrix Multiplication II
Chaired Session:	Thursday, 15 July, 23:00 - 23:20
Engagement Session:	Thursday, 15 July, 23:20 - 23:40
Abstract	Distributed matrix computations are well-recognized to suffer from the problem of stragglers (slow or failed worker nodes). The majority of prior work in this area has presented straggler mitigation strategies that are (i) either sub-optimal in terms of their straggler resilience, or (ii) suffer from numerical problems, i.e., there is a blow-up of round-off errors in the decoded result owing to the high condition numbers of the corresponding decoding matrices. This work introduces a novel solution framework, based on embedding the computations into the structure of a convolutional code, that removes these limitations. Our approach is provably optimal in terms of its straggler resilience, and has excellent numerical robustness which can be theoretically quantified by deriving a computable upper bound on the worst case condition number over all possible decoding matrices. All above claims are backed up by extensive experiments done on the AWS cloud platform.