2021 IEEE International Symposium on Information Theory

Technical Program

Paper ID	D5-S5-T2.1
Paper Title	Bounds for Learning Lossless Source Coding
Authors	Anders Host-Madsen, University of Hawaii, United States
Session	D5-S5-T2: Lossless Compression
Chaired Session:	Friday, 16 July, 23:20 - 23:40
Engagement Session:	Friday, 16 July, 23:40 - 00:00
Abstract	This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv. The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to encode new data of that type. This is a type of coder that has recently become popular for (lossy) image and video coding. The paper evaluates two criteria for performance of learned coders: the average performance over training data, and a guaranteed performance for all training except for some error probability $P_e$, which is PAC learning. In both cases the coders are evaluated with respect to redundancy. The paper considers the independent identically distributed (IID) binary case and binary Markov chains. In both cases it is shown that the amount of training data required is very moderate: to code sequences of length $l$ the amount of training data required to beat a universal source coder is $m=K\frac{l}{\log l}$, where the constant $K$ depends on the case considered.