|| Asymptotic Analysis of Data Deduplication with a Constant Number of Substitutions
||Hao Lou, Farzad Farnoud, University of Virginia, United States|
||D7-S6-T2: Applications of Source Coding
||Tuesday, 20 July, 23:40 - 00:00
||Wednesday, 21 July, 00:00 - 00:20
Data deduplication has gained attention in large-scale storage systems due to the explosive growth in digital data. Recently, the information-theoretic aspects of conventional deduplication algorithms have been studied and novel algorithms with better performance have been proposed. In this paper, we study the performances of variable-length deduplication and multi-chunk deduplication algorithms from the point of view of information theory. We consider a source model in which source strings are composed of repeated blocks with each data block containing a constant number of substitution edits. We show that over the proposed source model, the variable-length deduplication algorithm can achieve asymptotically arbitrarily large compression ratio and the multi-chunk deduplication algorithm is order optimal under mild conditions.