Paper ID | D7-S6-T2.1 |
Paper Title |
Asymptotic Analysis of Data Deduplication with a Constant Number of Substitutions |
Authors |
Hao Lou, Farzad Farnoud, University of Virginia, United States |
Session |
D7-S6-T2: Applications of Source Coding |
Chaired Session: |
Tuesday, 20 July, 23:40 - 00:00 |
Engagement Session: |
Wednesday, 21 July, 00:00 - 00:20 |
Abstract |
Data deduplication has gained attention in large-scale storage systems due to the explosive growth in digital data. Recently, the information-theoretic aspects of conventional deduplication algorithms have been studied and novel algorithms with better performance have been proposed. In this paper, we study the performances of variable-length deduplication and multi-chunk deduplication algorithms from the point of view of information theory. We consider a source model in which source strings are composed of repeated blocks with each data block containing a constant number of substitution edits. We show that over the proposed source model, the variable-length deduplication algorithm can achieve asymptotically arbitrarily large compression ratio and the multi-chunk deduplication algorithm is order optimal under mild conditions.
|