Paper ID | D7-S2-T2.3 |
Paper Title |
Universal Graph Compression: Stochastic Block Models |
Authors |
Alankrita Bhatt, University of California San Diego, United States; Ziao Wang, University of British Columbia, Canada; Chi Wang, Microsoft Research, Redmond, United States; Lele Wang, University of British Columbia, Canada |
Session |
D7-S2-T2: Compression & Graphs |
Chaired Session: |
Tuesday, 20 July, 22:20 - 22:40 |
Engagement Session: |
Tuesday, 20 July, 22:40 - 23:00 |
Abstract |
Motivated by the prevalent data science applications of processing large-scale graph data such as social networks, web graphs, and biological networks, as well as the high I/O and communication costs of storing and transmitting such data, this paper investigates universal compression of data appearing in the form of a labeled graph. In particular, we consider a widely used random graph model, stochastic block model (SBM), which captures the clustering effects in social networks. A universal graph compressor is proposed, which achieves the optimal compression rate for a wide family of SBMs with edge probabilities from $O(1)$ to $\Omega(1/n^{2-\e})$ for any $0<\e <1$. Existing universal compression techniques are developed mostly for stationary ergodic one-dimensional sequences with entropy linear in the number of variables. However, the adjacency matrix of SBM has complex two-dimensional correlations and sublinear entropy in the sparse regime. These challenges are alleviated through a carefully designed transform that converts two-dimensional correlated data into almost i.i.d. blocks. The blocks are then compressed by a Krichevsky-Trofimov compressor, whose length analysis is generalized to arbitrarily correlated processes with identical marginals.
|