|| Support Estimation with Sampling Artifacts and Errors
||Eli Chien, Olgica Milenkovic, University of Illinois Urbana-Champaign, United States; Angelia Nedich, Arizona State University, United States|
||D1-S4-T3: Structures & Inference
||Monday, 12 July, 23:00 - 23:20
||Monday, 12 July, 23:20 - 23:40
The problem of estimating the support of a distribution is of great importance in many areas of machine learning, computer science and molecular biology. Almost all of the existing work in this area has used perfectly accurate sampling assumptions, which is seldom true in practice. Here we introduce the first known theoretical approach to support estimation in the presence of sampling artifacts, where each sample is assumed to be observed through a Poisson channel that simultaneously captures repetitions and deletions. The proposed estimator is based on regularized weighted Chebyshev approximations, with weights governed by evaluations of Touchard (Bell) polynomials. The supports in the presence of sampling artifacts are calculated via discretized semi-infinite programming methods. The newly proposed estimation approach is tested on synthetic and GISAID data for the purpose of estimating the mutational diversity of genes in the SARS-Cov-2 viral genome. For all experiments performed, we observed significant improvements of our integrated method compared to adequately modified known noiseless support estimation methods.