Non-Bayesian Parametric Missing-Mass Estimation
Shir Cohen (Ben Gurion University of the Negev); Tirza S Routtenberg (Ben Gurion University of the Negev); Lang Tong ()
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the constrained Cramér-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators of the entire probability mass function
(pmf) vector, does not provide a relevant bound for missing-mass estimation. In this paper, we introduce a non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We derive a non-Bayesian CCRB-type lower bound on the missing-mass MSE (mmMSE), named the missing mass CCRB (mmCCRB), based on the missing-mass unbiasedness. The proposed mmCCRB can be used for system design and for the performance evaluation of existing estimators. Moreover, based on the mmCCRB, we propose a new method to improve estimators by an iterative missing-mass Fisher-scoring method. Finally, we demonstrate via numerical simulations that the biased mmCCRB is a valid and informative lower bound on the mmMSE of state-of-the-art estimators for this problem: the CML, asymptotic profile maximum likelihood (aPML), Good-Turing, and Laplace estimators. We also show that the mmMSE and missing-mass bias of the Laplace estimator is reduced by using the new missing-mass Fisher-scoring method.