Statistics and Its Interface
Volume 6 (2013)
Testing the statistical significance of an ultra-high-dimensional naïve Bayes classifier
Pages: 223 – 229
The naïve Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.
binary predictor, hypothesis testing, naïve Bayes, supervised learning, text classification, ultra-high-dimensional data
2010 Mathematics Subject Classification
Published 10 May 2013