Statistics and Its Interface
Volume 11 (2018)
Sparse Bayesian variable selection for classifying high-dimensional data
Pages: 385 – 395
Identifying differentially expressed genes for classifying experiment classes is an important application of microarrays. Methods for selecting important genes are of much significance in accurate classification. Owing to the large number of genes and many of them are irrelevant, insignificant or redundant, standard statistical methods do not work well. The modification of existing methods is needed to achieve better analysis of microarray data. We present a stochastic variable selection approach for gene selection with different two level hierarchical prior distributions for regression coefficients. These priors can be used as a sparsity-enforcing mechanism to perform gene selection for classification. Using simulation-based MCMC methods for simulating parameters from the posterior distribution, an efficient algorithm is developed and implemented. This algorithm is robust to the choices of initial values, and produces posterior probabilities of related genes for biological interpretation. To highlight the potential applications of the proposed approach, we provide examples of the well-known colon cancer data and leukemia data in microarray literature.
sparse priors, stochastic variable selection, classification, high-dimensional data
Supported by the grant of Natural Science Foundation of China (11501294, 11501261), China Postdoctoral Science Foundation (2015M580374, 2016T90398), Natural Science Foundation of Guangdong (2016A030313856), Jiangsu Qinglan Project(2017), Open Project Program of the Key Laboratory of Statistical Information Technology and Data Mining (SDL201704) and Project of Natural Science Research in Jiangsu Province (15KJB110007).
Received 1 March 2014
Published 7 March 2018