Statistics and Its Interface
Volume 12 (2019)
Network-incorporated integrative sparse linear discriminant analysis
Pages: 149 – 166
Linear discriminant analysis (LDA) has been extensively applied in classification. For high-dimensional data, results generated from a single dataset may be unsatisfactory because of the small sample size. Under the regression framework, integrative analysis, which pools and analyses raw data from multiple datasets, has presented superior performance than single dataset analysis and meta-analysis. In this study, we conduct integrative analysis for LDA (iLDA). A network structure for variables is constructed to accommodate their interconnections, which have not been considered in many of the existing classification studies. We adopt the $1$-norm group MCP method for simultaneous estimation and discriminative variable selection, and a Laplacian penalty to incorporate the network. The proposed method has intuitive formulations and can be computed using an effective coordinate descent algorithm. Simulation study shows that iLDA outperforms benchmarks with more accurate variable identification and classification. Analysis of three breast cancer datasets demonstrate that iLDA can improve prediction performance.
integrative analysis, discriminant analysis, network
We would like to thank the editor and reviewers for their useful comments and suggestions, which have led to a significant improvement of this study. Wang’s work was supported by the National Natural Science Foundation of China (71601076), Humanity and Social Science Youth Foundation of Ministry of Education of China (16YJCZH104), and Social Science Foundation of Hunan Province (15YBA085). Zhang’s work was supported by the Fundamental Research Funds for the Central Universities (20720171064, 20720181003). Ma’s work was supported by the National Bureau of Statistics of China (2016LD01).
Received 3 February 2018
Published 26 October 2018