Statistics and Its Interface
Volume 11 (2018)
Dimension reduction for big data
Pages: 295 – 306
Dimension reduction is aimed at reducing the dimension of a high dimensional vector-valued explanatory variables and simultaneously preserves its relationship with a univariate or low-dimensional real-valued response. As one of the oldest and most well-known dimension reduction approaches, principal component analysis (PCA) has been extensively used in high dimensional data analysis in applications. Classical PCA approaches cannot be applied to big data because of memory and storage barriers. Using a technique called scanning data by rows, the article proposes a new PCA approach. It shows that the proposed PCA approach can provide exact solutions when the size of observed data exceeds the memory size of a computing system.
big data, dimension reduction, generalized linear models, parallel computation, principal component analysis, scanning data by rows
2010 Mathematics Subject Classification
Primary 62H25. Secondary 62J12.
Received 5 August 2016
Published 7 March 2018