Statistics and Its Interface

Volume 3 (2010)

Number 1

A weighted cluster kernel PCA prediction model for multi-subject brain imaging data

Pages: 103 – 111



Ying Guo (Department of Biostatistics and Bioinformatics, Rollins School of Public Health of Emory University, Atlanta, Georgia, U.S.A.)


Brain imaging data have shown great promise as a useful predictor for psychiatric conditions, cognitive functions and many other neural-related outcomes. Development of prediction models based on imaging data is challenging due to the high dimensionality of the data, noisy measurements, complex correlation structures among voxels, small sample sizes, and between-subject heterogeneity. Most existing prediction approaches apply a dimension reduction method such as PCA on whole brain images as a preprocessing step. These approaches usually do not take into account the cluster structure among voxels and between-subject differences. We propose a weighted cluster kernel PCA predictive model that addresses the challenges in brain imaging data. We first divide voxels into clusters based on neuroanatomic parcellation or data-driven methods, then extract cluster-specific principal features using kernel PCA and define the prediction model based on the principal features. Finally, we propose a weighted estimation method for the prediction model where each subject is weighted according to the percent of variance explained by the principal features. The proposed method allows assessment of relative importance of various brain regions in prediction; captures nonlinearity in feature space; and helps guard against overfitting for outlying subjects in predictive model building. We evaluate the performance of our method through simulation studies. A real fMRI data example is also used to illustrate the method.


kernel PCA, prediction, multisubject data, cluster, functional magnetic resonance imaging (fMRI), weighted estimation

Full Text (PDF format)