Statistics and Its Interface

Volume 11 (2018)

Number 4

More accurate semiparametric regression in pharmacogenomics

Pages: 573 – 580



Yaohua Rong (College of Applied Sciences, Beijing University of Technology, Beijing, China)

Sihai Dave Zhao (Department of Statistics, University of Illinois at Urbana-Champaign, Il., U.S.A.)

Ji Zhu (Department of Statistics, University of Michigan, Ann Arbor, Mi., U.S.A.)

Wei Yuan (School of Statistics, Renmin University of China, Beijing, China)

Weihu Cheng (College of Applied Sciences, Beijing University of Technology, Beijing, China)

Yi Li (West China Hospital, Chengdu, China; and Department of Biostatistics, University of Michigan, Ann Arbor, Mi., U.S.A.)


A key step in pharmacogenomic studies is the development of accurate prediction models for drug response based on individuals’ genomic information. Recent interest has centered on semiparametric models based on kernel machine regression, which can flexibly model the complex relationships between gene expression and drug response. However, performance suffers if irrelevant covariates are unknowingly included when training the model. We propose a new semi-parametric regression procedure, based on a novel penalized garrotized kernel machine (PGKM), which can better adapt to the presence of irrelevant covariates while still allowing for a complex nonlinear model and gene-gene interactions. We study the performance of our approach in simulations and in a pharmacogenomic study of the renal carcinoma drug temsirolimus. Our method predicts plasma concentration of temsirolimus as well as standard kernel machine regression when no irrelevant covariates are included in training, but has much higher prediction accuracy when the truly important covariates are not known in advance. Supplemental materials, including $\mathrm{R}$ code used in this manuscript, are available online at $\href{}{\small{\texttt{}}}$.


kernel machine, semiparametric regression, model selection

2010 Mathematics Subject Classification

Primary 62G05, 62H20, 62J07, 62P10. Secondary 62G08, 62H12, 62J02, 92B15.

Rong’s work was partially supported by National Natural Science Foundation of China (No. 11701021), National Statistical Science Research Project (No. 2017LZ35), Fundamental Research Foundation of Beijing University of Technology, Introduction of Talent Research Start-up Foundation and Beijing Outstanding Talent Foundation (No. 2014000020124G047); Zhao’s work was partially supported by NSF grant DMS-1613005; Li’s work was partially supported by NIH grant U01CA209414.

Received 4 September 2017

Published 19 September 2018