Statistics and Its Interface
Volume 15 (2022)
Sparsity-restricted estimation for the accelerated failure time model
Pages: 1 – 18
In many biomedical studies, such as high-throughput microarray or RNA-sequencing (RNA-seq) gene expression analyses, it is of practical interest to link gene expression profiles to censored survival phenotypes, for example, time to cancer recurrence or time to death. With the number of genes greatly exceeding the sample size and the nuances of survival data such as right censoring, regularized methods that combine the rank-based loss function and the penalty are often used to identify relevant prognostic biomarkers and yield parsimonious prediction models for event times. Existing penalization methods for survival data often use $\ell_1$ penalty to approximate the sparsity, yielding numerical convenience for its convexity. In practice, however, the $\ell_1$ approximation also leads to an inflated model size to achieve a desired cross-validated prediction error when compared to the ideal sparsity-restricted method. In this paper, we consider sparsity-restricted estimation in the accelerated failure time (AFT) model for censored survival data. An efficient and fast two-stage procedure that uses a convex regularized Gehan rank regression and a simple hard-thresholding estimation is proposed for its numerical implementation. The effectiveness of the proposed method is demonstrated by extensive simulation studies and real-data applications.
AFT model, LASSO, penalty, prediction, sparsity, survival data
The work of Jinfeng Xu was supported by the University of Hong Kong, Zhejiang Institute of Research and Innovation Seed Fund, and by the Hong Kong General Research Fund (17308820).
Received 21 August 2019
Accepted 13 March 2021
Published 11 August 2021