Annals of Mathematical Sciences and Applications

Volume 1 (2016)

Number 2

Signed support recovery for single index models in high-dimensions

Pages: 379 – 426



Matey Neykov (Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, U.S.A.)

Qian Lin (Center of Mathematical Sciences and Applications, Harvard University, Cambridge, Massachusetts, U.S.A.)

Jun S. Liu (Department of Statistics, Harvard University, Cambridge, Massachusetts, U.S.A.)


In this paper we study the support recovery problem for single index models $Y = f (\boldsymbol{X}^{\intercal} \beta , \varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X} \sim N_p (0, \mathbb{I}_p)$ and $\beta$ is an $s$-sparse unit vector such that $\beta_i \in \{ \pm \dfrac{1}{\sqrt{s}} , 0 \}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by [24]; and (b) a semi-definite programming (SDP) approach inspired by [1]. When $s = O (p^{1-\delta})$ for some $\delta \gt 0$, we demonstrate that both procedures can succeed in recovering the support of $\beta$ as long as the rescaled sample size $\Gamma = \dfrac{n}{s \log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $\Gamma$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\beta$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations.


single index models, sliced inverse regression, sparsity, support recovery, high-dimensional statistics, semidefinite programming

2010 Mathematics Subject Classification

Primary 62G99. Secondary 62H99.

Published 25 July 2016