Signed support recovery for single index models in high-dimensions

In this paper we study the support recovery problem for single index models $Y = f (\boldsymbol{X}^{\intercal} \beta , \varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X} \sim N_p (0, \mathbb{I}_p)$ and $\beta$ is an $s$-sparse unit vector such that $\beta_i \in \{ \pm \dfrac{1}{\sqrt{s}} , 0 \}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by [24]; and (b) a semi-definite programming (SDP) approach inspired by [1]. When $s = O (p^{1-\delta})$ for some $\delta \gt 0$, we demonstrate that both procedures can succeed in recovering the support of $\beta$ as long as the rescaled sample size $\Gamma = \dfrac{n}{s \log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $\Gamma$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\beta$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations.

Keywords

single index models, sliced inverse regression, sparsity, support recovery, high-dimensional statistics, semidefinite programming

2010 Mathematics Subject Classification

Primary 62G99. Secondary 62H99.

Full Text (PDF format)

Published 25 July 2016