Statistics and Its Interface

Volume 4 (2011)

Number 4

Optimal false discovery rate control for dependent data

Pages: 417 – 430

DOI: https://dx.doi.org/10.4310/SII.2011.v4.n4.a1

Authors

T. Tony Cai (Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Penn., U.S.A.)

Hongzhe Li (Department of Biostatistics and Epidemiology, School of Medicine, University of Pennsylvania, Philadelphia, Penn., USA)

John Maris (Department of Pediatrics, School of Medicine, University of Pennsylvania, Philadelphia, Penn., U.S.A.)

Jichun Xie (Department of Statistics, The Fox School of Business and Management, Temple University, Philadelphia, Pennsylvania, U.S.A.)

Abstract

This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used $p$-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several $p$-value-based false discovery rate controlling procedures.

Keywords

large scale multiple testing, marginal rule, optimal oracle rule, weighted classification

Published 17 November 2011