Statistics and Its Interface

Volume 10 (2017)

Number 2

Correcting length-bias in gene set analysis for DNA methylation data

Pages: 279 – 289

DOI: https://dx.doi.org/10.4310/SII.2017.v10.n2.a11

Authors

Shaoyu Li (Department of Mathematics and Statistics, University of North Carolina, Charlotte, N.C., U.S.A.)

Tao He (Department of Mathematics, San Francisco State University, San Francisco, California, U.S.A.)

Iwona Pawlikowska (Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tennessee, U.S.A.)

Tong Lin (Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tennessee, U.S.A.)

Abstract

The enrichment analysis of pre-defined gene sets is a widely used tool to extract functional information in association studies. However, traditional methods give biased results on genome-wide DNA methylation data due to the different number of probes in genes. In this article, we present MethylSet, a novel two-step procedure which combines gene based association analysis with logistic regression model for enrichment analysis to correct bias induced by gene size. The adjustment of gene size effect is crucial because irrelevant gene sets may be identified otherwise. Our simulation studies showed that MethylSet has a well-controlled type I error rate and promising statistical power. When applied to a real DNA methylation data set, MethylSet was able to obtain meaningful gene sets associated with the studied disease outcome.

Keywords

epigenome-wide association study (EWAS), length bias, logistic kernel machine regression, gene set analysis

Published 31 October 2016