Statistics and Its Interface
Volume 2 (2009)
Robust genome-wide scans with genetic model selection using case-control design
Pages: 145 – 151
In a genome-wide association study with more than 100, 000 (100K) to 1 million single nucleotide polymorphisms (SNPs), the first step is usually a genome-wide scan to identify candidate chromosome regions for further analyses. The goal of the genome-wide scan is to rank all the SNPs based on their association tests or p-values and select the top SNPs. A good ranking procedure ranks the SNPs with true associations as near to the top as possible. This enhances the probability of selecting at least one SNP with a true association. However, if the disease-associated SNPs have moderate genetic effects, the probability that a large number of null SNPs will have extremely small p-values (or large test statistics) is high when screening more than 300K SNPs. Therefore, when selecting a small fraction of top SNPs (usually less than 5%), the probability of selecting at least one SNP with a true association is usually less than 80% unless the sample size is large. Robust statistics have been proposed to rank all the SNPs (e.g., MAX3 and MIN2). In this article we consider genome-wide scans with a genetic model selection and compare this proposed method to the existing approaches. Results from simulation studies are presented.
case-control design, efficiency robustness, genetic model selection, genome-wide studies, MAX
2010 Mathematics Subject Classification
Primary 62G10, 62G35. Secondary 62G30, 62P10.