Statistics and Its Interface

Volume 8 (2015)

Number 4

An extended Tajima’s D neutrality test incorporating SNP calling and imputation uncertainties

Pages: 447 – 456

DOI: https://dx.doi.org/10.4310/SII.2015.v8.n4.a4

Authors

Qingrun Zhang (Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, N.Y., U.S.A.)

Chris Tyler-Smith (The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom)

Quan Long (Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, N.Y., U.S.A.)

Abstract

To identify evolutionary events from the footprints left in the patterns of genetic variation in a population, people use many statistical frameworks, including neutrality tests. In datasets from current high throughput sequencing and genotyping platforms, it is common to have missing data and lowconfidence SNP calls at many segregating sites. However, the traditional statistical framework for neutrality tests does not allow for these possibilities; therefore the usual way of treating missing data is to ignore segregating sites with missing/ low confidence calls, regardless of the good SNP calls at these sites in other individuals. In this work, we propose a modified neutrality test, Extended Tajima’s D, which incorporates missing data and SNP-calling uncertainties. Because we do not specify any particular error-generating mechanism, this approach is robust and widely applicable. Simulations show that in most cases the power of the new test is better than the original Tajima’s D, given the same type I error. Applications to real data show that it detects fewer outliers associated with low quality data. The downloadable executable as well as the documentation can be found at google-code: https://code.google.com/p/robust-scan/.

Keywords

neutrality test, Tajima’s D, missing genotype, next generation sequencing

Published 19 October 2015