Statistics and Its Interface

Volume 13 (2020)

Number 2

Sampling High Dimensional Tables with Applications to Assessing Linkage Disequilibrium

Pages: 157 – 166

DOI: https://dx.doi.org/10.4310/SII.2020.v13.n2.a2

Authors

Robert D. Eisinger (Department of Statistical Science, Duke University, Durham, North Carolina, U.S.A.)

Xiao Su (Wells Fargo, Charlotte, North Carolina, U.S.A.)

Yuguo Chen (Department of Statistics, University of Illinois at Urbana-Champaign, Il., U.S.A.)

Abstract

We propose a sequential importance sampling strategy to sample high dimensional tables with fixed one way margins. The proposal distribution for the method is constructed by adapting an approximation to the total number of tables available in the literature. We apply the method to estimating the total number of tables and assessing linkage disequilibrium in multimarker genetic data with the table representing haplotype count data. We demonstrate efficient and accurate performance in these practical, real-world examples. The method may be applied in any situation in which uniformly sampling high dimensional tables with fixed one way margins is of interest. Detailed derivations are provided in the appendix.

Keywords

Counting problem, Exact test, High dimensional table, Linkage disequilibrium, Monte Carlo method, Sequential importance sampling.

Yuguo Chen was partially supported by the NSF grant DMS-1406455.

Received 12 April 2019

Received revised 27 July 2019

Accepted 18 September 2019

Published 30 January 2020