Statistics and Its Interface

Volume 1 (2008)

Number 1

LASSO-Patternsearch algorithm with application to ophthalmology and genomic data

Pages: 137 – 153

DOI: http://dx.doi.org/10.4310/SII.2008.v1.n1.a12

Authors

Barbara Klein (Department of Ophthalmology and Visual Science, University of Wisconsin, Madison Wisc., U.S.A.)

Ronald Klein (Department of Ophthalmology and Visual Science, University of Wisconsin, Madison Wisc., U.S.A.)

Kristine Lee (Department of Ophthalmology and Visual Science, University of Wisconsin, Madison Wisc., U.S.A.)

Weiliang Shi (Department of Statistics, University of Wisconsin, Madison, Wisc., U.S.A.)

Grace Wahba (Department of Statistics, University of Wisconsin, Madison, Wisc., U.S.A.)

Stephen Wright (Department of Computer Science, University of Wisconsin, Madison, Wisc., U.S.A.)

Abstract

The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act as a model selector, is used at both steps. We applied the method to myopia data from the population-based Beaver Dam Eye Study, exposing physiologically interesting interacting risk factors. We then applied the the method to data from a generative model of Rheumatoid Arthritis based on Problem 3 from the Genetic Analysis Workshop 15, successfully demonstrating its potential to efficiently recover higher order patterns from attribute vectors of length typical of genomic studies.

Full Text (PDF format)