Statistics and Its Interface

Volume 5 (2012)

Number 2

Bayesian areal wombling using false discovery rates

Pages: 149 – 158

DOI: https://dx.doi.org/10.4310/SII.2012.v5.n2.a1

Authors

Sudipto Banerjee (School of Public Health at the University of Minnesota, Minneapolis, Minnesota, U.S.A.)

Bradley P. Carlin (School of Public Health at the University of Minnesota, Minneapolis, Minnesota, U.S.A.)

Pei Li (Medtronic, Inc., Minneapolis, Minnesota, U.S.A.)

Alexander M. McBean (School of Public Health at the University of Minnesota, Minneapolis, Minnesota, U.S.A.)

Abstract

Spatial data arising in public health services are often reported as case counts or rates aggregated over areal regions (e.g. counties, census-tracts or ZIP codes), rather than being referenced with respect to the geographical coordinates of individual residences. For such areal data, subsequent inferential interest often resides in the formal identification of “barriers”, or “difference boundaries”, on the map, where “boundary” refers to a border with sharp changes in outcome on either side. This boundary detection problem is often referred to as “wombling” or, more specifically, “areal wombling” for aggregated areal data, after a foundational article by Womble (1951). Existing statistical frameworks for areal wombling usually follow a two stage procedure: (i) estimate the spatial effects from an appropriate spatial model, and (ii) detect boundaries from those estimates using appropriate discrepancy metrics on those estimates. Lu and Carlin (2005), and several subsequent articles, explored areal wombling within this framework.

This article treats wombling as a hypothesis-testing problem, where we are testing a substantial number of hypotheses – one for each geographical boundary – and seek to provide policy-makers and analysts with a final set of difference boundaries. Here we must reckon with a lurking multiplicity problem arising from the large number of individual hypothesis we are testing. We proffer a computationally feasible framework to estimate hierarchical spatial models that account for dependence between adjacent regions and test for equality of spatial effects, while adjusting for multiplicities using false discovery rates (FDR); see, e.g., Benjamini and Hochberg (1995). A simulation study is conducted to first illustrate and assess the new approach, which is then applied to detect boundaries on a county map of Minnesota that records pneumonia and influenza hospitalization rates from the SEER-Medicare program.

Keywords

areal data, Bayesian inference, hierarchical models, false discovery rates, spatial moving averages

2010 Mathematics Subject Classification

Primary 62F15, 62H11. Secondary 62F03.

Published 15 May 2012