Statistics and Its Interface

Volume 9 (2016)

Number 4

Special Issue on Statistical and Computational Theory and Methodology for Big Data

Guest Editors: Ming-Hui Chen (University of Connecticut); Radu V. Craiu (University of Toronto); Faming Liang (University of Florida); and Chuanhai Liu (Purdue University)

Preprocessing solar images while preserving their latent structure

Pages: 535 – 551



Nathan M. Stein (Department of Statistics, The Wharton School, University of Pennsylvania, Philadelpha, Penn., U.S.A.)

David A. van Dyk (Statistics Section, Mathematics Department, Imperial College London, United Kingdom)

Vinay L. Kashyap (High Energy Astrophysics Division, Harvard–Smithsonian Center for Astrophysics, Cambridge, Massachusetts, U.S.A.)


Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of $\sim 10$ segments rather than on each of $\sim 10^7$ pixels, reducing computing time by a factor of $\sim 10^6$. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015.


clustering, decision theory, dissimilarity measure, Hellinger distance, image segmentation, latent structure, solar physics, space weather

Published 14 September 2016