The use of the risk percentile curve in the analysis of epidemiologic data

Economists and social scientists have used percentile-based curves, e.g., the Lorenz curve, to summarize data from positive random variables, especially skewed data such as income. Measures of interest, e.g., the Gini index of relative inequality, correspond to areas defined by the curves. In this paper we explore the usefulness of risk-percentile and related curves in epidemiology, especially when the exposure data is skewed. These curves are defined and risk measures, e.g. the population attributable risk are related to areas under them for data from either a cohort or a case-control study. Regression spline methods of estimating these curves are used as they do not require a pre-specified risk model. The concepts are illustrated by analyzing data from a cohort study of dietary red meat consumption and all-cause mortality and a case-control study of serum homocysteine level and colorectal cancer. These examples show that the risk percentile curves often are more useful than presenting the risk as a function of the raw exposure data as the later graph is often dominated by the tails when the data is skewed. Furthermore, the risk percentile curve is more informative than the commonly used method of presenting the average risk in categories defined by several fixed percentiles such as quartiles or quintiles. Indeed, the risk averages for these categories can be obtained from the risk-percentile curve.

Keywords

absolute risk, population attributable risk, logistic regression, Cox proportional hazard regression, case-control study, cohort study, attributable risk reduction curve, expectancy curve, survey data

Full Text (PDF format)

Published 1 January 2009