Statistics and Its Interface

Volume 11 (2018)

Number 3

Interpretable selection and visualization of features and interactions using Bayesian forests

Pages: 503 – 513

DOI: https://dx.doi.org/10.4310/SII.2018.v11.n3.a12

Authors

Viktoriya Krakovna (DeepMind, London, England, United Kingdom)

Chenguang Dai (Department of Statistics, Harvard University, Cambridge, Massachusetts, U.S.A.)

Jun S. Liu (Department of Statistics, Harvard University, Cambridge, Massachusetts, U.S.A.)

Abstract

In analysis of scientific data, it is often of interest to learn which features and feature interactions are relevant to the prediction task. We present here Selective Bayesian Forest Classifier, which strikes a balance between predictive power and interpretability by simultaneously performing classification, feature selection, feature interaction detection and visualization. It builds parsimonious yet flexible models using tree-structured Bayesian networks, and samples an ensemble of such models using Markov chain Monte Carlo. We build in its feature selection capability by dividing the trees into two groups according to their relevance to the outcome of interest. Our method performed competitively compared to top classification algorithms on both simulated data sets and real data sets in terms of classification accuracy, and often outperformed these methods in terms of feature selections and interaction visualizations.

Keywords

feature selection, interaction visualization, Bayesian forest

This research was partially supported by the NSF grant DMS-1613035 and the NIH grant R01GM122080.

Received 9 November 2017

Published 17 September 2018