Statistics and Its Interface
Volume 1 (2008)
Partially Bayesian variable selection in classification trees
Pages: 155 – 167
Tree-structured models for classification may be split into two broad categories: those that are completely datadriven and those that allow some direct user interaction during model construction. Classifiers such as CART  and QUEST  are members of the first category. In those datadriven algorithms, all predictor variables compete equally for a particular classification task. However, in many cases a subject-area expert is likely to have some qualitative notion about their relative importance. Interactive algorithms such as RTREE  address this issue by allowing users to select variables at various stages of tree construction. In this paper, we introduce a more formal partially Bayesian procedure for dynamically incorporating qualitative expert opinions in the construction of classification trees. An algorithm that dynamically incorporates expert opinion in this way has two potential advantages, each improving with the quality of the expert. First, by de-emphasizing certain subsets of variables during the estimation process, machine-based computational activity can be reduced. Second, by giving an expert’s preferred variables priority, we reduce the chance that a spurious variable will appear in the model. Hence, our resulting models are potentially more interpretable and less unstable than those generated by purely data-driven algorithms.
feature selection, expert opinion, supervised learning