Statistics and Its Interface
Volume 14 (2021)
A residual-based approach for robust random forest regression
Pages: 389 – 402
We introduce a novel robust approach for random forest regression that is useful when the conditional distribution of the response variable, given predictor values, is contaminated. Residual analysis is used to identify unusual response values in training data, and the contributions of these values are down-weighted accordingly. This approach is motivated by a robust fitting procedure first proposed in the context of locally weighted polynomial regression and scatterplot smoothing. We demonstrate that tuning the parameter in the robustness algorithm using a weighted crossvalidation approach is advantageous when contamination is suspected in training data responses. We conduct extensive simulations, comparing our method to existing robust approaches, some of which have not been compared to one another in prior studies. Our approach outperforms existing techniques on noisy training datasets with response contamination. While no approach is uniformly optimal, ours is consistently competitive with the best existing approaches for robust random forest regression.
data contamination, robustness, random forest
2010 Mathematics Subject Classification
Primary 62G35. Secondary 62G08.
Received 27 January 2020
Accepted 18 December 2020
Published 8 July 2021