Empirical likelihood-based estimation and inference in randomized controlled trials with high-dimensional covariates

In this paper, we propose a data-adaptive empirical likelihood-based approach for treatment effect estimation and inference, which overcomes the obstacle of the traditional empirical likelihood-based approaches in the high-dimensional setting by adopting penalized regression and machine learning methods to model the covariate-outcome relationship. In particular, we show that our procedure successfully recovers the true variance of Zhang’s treatment effect estimator [30] by utilizing a data-splitting technique. Our proposed estimator is proved to be asymptotically normal and semiparametric efficient under mild regularity conditions. Simulation studies indicate that our estimator is more efficient than the estimator proposed by Wager et al. [26] when random forest is employed to model the covariate-outcome relationship. Moreover, when multiple machine learning models are imposed, our estimator is at least as efficient as any regular estimator with a single machine learning model. We compare our method to existing ones using the ACTG175 data and the GSE118657 data, and confirm the outstanding performance of our approach.

Keywords

average treatment effect, datasplitting, machine learning, multiple robustness, semiparametric efficiency bound

Full Text (PDF format)

Ying Yan’s research is supported by the National Natural Science Foundation of China (NSFC) (Grant No. 11901599).

Received 5 October 2020

Accepted 16 June 2021

Published 14 February 2022