Communications in Mathematical Sciences
Volume 20 (2022)
Reinforced optimal control
Pages: 1951 – 1978
Least-squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay, Commun. Math. Sci., 18(1):109–121, 2020] proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems including Markov Decision processes, while considerably improving the method’s efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.
Monte Carlo, optimal control, regression, reinforcement learning
2010 Mathematics Subject Classification
C.B., P.H, P.P. J.S. and V.S. were supported by the MATH+ project AA4-2 Optimal control in energy markets using rough analysis and deep networks. D.B. gratefully acknowledges the support of the German Science Foundation research grant (DFG Sachbeihilfe) 497300407. Results of Section 6 were obtained under the support of the RSF grant 19-71-30020 (HSE University).
Received 31 May 2021
Received revised 29 January 2022
Accepted 5 February 2022
Published 21 October 2022