Communications in Information and Systems

Volume 22 (2022)

Number 1

Sequential universal modeling for non-binary sequences with constrained distributions

Pages: 1 – 38

DOI: https://dx.doi.org/10.4310/CIS.2022.v22.n1.a1

Authors

Michael Drmota (Technische Universität (TU) Wien, Austria)

Gil I. Shamir (Google Inc., Mountain View, California, U.S.A.)

Wojciech Szpankowski (Purdue University, West Lafayette, Indiana, U.S.A.)

Abstract

Sequential probability assignment and universal compression go hand in hand. We propose sequential probability assignment for non-binary (and large alphabet) sequences with distributions whose parameters are known to be bounded within a limited interval. Sequential probability assignment algorithms are essential in many applications that require fast and accurate estimation of the maximizing sequence probability. These applications include learning, regression, channel estimation and decoding, prediction, and universal compression. On the other hand, constrained distributions introduce interesting theoretical twists that must be overcome in order to present efficient sequential algorithms. Here, we focus on universal compression for memoryless sources, and present a precise analysis for the maximal minimax and the (asymptotic) average minimax redundancy for constrained distributions. We show that our sequential algorithm based on modified Krichevsky–Trofimov (KT) estimator is asymptotically optimal up to $O(1)$ for both maximal and average redundancies. In addition, we provide precise asymptotics of the minimax redundancy for monotone distributions which is a special case of the constrained distribution. This paper follows and addresses some challenges presented in [17] that suggested ‘results for the binary case lay the foundation to studying larger alphabets”.

Keywords

information theory, minimax redundancy, constrained distribution, analytic combinatorics

This work was supported in part by the Austrian Science Foundation grant FWF Grant SFB F50-02, and in part by NSF Center on Science of Information Grant CCF-0939370 and NSF Grants CCF-2006440, CCF-2007238, and Google Research Award.

Received 7 June 2020

Published 7 February 2022