Optimal transport for generative models

Optimal transport plays a fundamental role in deep learning. Natural datasets have intrinsic patterns, which can be summarized as the manifold distribution principle: a natural class of data can be treated as a probability distribution on a low dimensional manifold, embedded in a high dimensional ambient space. A deep learning system mainly accomplishes two tasks: manifold learning and probability distribution transformation.

Given a manifold $X$, all the probability measures on $X$ form an infinite dimensional manifold $\mathcal{P}(X)$. Optimal transport assigns a Riemannian metric on $\mathcal{P}(X)$, the so-called Wasserstein metric, and defines Otto’s calculus, such that variational optimization can be carried out in $\mathcal{P}(X)$. A deep learning system learns the distribution by optimizing some functional in $\mathcal{P}$, therefore optimal transport lays down the theoretic foundation for deep learning.

This work introduces the theory of optimal transport and the profound relation between Brenier’s theorem and Alexandrov’s theorem in differential geometry via Monge–Ampère equation. We give a variational proof for Alexandrov’s theorem, and convert the proof to a computational algorithm to solve the optimal transport map. The algorithm is based on computational geometry and can be generalized to general manifold setting.

Optimal transport theory and algorithms have been extensively applied in the models of Generative Adversarial Networks (GANs). In a GAN model, the generator computes the OT map, while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution. The optimal transport theory shows the competition between the generator and the discriminator is completely unnecessary and should be replaced by collaboration. Furthermore, the regularity theory of optimal transport map explains the intrinsic reason for mode collapsing.

A novel generative model is introduced, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE‑OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapse.

Full Text (PDF format)

Published 16 August 2022