Communications in Mathematical Sciences

Volume 16 (2018)

Number 5

A semi-supervised heat kernel pagerank MBO algorithm for data classification

Pages: 1241 – 1265



Ekaterina Merkurjev (Department of Mathematics and CMSE, Michigan State University, East Lansing, Mi., U.S.A.)

Andrea L. Bertozzi (Department of Mathematics, University of California at Los Angeles)

Fan Chung (Department of Mathematics, University of California at San Diego)


We present an accurate and efficient graph-based algorithm for semi-supervised classification that is motivated by recent successful threshold dynamics approaches and derived using heat kernel pagerank. Two different techniques are proposed to compute the pagerank, one of which proceeds by simulating random walks of bounded length. The algorithm produces accurate results even when the number of labeled nodes is very small, and avoids computationally expensive linear algebra routines. Moreover, the accuracy of the procedure is comparable with or better than that of state-of-the-art methods and is demonstrated on benchmark data sets. In addition to the main algorithm, a simple novel procedure that uses heat kernel pagerank directly as a classifier is outlined. We also provide detailed analysis, including information about the time complexity, of all proposed methods.


heat kernel pagerank, graphs, graph Laplacian, threshold dynamics, random walk

2010 Mathematics Subject Classification

35K05, 35K08, 35R02, 60J75, 62-07, 62-09, 65S05, 94C15, 97K30

This work was supported by NSF grants DMS-1417674 and DMS-1118971, ONR grant N00014-16-1-2119 and AFSOR FA9550-09-1-0090. The first author was supported by the UC President’s Postdoctoral Fellowship.

Received 6 July 2016

Received revised 21 April 2018

Accepted 21 April 2018

Published 19 December 2018