Statistics and Its Interface

Volume 16 (2023)

Number 3

A semi-supervised density peaks clustering algorithm

Pages: 363 – 377

DOI: https://dx.doi.org/10.4310/22-SII725

Authors

Yuanyuan Wang (School of Mathematics and Statistics, Lanzhou University, Lanzhou, China)

Bingyi Jing (Department of Mathematics, Hong Kong University of Science and Technology)

Abstract

Density peaks clustering (DPC) is a density-based unsupervised clustering algorithm with the advantages of fast clustering capacity for arbitrary shape data and easy implementation without iteration. However, in practice, a small amount of label information might be partially available but not sufficient to be used to generate supervised learning. Semi-supervised clustering is often adopted to incorporate such partial information. In this paper, a novel semisupervised density peaks clustering algorithm (SS‑DPC) is proposed to extend the classical density peaks clustering algorithm to the semi-supervised clustering. In contrast to DPC, SS‑DPC uses prior information in the form of class labels to guide the learning process for improved clustering. SS‑DPC is a semi-supervised clustering that can handle data with a small number of labels. First, SS‑DPC identifies possible cluster centers based on labeled and unlabeled data automatically. Then, to incorporate partial information, virtual labels are brought in to integrate the partial information with identified centers in a uniform framework. Moreover, labeled data are used to initialize the semi-supervised clustering process to maintain the correctness of prior information in the clustering procedure. Subsequently, the nearest-point-based method is used to detect the labels of non-center unlabeled data. Finally, a step-by-step mergence strategy is introduced to generate more reasonable results. Experiments on eight UCI datasets illustrate that the proposed semi-supervised clustering algorithm yields promising clustering results.

Keywords

semi-supervised clustering, density peaks clustering, partial information, virtual label, mergence of clusters

The project was supported by the National Natural Science Foundation of China (No. 11971214, 81960309) and Cooperation Project of Chunhui Plan of the Ministry of Education of China-2018 and sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China, and Natural Science Foundation of Anhui Province (2108085QA14).

Received 19 February 2021

Accepted 19 January 2022

Published 14 April 2023