Communications in Information and Systems

Volume 7 (2007)

Number 4

A Gaussian Mixture Model to Detect Clusters Embedded in Feature Subspace

Pages: 337 – 352

DOI: https://dx.doi.org/10.4310/CIS.2007.v7.n4.a2

Authors

Ming Dong

Jing Hua

Yuanhong Li

Abstract

The goal of unsupervised learning, i.e., clustering, is to determine the intrinsic structure of unlabeled data. Feature selection for clustering improves the performance of grouping by removing irrelevant features. Typical feature selection algorithms select a common feature subset for all the clusters. Consequently, clusters embedded in different feature subspaces are not able to be identified. In this paper, we introduce a probabilistic model based on Gaussian mixture to solve this problem. Particularly, the feature relevance for an individual cluster is treated as a probability, which is represented by localized feature saliency and estimated through Expectation Maximization (EM) algorithm during the clustering process. In addition, the number of clusters is determined simultaneously by integrating a Minimum Message Length (MML) criterion. Experiments carried on both synthetic and real-world datasets illustrate the performance of the proposed approach in finding clusters embedded in feature subspace.

Published 1 January 2007