## Abstract

In the last decades, Gaussian Mixture Models (GMMs) have attracted considerable interest in data mining and pattern recognition. A GMM-based clustering algorithm models a dataset with a mixture of multiple Gaussian components and estimates the model parameters using the Expectation-Maximization (EM) algorithm. Recently, a new Locally Consistent GMM (LCGMM) has been proposed to improve the clustering performance by exploiting the local manifold structure of the data using a $p$ nearest neighbor graph. In addition to the underlying manifold structure, many other forms of prior knowledge may guide the clustering process and improve the performance. In this paper, we introduce a Semi-Supervised LCGMM (Semi-LCGMM), where the prior knowledge is provided in the form of class labels of partial data. In particular, the new Semi-LCGMM incorporates the prior knowledge into the maximum likelihood function of the original LCGMM, and the model parameters are estimated using the EM algorithm. It is worth noting that, in our algorithm, each class may be modeled by multiple Gaussian components while in the unsupervised setting each class is modeled by a single Gaussian component. Our algorithm has shown promising results in many different applications, including clustering breast cancer data, heart disease data, handwritten digit images, human face images, and image segmentation.

© 2015 Optical Society of America

Full Article | PDF Article**OSA Recommended Articles**

Haitao Gan, Zhizeng Luo, Yingle Fan, and Nong Sang

J. Opt. Soc. Am. A **33**(6) 1207-1213 (2016)

Haitao Gan, Nong Sang, and Rui Huang

J. Opt. Soc. Am. A **31**(1) 1-6 (2014)

Soo Chang Kim and Tae Jin Kang

J. Opt. Soc. Am. A **23**(11) 2690-2701 (2006)