Document Type

Conference Paper

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

Computer Sciences, Information Science

Publication Details

13th International Conference on Machine Learning and Data Mining MLDM 2017, July 15-20, 2017, New York, USA

Abstract

Silhouette is one of the most popular and effective internal measures for the evaluation of clustering validity. Simplified Silhouette is a computationally simplified version of Silhouette. However, to date Simplified Silhouette has not been systematically analysed in a specific clustering algorithm. This paper analyses the application of Simplified Silhouette to the evaluation of k-means clustering validity and compares it with the k-means Cost Function and the original Silhouette from both theoretical and empirical perspectives. The theoretical analysis shows that Simplified Silhouette has a mathematical relationship with both the k-means Cost Function and the original Silhouette, while empirically, we show that it has comparative performances with the original Silhouette, but is much faster in calculation. Based on our analysis, we conclude that for a given dataset the k-means Cost Function is still the most valid and efficient measure in the evaluation of the validity of k-means clustering with the same k value, but that Simplified Silhouette is more suitable than the original Silhouette in the selection of the best result from k-means clustering with different k values.

DOI

10.1007/978-3-319-62416-7_21

Share

COinS