Wide Gaps and Kleinberg’s Clustering Axioms for –Means

The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density within/between clusters and is in conflict with Kleinberg’s axiomatic system for distance based clustering algorithms that formalizes those expectations. In particular, k-means violates the consistency axiom. We hypothesise that this clash is due to the unexplained expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering them to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, i.e., variational k-separability and residual k-separability, and show that then Kleinberg’s consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of the k-means algorithm that fit approximately Kleinberg’s richness axiom, which does not hold for k-means. In this way, we reconcile k-means with Kleinberg’s axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and to clusterability theory, the practical benefit is the possibility to construct datasets for testing purposes of algorithms optimizing the k-means cost function. This includes a method of construction of clusterable data with a global optimum known in advance.

eISSN:: 2083-8492
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Mathematics, Applied Mathematics

Feed RSS della rivista

Wide Gaps and Kleinberg’s Clustering Axioms for k–Means

Pubblicato online: 26 mar 2024

Pagine: 135 - 147

Ricevuto: 18 ago 2023

Accettato: 28 nov 2023

DOI: https://doi.org/10.61822/amcs-2024-0010

Parole chiaveclustering theory, clustering axioms, clusterability

© 2024 Mieczysław A. Kłopotek, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Parole chiave
clustering theory, clustering axioms, clusterability