A feasible -means kernel trick under non-Euclidean feature space

This paper poses the question of whether or not the usage of the kernel trick is justified. We investigate it for the special case of its usage in the kernel k-means algorithm. Kernel-k-means is a clustering algorithm, allowing clustering data in a similar way to k-means when an embedding of data points into Euclidean space is not provided and instead a matrix of “distances” (dissimilarities) or similarities is available. The kernel trick allows us to by-pass the need of finding an embedding into Euclidean space. We show that the algorithm returns wrong results if the embedding actually does not exist. This means that the embedding must be found prior to the usage of the algorithm. If it is found, then the kernel trick is pointless. If it is not found, the distance matrix needs to be repaired. But the reparation methods require the construction of an embedding, which first makes the kernel trick pointless, because it is not needed, and second, the kernel-k-means may return different clusterings prior to repairing and after repairing so that the value of the clustering is questioned. In the paper, we identify a distance repairing method that produces the same clustering prior to its application and afterwards and does not need to be performed explicitly, so that the embedding does not need to be constructed explicitly. This renders the kernel trick applicable for kernel-k-means.

eISSN:: 2083-8492
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Applied Mathematics

Journal RSS Feed

A feasible k-means kernel trick under non-Euclidean feature space

Published Online: Dec 31, 2020

Page range: 703 - 715

Received: Feb 17, 2020

Accepted: Jul 17, 2020

DOI: https://doi.org/10.34768/amcs-2020-0052

Keywords
kernel methods, -means, clustering, non-Euclidean feature space, Gower/Legendre theorem

© 2020 Robert Kłopotek et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.