Speedup of the k-Means Algorithm for Partitioning Large Datasets of Flat Points by a Preliminary Partition and Selecting Initial Centroids

[1] W. Kaplan, “Maxima and minima with applications: Practical optimization and duality,” in Wiley Series in Discrete Mathematics and Optimization, vol. 51. John Wiley & Sons, 2011, p. 61. Search in Google Scholar

[2] K. A. Randolph and L. L. Myers, Basic Statistics in Multivariate Analysis, Pocket Guide to Social Work Research Methods. Oxford University Press, Oxford, England, 2013, p. 116. Search in Google Scholar

[3] S. Li, “A 1.488 approximation algorithm for the uncapacitated facility location problem,” in Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 6756. Springer, 2011, pp. 77–88. https://doi.org/10.1007/978-3-642-22012-8_5 Search in Google Scholar

[4] N. Megiddo and A. Tamir, “On the complexity of locating linear facilities in the plane,” Operations Research Letters, vol. 1, no. 5, pp. 194–197, 1982. https://doi.org/10.1016/0167-6377(82)90039-6 Search in Google Scholar

[5] T. F. Gonzalez, “Clustering to minimize the maximum intercluster distance,” Theoretical Computer Science, vol. 38, pp. 293–306, 1985. https://doi.org/10.1016/0304-3975(85)90224-5 Search in Google Scholar

[6] A. Ahmadi-Javid, P. Seyedi, and S. Syam, “A survey of healthcare facility location,” Computers & Operations Research, vol. 79, pp. 223–263, Mar. 2017. https://doi.org/10.1016/j.cor.2016.05.018 Search in Google Scholar

[7] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-Means clustering algorithm,” Journal of the Royal Statistical Society, Series C, vol. 28, no. 1, pp. 100–108, 1979. https://doi.org/10.2307/2346830 Search in Google Scholar

[8] M. E. Celebi, H. A. Kingravi, and P. A. Vela, “A comparative study of efficient initialization methods for the k-means clustering algorithm,” Expert Systems with Applications, vol. 40, no. 1, pp. 200–210, Jan. 2013. https://doi.org/10.1016/j.eswa.2012.07.021 Search in Google Scholar

[9] A. Vattani, “k-means requires exponentially many iterations even in the plane,” Discrete and Computational Geometry, vol. 45, no. 4, pp. 596–616, Mar. 2011. https://doi.org/10.1007/s00454-011-9340-1 Search in Google Scholar

[10] M. Mahajan, P. Nimbhorkar, and K. Varadarajan, “The planar k-means problem is NP-hard,” in Lecture Notes in Computer Science, vol. 5431. Springer, 2009, pp. 274–285. https://doi.org/10.1007/978-3-642-00202-1_24 Search in Google Scholar

[11] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, Jul. 2002. https://doi.org/10.1109/TPAMI.2002.1017616 Search in Google Scholar

[12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, “Section 16.1. Gaussian mixture models and k-means clustering,” in Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge University Press, New York, NY, 2007. Search in Google Scholar

[13] V. V. Romanuke, “Optimization of a dataset for a machine learning task by clustering and selecting closest-to-the-centroid objects,” Herald of Khmelnytskyi National University. Technical Sciences, vol. 1, no. 6, pp. 263–265, 2018. Search in Google Scholar

[14] L. Bottou and Y. Bengio, “Convergence properties of the K-means algorithms,” in Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS’94), Jan. 1994, pp. 585–592. Search in Google Scholar

[15] V. V. Romanuke, “Evolution of expert competences in estimating a finite set of objects by a given comparison scale via pairwise comparison matrices within the space of positive inverse-symmetric matrices,” Herald of Khmelnytskyi National University. Technical Sciences, no. 2, pp. 25–29, 2016. Search in Google Scholar

[16] C. Darken and J. Moody, “Note on learning rate schedules for stochastic optimization,” in R. P. Lippman, R. Moody, and D. S. Touretzky (eds.), Advances in Neural Information Processing Systems 3 (NIPS 1990), Morgan Kaufmann, Palo Alto, Denver, CO, 1991, pp. 832–838. Search in Google Scholar

[17] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, vol. 1, 1967, pp. 281–296. Search in Google Scholar

[18] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy, “The effectiveness of Lloyd-type methods for the k-means problem,” in Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), Berkeley, CA, USA, Oct. 2006, pp. 165–174. https://doi.org/10.1109/FOCS.2006.75 Search in Google Scholar

[19] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “A local search approximation algorithm for k-means clustering,” Computational Geometry: Theory and Applications, vol. 28, no. 2–3, pp. 89–112, Jun. 2004. https://doi.org/10.1016/j.comgeo.2004.03.003 Search in Google Scholar

[20] A. Chakrabarty and D. Swagatam, “On strong consistency of kernel k-means: A Rademacher complexity approach,” Statistics & Probability Letters, vol. 182, 2022, Art. no. 109291. https://doi.org/10.1016/j.spl.2021.109291 Search in Google Scholar

[21] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178–210, Apr. 2023. https://doi.org/10.1016/j.ins.2022.11.139 Search in Google Scholar

[22] S. J. Phillips, “Acceleration of K-Means and related clustering algorithms,” in D. M. Mount and C. Stein (eds.), Lecture Notes in Computer Science, vol. 2409. Springer, 2002, pp. 166–177. https://doi.org/10.1007/3-540-45643-0_13 Search in Google Scholar

[23] G. Hamerly, “Making k-means even faster,” in Proceedings of the 2010 SIAM International Conference on Data Mining, 2010, pp. 130–140. https://doi.org/10.1137/1.9781611972801.12 Search in Google Scholar

[24] D. Arthur and S. Vassilvitskii, “How slow is the k-means method?,” in Proceedings of the Twenty-Second Annual Symposium on Computational Geometry (SCG’06), Jun. 2006, pp. 144–153. https://doi.org/10.1145/1137856.1137880 Search in Google Scholar

[25] P. Fränti and S. Sieranoja, “How much can k-means be improved by using better initialization and repeats?,” Pattern Recognition, vol. 93, pp. 95–112, Sep. 2019. https://doi.org/10.1016/j.patcog.2019.04.014 Search in Google Scholar

[26] V. V. Romanuke, “Fast-and-smoother uplink power control algorithm based on distance ratios for wireless data transfer systems,” Studies in Informatics and Control, vol. 28, no. 2, pp. 147–156, 2019. https://doi.org/10.24846/v28i2y201903 Search in Google Scholar

[27] B. Żak and S. Hożyń, “Local image features matching for real-time seabed tracking applications,” Journal of Marine Engineering & Technology, vol. 16, no. 4, pp. 273–282, Oct. 2017. https://doi.org/10.1080/20464177.2017.1386266 Search in Google Scholar

[28] S. Mukherjee, M. Cajić, D. Karličić, and S. Adhikari, “Enhancement of band-gap characteristics in hexagonal and re-entrant lattices via curved beams,” Composite Structures, vol. 306, Feb. 2023, Art. no. 116591. https://doi.org/10.1016/j.compstruct.2022.116591 Search in Google Scholar

[29] J. Dong and H. Fan, “Crushing behaviors of buckling-oriented hexagonal lattice structures,” Mechanics of Materials, vol. 165, Feb. 2022, Art. no. 104160. https://doi.org/10.1016/j.mechmat.2021.104160 Search in Google Scholar

[30] M. I. Español, D. Golovaty, and J. P. Wilber, “A discrete-to-continuum model of weakly interacting incommensurate two-dimensional lattices: The hexagonal case,” Journal of the Mechanics and Physics of Solids, vol. 173, Apr. 2023, Art. no. 105229. https://doi.org/10.1016/j.jmps.2023.105229 Search in Google Scholar

[31] Y. Nakata, M. Yoshida, K. Osawa, and N. Miyanaga, “Fabricating a regular hexagonal lattice structure by interference pattern of six femtosecond laser beams,” Applied Surface Science, vol. 417, pp. 69–72, Sep. 2017. https://doi.org/10.1016/j.apsusc.2017.03.236 Search in Google Scholar

[32] J. Cartensen, “About hexagons,” Mathematical Spectrum, vol. 33, no. 2, pp. 37–40, 2000–2001. Search in Google Scholar

[33] M. J. Wenninger, Polyhedron Models. Cambridge University Press, New York, NY, 1974, p. 9. Search in Google Scholar

[34] R. G. Gallager, Stochastic Processes Theory for Applications. Cambridge University Press, New York, NY, 2013. Search in Google Scholar

[35] A. Papoulis, Probability, Random Variables and Stochastic Processes. McGraw Hill, New York, NY, 1991. Search in Google Scholar

[36] A. El Korchi and Y. Ghanou, “2D geometric shapes dataset – for machine learning and pattern recognition,” Data in Brief, vol. 32, Oct. 2020, Art. no. 106090. https://doi.org/10.1016/j.dib.2020.106090 Search in Google Scholar

[37] O. N. Almasi and M. Rouhani, “A geometric-based data reduction approach for large low dimensional datasets: Delaunay triangulation in SVM algorithms,” Machine Learning with Applications, vol. 4, Jun. 2021, Art. no. 100025. https://doi.org/10.1016/j.mlwa.2021.100025 Search in Google Scholar

[38] N. Joorabloo, M. Jalili, and Y. Ren, “Improved recommender systems by denoising ratings in highly sparse datasets through individual rating confidence,” Information Sciences, vol. 601, pp. 242–254, Jul. 2022. https://doi.org/10.1016/j.ins.2022.03.068 Search in Google Scholar

[39] R. B. Arantes, G. Vogiatzis, and D. R. Faria, “Learning an augmentation strategy for sparse datasets,” Image and Vision Computing, vol. 117, Jan. 2022, Art. no. 104338. https://doi.org/10.1016/j.imavis.2021.104338 Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Informática, Inteligencia artificial, Tecnologías de la información, Gestión de proyectos, Desarrollo de software

RSS Feed de revista

Speedup of the k-Means Algorithm for Partitioning Large Datasets of Flat Points by a Preliminary Partition and Selecting Initial Centroids

Vadim Romanuke

Publicado en línea: 17 ago 2023

Páginas: 1 - 12

DOI: https://doi.org/10.2478/acss-2023-0001

Palabras claveFlat objects, initial centroids, k-means, large dataset, speedup

© 2023 Vadim Romanuke, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Flat objects, initial centroids, k-means, large dataset, speedup