DBSCAN Speedup for Time-Serpentine Datasets

An approach to speed up the DBSCAN algorithm is suggested. The planar clusters to be revealed are assumed to be tightly packed and correlated constituting, thus, a serpentine dataset developing rightwards or leftwards as time goes on. The dataset is initially divided into a few sub-datasets along the time axis, whereupon the best neighbourhood radius is determined over the first sub-dataset and the standard DBSCAN algorithm is run over all the sub-datasets by the best neighbourhood radius. To find the best neighbourhood radius, it is necessary to know ground truth cluster labels of points within a region. The factual speedup registered in a series of 80 000 dataset computational simulations ranges from 5.0365 to 724.7633 having a trend to increase as the dataset size increases.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Informática, Inteligencia artificial, Tecnologías de la información, Gestión de proyectos, Desarrollo de software

RSS Feed de revista

DBSCAN Speedup for Time-Serpentine Datasets

Vadim Romanuke

Publicado en línea: 15 ago 2024

Páginas: 14 - 23

Recibido: 12 abr 2024

Aceptado: 10 jul 2024

DOI: https://doi.org/10.2478/acss-2024-0003

Palabras claveClustering, DBSCAN, large dataset, serpentine cluster, speedup

© 2024 Vadim Romanuke., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Clustering, DBSCAN, large dataset, serpentine cluster, speedup