Research on Efficient Algorithms for Intelligent Computing in Big Data Analytics

In this paper, we first collect massive data based on the Hadoop HDFS system and perform distributed data storage for it. Then, the massive data in the system is queried quickly by designing a secondary index and a parallel region query method based on Hilbert coding. Experiments are carried out to measure the efficiency of the Hadoop HDFS system in efficient data storage and querying, and to investigate its scalability. After completing the efficient storage and querying of data, the Spark DBSCAN algorithm is proposed to mine the data efficiently. The performance of the Spark DBSCAN algorithm is examined by comparing and analysing the clustering performance and running speed of the Spark DBSCAN algorithm with the traditional DBSCAN algorithm. The query performance of Hadoop HDFS is superior to that of Jena-Hbase and SHARD systems. The fast query time of Hadoop HDFS stays the same, and the scalability is nearly linear with the data size in the case of a slow query. The clustering accuracy of the Spark DBSCAN algorithm is slightly better than that of the traditional parallel DBSCAN algorithm, but the overall difference is small. The Spark DBSCAN algorithm runs significantly faster than the traditional parallel DBSCAN algorithm on different datasets.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Research on Efficient Algorithms for Intelligent Computing in Big Data Analytics

Xiguo Zhou

Ziping Zhao

Wentao Gao

Published Online: Feb 03, 2025

Received: Sep 15, 2024

Accepted: Jan 04, 2025

DOI: https://doi.org/10.2478/amns-2025-0020

KeywordsHadoop HDFS, Spark DBSCAN, Efficient storage query, Data mining

© 2025 Xiguo Zhou et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Hadoop HDFS, Spark DBSCAN, Efficient storage query, Data mining