Open Access

Research on Efficient Algorithms for Intelligent Computing in Big Data Analytics

,  and   
Feb 03, 2025

Cite
Download Cover

In this paper, we first collect massive data based on the Hadoop HDFS system and perform distributed data storage for it. Then, the massive data in the system is queried quickly by designing a secondary index and a parallel region query method based on Hilbert coding. Experiments are carried out to measure the efficiency of the Hadoop HDFS system in efficient data storage and querying, and to investigate its scalability. After completing the efficient storage and querying of data, the Spark DBSCAN algorithm is proposed to mine the data efficiently. The performance of the Spark DBSCAN algorithm is examined by comparing and analysing the clustering performance and running speed of the Spark DBSCAN algorithm with the traditional DBSCAN algorithm. The query performance of Hadoop HDFS is superior to that of Jena-Hbase and SHARD systems. The fast query time of Hadoop HDFS stays the same, and the scalability is nearly linear with the data size in the case of a slow query. The clustering accuracy of the Spark DBSCAN algorithm is slightly better than that of the traditional parallel DBSCAN algorithm, but the overall difference is small. The Spark DBSCAN algorithm runs significantly faster than the traditional parallel DBSCAN algorithm on different datasets.

Language:
English