Open Access

Research on Efficient Algorithms for Intelligent Computing in Big Data Analytics

,  and   
Feb 03, 2025

Cite
Download Cover

Figure 1.

Mass network data processing platform framework based on Hadoop
Mass network data processing platform framework based on Hadoop

Figure 2.

Mass network data acquisition framework
Mass network data acquisition framework

Figure 3.

Query time change with adjusted data
Query time change with adjusted data

Figure 4.

Query time change with adjusted storage node
Query time change with adjusted storage node

Figure 5.

Spark implementation process
Spark implementation process

Figure 6.

Spark DBSCAN algorithm clustering result graph
Spark DBSCAN algorithm clustering result graph

Comparison of query execution time

Database Unit: ms Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
LUBM-5 Hadoop HDFS Cold 235 9445 241 369 425 1491 299 365 14K 277
Hot 114 9188 159 152 194 513 109 142 14K 152
Jena-Hbase Cold 20K 11K 60K 4256 62K 2378 NA NA NA 18K
Hot 16K 10K 45K 4024 9345 864 NA 322K NA 18K
SHARD Cold 156K 302K 184K 212K 287K 672K 65K 203K 856K 200K
Hot 101K 285K 112K 124K 169K 611K 42K 172K 432K 142K
LUBM-50 Hadoop HDFS Cold 244 9051 303 314 415 2003 511 425 14K 363
Hot 112 8879 115 164 185 1734 203 302 14K 122
Jena-Hbase - NA NA NA NA NA NA NA NA NA NA
SHARD Cold 188K 415K 224K 306K 179K 406K 206K 108K 425K 174K
Hot 116K 315K 189K 177K 133K 342K 166K 77K 348K 130K
LUBM-500 Hadoop HDFS Cold 218 8974 266 273 231 18K 237 321 15K 227
Hot 112 8546 105 130 121 17K 133 201 15K 102
Jena-Hbase - NA NA NA NA NA NA NA NA NA NA
SHARD Cold 306K 986K 426K 387K 462K 884K 506K 472K 926K 412K
Hot 245K 758K 285K 204K 306K 695K 330K 394K 734K 283K

Hadoop HDFS index storage usage

LUBM-5 LUBM-50 LUBM-500
Total 195.4MB 2.0GB 17.9GB
Avg.±Std. 10.25±1.68MB 118.00±19.48MB 1.02GB±203.45MB

Comparison of clustering time cost of different parallel DBSCAN algorithms

Data set Algorithm Clustering time
R15 Naive DBSCAN 20.485s
Spark DBSCAN 17.065s
Jain Naive DBSCAN 18.746s
Spark DBSCAN 15.062s
Pathbased Naive DBSCAN 17.223s
Spark DBSCAN 16.012s
Aggregation Naive DBSCAN 15.462s
Spark DBSCAN 4.726s
D31 Naive DBSCAN 87.633s
Spark DBSCAN 40.745s

Comparison of clustering result indexes of different parallel DBSCAN algorithms

Data set Algorithm Silhouette coefficient Purity Rand index Adjusted Rand index F1-score
R15 Naive DBSCAN 0.7658 0.9644 0.9685 0.9532 0.9412
Spark DBSCAN 0.7346 0.9416 0.9602 0.9263 0.9331
Jain Naive DBSCAN 0.3015 0.9745 0.4913 0.1026 0.2578
Spark DBSCAN 0.3015 0.9745 0.4913 0.1026 0.2578
Pathbased Naive DBSCAN 0.3562 0.9278 0.7016 0.1152 0.1723
Spark DBSCAN 0.3562 0.9278 0.7016 0.1152 0.1723
Aggregation Naive DBSCAN 0.3325 0.8244 0.8078 0.1605 0.2346
Spark DBSCAN 0.3325 0.8244 0.8078 0.1605 0.2346
D31 Naive DBSCAN 0.5815 0.9045 0.9952 0.8142 0.8156
Spark DBSCAN 0.5685 0.8712 0.9896 0.7724 0.7789
Language:
English