Uneingeschränkter Zugang

Hairball Buster: A Graph Triage Method for Viewing and Comparing Graphs


Zitieren

Figure 1:

Sample ‘Hairball’ showing jazz players that performed with each other.
Sample ‘Hairball’ showing jazz players that performed with each other.

Figure 2:

Visone backbone layout of jazz player data set.
Visone backbone layout of jazz player data set.

Figure 3:

Sample HB curve for jazz players that performed with each other.
Sample HB curve for jazz players that performed with each other.

Figure 4:

Neighbors plot for jazz players that performed with each other.
Neighbors plot for jazz players that performed with each other.

Figure 5:

Questions addressed by location of neighbor nodes.
Questions addressed by location of neighbor nodes.

Figure 6:

Sample directed neighbors plot for jazz player data set (Green = In, Red = Out).
Sample directed neighbors plot for jazz player data set (Green = In, Red = Out).

Figure 7:

Force-directed representation of the Toaster data set.
Force-directed representation of the Toaster data set.

Figure 8:

Backbone layout representation of the Toaster data set.
Backbone layout representation of the Toaster data set.

Figure 9:

HB representation of the Toaster data set (directionality ignored).
HB representation of the Toaster data set (directionality ignored).

Figure 10:

HB representation of the inverse of neighbor nodes (e.g. gaps).
HB representation of the inverse of neighbor nodes (e.g. gaps).

Figure 11:

HB inverse representation of just the top 100 ranked nodes with each other in Toaster data set.
HB inverse representation of just the top 100 ranked nodes with each other in Toaster data set.

Figure 12:

Force Atlas 2 on top 20 nodes in Toaster data set.
Force Atlas 2 on top 20 nodes in Toaster data set.

Figure 13:

HB chart of first 3,500 connections in Toaster data set.
HB chart of first 3,500 connections in Toaster data set.

Figure 15:

HB chart of third 3,500 connections in Toaster data set.
HB chart of third 3,500 connections in Toaster data set.

Figure 14:

HB chart of second 3,500 connections in Toaster data set.
HB chart of second 3,500 connections in Toaster data set.

Figure 16:

HB chart of suspended Iranian Twitter™ accounts, user-id replies, and no retweets.
HB chart of suspended Iranian Twitter™ accounts, user-id replies, and no retweets.

Figure 17:

HB chart of suspended Iranian Twitter™ accounts, user-id replies, no retweets, first 200 nodes showing gaps among the top 3 and the next 40 nodes.
HB chart of suspended Iranian Twitter™ accounts, user-id replies, no retweets, first 200 nodes showing gaps among the top 3 and the next 40 nodes.

Figure 18:

Sample chart of CodeDNA™ cluster outputs of malware binaries.
Sample chart of CodeDNA™ cluster outputs of malware binaries.

Figure 19:

Sample CodeDNA™ cluster outputs of Linux coreutils binaries.
Sample CodeDNA™ cluster outputs of Linux coreutils binaries.

Figure 20:

Sample CodeDNA™ cluster output in standard hairball buster (blue = nodes, gray dots = links).
Sample CodeDNA™ cluster output in standard hairball buster (blue = nodes, gray dots = links).

Figure 21:

Sample CodeDNA™ cluster output in HB with vertical offset.
Sample CodeDNA™ cluster output in HB with vertical offset.

Figure 22:

Sample CodeDNA™ cluster output in HB with vertical offset and highlighting nodes with highest similarity scores.
Sample CodeDNA™ cluster output in HB with vertical offset and highlighting nodes with highest similarity scores.

Figure 23:

Displaying different measures of centrality in HB.
Displaying different measures of centrality in HB.

Figure 24:

Comparing different types of graphs and algorithms.
Comparing different types of graphs and algorithms.

Figure A1:

Sample Log10–log10 plot of jazz player data set with no offset.
Sample Log10–log10 plot of jazz player data set with no offset.

Figure A2:

Sample offset of origin to 10,10 for Log10–log10 plot of jazz player data set.
Sample offset of origin to 10,10 for Log10–log10 plot of jazz player data set.

Figure A3:

Sample offset of origin to 10,10 for semi–log plot of Toaster data set.
Sample offset of origin to 10,10 for semi–log plot of Toaster data set.

Performance calculations comparisons for HB vs backbone layout.

Data sets hb run time (s) visone run time – quad Sim (s) visone run time – tri Sim (s)
Filename File size (B) No. of nodes No. of edges 1 2 3 Avg 1 2 3 Avg 1 2 3 Avg
random-1000-nodes.graphml 341,365 1,000 5,002 0.25 0.25 0.25 0.25 2.0 1.7 1.6 1.8 1.5 1.1 1.3 1.3
random-10000-nodes.graphml 3,555,915 10,000 49,826 0.67 0.69 0.70 0.69 7.3 6.9 6.8 7.0 7.0 7.1 6.8 7.0
random-100000-nodes.graphml 37,271,224 100,000 500,061 10.01 11.74 6.55 9.43 139.4 120.1 118.5 126.0 129.0 119.7 119.1 122.6
random-250000-nodes.graphml 95,452,841 250,000 1,250,487 16.84 15.36 15.24 15.81 349.3 357.3 361.3 356.0 356.8 352.7 334.5 348.0
random-500000-nodes.graphml 193,263,339 500,000 2,501,346 26.21 25.71 24.47 25.46 >1,200
random-1000000-nodes.graphml 388,461,043 1,000,000 4,997,089 44.25 43.75 45.19 44.40 Visone could not load graphml file. Insufficient memory
code-dna.graphml 155,222 28 292 <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec
jazz-directed.graphml 361,796 198 4,113 <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec <1 sec
toster_CA_Edge.graphml 5,349,861 23,916 75,050 1.02 0.96 0.96 0.98 20.6 19.8 20.1 20.2 17.1 18.9 17.8 17.9
iran-tweet-replies.no-retweet.by-userid.graphml 294,153,484 228,626 440,244 1.26 1.12 1.13 1.17 >1,200

Comparing HB features to other graph analytic and visualization algorithms.

Feature Hairball buster Histogram/node-degree display Force-directed Visone backbone Adjacency matrix Block modeling
Understanding node relationships and graph characteristics
1. Distribution of nodes by degree Yes Yes No No Nof Nof
2. Quickly determine the number of high-degree nodes Yes Yes No No Yes Nof
3. Quickly identify which are the highest degree nodes Yes Yesa Nob No Yes Yes
4. Determine if the highest degree nodes are directly connected to other high-degree nodes Yes No Yesc Nob Yes Yes
5. Determine whether the highest degree nodes are connected to each other indirectly via two hops Yes No Yes Yesc Yes Yes
6. Determine which lower-degree nodes are directly connected to the high-degree nodes Yes No Yes Yes Yes Yes
7. Provide visual cue of how much difference exists between the degree of the nodes, especially high-degree nodes Yes Yes No No No Yes
8. Determine if there is one central cluster or many clusters that contain the highest degree nodes Yes No Yes Yes No Yes
Representing large or directed networks, or with weighted links
9. Provide log–log or semi–log representation for very large data sets Yes Yes No No No No
10. Can visualize both directed and undirected graphs Yes No Yese Yese Yes Yes
11. Determine which nodes connect to the highest weighted links Yes No Yesd Yes Yesg Yesg
Other centrality measures, standard format, low calculation cost
12. Distribution of nodes by other centrality measures Yes Yes No No No No
13. Provide a canonical representation of the graph Yes Yes No No Yes No
14. Low calculation cost Yes Yes No No Yes Noh
eISSN:
0226-1766
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
Volume Open
Fachgebiete der Zeitschrift:
Sozialwissenschaften, andere