Accesso libero

Hairball Buster: A Graph Triage Method for Viewing and Comparing Graphs

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Figure 1:

Sample ‘Hairball’ showing jazz players that performed with each other.
Sample ‘Hairball’ showing jazz players that performed with each other.

Figure 2:

Visone backbone layout of jazz player data set.
Visone backbone layout of jazz player data set.

Figure 3:

Sample HB curve for jazz players that performed with each other.
Sample HB curve for jazz players that performed with each other.

Figure 4:

Neighbors plot for jazz players that performed with each other.
Neighbors plot for jazz players that performed with each other.

Figure 5:

Questions addressed by location of neighbor nodes.
Questions addressed by location of neighbor nodes.

Figure 6:

Sample directed neighbors plot for jazz player data set (Green = In, Red = Out).
Sample directed neighbors plot for jazz player data set (Green = In, Red = Out).

Figure 7:

Force-directed representation of the Toaster data set.
Force-directed representation of the Toaster data set.

Figure 8:

Backbone layout representation of the Toaster data set.
Backbone layout representation of the Toaster data set.

Figure 9:

HB representation of the Toaster data set (directionality ignored).
HB representation of the Toaster data set (directionality ignored).

Figure 10:

HB representation of the inverse of neighbor nodes (e.g. gaps).
HB representation of the inverse of neighbor nodes (e.g. gaps).

Figure 11:

HB inverse representation of just the top 100 ranked nodes with each other in Toaster data set.
HB inverse representation of just the top 100 ranked nodes with each other in Toaster data set.

Figure 12:

Force Atlas 2 on top 20 nodes in Toaster data set.
Force Atlas 2 on top 20 nodes in Toaster data set.

Figure 13:

HB chart of first 3,500 connections in Toaster data set.
HB chart of first 3,500 connections in Toaster data set.

Figure 15:

HB chart of third 3,500 connections in Toaster data set.
HB chart of third 3,500 connections in Toaster data set.

Figure 14:

HB chart of second 3,500 connections in Toaster data set.
HB chart of second 3,500 connections in Toaster data set.

Figure 16:

HB chart of suspended Iranian Twitter™ accounts, user-id replies, and no retweets.
HB chart of suspended Iranian Twitter™ accounts, user-id replies, and no retweets.

Figure 17:

HB chart of suspended Iranian Twitter™ accounts, user-id replies, no retweets, first 200 nodes showing gaps among the top 3 and the next 40 nodes.
HB chart of suspended Iranian Twitter™ accounts, user-id replies, no retweets, first 200 nodes showing gaps among the top 3 and the next 40 nodes.

Figure 18:

Sample chart of CodeDNA™ cluster outputs of malware binaries.
Sample chart of CodeDNA™ cluster outputs of malware binaries.

Figure 19:

Sample CodeDNA™ cluster outputs of Linux coreutils binaries.
Sample CodeDNA™ cluster outputs of Linux coreutils binaries.

Figure 20:

Sample CodeDNA™ cluster output in standard hairball buster (blue = nodes, gray dots = links).
Sample CodeDNA™ cluster output in standard hairball buster (blue = nodes, gray dots = links).

Figure 21:

Sample CodeDNA™ cluster output in HB with vertical offset.
Sample CodeDNA™ cluster output in HB with vertical offset.

Figure 22:

Sample CodeDNA™ cluster output in HB with vertical offset and highlighting nodes with highest similarity scores.
Sample CodeDNA™ cluster output in HB with vertical offset and highlighting nodes with highest similarity scores.

Figure 23:

Displaying different measures of centrality in HB.
Displaying different measures of centrality in HB.

Figure 24:

Comparing different types of graphs and algorithms.
Comparing different types of graphs and algorithms.

Figure A1:

Sample Log10–log10 plot of jazz player data set with no offset.
Sample Log10–log10 plot of jazz player data set with no offset.

Figure A2:

Sample offset of origin to 10,10 for Log10–log10 plot of jazz player data set.
Sample offset of origin to 10,10 for Log10–log10 plot of jazz player data set.

Figure A3:

Sample offset of origin to 10,10 for semi–log plot of Toaster data set.
Sample offset of origin to 10,10 for semi–log plot of Toaster data set.

Performance calculations comparisons for HB vs backbone layout.

Data setshb run time (s)visone run time – quad Sim (s)visone run time – tri Sim (s)
FilenameFile size (B)No. of nodesNo. of edges123Avg123Avg123Avg
random-1000-nodes.graphml341,3651,0005,0020.250.250.250.252.01.71.61.81.51.11.31.3
random-10000-nodes.graphml3,555,91510,00049,8260.670.690.700.697.36.96.87.07.07.16.87.0
random-100000-nodes.graphml37,271,224100,000500,06110.0111.746.559.43139.4120.1118.5126.0129.0119.7119.1122.6
random-250000-nodes.graphml95,452,841250,0001,250,48716.8415.3615.2415.81349.3357.3361.3356.0356.8352.7334.5348.0
random-500000-nodes.graphml193,263,339500,0002,501,34626.2125.7124.4725.46>1,200
random-1000000-nodes.graphml388,461,0431,000,0004,997,08944.2543.7545.1944.40Visone could not load graphml file. Insufficient memory
code-dna.graphml155,22228292<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec
jazz-directed.graphml361,7961984,113<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec<1 sec
toster_CA_Edge.graphml5,349,86123,91675,0501.020.960.960.9820.619.820.120.217.118.917.817.9
iran-tweet-replies.no-retweet.by-userid.graphml294,153,484228,626440,2441.261.121.131.17>1,200

Comparing HB features to other graph analytic and visualization algorithms.

FeatureHairball busterHistogram/node-degree displayForce-directedVisone backboneAdjacency matrixBlock modeling
Understanding node relationships and graph characteristics
1. Distribution of nodes by degreeYesYesNoNoNofNof
2. Quickly determine the number of high-degree nodesYesYesNoNoYesNof
3. Quickly identify which are the highest degree nodesYesYesaNobNoYesYes
4. Determine if the highest degree nodes are directly connected to other high-degree nodesYesNoYescNobYesYes
5. Determine whether the highest degree nodes are connected to each other indirectly via two hopsYesNoYesYescYesYes
6. Determine which lower-degree nodes are directly connected to the high-degree nodesYesNoYesYesYesYes
7. Provide visual cue of how much difference exists between the degree of the nodes, especially high-degree nodesYesYesNoNoNoYes
8. Determine if there is one central cluster or many clusters that contain the highest degree nodesYesNoYesYesNoYes
Representing large or directed networks, or with weighted links
9. Provide log–log or semi–log representation for very large data setsYesYesNoNoNoNo
10. Can visualize both directed and undirected graphsYesNoYeseYeseYesYes
11. Determine which nodes connect to the highest weighted linksYesNoYesdYesYesgYesg
Other centrality measures, standard format, low calculation cost
12. Distribution of nodes by other centrality measuresYesYesNoNoNoNo
13. Provide a canonical representation of the graphYesYesNoNoYesNo
14. Low calculation costYesYesNoNoYesNoh
eISSN:
0226-1766
Lingua:
Inglese
Frequenza di pubblicazione:
Volume Open
Argomenti della rivista:
Social Sciences, other