An important part of the appeal of social network analysis originates from the incorporation of mathematical descriptions for social relations. This has made it possible to provide clear and unambiguous definitions for concepts relating to relational structures. The clarity of communication that resulted from this intersection of mathematics and social sciences has been credited with much of the field’s early growth (Freeman 1984). As Freeman relates,
The mathematical core of social network analysis has delivered the dual benefits of precision and flexibility. Equations are clear to the point that those who were interested in the topic could build upon one another’s work with minimal need for clarification. But mathematical definitions are also general enough to allow for their application in a variety of relational contexts. The structural measures that form the core of social network analysis have thereby proven to be compelling in a variety of contexts and interests. The logarithmic proliferation in where and how social network analysis has been applied (Otte and Rousseau 2002, Freeman 2011) is testament to the scalability of the field.
The diverse purposing of social network analysis has been mirrored by a corresponding proliferation in the number and variety of software packages that are available today in the field of social network analysis. Although software developers and programmers have put a great deal of effort into producing network analytic software that is suited to a wide variety of needs and applications, no single piece of software is generally applicable to every situation. Software packages have been optimized for efficiency, analytic variety, analytic specificity, ease of use, specialized data handling, greater capacity for visualization, and for terminology and concepts that are tailored to a particular end-user. Software also differs in style of user interface, method of reporting, and even the default methods for scaling output.
As the available packages continue to diversify, their content is also converging. However, the question of whether the analytic functions across programs are truly equivalent and exchangeable arises. Are the names being used to identify each function explicit in what they identify, or are they only referring to a generalized class of functions?
Naming conventions are important. The developers of network analytic software are faced with decisions about how they should implement a particular analytic function. Some software developers may choose to incorporate the ability to handle common topological features of social networks (e.g., loops, multiple components) by default, while others may choose a stricter interpretation of how the measure or algorithm should perform. Under such a paradigm, the terms in use within the social network analysis community become less precise over time and diverge from the original strength of network analysis: clarity. It is possible that a measure or algorithm differs by implementation in order to address some given scenario or feature of network topology, and that it therefore bears unique attributes that constitute a trade-off at some level. It is therefore valuable to both analysts and the social network analysis community for any such differences to be explicit, or systematized.
The issue of whether two software implementations produce the same measures is especially important when using two programs in concert. In such situations, consistency of output indicates that the user is introducing a minimum of variability when moving from one program to another. The equivalency of network metrics is important as small variations in such basic measures may translate into large differences on more complex algorithms. However, procedural dissimilarities in programs’ calculations of measures can be difficult to identify and frequently lack documentation.
Differences in how various software programs provide output constitute a barrier to assessing consistency of measures from program to program. The variety in default output styles (e.g., raw scores, normalized scores, scalar multiplied output) makes it difficult to visually compare raw output. Even if there were to be no meaningful difference in the
Our primary focus is an assessment of inter-program reliability and three related questions. Are the various software pack- ages producing consistently equivalent results? If not, how do they differ? Under what conditions do the centrality outputs diverge, if divergences exist?
To assess inter-program reliability, we focused on the basic building block of network analysis: node centrality. Specifically, the investigation involved the four most commonly applied centrality measures: degree, betweenness, closeness, and eigenvector (Valente et al. 2008). Such measures are often fundamental to social network analysis. Here, we evaluate and report on the consistency of basic measures of node centrality from across various software platforms in standardized simulations.
Six software packages for social network analysis were compared in terms of their calculations of four basic measures of node centrality in each of four networks. We selected popular network analytic software that are self-contained (UCINET, Pajek, ORA, and Gephi), or available through CRAN R archive (sna and igraph) packages (Table 1, below).
Analytic interfaces used in this study
Version | Source | |
---|---|---|
UCINET | 6.564 |
|
Pajek | 4.03 |
|
ORA-NetScenes | 3.0.9.9.20 |
|
Gephi | 0.8.2 |
|
sna | 2.3-2 |
|
igraph | 0.7.1 |
|
The “big four” centrality measures (degree, betweenness, closeness, and eigenvector The program Pajek does not include a measure titled eigenvector centrality. In cases of undirected networks, Pajek’s “hubs and authorities” measure is analogous to eigenvector centrality and was used for the purpose of comparison.
Output scaling
Degree | Closeness | Betweenness | Eigenvector | |
---|---|---|---|---|
Raw | Normalized Average | Raw | Scaled (max=1) | |
Raw | Normalized | Normalized | Normalized | |
Raw | Average | Raw | Normalized | |
Normalized | Normalized | Normalized | Normalized | |
Raw | Normalized | Raw | Normalized | |
Raw | Normalized | Raw | Scaled (max=1) |
Centrality measures were deemed to be optimally
Scatterplots offer additional insight for comparing similar measures that employ different scales. If any nodes are of particular interest to the analyst then potential variations in their measurement may become very important. Deviations in the measurement of a small number of nodes within a large network could still occur within the correlation threshold that we selected. Scatterplots were therefore used to identify or characterize variation in measurement and assess whether such variation, if present, is a singular anomaly (e.g., differences in floating point) or appear to be deviations that are patterned (i.e., errors that arise from differences in the assumptions behind how a measure should be calculated).
Network data (graphs) of variable size and modality were generated to compare centrality measures across software packages. Undirected one-mode [small (n=35) and a moderately large (n=2000)] and two-mode [small (n1=10, n2=25) and a moderately large (n1=300, n2=1500)] network datasets were generated. Initially, both the one-mode, and two-mode networks contained smaller disconnected components (e.g., isolates and/or other small components) in addition to a large main component (Table 2). One-mode networks also contained loops.
New networks were created by removing loops, removing smaller components, or both, from the initial network in order to model a variety of conditions. This resulted in twelve networks: both large and small networks that contain either loops, or disconnected components, or both, or neither; as appropriate for one- or two-mode networks.
Each dataset was designed to be well within the data handling limits of each of the software packages that we evaluated. Most of the programs tested were limited mainly by concerns such as network density and size, in addition to the processor speed and the amount of available memory in a given computer. All are capable of handling networks into the tens of thousands of nodes, with some capable of handling networks into the millions of nodes.
Data used for reliability comparisons
Data | Nodes in Main Component | Nodes in Smaller Components | Number of Loops | Max. Number of Nodes | Average Degree |
---|---|---|---|---|---|
29 | 6 | 6 | 35 | 3.5 | |
1876 | 112 | 60 | 2000 | 3.0 | |
(10, 21) | 4 | NA | (10, 25) | 3.7 | |
(300, 1815) | 185 | NA | (300, 1700) | 3.7 |
Centrality measures were calculated using all six programs under a variety of conditions on all four networks, where applicable. Loops were not considered to be a feature that is consistent with the definition of two-mode networks as consisting of two sets of nodes that have ties between but not within each node set. The two-mode networks were therefore not evaluated for network data with loops.
Our findings, presented in brief form in Table 4, demonstrate that differences between analytic programs exist on each measure, with the notable exception of betweenness centrality. Results are presented below by measure, and within each measure, by network condition. Results are presented in a manner that highlights some of the most common or notable differences between programs. Consistency was considered to be “high” when no notable difference arose, “medium” when the output from
Consistency of output by centrality type and network conditions
No Disconnected Components & No Loops | Disconnected Components | Loops | Disconnected Components Loops | |||||
---|---|---|---|---|---|---|---|---|
1 Mode | 2 Mode | 1 Mode | 2 Mode | 1 Mode | 2 Mode | 1 Mode | 2 Mode | |
High | High | High | High | High | NA | High | NA | |
High | Medium | High | Medium | Low | NA | Low | NA | |
Medium | Medium | Medium | Low | Medium | NA | Low | NA | |
Medium | Medium | Medium | NA | NA |
Closeness centrality measures showed the least amount of measurement variability in ideal networks (i.e., no loops or disconnected components), or in networks containing loops, but not in disconnected components. In networks containing
UCINET and Gephi also correspond when closeness measures in UCINET are reported as summed or averaged distances. In this condition, both UCINET and Gephi produce output for Freeman closeness with smaller values indicating shorter average distances from a particular node to all others in the graph (see negative correlation coefficients [small graph r = -0.9903, large graph r = 0.9883]). UCINET also offers an “Average Reciprocal Distance” measure (ARD) that corresponds more closely with other programs (small graph r = 0.9850, large graph r = 0.9990).
In two-mode networks, neither UCINET, nor Gephi produced results that corresponded with other programs (
Scatterplot matrix comparing closeness centrality output for a
A variety of solutions are possible when analyzing two-mode networks in UCINET. Top Row: Scatterplots of UCINET’s degree (r = 0.3713) and closeness (r = -0.0134) output using the two-mode centrality procedure, compared with other analytic packages. All other packages performed identically. Bottom Row: When transformed into a bipartite network format, UCINET calculates as for a one-mode network, and results are analogous to other packages. Closeness centrality for the bipartite aspect was calculated using Freeman normalization in UCINET.
Networks that contain
Scatterplot matrix comparing output for closeness centrality in a
In considering networks with
In networks with
Scatterplot matrix comparing degree centrality output for a
The variations in output stem from how each program handles loops. Program defaults counted single loops as two edges (Pajek and igraph), loops as one edge (UCINET and ORA), or ignored loops entirely under the default commands (Gephi and sna). Note that the two R packages (sna and igraph) differ in their default treatment of loops, with igraph defaulting to include loops and sna defaulting to ignore them in calculations. When the sna package was modified to include loops (diag = TRUE) in the calculation of degree centrality, sna counted all loops as one edge and the output was consistent with that of UCINET and ORA. Gephi counted all loops as two arcs in a manner that was consistent with Pajek and igraph.
Eigenvector centrality was inconsistent across software packages and network types. In networks with
In two-mode networks, igraph, ORA, Gephi, and Pajek’s “2-mode important vertices” function produced results that were largely consistent with UCINET’s two-mode eigenvector centrality (
Scatterplot matrix comparing eigenvector centrality output for a
Scatterplot matrix of eigenvector centrality output for a
The igraph package produced eigenvector output that differed slightly from other programs in networks that contain
The above patterns of inconsistencies in calculating eigenvector centrality persisted in networks with
Scatterplot matrix comparing eigenvector centrality output for a
Measurement of
Scatterplot matrix comparing betweenness centrality output for a
This study was designed to examine basic reliability concerns among a selection of popular tools in use within the social network analysis community. Specifically, we investigated whether some popular software packages were producing equivalent results. We found variability that brings to light disagreement, sometimes substantial, over how four concepts of node centrality should be measured. The programs under consideration were only able to produce the same output under a very narrow set of conditions. Disagreements over aspects of how these measures should be operationalized manifested as networks departed from the ideal reference graphs that contained no loops or disconnected components. Such variability precludes the ability to seamlessly port data and/or exchange measures between programs and makes it essential for the user to have access to evaluations that highlight differences between the default, and available, options for various measures when using two or more programs in concert. Within the social network analysis community, the differing assumptions behind the various measurement variations unnecessarily cloud communication between users of different programs and leave enough doubt in the minds of new entrants as to whether the community has unified its language. Below, we discuss in greater detail our interpretation of the results, the implications of our findings for the average user, and the implications for the social network analysis community.
By employing hierarchical subsets of network conditions, we isolated measure differences under specific conditions. The use of varying network conditions was intended to better reflect a range of network data that are likely to be encountered. Conditions in the undirected networks ranged from the “ideal” of reference data – no loops or disconnected components – to scenarios commonly encountered when analyzing social networks, namely, loops, disconnected components, and the combination of the two.
In general, centrality measures for reference graphs – those with no loops or disconnected components – were largely consistent. This may be taken to imply that programs are implementing the same – or very similar – algorithms for the offered measures, albeit mainly in the absence of issues that may complicate calculation, such as loops and disconnected components.
A notable inconsistency did, however, arise in the analysis of the two-mode reference networks. Of the tested software, only UCINET offered measures tailored specifically for two-mode networks. Correspondingly, analytic results reveal a bifurcated pattern when comparing calculations of degree and closeness in UCINET to those of other software packages, along with slight differences in betweenness measures in small two-mode networks. No other programs demonstrated a pattern that corresponded to that of UCINET when measuring degree, closeness, or betweenness. When measuring eigenvector centrality, however, three additional programs (igraph, ORA, and Gephi) evince a bifurcated pattern that corresponds very closely with UCINET. (
Scatterplot matrices comparing degree and eigenvector output for a
There was a surprising amount of inconsistency in the most basic measure:
The problem of different measures residing under the same name is exemplified when considering
Three programs – UCINET, ORA, and Pajek – were consistent with one another in measuring eigenvector centrality in all three variations of one-mode networks. This is notable because one of those three, Pajek, provides “hubs and authorities”, which generates two independently scaled measurement vectors, of which the authorities vector was consistent with eigenvector centrality in the other two programs. This is the one case where a centrality measurement that differed from the classic citation was identified under a different name.
Perhaps the most conspicuous case of inconsistent calculations is
Only the R package sna produced an error message without numerical results rather than closeness values as stipulated in Freeman (1979), because it treats the distances between disconnected components as infinite. There is also a stern admonishment in the package details against calculating closeness centrality in networks with disconnected components. All other software produced closeness calculations without requiring that smaller components first be removed.
The analysis of disconnected networks using closeness produced widely varied output. Correspondingly, all tested software provided some means for dealing with disconnected components. In most software, the method for defining the distance between disconnected components was incorporated into the measure. ORA and the R package igraph appear to default to substituting the number of nodes for undefined distances, whereas Gephi appears to report undefined distances as zero and omit disconnected nodes from calculation. Pajek sets undefined distances to zero and calculates closeness only within each component.
Of the software tested, UCINET offers the most options for applying closeness centrality to one-mode networks with disconnected components. The user may choose one of four options for dealing with the distances between disconnected components: (1) substitute the number of nodes in the graph for the undefined distance; (2) substitute the maximum distance, plus one (the default setting); (3) treat undefined distances as missing and assign no value to isolates; and (4) set undefined distances to zero and calculate closeness only within each component. Those four options, combined with three options for scaling output (summed distances, averaged distances, and Freeman normalization) present the user with 10 combinations of options The option of treating undefined distances as missing is scaled in only one manner.
Aside from differences in how output was scaled, there was essentially no variation in calculating
It bears repeating that the programs tested displayed relatively little variation when analyzing reference graphs – those with neither loops, nor disconnected components. The variation that was present in the centrality output from the reference graphs appears more likely to have arisen from differences in opinion on preferred methods of calculation in situations that vary from the ideal of connected one-mode graphs with non-recursive edges.
The reference graphs make it clear that – with only a few exceptions – those responsible for developing and maintaining each of these programs have done an admirable job of benchmarking their programs against others and correcting unintentional software differences. However, network topology that diverges away from the “ideal” reference graph reveals that there is a great deal of disparity in the analytic assumptions that are built into software used to calculate such measures. The problem that arises from this lack of understanding and agreement within the social network analysis community is that it puts both analysts and the field itself at a disadvantage by introducing unnecessary noise into analyses and communication within the community.
Certainly, for those analysts whose data are similar to our reference graphs (i.e., no loops, no isolates or other disconnected components) the low variability in measurement definition and implementation is good news. The lack of variation in the output for the four reference graphs indicates that the programs used in this study agree in their standards for the calculation of basic centrality measures under the most basic and favorable conditions. The differences that resulted from other conditions, however, underscore the importance of an analyst’s familiarity with their choice of software, and the software used by those whose work they wish to use as an analytic benchmark. A good deal of care should be exercised to verify the precise method of calculation being applied and the settings – and defaults – that were employed for those calculations.
Centrality measures typically form the foundation of an analysis and if their implementation varies, more complex algorithms that involve one or more of these centrality measures may be producing measures or results that magnify this variability. Unfortunately, the variation in the measurement of centrality values
Clearly, analyzing the same network in the same way, using different software, can produce divergent results. If the implementation of a given centrality measure differs from one program to another, they are at best two
The clarity in communication that has characterized the development of methods in the field of social network analysis is less evident in software operationalization. This presents a threat to the validity of how those measures are employed. Definitional differences between programs exist and are not readily apparent to the average user. Although variations in centrality calculations hold the potential to increase the validity of a particular measure when applied in the appropriate context, such differences are frequently masked from the general user, resulting in increased potential for the misapplication of measures.
The present research has highlighted the value of knowing what one is getting into when considering new analytic software, and the importance of thoroughly vetting the topology of the network being analyzed. Add to that the tendency for most software packages to have some provision for porting data and/or measures between packages. The detected disparity in measures available in the current analytic packages indicates that such practices should be undertaken with caution – especially in cases where graphs contain loops, isolates, or disconnected components. If the basic measures differ between packages then it may be inadvisable to use the two packages together in an analysis that involves those measures.
The lack of consensus over how to operationalize the most common node centrality measures suggests some ontological variability within the social network analysis community. Disagreements over how various centrality measures should be operationalized would not be troubling if they were apparent to the user. But the differences highlighted above are far from clear to the average user. Lack of agreement over how to operationalize a measure is masked when a variety of approaches share a single name. The situation is exacerbated when software documentation does not clarify precisely which approach has been implemented.
The debate over how various network measures should be calculated is rich, and as old as the field itself. The community’s openness to new variations on established methods provides flexibility and a healthy diversity of analytic options. However, the advantages of such wealth are substantially diminished when the same measure is operationalized differently in each analytic package. Although a shared lexicon of terms and concepts exists within the social network analysis community, those terms and concepts are only generally – and not explicitly – applied. The programs used to perform social network analysis are disparate enough to create idiosyncratic analytic results.
The interfaces of each analytic package vary greatly and do not always default to the most commonly used variants of each measure. Without equivalence of measurement assumptions and nomenclature between programs that is easily accessible, the assumption of equivalence and portability of centrality measures in network analysis is lacking. This increased variability of centrality results may potentially affect more complex algorithms that incorporate these basic measures. These basic differences could be resolved with agreed-upon defaults and naming conventions for variants of a particular measure or algorithm.
It is important that the variants of each measure be identified as distinct variations of a centrality – or other measurement – theme. It is not enough to identify a measure generally as “closeness centrality” if it varies from the basic measure identified by Freeman (or Sabidussi) – which most all do. Instead, the measure should be explicitly identified as a particular variant in order to better emphasize its unique attributes and trade-offs.
To draw an example from another analytic field:
Naming conventions are important. If a measure or algorithm differs from others in order to address some given scenario or feature of network topology, then it bears unique attributes that more often than not constitute a trade-off at some level. It is far better for both the analyst and the community for these differences to become less opaque. Of the tested programs, UCINET appears to have gone the furthest in giving attribution to the different variants of each measure. Though, both R packages benefit from the explicit nature of specifying a measurement. This aids the analyst by further clarifying differences between analytic approaches. The community is aided in establishing reliability of measurements between programs; as such clarity makes it much easier to directly compare results from different programs.
It is not necessary for each program to offer every available option – though several have clearly taken steps in that direction. It is likely to be much more helpful to the social network analysis community at large if the measures and algorithms that a program offers are fully identified for appropriate application of their properties. Proper identification will simplify discourse and improve the communication of methods. Such improvements in communication within the field also translate into increases in measurement validity when a measure is identified as a specific variant, rather than just belonging to a general category or class of measures.
Lastly, it should be noted that a lack of agreement from within the community on something as fundamental as naming conventions hints at an arbitrariness that is surprising given the care and rigor of those who have established and expanded the field of social network analysis. Freeman (1984, 2004) has repeatedly made a compelling case for clarity and precision in communication – as facilitated by mathematical notation – being the factor that set social network analysis apart from similar fields that rely more on natural language for clarification. The benefits of such precision are, however, often frustratingly beyond the reach of those using social network analysis software.
Further, as researchers from other fields continue to adapt and adopt social network analytic methods, the use of standard, specific terms on the available analytic options provides a clarity that aids newcomers to social network analysis in seeing how their field can benefit by adopting a network analytic approach. But the converse is not the case: imprecise definitions need not constitute an invitation for some within the “hard sciences” to forego established network analysis methods in favor of feigning to invent them for themselves. Although co-option will likely continue, there has been an increase in the number of new entrants to social network analysis who give proper attribution (Freeman 2011). Clarity of communication will reinforce social network analysis as a mature and growing field, and deemphasize the perception of it as being a general perspective or a mere category of tools (Knoke and Yang 2008, Snowden 2005).
In most cases, it is possible to force the centrality output for a network containing loops or disconnected components to be relatively consistent across all six platforms employed above. But such actions frequently require transformations or other preprocessing in order to do so, and those steps are seldom stipulated since there is no real agreed-upon
The “correct” measure is the one that is best suited to handle the idiosyncrasies of the data an analyst holds. For the analyst to make this assessment, they first need to know the topology of the network they are analyzing; and next, specifically how a measure is meant to operate, and its underlying assumptions. A more complete approach includes asking which variation of the measure is available, the strengths and limitations of that version, and how reliably one or more programs produce accurate measures. We have identified program and inter-program reliability issues under varied conditions. Similar comparisons between other programs and under different conditions are strongly recommended when weighing whether to use two or more analytic programs in conjunction with one another. Further evaluations of inter-program reliability will benefit from adding more types of variation: e.g., directed graphs, density variations, clusterability variations. Ongoing research in this topic will continue to be important as new entrants continue to discover the scalability and utility of the tools and concepts of social network analysis for deciphering increasingly diverse networks with complex topological features.