In order to investigate the structure of ideas, or schemata, of social networks, Martin (2017) investigated a very unusual set of three social networks. These are delusional social networks of alternative personalities described by a patient undergoing therapy for multiple personality disorder (David et al. 1996), now known as dissociative identity disorder (Kihlstrom 2005). In order to do this, Martin (2017) uses random graphs, specifically the
Martin (2017) finds that one of the networks contains what he describes as a large “
He concludes that the logic of these networks is spatial, with long path lengths indicating a “large world”, and that the bias against large hollow rings is indicative of a belief that a chain of relationships that “goes away” in space is not going to return. In summary, Martin (2017) concludes with the hypothesis that the root schema for social networks is local and spatial.
In this work, I will re-examine these networks using a different random graph model, the exponential random graph model (ERGM), which allows some more flexibility in the structures it can model, and also allows use of additional data (such as nodal covariates) not available in the purely structural
In addition, I will repeat this exercise with several other empirical networks (human and animal social networks), and a fictional character network, and compare the results with those for the delusional social networks.
Martin (2017) describes a “hollow ring” of degree ten as “a cycle containing ten nodes, no pair of which have a distance in the graph lower than that in the cycle”, so that there are no “short cuts” between nodes in the ring Martin (2017:16). This definition makes intuitive sense, and seems precise enough for us to understand it unambiguously (and for Martin (2017) to count such structures), however, I believe it is useful to relate it to existing definitions in graph theory literature, in which this idea has already been defined precisely.
First, I prefer the term “cycle” rather than “ring”, as the former is well-known in graph theory, while “ring” is generally not used in graph theory, being better-known as an algebraic structure. In the following definitions, all graphs are undirected, and all graphs considered in this work are undirected.
A
An
A
The preceding definitions are all standard and well-known in graph theory, and can be found in any discrete mathematics or computer science textbook, e.g. Roman (1989), or Wolfram MathWorld, e.g. Weisstein (2020). The following definitions, however, are not so well-known.
A
where
Note that although finding the largest cycle and largest chordless cycle in a graph are both
1Unless
To illustrate these definitions, consider the graph shown in Figure 1. This graph has seven cycles. Four of them are chordless (1–4–5–9–3–2–1, 4–6–5–4, 5–6–7–8–9–5, and 1–4–6–7–8–9–3–2–1), however only three of the chordless cycles are geodesic: the largest chordless cycle is not geodesic as the paths 4–5–9 and 6–5–9 are “shortcuts” across the cycle.
This graph has seven cycles (one is Hamiltonian). Four are chordless and three of those are also geodesic (and, in addition, convex). The cycle in red (around the “outside” of the graph) is chordless, but not geodesic.
A cycle
i.e., the distance along the cycle between any pair of vertices in the cycle is less than the distance between them in the graph excluding the cycle. This is not such a well-known concept, but is described in Hellmuth, Leydold, and Stadler (2014). Geodetic (or geodesic) convexity in graphs (more generally, not necessarily of cycles) had earlier been discussed in Batten (1983), Farber and Jamison (1986), Farber (1987). Note that these papers on “isometric”, “geodesic”, or “convex” cycles do not cite each other or mention any equivalent definitions with different names, although Hellmuth et al. (2014) also discusses isometric subgraphs and isometric cycles:
A subgraph
Convex implies isometric (Hellmuth et al. 2014:125).
An
An
An
We can see that this coincides with the definition of a geodesic cycle, and also with the description of a “hollow ring” by Martin (2017). I prefer the term “geodesic cycle” as it makes sense mathematically given its definition, without confusing it with other properties such as atomicity, and it dates back at least as far as Negami and Xu (1986).
For an Erdős–Rényi random graph, the expected number of cycles of a given length can be calculated (Erdős and Rényi 1960; Takács 1988), as can the probability of a chordless cycle (Łuczak 1991), and the expected length of the largest chordless cycle (Łuczak 1993). Geodesic cycles in random graphs were studied in Benjamini et al. (2011), and the length of the longest geodesic cycles in random graphs in Li and Shi (2018).
For more complex families of random graphs, such as the
ERGMs provide a way of modeling network ties based on structure and attributes. Given an observed network, we estimate parameters for local effects, such as closure (clustering), activity (greater tendency to have ties), homophily, and so on. The sign (positive for the effect occurring more than by chance, negative for less than by chance) and significance tell us about these processes, taking dependency into account. That is, the parameter tells us about the process occurring significantly more or less than by chance, given all the other effects in the model occurring simultaneously. ERGMs are widely used in the social sciences to model social networks (Robins et al. 2007a; Lusher, Koskinen, and Robins 2013; Amati, Lomi, and Mira 2018).
An ERGM is a probability distribution with the form:
where
Which configurations
So given the observed network
Rather than estimating model parameters to fit an observed network and then, if an ensemble of random networks is required, simulating random networks from the model as required by ERGM,
Martin (2017) mentions that a difficulty with ERGMs, along with other techniques such as block modeling (White, Boorman, and Breiger 1976), is that they require strong structural assumptions. He further notes that, for more flexible modeling approaches, the difficulty in obtaining a converged model for theoretically interesting effects means that model convergence itself becomes the criterion for the model selected Martin (2017:6). For these reasons, Martin (2017) instead uses the
I contend, however, that rather than necessarily being a difficulty, the strong structural assumptions made by the use of ERGM is in some ways an advantage. The social circuit dependence assumption means that statistical significance of parameters tells us something about the corresponding local processes generating the observed global structure of the network. The configurations (and their corresponding parameters) in the model can therefore chosen not purely to fit local structure (such as degree homophily and local clustering) but to test theoretically relevant hypotheses about local processes.
A potential disadvantage of the
As for the potential difficulty in obtaining a converged ERGM model due to problems with degeneracy, for example, it is true that this can be a challenging task. However advances in model specifications such as the use of the “alternating” or “geometrically weighted” model terms (Hunter and Handcock 2006; Snijders et al. 2006; Robins et al. 2007b), curved ERGMs (Hunter and Handcock 2006; Hunter 2007), and alternative estimation algorithms (Hummel et al. 2012; Byshkin et al. 2018) mean these degeneracy problems can generally be overcome (Schweinberger et al. 2019).
As described in David et al. (1996) and used in Martin (2017), the patient “Patricia” drew maps in which she represented her personalities as nodes and relationships between them as edges. I manually coded the networks from Patricia’s hand drawings as reproduced in David et al. (1996). Three such drawings are available, for the years 1990, 1992, and 1993. Network visualizations of these three delusional social networks are shown in Figures 2, 3, and 4, respectively.
Patricia’s 1990 network. Visualization created using the network R package (Butts, 2008, 2015).
Patricia’s 1992 network. Nodes marked with a star are marked as “Christian Alters” in Patricia’s original diagram, and nodes colored yellow are included inside the “Sphere of the Blue Flame” in Patricia’s original diagram. Visualization created with Pajek (Batagelj and Mrvar, 2004; Mrvar and Batagelj. 2016).
Patricia’s 1993 network. Nodes marked with a star are from the box labelled “(Behind)” Patricia’s original diagram, and nodes colored yellow are included inside the “Sphere of the Blue Flame in Patricia’s original diagram. Visualization created with Paiek (Batagelj and Mrvar, 2004; Mrvar and Batageli, 2016)
The 1990 network has no nodal attributes other than names. However, the 1992 and 1993 networks have some nodal attributes. In the original drawings, shaded nodes are labeled as “Integrated alters” and I coded these as a binary attribute
Martin (2017) includes a detailed discussion of the structure and evolution of these networks, however due to their unusual nature, some more discussion of what they might mean is warranted. According to David et al. (1996:139–140):
It would be hard to conceive of a single mind capable of sustaining dozens of other minds, lives, and relationships, leaving aside the question of whether or not such a feat is intended.
and David et al. (1996:143):
The illusion of unity of consciousness is not exposed in MPD [multiple personality disorder] but is repeated over and over again. One illusory consciousness gives way (or joins) another. Each is a coherent autonomous homonculus [
Without referring to Freud, David et al. (1996) refers to MPD as “a means of dealing with social and interpersonal conflict” that “becomes fossilised and embellished…” David et al. (1996:143). Freud (1950:211–212), famously, had the view, that, in relation to paranoia:
In every instance the delusional idea is maintained with the same energy with which another, intolerably distressing, idea is fended off from the ego.
These theories of how such delusions may arise, however, give us no insight into what the lines in Patricia’s diagrams (network edges) might represent. According to Martin (2017:5):
In many ways, this returns us to core intuitions of early network analysts: that we are interested in some sort of intertwining of lives, but we might be unable to specify a single content to the nature of the ties involved. … Patricia’s maps presumably are indicating a more fundamental connection, one perhaps as obscure as our own strong feelings.
However, is not “a single mind capable of sustaining dozens of other minds, lives, and relationships” David et al. (1996:139–140) just (part of) what a novelist, or screenwriter, or any other fiction writer must have, to some degree? The characters (“coherent autonomous homunculi”) in a work of fiction are individual and coherent, but they perhaps have a common hallmark, from the author, so, leaving aside any consideration as to how “real” the alter are, it makes sense to consider Patricia’s alters just as we might consider fictional characters. They and their relationships are not completely arbitrary or random, but follow some schema which makes them meaningful.
In addition to the delusional social networks, I also use six other networks from a variety of domains.
The first is a fictional character network. The
The
The
The
The
The
Summary statistics of all the networks considered are shown in Table 1.
Summary statistics of the networks.
Network | N | Components | Mean degree | Density | Clustering coefficient | Assortativity coefficient |
---|---|---|---|---|---|---|
Patricia 1990 | 14 | 1 | 2.57 | 0.19780 | 0.40000 | -0.25000 |
Patricia 1992 | 85 | 2 | 2.21 | 0.02633 | 0.10345 | -0.46399 |
Patricia 1993 | 107 | 5 | 2.13 | 0.02010 | 0.10266 | -0.37400 |
Grey’s Anatomy | 44 | 4 | 2.09 | 0.04863 | 0.00000 | -0.22567 |
Dolphins | 62 | 1 | 5.13 | 0.08408 | 0.30878 | -0.04359 |
Zachary karate club | 34 | 1 | 4.59 | 0.13904 | 0.25568 | -0.47561 |
Kapferer tailor shop | 39 | 1 | 8.10 | 0.21323 | 0.38506 | -0.18269 |
Law firm friendship | 71 | 3 | 11.24 | 0.16056 | 0.44862 | 0.07948 |
High school friendship | 134 | 3 | 6.06 | 0.04556 | 0.47540 | 0.28718 |
Note: All networks are undirected. The statistics were computed using the igraph (Csárdi and Nepusz, 2006) package in R (R Core Team, 2016).
For each of the networks considered, ERGM models are estimated using the statnet software (Handcock et al. 2008; Hunter et al. 2008; Handcock et al. 2016a,b). Unless otherwise noted, the default MCMCMLE (Markov chain Monte Carlo maximum likelihood estimation) estimation method is used; sometimes the “Stepping” algorithm (Hummel et al. 2012) is used instead. Where a curved ERGM is used, this is noted by the estimation of the corresponding decay parameter
ERGM models of Patricia’s 1990 network.
Effect | Model 1 | Model 2 | Model 3 |
---|---|---|---|
Edges | -2.023 (0.507)*** | -2.030 (0.503)*** | -2.391 (0.669)*** |
GWESP | 0.697 (0.341)* | 0.710 (0.355)* | 0.861 (0.505) |
Degree 2 | 1.029 (0.573) | ||
Degree 2-3 | 1.006 (0.568) | ||
Degree 3 | 1.501 (0.584)* | ||
AIC | 90.62 | 90.61 | 84.12 |
BIG | 98.16 | 98.14 | 91.65 |
Note: Estimated with the statnet software (Handcock et al., 2008; Hunter et al., 2008; Handcock et al., 2016a,b). Estimated standard errors in parentheses. The GWESP
Only models that show acceptable convergence and goodness-of-fit according to statnet diagnostics are included in the results. For each network, the “best” model, according to AIC, BIC, and goodness-of-fit plots was selected, and 100 networks simulated from that model using statnet.
Random graphs from the
The geodesic cycle length distributions are then computed from these simulated networks, using the find_large_atomic_cycle algorithm (Gashler and Martinez 2012) as implemented in the Waffles machine learning toolkit (Gashler 2011) (software was written to call this toolkit routine in order to find geodesic cycles of all possible lengths). Cycle and chordless cycle distributions from the simulated networks are computed using the CYPATH program (Uno and Satoh 2014). Box plots were generated using the ggplot2 R package (Wickham 2009) and follow the default convention that the lower and upper hinges show the first and third quartiles, with the whiskers extending to the largest or smallest value no further than 1.5 times the interquartile range from the corresponding hinge.
Code and scripts written for this work are available from
Table 2 shows ERGM models for Patricia’s 1990 network. As is apparent from Figure 2 and discussed in Martin (2017), this network hardly resembles a typical social network at all, having only nodes of degree two or three, a large geodesic cycle (“hollow ring”) of length 10, and in fact resembles a connected caveman graph (Watts 1999). There is therefore little to be gained by attempting to interpret these model parameters, except to note that we can indeed obtain converged ERGM models which fit this network adequately, and that Model 3 indicates an over-representation of nodes of degree 3.
Table 3 shows ERGM models of Patricia’s 1992 network. As is clear from Figure 3, and discussed in Martin (2017), this network is far larger and very different from the 1990 network, and contains node attributes. We can conclude from these models that there is a significant tendency against preferential attachment on degree (the GWDEGREE parameter is positive and significant). This is not what we might have expected from the
ERGM models of Patricia’s 1992 network.
Effect | Model 1 | Model 2 | Model 3 |
---|---|---|---|
Edges | -4.588 (0.311)*** | -6.047 (0.561)*** | -8.392 (0.775)*** |
GWDEGREE | 1.397 (0.503)** | 1.908 (0.560)*** | 1.838 (0.572)** |
GWESP | 0.793 (0.173)*** | 0.764 (0.178)*** | 0.608 (0.177)*** |
Activity Christian | 0.734 (0.196)*** | 0.772 (0.217)*** | |
Homophil y Christian | 0.377 (0.205) | 0.305 (0.215) | |
Activity Integrated | 0.232 (0.310) | 0.150 (0.331) | |
Homophily Integrated | 0.427 (0.350) | 0.290 (0.380) | |
Activity Sphere | 0.646 (0.200)** | ||
Homophily Sphere | 2.744 (0.513)*** | ||
AIC | 854.40 | 843.20 | 782.60 |
BIC | 872.90 | 886.40 | 838.20 |
Note: Estimated with the statnet software (Handcock et al., 2008; Hunter et al., 2008; Handcock et al., 2016a,b). Estimated standard errors in parentheses. The GWDEGREE decay parameter is fixed at 0.5 and the GWESP
Models 2 and 3 incorporate nodal attributes as well as structural information, which is not possible with the
Table 4 shows ERGM models for Patricia’s 1993 network (Figure 4). As discussed in Martin (2017), this network is not very different from the 1992 network, and this is reflected in the ERGM models. One difference, however, is that the Christian and Integrated attributes, which had no statistically significant effects in the models for 1992 network (Table 3), now both have positive significant effects for activity and homophily.
ERGM models of Patricia’s 1993 network.
Effect | Model 1 | Model 2 | Model 3 | Model 4 |
---|---|---|---|---|
Edges | -4.941 (0.282)*** | -6.609 (0.470)*** | -9.914 (0.869)*** | -10.169 (0.901)*** |
GWDEGREE | 1.501 (0.443)*** | 2.384 (0.547)*** | 2.314 (0.550)*** | 2.249 (0.538)*** |
GWESP | 0.870 (0.163)*** | 0.835 (0.168)*** | 0.656 (0.169)*** | 0.643 (0.172)*** |
Activity Christian | 0.708 (0.190)*** | 0.747 (0.203)*** | 0.789 (0.203)*** | |
Activity Integrated | 0.540 (0.243)* | 0.507 (0.258)* | 0.580 (0.247)* | |
Homophily Christian | 0.473 (0.203)* | 0.453 (0.205)* | 0.512 (0.214)* | |
Homophily Integrated | 0.725 (0.252)** | 0.670 (0.260)** | 0.828 (0.271)** | |
Activity Sphere | 0.660 (0.192)*** | 0.729 (0.194)*** | ||
Homophily Sphere | 3.611 (0.730)*** | 3.600 (0.744)*** | ||
Activity Behind | 0.631 (0.377) | |||
AIC | 1094.00 | 1062.00 | 975,60 | 975.00 |
BIC | 1114.00 | 1108.00 | 1035.00 | 1041.00 |
Note: Estimated with the statnet software (Handcock et al., 2008: Hunter et al., 2008; Handcock et al., 2016a,b). Estimated standard errors in parentheses. The GWDEGREE decay parameter is fixed at 0.5 and the GWESP
ERGM models of the other networks are shown in Appendix B.
Figures 5–7 show the distribution of geodesic cycle lengths and of the largest geodesic cycle length for Patricia’s 1990, 1992, and 1993 networks, respectively. In agreement with the results described by Martin (2017) using the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for Patricia’s 1990 network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for Patricia’s 1992 network n the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for Patricia’s 1993 network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
For the 1990 and 1992 networks, the ERGM does not fit the maximum geodesic cycle length or geodesic cycle distribution any better (or worse) than the
Distribution of geodesic cycle sizes (bottom) for Patricia’s 1993 network. The points shown as red diamonds joined by the red line are the values in the observed network, with the box plots showing the values in 100 networks simulated from the
Figures 9–14 show the corresponding results for the other networks. Unlike Patricia’s networks, in all of these networks the ERGM shows a good fit to the largest geodesic cycle sizes in the observed networks, and the fit to the overall geodesic cycle length distributions is also all acceptable for all of these networks.
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the Grey’s Anatomy sexual contact network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the dolphin social network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the Lazega law firm friendship network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the Zachary karate club network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the Kapferer tailor shop network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
Largest geodesic cycle size (top), and distribution of geodesic cycle sizes (bottom) for the high school friendship network. In the top plot, the dashed red line is the value in the observed network, with the box plots showing the values in 100 networks simulated from the
With two exceptions, the
In the high school friendship network (Figure 14), the ERGM fits the maximum geodesic cycle length better than the
Neither ERGM nor
This work suggests that geodesic cycle length distributions could be useful goodness-of-fit diagnostics for social network models, in addition to the usual ones of degree distribution, maximum geodesic distance, edge-wise and dyad-wise shared partners, and triad census, as used in statnet. In the case of empirical networks, that ERGMs, explicitly based on their local (social circuit) dependence assumption, reproduce well the (non-local) geodesic cycle size distribution is an encouraging confirmation that purely local interactions really can produce the observed global structures.
The high school dating network from the Add Health research program described in Bearman, Moody, and Stovel (2004) also notably contains a large geodesic cycle (“hollow ring”). Bearman et al. (2004) propose a normative proscription against four-cycles (“don’t date your old partner’s current partner’s old partner”), based on low number of four-cycles in their data relative to simulated networks. However, Rolls et al. (2015) find in a more sophisticated ERGM model of this data with acceptable goodness-of-fit that small numbers of four-cycles are generated. The (much smaller) fictional Grey’s Anatomy sexual contact network contains seven four-cycles (of which all seven are chordless and five are geodesic) – violating this proposed proscription could be because it makes compelling entertainment (Lind 2012). I could not find a converged ERGM with a four-cycle term for this network to explicitly test for over- or under-representation (neither could (Lind 2012)). But note the ERGM (but not the
It could be informative to repeat the modeling and geodesic cycle size comparisons with other fictional networks, specifically those with a single author, to test the hypothesis that such cases would also produce anomalous geodesic cycle length distributions like Patricia’s. However, fitting ERGMs to these is difficult. There is usually a main character, who is linked to most of the other characters, creating a single very strong “hub” node, which causes problems for ERGM model fitting. This problem can potentially be overcome by fixing the ties for that character, but I have not been able to get good converged ERGMs for the publicly available data sets I have tried (Les Misérables, David Copperfield, Anna Karenina, Huckleberry Finn). Note that this is not the case with Patricia’s networks: “Patricia” is not, as it were, the star of her own story in relation to her alters. I was able to fit all three Patricia networks well with ERGM, as well as the other six networks described earlier. ERGMs also have the advantage of allowing node attributes, while the
This difficulty with ERGMs (having to only use data for which a converged model with good fit can be obtained) is one reason Martin (2017) gives for using