| 01 oct. 2020

À propos de cet article

Introduction

I am delighted to see Stivala’s piece on geodesic cycle length, which responds to and goes considerably beyond my 2017 JOSS. This article (1) regularizes the terminology I used; (2) replicates my analyses using exponential random graph models; and (3) applies these models to other data sets to examine the degree to which these models predict geodesic cycle length. All of these constitute a welcome (and impressively done) contribution. Yet, I also have a sense that some of the motivation of this paper is to establish the superiority of the ERGM approach, and to treat all others as, at best, fallbacks.

1It should be noted that the version of this paper that I read as a reviewer, and on which I had suggested commenting, did not include the

Here, I will argue that the social networks community is increasingly moving towards an ill-considered ritualization of ERGMs, and in such a way to undermine the distinctiveness of network analysis/mathematical sociology, which had been the great hold-out against the “saming-of-everything” associated with the ideational sink of mainstream sociology. If we do not change course, we will import into our own field the contradictions that have, in the past generation, been recognized, but not solved, in mainstream statistical practice. I first address the contribution made by Stivala to the substantive problem at hand, then discuss the ritualism in current usage. I propose that close consideration demonstrates that the attributes of the ERGM model that Stivala suggests makes it superior to other techniques are more deleterious than advantageous. I try to demonstrate the collapse of the ERGM into the general linear modeling paradigm tends to leads us, and in this case, has led Stivala, to make the same interpretive errors that bedevil current social statistics more generally. I close by making a few suggestions as to where we can go in the future.

Patricia’s network in comparative perspective

Regarding the substantive claims, I think that Stivala here shows some of the advantages of using the same model on many networks. He confirms my arguments about the geodesic cycles being surprisingly large, and surprisingly small, in Patricia’s first two graphs (respectively). Stivala also notes that this is not quite true for the final graph, an analysis I had not done, as this graph had seemed to me to be the same as the second except for the addition of a heteroplanar component which complicates things. (I think that analyzing the two planes together leads to problematic results, but I still should have done the analyses that Stivala does and reported these results.) Further, Stivala takes a set of both real world and fictional networks and shows that in none of these does a straightforward, out-of-box-parameterization version of an ERGM fail to reproduce the largest geodesic cycle, or indeed the distribution of geodesic lengths. As Stivala says, this supports the argument that Patricia’s mental model was such that the geodesic was an important consideration. Two different null models (the

Thus Stivala provides much stronger evidence than had I that there is something comparatively unusual in the networks made by Patricia. Whether this supports my argument as to the fundamentally

Ritualism

In my

Cases in which researchers change the model or drop data to achieve fit may be less common now than when I wrote

2However, I do note that Stivala saw fit to specify that “Only models that show acceptable convergence and goodness-of-fit according to statnet diagnostics are included in the results.” This does seem to imply that not all models

Of course, sometimes science tells us that our first questions were unanswerable, and redirects us to ones that we

I also do not deny that ritualism can help a field move towards increased reliability, the value of which is not to be underestimated. The opposite pole of ritualism – frenetic individualist innovation, never doing the same things twice – is just as incompatible with scientific advance as is ritualism. It may sound glorious to call for an end to fundamentalist orthodoxy, and to “let a thousand flowers bloom.” But when every piece is both a substantive claim and a methodological innovation (which did tend to characterize the earlier period of social networks research), something is wrong, even if a good time is had by all. What we should be prizing, then, are robust techniques that allow for comparable, theoretically relevant, answers across a wide variety of data sets. Stivala is confident that the ERGM is just such a technique. I am not so sure, and I

3We should remember the short life of “best practices” in this field: since the development of the first

This seems like a good time to consider the purported advantages. But before simply assuming that we know what to tally up as a plus and what a minus, it is worth being clear as what we want our models to

History

In the late twentieth century, ideas coming from mathematical sociology had suggested some strong models for informal social structures, especially the notion of balance, still a topic of serious investigation today (Rawlings and Friedkin 2017). Since those structures were not observed in all their clarity in sociometric data, there was an attempt to make statistical tests for

Their next major effort (Holland and Leinhardt 1981), although mathematically related to the triad analysis, seemed to have the opposite problem. Individual attractiveness and expansiveness (tendency to make choices) were estimated, along with a tendency to reciprocity, but this required getting rid of almost all the structure, leading Breiger (1981) to use what was for him extremely strong language: he considered this “inappropriate.” The choice of the exponential function taken by Holland and Leinhardt (1981), despite the attendant difficulty of dealing with the normalizing constant, made a great deal of sense as a general approach. It seemed that there was an expectation among researchers that with grunt work, other probability distributions, presumably less individualistic, would be added to the _{1}, leading to a large family of interpretable models that generated probability distributions of graphs.

Shelby Haberman had noted the relation of the _{1} model to loglinear models, and Stanley Wasserman jumped on this, trying to push a general loglinear modeling framework as our “go to” for social network analysis (the reason Wasserman and Faust (1994) devote so much space to this). But Frank and Strauss (1981) had recognized the significance of Besag’s (1974) work on the Hammersley–Clifford theorem for statistical graph theory. This theorem demonstrates that the probability distribution of a “Markov” graph – one where two ties are conditionally independent if they are between four distinct nodes – can be factored, as it is a Gibbs distribution. This is, I want to emphasize, a

Strauss and Ikeda (1990) then showed that Besag’s (1975) pseudo-likelihood approach could be used for such graphs. As Stivala points out, there had been concerns about degeneracy from the start – models that worked well when anchored to a spatial substrate, like those in physics and in geographical science, when transferred to networks flapped about like a poorly tied down piece of plywood tied to a car roof. But the pseudo-likelihood estimates behaved better, and Wasserman and Pattison (1996; Pattison and Wasserman 1999) seized upon this as the way forwards as a general approach. Indeed, they christened it

But just as the very success of Nelder and Wedderburn’s unification came at a great cost of allowing sociologists to have a single (and singularly false) vision of society in their minds (Abbott 1988), so the replacement of

The ELM

Let me back up for a moment, to one of the earliest models for random graphs, the Erdős–Rényi (1959) model. It is quite common to hear or read that the Erdős–Rényi model is a probabilistic random graph model characterized by the number of nodes

4And we want to attribute it to Erdős because he was such a trip!

And I believe that the parametric hegemony may have some unhappy side-effects.Harrison White, whose great mathematical vision, based on Lévi-Strauss’s structuralism, was really the inspiration for much of network analysis worthy of the name, once wrote a minor paper, “Parameterize!” (2000). He noted that the great work of his that energized social network analysis – his studies of kinship systems, and then his transposition of these to informal networks via the notions of structural equivalence and role algebras – had nary a parameter in it. He considered this a fault, and was excited that, in his work on markets, he was going to be able to reduce the variation into a single exponentiated parameter. He hoped to do what science does – to look for relations between invariants. What sociology does is something very different, and it is (unlike most physical sciences) based on the Great Divide between the left hand side and the right hand side, and the notion that all parameters are fundamentally of the same (intellectual) nature. I wish that he had warned his readers that not

In terms of programming convenience, there is much to be said for the capacity to reconceive any formal data analysis as an application of a linear model of some very general form. But in terms of the direction of our capacity to generate important arguments about the world, especially structural models, it can prove deleterious. I propose that we think not so much about the

The problem of covariates

Recall that when the excitement began for what became the ERGM, it did not have to do with the fitting algorithm used, nor with the notion that parameters were maximum likelihood estimates, but with the finding that a Markov graph could (assuming homogeneity of local effects) be factored in a

Stivala takes for granted that it is a point in favor of the ERGM that it can take various other covariates into account. Indeed it often

5There is much to admire in the derivation of the geometrically weighted effects that have been used to damp the wild behavior that led to degeneracy. But even this is not always enough, and practitioners have found that for some data, they need to “ride the brakes” by including other parameters that introduce some heterogeneity to the edges. When refusing to get rid of my old car, I swear to my wife that the automatic choke on the carburetor works fine – you just have to stick a fork in it when it’s cold. These definitions of “work fine” display the admirable loyalty of the romantically attached, but are not the same “work fine” as others have a right to expect.

). But are we sure that we6And I do not even want to go into the many cases of incorrectly interpreted ERGM models; even when they are correctly specified (and it is easy for complex models with covariates to be incorrectly specified), nested terms are notoriously tricky for even experienced analysts to deal with, and many articles get published that rest on misinterpreted parameters.

This point was made quite clearly by Lieberson (1985). The factors that shape an occupational structure, he insisted, are not the same as those that explain who, conditional on the existence of this structure, get what job. Yet, the ELM approach to mobility tended to lead people to blur these two and make implausible counterfactuals (if everyone goes to college, everyone will be a professional). So, too, if there are structural properties that we are attempting to isolate, we will mis-estimate these if we confuse individual covariates, that might help determine

Indeed, in some interesting cases, individual covariates are as likely as not to be endogenous to structure. If the actual structure of a high school is the ranked cliques model, the highest ranking cliques may control access to certain extracurriculars which, if “taken into account” in the model (say, by entering a dummy variable for SHARED_EXTRACURRICULAR), might lead us to reject the ranked cliques model. The point is not that it never makes sense to enter dyadic or individual covariates, but that we must beware of falling back into the strange sociological conviction that the best model excludes no significant predictors (only, perhaps, relying on a claimed causal order to refrain from including post-treatment confounders). This conviction is based on an untenable assumption – and this is one of the few planks on which both those dedicated to causal analysis and those opposed to it can agree – that partialed coefficients on the right side can be treated as “effects,” whose precise metaphysical nature is left unexplored. The use of the ELM reinforces this, and I think we can even see this in Stivala’s analyses.

Stivala argues that the ERGM has the advantage of being able to take nodal attributes into account, and does this in Models 2 and 3 of Table 3, the first suggesting that Christian alters have higher degree, and the second that (not surprisingly) those in the Sphere of the Blue Flame are more likely to be tied to one another than are random pairs of nodes. But if one examines Patricia’s maps, we see that this Sphere is only one of four large clumps of nodes (there are two components, the larger of which easily breaks into three pieces with one or two cuts to separate each). She has labeled this one, but what if she had labelled another (for example, that to the left of the Sphere of the Blue Flame?). We of course would find homophily here as well. What if she had labeled this “Sphere of Ju-Ju” after the most central actor? What if she in fact had labelled every “wheel” (every structure consisting of a hub and its spokes), and we were to take this into account? As we added more and more of these seemingly nodal attributes, our structural parameters would of course change. But in a particular way – we can imagine that, at the end of the day, we would no longer have any idea as to the nature of the structure, because we had misparameterized it as nodal attributes! It is just the nightmare that would cause Harrison White to faint in horror – all structure had been turned back into seemingly individual variables. And yet it is difficult to prevent this sort of regress once one decides to envision one’s job as fitting an ELM. We are ineluctably drawn to add parameters, and the only way to keep track of what we are doing is to treat each parameter as “an effect,” thereby reifying it and projecting into our vision of the world that which is the most convenient interpretation for each that we can think of. And I think we see the way the ELMification of the ERGM pushes our interpretations in Stivala’s discussion of Patricia’s maps.

What do ERGMs tell us?

The sorts of interpretive slippages I will point to in Stivala’s arguments are, I think, characteristic of the way in which sociology has had to make use of the ELM, and how increasingly we see social network researchers interpreting the ERGM. Widespread or not, however, these issues get to the heart of the choice before us, and so I want to look carefully at how, after all the work done, the results of the ERGM are interpreted. Stivala writes, “Given an observed network, we estimate parameters for local effects, such as closure (clustering), activity (greater tendency to have ties), homophily, and so on. The sign (positive for the effect occurring more than by chance, negative for less than by chance) and significance tell us about these processes, taking dependency into account. That is, the parameter tells us about the process occurring significantly more or less than by chance, given all the other effects in the model occurring simultaneously.” This is the way we generally write about our models in sociology. We tend to take the ambiguity of the term “effects” (which can refer simply to a certain type of statistical predictor, but carries connotations of causality) as a cover for stretching a bit beyond what we really are doing.

7We lead like a runner on first base, until that reviewing pitcher trained at Harvard snaps the ball to the first baseman, at which point we frantically dive back to safety, hitting the dirt on our bellies!

But in this case, I do not think that Stivala is correct to say that parameters in ERGMs should be interpreted as giving us

8 Snijders (2001) shows this to be true in some conditions of detailed balance, though see Leifeld and Cranmer (2019) for more.

Even in a TERGM, these parameters must be understood asA possible example here is raised by Stivala, in noting that Bearman, Moody, and Stovel (2004) find structures of romantic attachments to approximate spanning trees, but also to form structures with a large geodesic cycle. As Stivala notes, while “Bearman et al. (2004) propose a normative proscription against four-cycles (‘don’t date your old partner’s current partner’s old partner’),” there could be other reasons for such a structure. Stivala makes reference to one such alternative, an extremely-hard-to-fit ERGM that was able to produce similar data. But such structures will also arise if boys and girls are distributed in a space of likeness (say, a two-dimensional one), they have a relatively low degree, and there are strong tendencies for them to have relationships to those close in space.

9Demonstration of this is left as an enjoyable exercise to the reader.

One might acknowledge the force of this critique but excuse such interpretive slippage (from parameter to effect, from effect to process) as a commonly cut corner in social statistics – we rarely explicitly remind the reader that our results are only interpretable if our assumed model is correct (one might argue), and so most of us bear this in mind as a mental reservation, making the omission of explicit mention innocuous. But such an omission cannot be seen as innocuous if it is made as part of an argument for the superiority of a certain model!

One might also accept my argument in principle, but demand that evidence be shown of a concrete misinterpretation. Such evidence may rarely be forthcoming, if all we have is a single model with parameters crying out for tendentious interpretation. This is where the virtues of comparing results across

10I have corrected what I believe to be typographical errors in this sentence.

degree (the GWDEGREE parameter is positive and significant). This is not what we might have expected from theOne will note here the unwarranted leap from the GWDEGREE parameter, which really describes the ceteris paribus degree

I have the feeling that many social networkers go from the admirable properties of the

11I know this because I once urged a student at a conference to use SIENA for his/her project, as it would be so elegant, pointing out that Snijders was literally quite close at hand if there were some issues in adapting the model to the data structure. “I already asked him,” the student reported sadly, “and he said he didn’t think the assumptions of the model were satisfied by my data, and so he wouldn’t do it.”

) We make assumptions because weEven if the model is correct, its coefficients are not necessarily indicative of any particular

How should we interpret ERGMs?

What do we want from ERGMs? One common answer among non-network-researchers is that we want them to adjust our models for dyadic data to deal with the statistical non-independence of observations. For example, someone is interested in high school friendship formation, and has a model including observed covariates, but a conventional logistic regression on these covariates, even if the model is correct, will not reach the maximum likelihood estimates of the parameters because of the violation of conventional sampling axioms. Since, however, we invariably do

12To the extent that such models are really oriented to a rigorous test of whether

One possibility would be to return to the notion that we are attempting to generate known distributions of graphs – the parameters are merely a flexible way of doing what was being done in, say, the U|MAN analysis via combinatorics. (This is the interpretation of Fuhse 2018.) We want to preserve a theoretical understanding of the nature of the predictions, but the precise value of the parameters is uninteresting. But if this is the case, then a method like that of the

We see some residue of this way of thinking about ERGMs in the concern with ^{2} were considered to be “bad models.” We no longer think this way – the degree of residual variance can be high in a model with important (and not merely statistically significant) predictors (and, as Mike Hout used to say, “Who wants to live in an ^{2}=0.9 world, anyway?”). But in ERGM modeling, “fit” is used (quite properly) in a different way – if we cannot reproduce the graph statistics, then we are not actually generating the family of graphs that we might be claiming is the class which includes the observed.

It is, therefore, quite understandable that fit becomes of great concern to those using ERGMs. Still, fit is only a means to an end of making meaningful statements about the world – it is not an end in itself. We know that bad models (e.g., those that condition on post-treatment confounders, for those doing causal modeling) can have better fits than good models. Yet I see with the current use of ERGMs a tendency to slip into a working consensus that one’s job is to fit data, and the better the fit, the better job has been done – even if this does not advance our understanding. I think there are two problems with this. One is that it can lead researchers to prefer ad hoc elaborated models that fit any particular case (or that

For this reason, I think that the out-of-the-box models that Stivala uses are the

The second problem with the emphasis on fit is that it actually flies in the face of what I at any rate hold to be the most successful procedure for using statistics to build social scientific knowledge, namely, falsification. The key evidence supporting my argument about the root social networks schema being spatial was not the capacity of a spatial network model to fit the data – this capacity should be obvious upon inspection of the raw data. Rather, it was the

Let us see whether fit is a good guide for determining when a model is helping us by considering the difference between the ERGM results for Patricia’s 1992 and 1993 networks. Stivala compares the results of the ERGM and the

The implication seems to be that we should be comparatively happier with the ERGM (compared to the

The model fits this statistic – but what does that tell us about the

Our knowledge, then, comes less from fit than from

Let me give as an example of how we best learn from ERGMs using a paper by Gondal and McLean (2013), studying data on loans among renaissance Florentines. They begin with the simplest random graph models, which fit terribly, but they show that much of that comes because these ignore reciprocity. Already, without a good model in hand, we have learned something (and something non-trivial – we might imagine that data on loans could tend

Had Gondal and McLean

For the future

I am grateful to Stivala not only for taking seriously (as do I myself) the analysis of these somewhat strange data, but, by placing this analysis in comparative perspective, strengthening the conclusion. And I am grateful to Stivala for connecting this idea to proper mathematical vocabulary. But I am also grateful to Stivala for giving us the opportunity to reflect on current practice, and on where we are going. It would be a shame if we were to commit ourselves to a monoculturalism that aligns us with the thoughtways of the ELM, precisely what structural sociology was trying to escape. But I do not mean to argue that the problem with the current use of the ERGM-as-ELM is that it is supporting a rather fundamentalist orientation among some adherents (and I certainly do not think that this characterizes all users, let alone the pivotal developers of the method). Rather, I think what we need to do is to re-awaken our interest in the ERGM-not-as-ELM.

Whether or not any particular model cast as an exponential function and fit using an MCMC method is an advance, we should recognize that the core approach that lies at the heart of the ERGM is a beautiful and generative idea. Indeed, it turns out that the same fundamental vision underlies the random graph model, and the Gibbs sampling used to identify it, as well as being deeply connected to the pseudolikelihood as well. This is the Boltzmann equation, and the notion that there are consistent analogies that can be made between graph configurations and physical systems with variable energy levels. It is this sort of interest in pursuing elegant and rigorous mathematical derivations that separated the field of mathematical sociology from statistics-as-generally-understood (which concerned itself largely with issues of inference).

13If we are not going to try to rigorously make use of the potentials of the exponential distribution, we might do better to use a completely different approach. Twenty years ago, anyone who used an OLS for dichotomous data was considered to be criminally incompetent; today the rechristened “Linear Probability Model” is found superior in interpretability for many purposes (Angrist and Pischke 2009). It is quite possible that in 20 years, many researchers will have switched to OLS models with random effects for network questions that are not about structure and where the networks have density .2<

We should be more enthusiastic about the equivalent of an Ideal Gas Law that can set the direction for plausibly cumulative social science than Actual Gas Fits that only predict the past using parameters that we all agree to pretend are meaningful. There is nothing outlandish in proposing that we pursue such