Otwarty dostęp

Constructing legislative networks in R using incidentally and backbone


Zacytuj

Political network data can often be challenging to collect and clean for analysis. This article illustrates how the incidentally (Neal, 2022b) and backbone (Neal, 2022a) packages for R can be used together to construct networks among legislators in the US Congress. These networks can be customized to focus on a specific chamber (Senate or House of Representatives), session (2003 to present), type of legislation (bills and resolutions), and policy area (32 topics).

The article is organized into four sections. The first section provides a brief overview of the legislative process in the US Congress, and discusses how information on legislators' sponsorship of legislation can be used to construct co-sponsorship networks. The second section introduces how data can be obtained using the incidentally package, and how networks can be constructed from these data using the backbone package. The third section presents a series of replicable examples that illustrate how these steps can be combined to yield custom political networks; the replication code is available at https://osf.io/kjgrz/. Finally, the fourth section highlights some limitations of these methods and identifies directions for future research and software development.

Background

Political networks come in many forms that can be distinguished by both their nodes (politicians, institutions, states, etc.) and their edges (alliance, opposition, collaboration, etc.; Victor et al., 2017; Knoke, 1994; Knoke et al., 2021). In this article, I focus on one type of political network: networks of legislators in the US Congress, connected by ties of ideological alignment, political alliance, and legislative collaboration inferred from their bill (co-) sponsorship activities. These types of networks have provided insight into a range of congressional phenomena, including polarization (e.g., Neal, 2020), bipartisanship (e.g., Rippere, 2016), legislative effectiveness (e.g., Tam et al., 2010), and gender roles (e.g., Neal et al., 2022). In this section, I provide a brief overview of the legislative process in the US Congress, and of the logic of legislative co-sponsorship networks.

The legislative process

The US Congress is composed of two chambers: the Senate that contains 100 Senators with 2 representing each state, and the House of Representatives that contains 435 Representatives with the size of each state's delegation depending on its population size. For example, in 2020 Alaska had a population under 1 million and was represented by a single Representative, while California had a population of nearly 40 million and was represented by 52 Representatives. During each two-year session of Congress, these legislators meet to create new federal laws following a multi-step legislative process illustrated in Figure 1 (Smith and Riddick, 1948; Frishberg, 1976).

Figure 1

The legislative process.

The process begins when a legislator introduces a bill for consideration in their own chamber. This individual is known as the bill's sponsor, while other members of the same chamber can express support for the legislation by joining the bill as a co-sponsor. Upon introduction, the Congressional Research Service classifies the bill into one of 32 broad policy areas, such as “Education” or “Commerce”; a complete list with descriptions is available at https://www.congress.gov/help/field-values/policy-area. The newly introduced bill is debated, revised, and possibly voted on in the originating chamber. If the bill passes in the originating chamber, it is then debated, revised, and possibly voted on in the other chamber. If the bill passes both chambers, it is sent to the President, who may sign it into law, not sign it, or veto it. As Figure 1 illustrates, there are many ways for a bill to fail, and few ways for it to become a law.

This represents a simplified version of a complex process that can involve a wide range of political and procedural maneuvers. However, three features of this process are particularly important in the context of constructing legislative networks. First, a bill's sponsor is the first person named in the bill, but is not necessarily the bill's primary author or strongest supporter. Therefore, there may be little practical difference between a bill's sponsor and its co-sponsors. Second, legislators may introduce four distinct types of legislation, however only bills and joint resolutions can become law, while simple resolutions and concurrent resolutions are used only for procedural or ceremonial matters. Therefore, bills and joint resolutions are more consequential. Finally, while all bills have a sponsor and possibly cosponsors, most bills are never voted on, even in the originating chamber. Therefore, the (co-)sponsorship process provides substantial information, while the voting process provides relatively limited information.

Co-sponsorship networks

A co-sponsorship network can be constructed from information on legislators' bill sponsorship activities. In a co-sponsorship network, two legislators are connected when they have (co-)sponsored the same bills. Formally, bill sponsorship data can be represented as an incidence matrix I where Iik=1 if legislator i (co-)sponsored bill k. Multiplying this matrix by its transpose (i.e., I×I′; bipartite projection) yields a legislator network represented as an adjacency matrix A, where Aij indicates the number of bills that both legislator i and legislator j (co-)sponsored. The political network literature contains numerous examples of, and theorizing about, co-sponsorship networks not only in the US Congress (Neal, 2014, 2020; Neal et al., 2022; Aref and Neal, 2020, 2021; Ringe et al., 2017; Fowler, 2006a, 2006b; Kirkland and Gross, 2014; Rippere, 2016; Tam et al., 2010; Zhang et al., 2008), but also in US state legislatures (Bratton and Rouse, 2011; Clark and Caro, 2013; Kirkland, 2011, 2014), and in legislative bodies around the world (Aleman and Calvo, 2013; Baller, 2017; Fischer et al., 2019; Micozzi, 2014; Briatte, 2016).

Under most circumstances it would be impractical to collect network data directly from legislators because they are busy, and because they may have strategic motivations that lead them to misrepresent their true political relations. Therefore, most legislative networks are measured indirectly through secondary data. Many such indirect measurement approaches exist, but co-sponsorship networks offer advantages over many of the alternatives. First, legislators' political ties could be inferred from their shared committee memberships (e.g., Porter et al., 2005). However, committee assignments are often made by party leadership based on seniority and other strategic considerations, whereas legislators' decisions about which bills to sponsor are more independent. Second, legislators' ties could be inferred from their shared roll call votes (e.g., Andris et al., 2015). However, roll call votes are taken on only a small subset of bills, whereas information about sponsorship is available for all bills. Finally, legislators' ties could be inferred from their co-participation in press and other events (e.g., Desmarais et al., 2015). However, there is no comprehensive database of legislators' event participations, whereas bill sponsorship is an official act recorded by the legislative body.

Although co-sponsorship networks offer many practical advantages over alternative approaches to measuring legislators' political networks, it is important to be clear what they measure. The interpretation of a co-sponsorship networks depends on the depth of inference a researcher is able to justify making from the non-network data on bill sponsorship. Directly (i.e., without making any inferences), edges in a co-sponsorship network measure whether or how often two legislators (co-)sponsor the same bills. By making an initial but relatively plausible inference, these edges might be interpreted as representing legislators' ideological or policy alignment because they identify cases where legislators supported common causes. A deeper inference might contend that the edges represent political alliances, while a still deeper inference might view them as representing active collaboration in the legislative process (Kirkland, 2011). These deeper inferences, while potentially plausible, are still inferences that go beyond the data. For example, it is possible that two legislators with similar policy agendas would sponsor the same set of bills, but would do so with no knowledge (and thus no alliance or collaboration) of the other.

Constructing legislative networks
Obtaining data with incidentally

The incidentally package can be installed in R from CRAN with install.packages(“incidentally”) and loaded for use with library(incidentally) (Neal, 2022b). The incidentally package provides a range of functions for generating and analyzing incidence matrices and bipartite graphs. Of particular interest here, data on bill sponsorship in the US Congress can be obtained directly from the US Government Publishing Office using the incidence. from.congress() function. The basic format of the function is:

incidence.from.congress (

session=NULL,

types=NULL,

areas=“all”,

nonvoting=FALSE,

weighted=FALSE,

format=“data”,

narrative=FALSE

)

The session parameter specifies the session of Congress for which data should be obtained. At the time of writing, data are available from the 108th (2003–2004) session through the current 117th (2021–2022) session. The data for the current session are updated regularly as new bills are introduced and cosponsored.

The types parameter specifies which type(s) of legislation should be included. In the Senate this can include bills (s), simple resolutions (sres), joint resolutions (sjres), and concurrent resolutions (sconres). In the House it can include bills (hr), simple resolutions (hres), joint resolutions (hjres), and concurrent resolutions (hconres) Because only bills and joint resolutions can become laws, it will typically be useful to specify either types=c(“s”, “sjres”) for the Senate, or types=c(“hr”, “hjres”) for the House.

The areas parameter specifies the policy areas of bills to include. By default, the function includes all bills pertaining to any of 32 policy areas. However, the data can also be restricted to contain only bills focused on one or a subset of policy areas. The complete list of policy area classifications is available at https://www.congress.gov/help/field-values/policy-area. For example, specifying areas=c(“education”, “families”, “health”) would yield data only on bills pertaining to education, families, or health.

The nonvoting parameter specifies whether non-voting members of Congress should be included in the data. By default, non-voting Representatives from Washington DC, Puerto Rico, American Samoa, Guam, the U.S. Virgin Islands, and the Northern Mariana Islands are excluded. Because they do not vote, and therefore cannot play a role in the eventual passage of legislation, this may often be the most appropriate option. However, they can sponsor or co-sponsor bills, so there may be contexts where retaining these members in the data using nonvoting=TRUE will be useful.

The weighted parameter specifies whether a bill's sponsor should receive extra weight in the data. By default sponsors and co-sponsors are treated equally, which will typically be appropriate because there is limited practical difference between these two roles. In contrast, specifying weighted=TRUE will yield an incidence matrix I in which Iik=2 if legislator i sponsored bill k, Iik=1 if i co-sponsored k, and otherwise Iik=0. In the examples below, the construction of networks is illustrated using the sdsm() function. However, if the bill sponsorship data are weighted in this way, the osdsm() function, which implements an ordinally weighted variant, should be used instead.

The format parameter specifies the desired format of the output. By default, the function returns an object that contains (i) an incidence matrix of legislators and bills, (ii) a data frame containing legislator characteristics, and (iii) a data frame containing bill characteristics. Alternatively, specifying format=“igraph” will return a bipartite graph as an igraph object with legislator and bill characteristics stored as node vertices (Csardi et al., 2006). In either case, the legislator characteristics include their Bioguide ID, last name, party affiliation, and state.

The Bioguide ID can be used to link legislators with additional information from the Biographical Directory of the US Congress (https://bioguide.congress.gov/), as well as with other data such as their DW-Nominate ideology scores (https://voteview.com/data; Poole and Rosenthal, 1985) and legislative effectiveness scores (https://thelawmakers.org/data-download; Volden and Wiseman, 2014). The bill characteristics include the bill ID, introduction date, title, policy area, status, party of its sponsor, and number of cosponsors from each party. The sponsor's and cosponsors' party affiliations can be used to classify bills as partisan or bipartisan.

Finally, the narrative parameter specifies whether the function should display suggested manuscript text and citations. By default, this information is not displayed to avoid cluttering the R console with unnecessary output. However, for new users or for a final analysis it can be useful to specify narrative=TRUE because the suggested text can be pasted directly into a manuscript, which facilitates complete and consistent reporting of the analysis.

Constructing networks with backbone

The backbone package can be installed in R from CRAN with install.packages(“backbone”) and loaded for use with library(backbone) (Neal, 2022a). The backbone package provides a range of functions for extracting the backbone of networks, including bipartite projections such as co-sponsorship networks. While many of these functions are potentially relevant for political networks, here I focus on the sdsm() function, which implements the stochastic degree sequence model, because this model will often be the most useful for bill sponsorship data (Neal, 2014; Neal et al., 2021).

The basic format of the function is:

sdsm(

B,

alpha=0.05,

signed=FALSE,

mtc=“none”,

class=“original”,

narrative=FALSE

)

The sdsm() function takes an incidence matrix or bipartite igraph object B as its input, constructs its weighted bipartite projection, then identifies and retains only the statistically significant edges. In the context of a co-sponsorship network, it takes data on which legislators sponsored which bills, constructs a weighted co-sponsorship network, then yields an unweighted network in which legislators are connected if they co-sponsored statistically significantly more bills together than expected at random. This model's statistical test controls for the fact that some legislators (co-)sponsor many bills while others (co-)sponsor few, and for the fact that some bills have many (co-)sponsors while others have few.

The alpha parameter specifies the statistical significance level used to test each edge. By default, the function performs a one-tailed statistical test because it is evaluating whether a pair of legislators co-sponsored significantly more bills together than expected at random. The signed parameter can be used to modify this behavior. When signed=TRUE, the function instead returns a signed network in which legislators are connected by a positive edge if they co-sponsored more bills than expected at random, and are connected by a negative edge if they cosponsored fewer bills than expected at random. When a signed network is returned, the function performs a two-tailed statistical test.

The mtc parameter specifies whether a multiple test correction should be applied to the edge-wise statistical tests. The function must conduct many independent statistical tests – one for each edge – which can inate the Type-I error rate. By default, no correction is performed. However, any of the methods implemented in R's p.adjust() function can be specified. These methods offer options to control the familywise error rate (FWER) or false discovery rate (FDR).

The class parameter specifies the desired format of the output. By default, the output will take the same form as the input. For example, if an incidence matrix is supplied then an adjacency matrix will be returned, and if a bipartite igraph object is supplied then a unipartite igraph object will be returned.

Finally, the narrative parameter specifies whether the function should display manuscript suggested text and citations. By default, this information is not displayed to avoid cluttering the R console with unnecessary output, but can be useful for new users or for facilitating reporting of a final analysis.

Examples
The US house: then and now

This example illustrates the most basic features of the incidence.from.congress() and sdsm() functions by using them to construct networks of the US House of Representatives in the 108th session (2003–2004), and the in-progress 117th session (2021–2022).

We begin by obtaining data about the 108th session using:

I <- incidence.from.congress(

session=108,

types=c(“hr”, “hjres”),

narrative=TRUE

)

This function takes some time to run because it requires downloading the bill data, then parsing information about the 5429 bills and 115 joint resolutions introduced in the session. Immediately after obtaining and parsing these data, because narrative=TRUE, the R console displays some suggested manuscript text and citations:

We used the incidentally package for R (v1.0.2; Neal, 2022) to generate an incidence matrix recording Representatives' bill sponsorships during the 108 session of the US Congress.

By default, the returned object “I” contains an incidence matrix, a data frame of legislator characteristics, and a data frame of bill characteristics. We can examine a portion of each of these:

>I$matrix[1:2,1:2]

          HR5143    HR3972

Rep. Millender-McDonald  1   0

Rep. Foley        0   1

>I$legislator[1:2,c(1,3:5)]

   id        last party state

1 M000714 Millender-Mcdonald  D  CA

2 F000238       Foley  R  FL

>I$bills[1:2,c(1,2,5)]

  bill introduced   status

1 HR5143 2004-09-23 Introduced

2 HR3972 2004-03-16 Introduced

The incidence matrix indicates that Sen. Millender-McDonald (co-)sponsored HR5143 but not HR3972, while Sen. Foley (co-)sponsored HR3972 but not HR5143. The legislator data indicate that Millender-McDonald (bioguide ID=M000714) is a Democrat representing California, while Foley (bioguide ID=F000238) is a Republican representing Florida. Finally, the bill data indicates that HR5143 was introduced in September 2004 and HR3972 was introduced in March 2004, but that neither ever left the House.

The “data” format is useful for inspecting the characteristics of specific legislators and bills. However, for constructing legislative networks, it is more useful to obtain the data in the form of a bipartite igraph object using:

B <- incidence.from.congress (

session=108,

types=c(“hr”, “hjres”),

format=“igraph”

)

We can construct a simple legislative network from these data using N <- sdsm(B, narrative=TRUE). Immediately after constructing the network, because narrative=TRUE, the R console displays some suggested manuscript text and citations:

We used the backbone package for R (v2.1.0; Neal, 2022) to extract the un-weighted backbone of the weighted projection of an unweighted bipartite network containing 442 agents and 5497 artifacts. An edge was retained in the backbone if its weight was statistically significant (alpha=0.05) using the stochastic degree sequence model (SDSM; Neal, 2014). This reduced the number of edges by 68.9%, and reduced the number of connected nodes by 0.2%.

In this context, the “442 agents” are the 442 Represenatives that served during this session, and the “5497 artifacts” are the 5497 bills introducing during this session that used to infer their political ties.

Figure 2 shows the resulting network, with Republican Representatives colored red, Democratic Representatives colored Blue, and Independent Representatives colored green (there's only one; Rep. Bernie Sanders of Vermont, who later became a Senator). In this network, two Representatives are connected if they (co-)sponsored more of the same bills or joint resolutions than would be expected at random. We can clearly see the effects of partisan polarization, with separate clusters of Republican and Democratic Representatives, but we can also see that some Senators are more bipartisan than others. Given this network, there are a range of descriptive and inferential analyses we might perform. For example, we can characterize the level of partisanship by computing the network's assortativity (i.e. homophily) with respect to political party (r=0.893).

Figure 2

The 108th US House of Representatives.

We can repeat this process to construct the legislative network of the in-progress House of Representative during the 117th session:

B <- incidence.from.congress(

session=117,

types=c(“hr”, “hjres”),

format=“igraph”

)

N <- sdsm(B)

Figure 3 shows the resulting network. It is clear that by the 117th session the House of Representatives had become even more partisan. We can confirm this increase in partisanship by computing the new network's assortativity with respect to political party (r=0.992).

Figure 3

The 117th US House of Representatives.

The armed forces in the 116th Senate

The first example focuses on the House and includes all bills regardless of their content. However, we can also construct networks of legislators in the Senate, and we can do so focusing on the role of bills pertaining to specific issues.

We begin by obtaining the data using:

B <- incidence.from.congress(

session=108,

types=c(“s”, “sjres”),

areas=c(“Armed Forces and Security”), format=“igraph”)

Here, we specify types=c(“s”, “sjres”) to indicate that we are interested in bills and joint resolutions introduced in the Senate. We also specify areas=c(“Armed Forces and National Security”) to indicate that we are only interested in bills addressing the armed forces.

Next, we construct a legislative network from these data using N <- sdsm(B).

Figure 4 shows the resulting network. It is smaller than the House network because it contains only 100 Senators. Partisan polarization is still evident, however we observe more bipartisan collaboration on military issues. Again, there are many ways we might analyze this network. For example, we could use betweenness to identify the Senators who are most responsible for bringing Republicans and Democrats together around military issues (e.g., Sen. James Risch, betweenness=968, highlighted in the plot).

Figure 4

The 116th US Senate, based on armed forces bills, highlighting Sen. Risch.

Alliances and antagonisms in the 116th Senate

The prior examples focus on constructing networks where the edges identify legislators who sponsor more bills together than expected at random, and thus might be interpreted as alignment, alliance, or collaboration. However, we can also construct signed networks that capture both alliances and antagonisms.

We begin by obtaining the data using:

B <- incidence.from.congress(

session=116,

types=c(“s”, “sjres”),

format=“igraph”)

Here, we focus on the highly contentious 116th session of the Senate, which took place in the second half of Donald Trump's presidency. By default, we include bills addressing all policy areas.

Next, we construct the network from these data using N <- sdsm(B, signed=TRUE). Here, we specify signed=TRUE to indicate that we want a signed network where pairs of legislators who (co-) sponsor more bills together than expected at random are connected by a positive edge, but pairs of legislators who (co-)sponsor fewer bills together than expected at random are connected by a negative edge.

Figure 5 shows the resulting network. In this signed network, positive edges are green, while negative edges are red. We observe that the network is polarized into two distinct groups, which here closely match political party affiliations. The majority of positive “alliance” ties are within group, a pattern that Neal (2020) called “weak polarization.” However, because this is a signed network, we can also observe that many negative “antagonism” ties are located between the two groups, a pattern that Neal (2020) called “strong polarization.” The extent of strong polarization can be characterized by the signed network's degree of structural balance, which can be measured using the triangle index T (Aref and Wilson, 2018). Here T=0.937, indicating that 93.7% of all triangles are structurally balanced, and suggesting a very high level of strong polarization. In this context, the strong polarization visible in Figure 5 means the Senate is characterized by both within-party alliances and cross-party antagonisms.

Figure 5

Positive and negative links in the 116th US Senate.

Conclusion

The incidentally package offers tools for generating and analyzing incidence matrices and bipartite networks (Neal, 2022b), while the backbone package offers tools for extracting the backbone of networks (Neal, 2022a).

This article has demonstrated how these two packages can be used together to construct customized legislative networks of co-sponsorship in the US Congress, by session, by chamber, by bill type, by bill policy area, that are binary or signed.

To summarize the code required, a basic Senate igraph network can be constructed using:

senate <- sdsm(incidence.from.congress(session=<session number>, types=c(“s”, “sjres”), format=“igraph”))

Similarly, a basic House of Representatives igraph network can be constructed using:

house <- sdsm(incidence.from.congress (session=<session number>, types=c(“hr”, “hjres”), format=“igraph”))

The examples in this article illustrate ways that options can be used to modify these basic commands to construct more specialized networks, for example, that focus on bills pertaining to specific policies or that contain both positive and negative political ties.

These methods offer one practical option for researchers wishing to study legislative networks. However, they are subject to some important limitations. First, co-sponsorship networks are only one type of political network, and their interpretation as reecting meaningful political relationships such as alliance or collaboration requires a careful theoretical rationale. Second, the incidentally package currently provides access only to data from the US Congress starting in 2003.

Some of these limitations identify directions for future software development. For example, future versions of incidentally may include functions to obtain bill sponsorship data from other legislative bodies. Functions of these packages that are not demonstrated in this article also highlight directions for future research. For example, while this article has focused on constructing networks among legislators, these packages can also be used to construct networks among bills. Following the approach used by Doreian and Mrvar (2019) to study the US Supreme Court, such bill networks may provide insight into legislators' logical consistency.

eISSN:
0226-1766
Język:
Angielski
Częstotliwość wydawania:
Volume Open
Dziedziny czasopisma:
Social Sciences, other