Political network data can often be challenging to collect and clean for analysis. This article illustrates how the incidentally (Neal, 2022b) and backbone (Neal, 2022a) packages for R can be used together to construct networks among legislators in the US Congress. These networks can be customized to focus on a specific chamber (Senate or House of Representatives), session (2003 to present), type of legislation (bills and resolutions), and policy area (32 topics).
The article is organized into four sections. The first section provides a brief overview of the legislative process in the US Congress, and discusses how information on legislators' sponsorship of legislation can be used to construct co-sponsorship networks. The second section introduces how data can be obtained using the incidentally package, and how networks can be constructed from these data using the backbone package. The third section presents a series of replicable examples that illustrate how these steps can be combined to yield custom political networks; the replication code is available at
Political networks come in many forms that can be distinguished by both their nodes (politicians, institutions, states, etc.) and their edges (alliance, opposition, collaboration, etc.; Victor et al., 2017; Knoke, 1994; Knoke et al., 2021). In this article, I focus on one type of political network: networks of legislators in the US Congress, connected by ties of ideological alignment, political alliance, and legislative collaboration inferred from their bill (co-) sponsorship activities. These types of networks have provided insight into a range of congressional phenomena, including polarization (e.g., Neal, 2020), bipartisanship (e.g., Rippere, 2016), legislative effectiveness (e.g., Tam et al., 2010), and gender roles (e.g., Neal et al., 2022). In this section, I provide a brief overview of the legislative process in the US Congress, and of the logic of legislative co-sponsorship networks.
The US Congress is composed of two chambers: the Senate that contains 100 Senators with 2 representing each state, and the House of Representatives that contains 435 Representatives with the size of each state's delegation depending on its population size. For example, in 2020 Alaska had a population under 1 million and was represented by a single Representative, while California had a population of nearly 40 million and was represented by 52 Representatives. During each two-year session of Congress, these legislators meet to create new federal laws following a multi-step legislative process illustrated in Figure 1 (Smith and Riddick, 1948; Frishberg, 1976).
The legislative process.
The process begins when a legislator introduces a bill for consideration in their own chamber. This individual is known as the bill's sponsor, while other members of the same chamber can express support for the legislation by joining the bill as a co-sponsor. Upon introduction, the Congressional Research Service classifies the bill into one of 32 broad policy areas, such as “Education” or “Commerce”; a complete list with descriptions is available at
This represents a simplified version of a complex process that can involve a wide range of political and procedural maneuvers. However, three features of this process are particularly important in the context of constructing legislative networks. First, a bill's sponsor is the first person named in the bill, but is not necessarily the bill's primary author or strongest supporter. Therefore, there may be little practical difference between a bill's sponsor and its co-sponsors. Second, legislators may introduce four distinct types of legislation, however only bills and joint resolutions can become law, while simple resolutions and concurrent resolutions are used only for procedural or ceremonial matters. Therefore, bills and joint resolutions are more consequential. Finally, while all bills have a sponsor and possibly cosponsors, most bills are never voted on, even in the originating chamber. Therefore, the (co-)sponsorship process provides substantial information, while the voting process provides relatively limited information.
A co-sponsorship network can be constructed from information on legislators' bill sponsorship activities. In a co-sponsorship network, two legislators are connected when they have (co-)sponsored the same bills. Formally, bill sponsorship data can be represented as an incidence matrix
Under most circumstances it would be impractical to collect network data directly from legislators because they are busy, and because they may have strategic motivations that lead them to misrepresent their true political relations. Therefore, most legislative networks are measured indirectly through secondary data. Many such indirect measurement approaches exist, but co-sponsorship networks offer advantages over many of the alternatives. First, legislators' political ties could be inferred from their shared committee memberships (e.g., Porter et al., 2005). However, committee assignments are often made by party leadership based on seniority and other strategic considerations, whereas legislators' decisions about which bills to sponsor are more independent. Second, legislators' ties could be inferred from their shared roll call votes (e.g., Andris et al., 2015). However, roll call votes are taken on only a small subset of bills, whereas information about sponsorship is available for all bills. Finally, legislators' ties could be inferred from their co-participation in press and other events (e.g., Desmarais et al., 2015). However, there is no comprehensive database of legislators' event participations, whereas bill sponsorship is an official act recorded by the legislative body.
Although co-sponsorship networks offer many practical advantages over alternative approaches to measuring legislators' political networks, it is important to be clear what they measure. The interpretation of a co-sponsorship networks depends on the depth of inference a researcher is able to justify making from the non-network data on bill sponsorship. Directly (i.e., without making any inferences), edges in a co-sponsorship network measure whether or how often two legislators (co-)sponsor the same bills. By making an initial but relatively plausible inference, these edges might be interpreted as representing legislators' ideological or policy alignment because they identify cases where legislators supported common causes. A deeper inference might contend that the edges represent political alliances, while a still deeper inference might view them as representing active collaboration in the legislative process (Kirkland, 2011). These deeper inferences, while potentially plausible, are still inferences that go beyond the data. For example, it is possible that two legislators with similar policy agendas would sponsor the same set of bills, but would do so with no knowledge (and thus no alliance or collaboration) of the other.
The
The session parameter specifies the session of Congress for which data should be obtained. At the time of writing, data are available from the 108th (2003–2004) session through the current 117th (2021–2022) session. The data for the current session are updated regularly as new bills are introduced and cosponsored.
The types parameter specifies which type(s) of legislation should be included. In the Senate this can include bills (s), simple resolutions (sres), joint resolutions (sjres), and concurrent resolutions (sconres). In the House it can include bills (hr), simple resolutions (hres), joint resolutions (hjres), and concurrent resolutions (hconres) Because only bills and joint resolutions can become laws, it will typically be useful to specify either
The areas parameter specifies the policy areas of bills to include. By default, the function includes all bills pertaining to any of 32 policy areas. However, the data can also be restricted to contain only bills focused on one or a subset of policy areas. The complete list of policy area classifications is available at
The
The weighted parameter specifies whether a bill's sponsor should receive extra weight in the data. By default sponsors and co-sponsors are treated equally, which will typically be appropriate because there is limited practical difference between these two roles. In contrast, specifying
The format parameter specifies the desired format of the output. By default, the function returns an object that contains (i) an incidence matrix of legislators and bills, (ii) a data frame containing legislator characteristics, and (iii) a data frame containing bill characteristics. Alternatively, specifying
The Bioguide ID can be used to link legislators with additional information from the Biographical Directory of the US Congress (
Finally, the
The
The basic format of the function is:
The
The
The
The
Finally, the
This example illustrates the most basic features of the
We begin by obtaining data about the 108th session using:
This function takes some time to run because it requires downloading the bill data, then parsing information about the 5429 bills and 115 joint resolutions introduced in the session. Immediately after obtaining and parsing these data, because We used the incidentally package for R (v1.0.2; Neal, 2022) to generate an incidence matrix recording Representatives' bill sponsorships during the 108 session of the US Congress.
By default, the returned object “I” contains an incidence matrix, a data frame of legislator characteristics, and a data frame of bill characteristics. We can examine a portion of each of these:
The incidence matrix indicates that Sen. Millender-McDonald (co-)sponsored HR5143 but not HR3972, while Sen. Foley (co-)sponsored HR3972 but not HR5143. The legislator data indicate that Millender-McDonald (bioguide ID=M000714) is a Democrat representing California, while Foley (bioguide ID=F000238) is a Republican representing Florida. Finally, the bill data indicates that HR5143 was introduced in September 2004 and HR3972 was introduced in March 2004, but that neither ever left the House.
The “data” format is useful for inspecting the characteristics of specific legislators and bills. However, for constructing legislative networks, it is more useful to obtain the data in the form of a bipartite igraph object using:
We can construct a simple legislative network from these data using We used the backbone package for R (v2.1.0; Neal, 2022) to extract the un-weighted backbone of the weighted projection of an unweighted bipartite network containing 442 agents and 5497 artifacts. An edge was retained in the backbone if its weight was statistically significant (alpha=0.05) using the stochastic degree sequence model (SDSM; Neal, 2014). This reduced the number of edges by 68.9%, and reduced the number of connected nodes by 0.2%.
In this context, the “442 agents” are the 442 Represenatives that served during this session, and the “5497 artifacts” are the 5497 bills introducing during this session that used to infer their political ties.
Figure 2 shows the resulting network, with Republican Representatives colored red, Democratic Representatives colored Blue, and Independent Representatives colored green (there's only one; Rep. Bernie Sanders of Vermont, who later became a Senator). In this network, two Representatives are connected if they (co-)sponsored more of the same bills or joint resolutions than would be expected at random. We can clearly see the effects of partisan polarization, with separate clusters of Republican and Democratic Representatives, but we can also see that some Senators are more bipartisan than others. Given this network, there are a range of descriptive and inferential analyses we might perform. For example, we can characterize the level of partisanship by computing the network's assortativity (i.e. homophily) with respect to political party (
The 108th US House of Representatives.
We can repeat this process to construct the legislative network of the in-progress House of Representative during the 117th session:
Figure 3 shows the resulting network. It is clear that by the 117th session the House of Representatives had become even more partisan. We can confirm this increase in partisanship by computing the new network's assortativity with respect to political party (
The 117th US House of Representatives.
The first example focuses on the House and includes all bills regardless of their content. However, we can also construct networks of legislators in the Senate, and we can do so focusing on the role of bills pertaining to specific issues.
We begin by obtaining the data using:
Here, we specify
Next, we construct a legislative network from these data using
Figure 4 shows the resulting network. It is smaller than the House network because it contains only 100 Senators. Partisan polarization is still evident, however we observe more bipartisan collaboration on military issues. Again, there are many ways we might analyze this network. For example, we could use betweenness to identify the Senators who are most responsible for bringing Republicans and Democrats together around military issues (e.g., Sen. James Risch, betweenness=968, highlighted in the plot).
The 116th US Senate, based on armed forces bills, highlighting Sen. Risch.
The prior examples focus on constructing networks where the edges identify legislators who sponsor more bills together than expected at random, and thus might be interpreted as alignment, alliance, or collaboration. However, we can also construct signed networks that capture both alliances and antagonisms.
We begin by obtaining the data using:
Here, we focus on the highly contentious 116th session of the Senate, which took place in the second half of Donald Trump's presidency. By default, we include bills addressing all policy areas.
Next, we construct the network from these data using
Figure 5 shows the resulting network. In this signed network, positive edges are green, while negative edges are red. We observe that the network is polarized into two distinct groups, which here closely match political party affiliations. The majority of positive “alliance” ties are within group, a pattern that Neal (2020) called “weak polarization.” However, because this is a signed network, we can also observe that many negative “antagonism” ties are located between the two groups, a pattern that Neal (2020) called “strong polarization.” The extent of strong polarization can be characterized by the signed network's degree of structural balance, which can be measured using the triangle index
Positive and negative links in the 116th US Senate.
The
This article has demonstrated how these two packages can be used together to construct customized legislative networks of co-sponsorship in the US Congress, by session, by chamber, by bill type, by bill policy area, that are binary or signed.
To summarize the code required, a basic Senate igraph network can be constructed using:
senate <- sdsm(incidence.from.congress(session=<session number>, types=c(“s”, “sjres”), format=“igraph”))
Similarly, a basic House of Representatives igraph network can be constructed using:
house <- sdsm(incidence.from.congress (session=<session number>, types=c(“hr”, “hjres”), format=“igraph”))
The examples in this article illustrate ways that options can be used to modify these basic commands to construct more specialized networks, for example, that focus on bills pertaining to specific policies or that contain both positive and negative political ties.
These methods offer one practical option for researchers wishing to study legislative networks. However, they are subject to some important limitations. First, co-sponsorship networks are only one type of political network, and their interpretation as reecting meaningful political relationships such as alliance or collaboration requires a careful theoretical rationale. Second, the
Some of these limitations identify directions for future software development. For example, future versions of