Accès libre

Regression discontinuity design and its applications to Science of Science: A survey

À propos de cet article

Citez

Figure 1.

Illustrations of RDD. (a) The continuity framework, and (b) the local randomization framework. The figure depicts the expected outcomes conditional on the running variable Xi, denoted by E[Yi(1)│Xi=x] and E[Yi(0)│Xi=x]. τSRD and τSLR represent the causal effect using these two frameworks at the cutoff c in the window [c−Δ, c+Δ], respectively. This figure is adapted from (Cattaneo & Titiunik, 2022).
Illustrations of RDD. (a) The continuity framework, and (b) the local randomization framework. The figure depicts the expected outcomes conditional on the running variable Xi, denoted by E[Yi(1)│Xi=x] and E[Yi(0)│Xi=x]. τSRD and τSLR represent the causal effect using these two frameworks at the cutoff c in the window [c−Δ, c+Δ], respectively. This figure is adapted from (Cattaneo & Titiunik, 2022).

Figure 2.

Data collection procedure. (a) Illustration of data collection procedure. Specifically, we manually collect 3,387 RDD papers from Web of Science through keyword searching, and we obtain 2,061 RDD papers in the MAG by matching their DOIs with the Web of Science data. (b) The number of RDD papers in 19 MAG categories as the function of time. The main plot is smoothed using a three-year sliding window. The inset figure shows the total number of RDD papers from 1960 to 2021.
Data collection procedure. (a) Illustration of data collection procedure. Specifically, we manually collect 3,387 RDD papers from Web of Science through keyword searching, and we obtain 2,061 RDD papers in the MAG by matching their DOIs with the Web of Science data. (b) The number of RDD papers in 19 MAG categories as the function of time. The main plot is smoothed using a three-year sliding window. The inset figure shows the total number of RDD papers from 1960 to 2021.

Figure 3.

The RDD keyword network and emergent words in WOS. (a) We illustrate the RDD keyword network, where nodes represent keywords and links indicate that two keywords appear in the same paper. The modularity Q is 0.37, indicating a strong community structure. Here, we display only the largest eight clusters, excluding small clusters. (b) Top 15 emergent words of RDD papers, which indicate research frontiers. Year indicates the year when the keyword first appeared, while Begin and End represent the starting and ending years of the keyword as the research frontier. The graph on the rightmost displays the research frontiers in different time periods. For example, air pollution is the research frontier of RDD between 2021 and 2023.
The RDD keyword network and emergent words in WOS. (a) We illustrate the RDD keyword network, where nodes represent keywords and links indicate that two keywords appear in the same paper. The modularity Q is 0.37, indicating a strong community structure. Here, we display only the largest eight clusters, excluding small clusters. (b) Top 15 emergent words of RDD papers, which indicate research frontiers. Year indicates the year when the keyword first appeared, while Begin and End represent the starting and ending years of the keyword as the research frontier. The graph on the rightmost displays the research frontiers in different time periods. For example, air pollution is the research frontier of RDD between 2021 and 2023.

Figure 4.

The citation behaviors between RDD and other academic domains over time. (a) The fraction of references made by RDD papers to certain scientific domains. (b) The fraction of references made to RDD papers by papers in various scientific domains. (c) Reference strength from RDD papers to papers in other academic fields. (d) Reference strength from other academic fields to RDD papers. Black dashed lines in c,d represent φ= 1, and other dashed lines in c, d indicate that the strength of references from certain academic fields is lower than the average value cross fields in 2016.
The citation behaviors between RDD and other academic domains over time. (a) The fraction of references made by RDD papers to certain scientific domains. (b) The fraction of references made to RDD papers by papers in various scientific domains. (c) Reference strength from RDD papers to papers in other academic fields. (d) Reference strength from other academic fields to RDD papers. Black dashed lines in c,d represent φ= 1, and other dashed lines in c, d indicate that the strength of references from certain academic fields is lower than the average value cross fields in 2016.

Figure 5.

The results of the analysis conducted in (Ludwig & Miller, 2007). (a) - (b) show the linear and quadratic fits, respectively, using rdplot for county mortality of children aged 5 to 9 in 1973-1983. (c) shows the quadratic fit using rdplot for county mortality of people ages 25 and older in 1973-1983. The data used in the analysis come from (Matias D. Cattaneo, 2021).
The results of the analysis conducted in (Ludwig & Miller, 2007). (a) - (b) show the linear and quadratic fits, respectively, using rdplot for county mortality of children aged 5 to 9 in 1973-1983. (c) shows the quadratic fit using rdplot for county mortality of people ages 25 and older in 1973-1983. The data used in the analysis come from (Matias D. Cattaneo, 2021).

Regression discontinuity estimation of the effect of HS funding on mortality. Robust standard errors are in parentheses,

(1) (2) (3) (4) (5)
Variable Mean Nonparametric estimator Parametric
Flexible linear Flexible quadratic
Bandwidth or poverty range 9 18 36 8 16
Main results
Number of countries 524 954 2,161 482 858
Mortality, Ages 5-9 (%) 2.252 −1.895*(0.984) −1.198*(0.662) −1.114**(0.501) −2.201**(1.058) −2.558**(1.096)
Mortality, Ages 25+(%) 132.626 2.204(5.645) 6.016(4.025) 5.872(3.600) 2.091(5.872) 2.574(6.370)

The survey of studies that utilize RDD. Context reveals the settings of the focal paper. Outcome(s) means the dependent variable of the focal paper. Treatment(s) is the treatment variable in the focal paper. In practice, the treatment variable is a binary variable. Running variable(s) is the forcing variable for individuals.

Context Outcome(s) Treatment(s) Running Variable(s)
Economics
Yi et al. (Yi et al., 2022) Great Famine in China Risk tolerance and entrepreneurship in adulthood Experiencing early-life hardship Location
García-Jimeno et al. (García-Jimeno et al., 2022)

Women’s Temperance

Crusade in American

Collective action decisions Affective information networks Location
Akhtari et al. (Akhtari et al., 2022) The politically motivated replacement of personnel in the schools in Brazil The quality of public education provision by the government Political turnover Share of Votes
Van Der Klaauw (Van Der Klaauw, 2002) East Coast college’s aid College enrollment Offering financial aid Aid allocation decisions
Education
Davies et al. (Davies et al., 2018) Reform of increasing the minimum school leaving age in England Risk of diabetes and mortality Remaining in school Time
Huang et al. (Huang & Zhou, 2013) Great Famine in China Cognition estimated by episodic memory survey Completion of primary school Year of birth and entering primary schooling
Clark et al. (Clark & Royer, 2013) Reform of increasing the minimum school leaving age in England Adult mortality and health Remaining in school Time
Science of Science or Innovation Studies
Seeber et al. (Seeber et al., 2019) Scientists’ promotion in Italian higher Education system Scientists’ number of self-citations Undergoing the introduction of the habilitation procedure Time
Wang et al. (Y. Wang et al., 2019) Early-career setback, NIH R01 grant applications

Future

Career outcomes

Receiving the R01 grant Priority score
Bol et al. (Bol et al., 2018) Innovation Research Incentives Scheme for early career scientists, Netherlands Winning a midcareer grant Winning the early career award Evaluation scores
Bronzini et al. (Bronzini & Iachini, 2014) Firms’ R&D subsidy in northern Italy Investment spending of firms Receiving funding Priority score
Jacob et al. (Jacob & Lefgren, 2011b) NIH R01 grant applications Subsequent publications and citations Receiving an NIH research grant Priority score
Jacob et al. (Jacob & Lefgren, 2011a) NIH postdoctoral training grants Subsequent publications and citations Receiving an NIH postdoctoral training grant Priority score

Counties Characteristic. Column 1 represents county-level data, including the county poverty rate in 1960, mortality of children aged 5 to 9, and people aged 25 and older in 1973-1983. Counties with a 1960 poverty rate of 49.198% to 59.198% are the control group, while counties with a 1960 poverty rate of 59.1984% to 69.1984% are the treatment group, i.e., the poorest counties funded by the HS funding program.

County-level data Counties with 1960 poverty 49.198% to 59.198 Counties with 1960 poverty 59.1984% to 69.1984
No. of observations (counties) 347 228
Mean Std Mean Std.
County Poverty Rate 1960 (%) 54.08 2.861 63.40 2.644
Mortality, Ages 5-9, 1973-1983 (%) 3.044 5.897 2.316 4.566
Mortality, Ages 25+, 1973-1983 (%) 132.5 30.96 135.7 30.53
eISSN:
2543-683X
Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Informatique, Gestion de projet, Bases de données et exploration de données