Uncertain research country rankings. Should we continue producing uncertain rankings?
Artikel-Kategorie: Research Papers
Online veröffentlicht: 13. Mai 2025
Seitenbereich: 161 - 182
Eingereicht: 21. März 2025
Akzeptiert: 24. Apr. 2025
DOI: https://doi.org/10.2478/jdis-2025-0030
Schlüsselwörter
© 2025 Alonso Rodríguez-Navarro, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Since the mid-20th century, science and technology have emerged as key drivers of economic growth (Godin, 2004). As a result, research has evolved from a tool for advancing knowledge to a vital factor in sustaining national wealth. Consequently, research assessment has become an essential instrument for managing systems of critical importance within nations. Peter Drucker, widely regarded as the father of modern management (Drucker, 1954), is credited with the statement: “You can’t manage what you can’t measure.” This opinion aligns with the more recent observation that “policymakers, corporate research managers, and university administrators need valid and reliable S&T indicators” (Garfield & Welljams-Dorof, 1992).
In response to political demands (Godin, 2006) and academic initiatives (Garfield, 1955; Merton, 1979; Narin, 1976; Price, 1965), a growing number of countries, institutions, and researchers have turned to bibliometric measures to assess the scientific and technological achievements of countries and organizations (Aksnes et al., 2019; Godin, 2003; Mingers & Leydesdorff, 2015; Waltman, 2016). Leading institutions, such as the European Commission (EC) (European Commission, 2018, 2020, 2022, 2024) or the US National Science Board (NSB) (National Science Board, 2018, 2020, 2022, 2024), regularly publish reports that assess the research performance of various countries.
Despite significant efforts by policymakers and academics to develop reliable research indicators, there is growing evidence that many commonly used indicators can be misleading. A notable example is Japan, which, despite its robust research output, is often ranked as a developing country in research assessments (Pendlebury, 2020). For instance, the latest EC report, using an indicator deemed representative of scientific excellence, places Japan at the level of underdeveloped countries (European Commission, 2024); Similarly, the NSB, while adopting more cautious language, has consistently ranked Japan and India similarly, behind Germany, the UK, and the USA from 2003 to 2020 (National Science Board, 2024). While misclassifications may go unnoticed in other countries, Japan’s case is particularly evident. Over the 21st century, 17 Japanese researchers have been awarded Nobel Prizes, and a more detailed study places Japan at the level of the UK and Germany in terms of Nobel laureates (Schlagberger et al., 2016). Furthermore, Japan surpasses these countries in terms of triadic patents (Rodríguez-Navarro & Brito, 2022). Thus, evaluation methods that categorize Japan as a scientifically underdeveloped country are fundamentally unreliable.
Bibliometric assessments based on the number of publications and citation (Waltman, 2016; Wildgaard et al., 2014) are the simplest and most commonly used method to evaluate the scientific success of countries and institutions, although these methods are not suitable for assessing individual papers or small sets of publications (Aksnes et al., 2019). Citation metrics can be divided into three categories: citation-only metrics, citation-based rank metrics, and combined metrics. When applying citation metrics across different disciplines and publication years, it is essential to normalize them to account for variations in citation practices between fields. This principle has long been recognized: comparing two publications from different research fields requires normalizing their citation counts (Mcalister et al., 1983; Schubert & Braun, 1996).
Citation-only metrics rely on the total number of citations to assess the impact of papers. These metrics are used to evaluate the performance of countries and institutions by considering the citation counts of all their publications. This number of citations can also be used in combination with the number of papers. The most basic of these metrics is the mean number of citations per publication, similar to the journal impact factor (Garfield, 1999). However, more sophisticated formulas have also been proposed (Docampo & Besoule, 2019; Prathap, 2010; Waltman et al., 2011).
Citation-based rank metrics are non-parametric indicators (Conover & Iman, 1981) derived from citation counts. In these metrics, papers from specific years, research disciplines, or topics are ranked according to the number of citations they have received, starting with the most cited. A similar ranking is applied to the publications of institutions or countries. As a result, each publication receives two ranks—one in the list including all countries, the global rank, and one in the list corresponding to the country or institution, the local rank—and these two ranks are used for evaluation.
Widely used metrics in this category are the top percentile metrics, where the global list is segmented into top percentiles (e.g., top 10% or 1%) (Bornmann et al., 2013; Waltman & Schreiber, 2013). Then, the percentile indicator is the number of papers in the local list that appears in the corresponding global percentile, which coincides with the local rank of the least cited paper in the segmented global list—papers with the same number of citations raise an issue that has technical solutions (Waltman & Schreiber, 2013). The usefulness of the percentile method is intuitive because the concept of success is associated with a high number of citations and, consequently, with a high number of papers in a narrow top percentile. Other metrics of this type are based on the two ranks, either using most of papers in the list, as in the
The third type of metric combines both ranks and citation counts. A widely known indicator of this type is the
The primary aim of research evaluations is often to assess a country’s or institution’s contribution to advancing knowledge or its ability to produce scientific knowledge that can be translated into innovations within the production system. However, the concrete realization of these concepts remains somewhat ambiguous. It is likely that all publications in internationally recognized scientific journals contribute in some way to the progress of knowledge, and in technological fields, many papers may eventually lead to innovations, or at least contribute to them.
When focusing on evaluating disruptive research, identifying an indicator of performance becomes more manageable by recognizing that scientific progress alternates between large quantities of incremental (normal) science and much smaller amounts of revolutionary science or breakthrough discoveries (Kuhn, 1970). Under this model, research evaluations can be particularly focused on publications reporting breakthrough discoveries, which are often identified by their high citation counts (Bornmann et al., 2013; Min et al., 2021; Schneider & Costas, 2017; Wuestman et al., 2020). However, it is important to note that not all breakthrough papers are highly cited (Hu & Rousseau, 2019), and not all highly cited papers necessarily represent scientific breakthroughs (Garfield, 1973).
To define the appropriate citation-based metrics for breakthrough evaluations, it is essential to establish a citation threshold, and, for this purpose, the threshold presents a challenge since it typically falls within the top 0.01% or 0.02% citation percentile (Bornmann et al., 2018; Poege et al., 2019). Evaluating research at such a stringent level is challenging, particularly for smaller countries, research institutions, or even large countries with less competitive research systems, which may take years to produce a paper that reaches this citation level. To address this issue, research evaluations have often been based on the number of publications in the top 10% or 1% citation percentile in a given field or topic (Waltman & Schreiber, 2013).
Given this context, the emerging question is whether evaluations based on the top 10% or 1% citation percentile can estimate the occurrence of papers in the top 0.01% or 0.02% citation percentile. To answer this, we must distinguish between cases where citations follow or deviate from the ideal model. According to this model, the rank distribution of a country’s or institution’s publications (local versus global rank plot; Section 1.1) follows a power law (Rodríguez-Navarro & Brito, 2018a). This relationship allows for the calculation of the number of publications in any top percentile based solely on the total number of papers and the number of publications in a single top percentile (Rodríguez-Navarro & Brito, 2021b), or alternatively, calculating the probability of having a publication in a narrow top percentile (Rodríguez-Navarro & Brito, 2019). However, just as the ideal gas law does not perfectly describe all gas behaviors, many countries and institutions deviate from this ideal model. Such deviations occur when the citation distribution of a country’s or institution’s publications differs significantly from the citation distribution of the global papers in the same field (Rodríguez-Navarro, 2024).
In research aimed at pushing the boundaries of knowledge, the lognormal citation distribution (Radicchi et al., 2008) is well-supported (Golosovsky, 2021) and has the advantage of a relatively constant σ parameter, typically around 1.0–1.2 (Radicchi et al., 2008; Rodríguez-Navarro & Brito, 2018a; Thelwall, 2016; Viiu, 2018). However, research objectives can vary, and deviations from the lognormal distribution are common (Perianes-Rodriguez & Ruiz-Castillo, 2016; Rodríguez-Navarro & Brito, 2024a; Shahmandi et al., 2020; Waltman et al., 2012), often due to a large number of uncited publications (Rodríguez-Navarro & Brito, 2024a; Waltman et al., 2012). Considering the diversity of possible cases, a previous study suggests that defining a research system requires three indicators in addition to the total number of publications: for the extreme of the upper tail, for lowly cited papers, and for the efficiency of the system in the top 10% cited papers, approximately (Rodríguez-Navarro, 2024).
Most indicators used in research assessments are statistical data or are calculated from statistical data. In certain cases, their role as indicators of research performance is assumed, while in others, it is validated through peer reviews in university assessments (Rodríguez-Navarro & Brito, 2020; Thelwall et al., 2023; Traag & Waltman, 2019). Setting aside the accuracy of peer-review assessments (Abramo et al., 2013), the stringency of these evaluations cannot be high, at the level of the top 10% or 1% of cited papers (Rodríguez-Navarro & Brito, 2020). This level represents the highest level of stringency attainable through peer review, which requires that the number of reviewed papers is sufficiently high. Peer review cannot effectively assess the likelihood of a department or university producing breakthrough papers that rank within the top 0.01%, this level of granularity can only be derived through bibliometric analysis.
Assuming the notion that reliable descriptions of research systems require three indicators in addition to the number of papers (Rodríguez-Navarro, 2024), the arising question is: What kind of information do research assessments based on a single indicator provide? Focusing on the number of top 10% or 1% cited papers (henceforth, I will use the notation Ptop x% for the number of paper in top x% percentile and P for the total number of papers), as well as on breakthrough contributions, the response to the question is worrying. Using Ptop 0.01% as a reference, only one paper in every 1000 within Ptop 10% and one in every 100 within Ptop 1% reaches this level of impact. If citation fits the ideal model, the contribution at the breakthrough level (Ptop 0.01%) can be computed using the ideal power law (Rodríguez-Navarro & Brito, 2019), but this is not applicable in other cases. When considering research efficiency based on size-independent indicators, if the research system does not adhere to the ideal model, parameters such as Ptop 10%/P and Ptop 1%/P can be highly misleading, as demonstrated in the case of Japan above described (Section 1).
Testing whether citation patterns conform the ideal model is straightforward because, in this case, the Ptop 10%/P, Ptop 5%/Ptop 50%, and Ptop 1%/Ptop 10% ratios are equal (Rodríguez-Navarro & Brito, 2019). Using this test, in the research topic of graphene, only 50% of the countries conform to the ideal model (Rodríguez-Navarro, 2024; deviations < 15%). This implies that, in this research topic, 50% of country’s research evaluations based on Ptop 10%/P or Ptop 1%/P are misleading.
This raises the issue that in common research rankings or evaluations, selecting a specific country provides no indication of whether the evaluation is accurate, approximate, or misleading. A simple calculation could resolve this uncertainty, but, to the best of my knowledge, such calculations are not currently included in research country rankings or evaluations.
Given these concerns, this study aims to complement a previous investigation into deviations from the lognormal citation distribution (Rodríguez-Navarro, 2024). The first part of the study examines countries across three research topics—solar cells, semiconductors, and immunity—by analyzing a limited sample of countries. These case studies highlight the diversity of deviations in both the upper and lower extremes of the citation distribution and their effect on the ideal rank power law. The second part leverages data from the Leiden Ranking to assess the alignment or deviation of university research from the ideal model.
To investigate the deviations described above in countries’ publications, global and country papers in the selected topic were ordered based on their number of citations, starting with the one with the highest number of citations. The global ranks of country papers were obtained after their identification in the global list. Scatterplots of ranks are presented in double logarithmic scales where a power law is transformed into a straight line. Deviations from the ideal power law are easily detected as deviations from a straight line. For deviations in the lower tail, I constructed histograms using logarithmic binning, including the numbers of papers with zero, one, and two citations. In some figures, the histogram of a synthetic series of lognormally distributed random numbers is included to guide the eye. These series were generated as described previously (Rodríguez-Navarro & Brito, 2018a).
To investigate the deviation from the ideal power law in universities, I used the data reported in the CWTS Leiden Ranking (Olensky et al., 2015; Waltman, Calero-Medina, et al., 2012) for the year 2023 (
In some figures, the real ranks of papers are compared to ranks derived from ideal power laws, calculated from P, Ptop 10%, and Ptop 1%. For this purpose, I used the equation:
Citations in the topics of solar cells/photovoltaics, semiconductors, and immunity were obtained from Clarivate Web of Sciences. Searches were performed making use of the Advanced Search tool in the Web of Science Core Collection, edition: Science Citation Index Expanded. The searches were limited to articles (DT=) using a four-year publication window (2014–2017) and counting the citations in a four-year window (2019–2022). The global search was limited to the 75 most productive countries (Clarivate InCites). Domestic searches included one country and excluded the other 74. Focusing on domestic papers is advantageous in this study because, in general terms, deviations in the lower tail are more evident in domestic than in international collaborative papers (unpublished results).
The measurement or estimation of the contribution or capacity of countries and institutions in pushing the boundaries of knowledge should be one of the most important aims of research assessment. In every topic or research field evaluated, this information can be obtained by studying the very highly cited papers that constitute the extreme upper tail of the citation distribution of global papers. To address the question of whether this extreme of the upper tail conforms to or deviates from the ideal model, double rank analysis is the most convenient approach.
To perform this analysis, I chose three important research topics: solar cells/photovoltaics, semiconductors, and immunity, which have experienced rapid evolution in recent years. In two of these topics, solar cells and semiconductors, incremental innovations have proven extremely important, whereas this type of innovation holds less significance in immunity. I then selected six countries based on their distinct research system models. Table 1 provides a basic description of these 18 cases in terms of P, Ptop 10%/P, and Ptop 1%/Ptop 10%. In the absence of deviations from the ideal rank power law, the ratios Ptop 10%/P and Ptop 1%/Ptop 10% should be equal (deviation below 15%). This is evident in a few cases, such as the USA in solar cells, and Germany and South Korea in semiconductors. In other cases, differences are high in either direction, such as Germany and Japan in immunity. Overall, describing a general pattern for the comparisons of these ratios proves challenging: each country and topic may be different. The existence of high deviations from equality in these ratios is noteworthy, as it indicates that the use of single indicators like Ptop 10%/P to assess contributions to pushing the boundaries of knowledge can be misleading.
Basic description of scientific publications in terms of P, Ptop 10%/P, and Ptop 1%/Ptop 10% in 18 selected cases.
Country | Solar cells | Semiconductors | Immunity | ||||||
---|---|---|---|---|---|---|---|---|---|
P | Ptop 10%/P | Ptop 1%/Ptop 10% | P | Ptop 10%/P | Ptop 1%/Ptop 10% | P | Ptop 10%/P | Ptop 1%/Ptop 10% | |
Global | 61,202 | 58,081 | 42,586 | ||||||
Germany | 1,834 | 0.077 | 0.063 | 1,748 | 0.065 | 0.061 | 1,177 | 0.091 | 0.056 |
India | 3,138 | 0.052 | N/C | 3,056 | 0.046 | 0.056 | 962 | 0.021 | N/C |
Japan | 2,742 | 0.059 | 0.092 | 3,167 | 0.058 | 0.087 | 1,491 | 0.057 | 0.106 |
South Korea | 4,193 | 0.068 | 0.102 | 2,668 | 0.066 | 0.068 | 1,088 | 0.053 | N/C |
UK | 892 | 0.182 | 0.204 | 810 | 0.115 | 0.204 | 893 | 0.103 | 0.076 |
USA | 5,474 | 0.146 | 0.141 | 6,004 | 0.149 | 0.111 | 7,946 | 0.145 | 0.122 |
Publication window: 2014–2017; citation window: 2019–2022. N/C, not calculated because Ptop 1% is too low.
For double rank analysis, the country rank (ordinate axis) is plotted versus the global rank (abscissa axis; Rodríguez-Navarro & Brito, 2018a). Then, to study deviations, the double rank plot of the actual data is compared to the plot of data mathematically generated. Using as a model the power law defined by P and Ptop 10%, the comparison reveals three different patterns (Figure 1): (i) the reference and actual plots practically overlap (South Korea in semiconductors; also Table 1), ensuring the accuracy of the evaluations based on P and Ptop 10%; (ii) the most cited papers have smaller global ranks than expected from P and Ptop 10% (Japan in immunity and South Korea in solar cells; also Table 1), implying that evaluations based on P and Ptop 10% undervalue the research capacity; (iii) the most cited papers have larger global ranks than expected from P and Ptop 10% (USA in immunity; also Table 1), implying that evaluations based on P and Ptop 10% overvalue the research capacity. These last two patterns imply that, in these cases, P and Ptop 10% should not be used to estimate the contribution to pushing the boundaries of knowledge.

Country versus global ranks of domestic papers. Comparison of actual global ranks (open circles) with those calculated for the power law defined by P and Ptop 10% (closed circles). Publication window: 2014–2017; citation window: 2019–2022.
The plots in Figure 1 demonstrate that, considering the total number of papers (P), the extreme upper tail cannot be predicted. This raises the question of whether it could be predicted using only the top 10% cited papers. Figure 2 shows that even with this restriction, the extreme upper tail is not predicted in some cases. This figure uses the power law defined by Ptop 10% and Ptop 1% instead of P and Ptop 10%, as in Figure 1, in two cases: South Korea in solar cells and the USA in immunity. In both cases, the deviations between the actual and calculated data persist, although they are reduced.

Country versus global ranks of domestic papers. Comparison of actual global ranks (open circles) with those calculated for the power law defined by Ptop 10% and Ptop 1% (closed circles). Publication window: 2014–2017; citation window: 2019– 2022.
A simple conclusion can be drawn from the study of the extreme upper tail of the citation distribution. It implies that, as a general rule, the contribution of countries to pushing the boundaries of knowledge cannot be estimated from citation metrics based on papers that are not in the extreme upper tail. Restricting the rank analysis to the top 10% of most cited papers improves the estimation a little, but not sufficiently.
Another difficulty in describing the research systems of countries arises from deviations from the lognormal distribution in the less-cited papers (Rodríguez-Navarro, 2024; Rodríguez-Navarro & Brito, 2024a). To study the lower tail, the double rank approach has little sensitivity, and I used a logarithmic binning approach, including the number of papers with 0, 1, and 2 citations.
A first analysis of the global papers in the three topics studied here (Figure 3) shows that their citation distributions deviate notably from a lognormal distribution, which fits reasonably well the middle of the distribution and its upper tail. However, the three topics are different. In immunity, the distribution can be described as lognormal with a minor excess of papers that are uncited or have one citation. In solar cells and semiconductors, the distributions are more complex, showing an excess of papers in all bins to the left of the mode in the histogram.

Histogram with logarithmic binning of citations to world publications in the scientific topics of solar cells, semiconductors, and immunity (from left to right). Blue bars correspond to a lognormal synthetic series to guide the eye. Publication window: 2014–2017; citation window: 2019–2022.
Next, I investigated the countries’ variability in the lower tail. Figure 4 shows the logarithmic binning histograms of three countries—UK, Japan, and South Korea—in the topic of solar cells. For easier comparison, the distributions of the UK and Japan have been scaled up to the level of South Korea in the 32–49 bin. While the UK distribution has a moderate excess of papers with zero and one citation, in Japan and South Korea, the deviations are significant and affect a large proportion of the papers. Notably, above the 13–19 bin, the three distributions follow similar patterns, except for the extreme upper tail, which is in contrast with the large deviations in the lower tail of the distribution.

Histogram with logarithmic binning of citations to publications from the UK, Japan, and South Korea on the scientific topic of solar cells. Publication window: 2014–2017; citation window: 2019–2022.
Considering the differences between topics presented in Figure 3 and the discrepancies between countries shown in Figure 4, it appears that the likelihood of identifying a universal research indicator that effectively captures the research differences between the UK, Japan, and South Korea is extremely low. In Figure 4, the substantial disparity in the number of papers with zero or one citation between the UK (which has low counts) and Japan and South Korea (which have high counts) suggests a pronounced differential effect on size-independent indicators derived by dividing by P. For instance, the Ptop 10%/P ratios for two countries with identical upper tails could differ significantly if there is a large difference in the lower tail. Dividing by P implies the assumption that papers with zero or one citation hold no research value, which is a wrong assumption as is discussed below (Section 5.2).
To further investigate the deviations of citation distributions from the ideal rank power law, I utilized the extensive data provided by the Leiden Ranking. As already mentioned, if the doublerank power law model applies, the Ptop 10%/P, Ptop 5%/Ptop 50%, and Ptop 1%/Ptop 10% ratios should be equal (Rodríguez-Navarro & Brito, 2019).
To test this requirement, I selected two fields,

Plots of Ptop 5%/Ptop 5% and Ptop 1%/Ptop 10% versus Ptop 10%/P in selected universities. Data from the CWTS Leiden Ranking 2023, means of four evaluation periods, and fractional counting. a and b panels: “Physical and engineering sciences;” c and d panels:” Biomedical and health sciences.” Evaluation periods and selection of universities are described in the text.
Next, in each university, comparing the three ratios allowed classifying the universities into three types: A, B, and C, according to the stability or differences between the described ratios. Type A corresponds to stability (deviations up to a maximum of 15% between the highest and lowest values); and Types B and C correspond to a decrease or increase of the ratios, from Ptop 10%/P to Ptop 1%/Ptop 10%. Tables 2 and 3 show some examples of universities in the fields of
Classification of universities attending to the variation of the Ptop 10%/P, Ptop 5%/Ptop 50%, and Ptop 1%/Ptop 10% ratios: stable (Type A), decreasing (Type B), or increasing (Type C). Field of Physical sciences and engineering.
University | Country | Ptop 10%/P | Ptop 5%/Ptop 50% | Ptop 1%/Ptop 10% |
---|---|---|---|---|
Type A | ||||
Pennsylvania State University | United States | 0.138 | 0.129 | 0.117 |
Florida State University | United States | 0.124 | 0.121 | 0.130 |
University of Virginia | United States | 0.121 | 0.118 | 0.118 |
University of Nottingham | United Kingdom | 0.141 | 0.125 | 0.120 |
University of Exeter | United Kingdom | 0.127 | 0.108 | 0.125 |
Chongqing University | China | 0.113 | 0.106 | 0.109 |
Hohai University | China | 0.103 | 0.113 | 0.089 |
University of Queensland | Australia | 0.140 | 0.131 | 0.151 |
University of Bordeaux | France | 0.112 | 0.109 | 0.118 |
Harvard University | United States | 0.254 | 0.221 | 0.170 |
Stanford University | United States | 0.249 | 0.201 | 0.160 |
Massachusetts Institute of Technology | United States | 0.235 | 0.196 | 0.155 |
Yale University | United States | 0.225 | 0.182 | 0.129 |
Princeton University | United States | 0.211 | 0.177 | 0.125 |
University of Colorado, Boulder | United States | 0.168 | 0.165 | 0.104 |
Imperial College London | United Kingdom | 0.161 | 0.135 | 0.120 |
University of Oxford | United Kingdom | 0.167 | 0.146 | 0.124 |
University of Cambridge | United Kingdom | 0.170 | 0.145 | 0.129 |
Lanzhou University | China | 0.098 | 0.074 | 0.052 |
Fuzhou University | China | 0.127 | 0.121 | 0.107 |
Katholieke Universiteit Leuven | Belgium | 0.149 | 0.136 | 0.113 |
Heidelberg University | Germany | 0.147 | 0.127 | 0.116 |
Nagoya University | Japan | 0.073 | 0.081 | 0.120 |
Tohoku University | Japan | 0.075 | 0.082 | 0.103 |
Drexel University | United States | 0.149 | 0.145 | 0.183 |
Data from the CWTS Leiden Ranking 2023, means of four evaluation periods.
Classification of universities attending to the variation of the Ptop 10%/P, Ptop 5%/Ptop 50%, and Ptop 1%/Ptop 10% ratios: stable (Type A), decreasing (Type B), or increasing (Type C). Field of Biomedical and health sciences.
University | Country | Ptop 10%/P | Ptop 5%/Ptop 50% | Ptop 1%/Ptop 10% |
---|---|---|---|---|
Type A | ||||
Northeastern University, USA | United States | 0.132 | 0.131 | 0.139 |
University of Arizona | United States | 0.100 | 0.090 | 0.091 |
Purdue University, West Lafayette | United States | 0.100 | 0.092 | 0.090 |
The University of Warwick | United Kingdom | 0.142 | 0.138 | 0.152 |
University of East Anglia | United Kingdom | 0.146 | 0.132 | 0.129 |
Peking University | China | 0.086 | 0.079 | 0.080 |
Xiamen University | China | 0.081 | 0.075 | 0.074 |
University of Milan | Italy | 0.104 | 0.096 | 0.100 |
The University of Tokyo | Japan | 0.081 | 0.084 | 0.089 |
Rockefeller University | United States | 0.326 | 0.282 | 0.182 |
Massachusetts Institute of Technology | United States | 0.298 | 0.260 | 0.192 |
Princeton University | United States | 0.267 | 0.236 | 0.173 |
Yale University | United States | 0.225 | 0.182 | 0.129 |
University of California, Berkeley | United States | 0.214 | 0.185 | 0.138 |
California Institute of Technology | United States | 0.213 | 0.185 | 0.150 |
University of Oxford | United Kingdom | 0.208 | 0.178 | 0.133 |
University of Cambridge | United Kingdom | 0.208 | 0.174 | 0.130 |
Tsinghua University | China | 0.137 | 0.126 | 0.115 |
Soochow University | China | 0.071 | 0.061 | 0.046 |
University of Toronto | Canada | 0.139 | 0.127 | 0.107 |
Universität Hamburg | Germany | 0.114 | 0.099 | 0.085 |
Flinders University | Australia | 0.111 | 0.107 | 0.082 |
Indiana University, Bloomington | United States | 0.109 | 0.103 | 0.128 |
Kiel University | Germany | 0.109 | 0.106 | 0.123 |
Tohoku University | Japan | 0.062 | 0.071 | 0.078 |
Data from the CWTS Leiden Ranking 2023 are means of four evaluation periods.
Number of universities that belong to the three types described in Tables 2 and 3.
Country | Physical sciences and engineering | Biomedical and health sciences | ||||
---|---|---|---|---|---|---|
Type A | Type B | Type C | Type A | Type B | Type C | |
Australia | 4 | 12 | 0 | 7 | 9 | 0 |
Belgium | 2 | 1 | 0 | 3 | 4 | 0 |
Brazil | 1 | 0 | 0 | 0 | 0 | 1 |
Canada | 7 | 0 | 2 | 6 | 5 | 0 |
China | 23 | 50 | 1 | 9 | 21 | 0 |
Denmark | 0 | 4 | 0 | 3 | 0 | 0 |
Finland | 0 | 2 | 0 | 1 | 1 | 0 |
France | 9 | 2 | 2 | 7 | 6 | 1 |
Germany | 14 | 9 | 2 | 20 | 3 | 2 |
Greece | 0 | 1 | 0 | 1 | 0 | 0 |
India | 0 | 0 | 2 | 0 | 0 | 0 |
Iran | 1 | 2 | 1 | 0 | 0 | 1 |
Ireland | 1 | 0 | 0 | 0 | 3 | 0 |
Israel | 0 | 3 | 1 | 3 | 1 | 0 |
Italy | 7 | 0 | 0 | 5 | 10 | 0 |
Japan | 2 | 0 | 6 | 4 | 0 | 1 |
Malaysia | 0 | 0 | 1 | 0 | 0 | 0 |
Netherlands | 3 | 4 | 0 | 2 | 7 | 0 |
New Zealand | 0 | 0 | 1 | 2 | 0 | 0 |
Norway | 0 | 1 | 0 | 2 | 1 | 0 |
Poland | 0 | 0 | 1 | 0 | 0 | 1 |
Portugal | 3 | 0 | 0 | 1 | 0 | 1 |
Singapore | 0 | 2 | 0 | 1 | 0 | 0 |
South Korea | 3 | 4 | 4 | 3 | 1 | 0 |
Spain | 3 | 1 | 2 | 5 | 2 | 0 |
Sweden | 5 | 1 | 0 | 2 | 3 | 0 |
Switzerland | 1 | 4 | 0 | 1 | 6 | 0 |
Taiwan | 0 | 0 | 2 | 2 | 1 | 0 |
United Kingdom | 7 | 15 | 1 | 8 | 19 | 0 |
United States | 17 | 48 | 3 | 36 | 61 | 4 |
Taking as reference the citation distributions and double rank plots in countries and journals (Rodríguez-Navarro, 2024; also Figures 3 and 4), Type C universities can be explained by a large influence of a lower tail, which contains many more papers that in the reference global distribution. In contrast, Type B universities, in which the Ptop 1%/Ptop 10% ratio is lower—in some cases much lower—than the Ptop 10%/P ratio, show a pattern similar to that of journals with a very high impact factor (Rodríguez-Navarro, 2024). To further investigate Type B universities, I studied the publications of four US universities: Harvard, Stanford, Yale, and the Massachusetts Institute of Technology, in the topic of immunity (as in Figure 1, incorporating these universities into the search). The Ptop 10%/P and Ptop 1%/Ptop 10% ratios corresponding to the retrieved papers were 0.27 and 0.16, respectively, as expected for Type B universities. Figure 6 shows the comparisons of the double rank plot of these publications with the power laws defined by P and Ptop 10%, and Ptop 1% and Ptop 10%. The results displayed in Figure 6 demonstrate that when calculated from P and Ptop 10% (using only the less-cited 90% of the papers), the 10% most cited papers have lower ranks than the real ranks. In other words, the estimated research success is higher than it is really. In contrast, when the calculation is preformed with Ptop 1% and Ptop 10%, the prediction of the extreme of the upper tail is almost correct.

Local versus global ranks of US domestic papers published by the universities of Harvard, Stanford, Yale, and Massachusetts Institute of Technology in immunology. Comparison of the actual ranks of the papers of these universities with the ranks calculated for the power laws defined by P and Ptop 10% (left panel), and Ptop 10% and Ptop1% (right panel).
Research not only pushes the boundaries of our knowledge; in current times, it is also a factor that, together with capital and labor, participates in the economic growth of countries (Godin, 2004). Consistent with this fact, developed countries make large investments in research, a significant proportion of which are public investments. Although a certain proportion of the product of this research is technical innovation, which is tangible and measurable in economic terms, a large proportion of its product is knowledge progress, which is intangible and cannot be easily measured. However, a correct research policy must be based on reliable research assessments, and the challenge is to identify the procedure for doing so. This study and a previous one (Rodríguez-Navarro, 2024) show that many current research assessments are not reliable, producing evaluations that do not reflect the scientific level of countries and institutions.
Most research country rankings reported by well-known institutions (European Commission, 2018, 2020, 2022, 2024; National Science Board, 2018, 2020, 2022, 2024) are based on two indicators: one for the size and one for the quality of the system. Currently, the size indicator, number of publications (P), can be considered flaw-free, but the second indicator, based on citations, may be misleading, generating country rankings that do not truly reflect the quality of the research system. Common indicators like Ptop 10%/P and Ptop 1%/P define the quality of a research system properly if citations to its publications adhere to the ideal model: a power-law relationship between the citation-based ranks in local and global lists; in other cases, they are misleading. This applies whether rankings aim to provide a general system description, as in most evaluations, or intend to assess the capacity to publish breakthroughs (Gaida et al., 2023).
Two types of deviations in the upper and lower tails of citation distributions lead to undervaluing the contribution to pushing the boundaries of knowledge. In the case of the upper tail, a country’s capacity is higher than expected from Ptop 10%/P and Ptop 1%/P if the most-cited country’s papers are more cited than expected from the indicators (e.g., South Korea in solar cells; Figure 1 and 2). This may occur if there is a low proportion of highly competitive researchers with very low influence in bulk production but high influence in the most-cited papers. Currently, there is no information about the frequency of this type of deviation, but in advanced countries, it might not be uncommon.
The more common type of deviation in the lower tail is an inflated lower tail (Figures 3 and 4). The most misleading effect of this inflated lower tail is on the Ptop 10%/P and Ptop 1%/P ratios. The use of these ratios as indicators of research performance implies the assumption that papers with zero or one citation hold no research value. This is a wrong assumption (Rodríguez-Navarro & Brito, 2022) that has caused the inadequate research evaluations of Japan (Rodríguez-Navarro & Brito, 2024a).
Although this type of effect may be observed in the double-rank analysis (Figure 3, p. 33 in Rodríguez-Navarro, 2024), it is more easily identified in citation histograms, such as those in Figures 3 and 4, where the proportion of uncited papers or those with only one citation is very different across countries. In this study, inflated lower tails are shown in solar cells and semiconductors (Figure 3) and probably occur in many technological fields. Therefore, the Ptop 10%/P and Ptop 1%/P ratios should not be used as indicators of research performance without investigation the lower tails of citation distributions.
Although the reason for inflated lower tails has not been definitively demonstrated, in countries with high technology, a reasonable hypothesis is that it is due to the coexistence of two different objectives in the research system: pushing the boundaries of knowledge and incremental innovations. The papers with the latter purpose are poorly cited, and many are uncited even in wide citation windows. This infrequent citation contrasts sharply with research focused on pushing the boundaries of knowledge, whose publications rarely remain uncited (Rodríguez-Navarro & Brito, 2022).
In addition to the deviations in the upper and lower tails just described, other types of deviations with reference to global citation distributions lead to an overvaluing of contributions to pushing the boundaries of knowledge. In these cases, the Ptop 10%/P, Ptop 5%/Ptop 50%, and Ptop 1%/Ptop 10% ratios show decreasing values.
It is worth noting that global citation distributions in technological topics are composed of a lognormal distribution plus another distribution that produces an inflated lower tail (Figure 3; Figure 1. p. 31 in Rodríguez-Navarro, 2024). In some countries and instititions, the entire distribution may shift to the right because their citation lognormal distribution has a much higher μ parameter than the global distribution and the inflated lower tail does not disappeared. However, in most cases, high μ parameters are associated with the disappearance of the inflated lower tail. This pattern is typically observed in top journals, whose citation distribution may closely fit a lognormal distribution (Table 4, p. 35 in Rodríguez-Navarro, 2024). In contrast, this disappearence is either absent or infrequent in countries and institutions. This is the case of most type B universities, in which the inflated lower tail is diminished relative to the global citation distribution but does not disappear (unpublished observations).
Many well-known, research-intensive universities (e.g., Harvard, Stanford, Yale, Cambridge, or Oxford; Tables 2, 3, S1, and S2) fall into the Type B category. A reasonable explanation for this finding is that the number of lowly funded, less competitive researchers in these universities is much lower than in global research. Possibly, across the world, the number of Type B universities may not be very high. The high proportion of Type B universities in this study is probably the result of the selection of universities with Ptop 1% higher than nine in the four evaluation periods. This implies a stringent selection of universities and, consequently, an increase in the proportion of Type B universities in the selection. In any case, it seems that the proportion of these universities is notably different across countries (Table 4).
In these universities, the Ptop 10%/P and Ptop 1%/P ratios overvalue the capacity to push the boundaries of knowledge. This capacity is correctly valued by Ptop 10%/Ptop 1% (Figure 6). For example, in Physical Sciences and Engineering (Table 2), in the USA, the Ptop 1%/Ptop 10% ratio of Florida State University is similar to those of Yale University and Princeton University, and the same occurs in the UK with the universities of Nottingham and Exeter compared to those of Oxford or Cambridge. In the absence of deviations in the extreme upper tail (South Korea in Figure 2), the similar Ptop 1%/Ptop 10% ratios of these universities indicate similar contributions to pushing the boundaries of knowledge. In contrast, if the evaluation is based on the Ptop 10%/P ratio, we would reach the conclusion that Florida State University in the USA and the universities of Nottingham and Exeter in the UK are second-tier institutions compared to Yale University, Princeton University, and Oxford and Cambridge Universities. Focusing on the contribution to pushing the boundaries of knowledge, this would be an inaccurate and unfair conclusion, which might extend to many apparent second-tier institutions and even countries.
Building upon a previous report (Rodríguez-Navarro, 2024), the present study highlights the complexity of research evaluations across various countries and institutions. An effective and informed research policy requires the consideration of four key indicators of research (Rodríguez-Navarro, 2024), in conjunction with an analysis of incremental innovations. However, it seems reasonably that large-scale assessments of national performance in science, engineering, and innovation—such as those conducted by the European Commission (e.g., European Commission, 2024), the National Science Board (e.g., National-Science-Board, 2024), Hcéres (Hcéres, 2019), and the National Institute of Science and Technology in Tokyo (National Institute of Science and Technology, 2022)—adopt a single research efficiency indicator that reflects the capacity of the research system to advance the frontiers of knowledge.
In this context, it is crucial to move away from the widespread use of the Ptop 10%/P and Ptop 1%/P ratios for size-independent research assessments. First, it is important to note that not all research is aimed at advancing the frontier of knowledge, making the use of P as a normalizing factor inappropriate. While the top 10% and top 1% most cited papers belong exclusively to this type of research, P includes all types of research, meaning that the ratios are influenced by differences in research focus and citation patterns of other types of research. Additionally, as occurs in some top journals (Rodríguez-Navarro, 2024), in some leading research institutions, papers with fewer citations are disproportionately underrepresented compared to those from other countries and institutions. As a result, while the Ptop 10% and Ptop 1% have similar meaning in global and local citation distributions, P is sensitive to variations in the publication of lowly cited papers.
Even if the proportion of misleading evaluations based on Ptop 10%/P and Ptop 1%/P is low, their application—without transparency regarding deviations from the ideal model—generates uncertainty in all instances. The persistent classification of Japan as an underdeveloped country in terms of research, despite its substantial achievements, serves as a clear cautionary example of the misapplication of these metrics. A more appropriate alternative is the Ptop 1%/Ptop 10% ratio, which can be derived from the same datasets currently used, thus ensuring consistency while addressing the limitations of existing indicators. If Ptop 1% is too low, Ptop 5%/Ptop 25% or in extreme cases Ptop 10%/Ptop 50% may be reliable alternatives.
In size-dependent assessments of breakthroughs in specific research topics, the relatively low number of publications per topic means that the top 0.01% cited papers cannot be used due to statistical constraints, and the Ptop 10% is typically employed (Gaida et al., 2023; Leung et al., 2024). In such cases, calculating the probability of publishing in the top 0.01% based on the Ptop 5%/Ptop 25% ratio is not advisable, given the deviations observed at the extreme upper tail (Figures 1 and 2). A more tailored indicator is necessary to accurately capture the performance of research at the extreme of the upper tail of citation distribution (Rodríguez-Navarro & Brito, 2024b).
A significant number of current country and institution rankings, as well as evaluations conducted by well-known organizations, rely on research efficiency indicators, particularly the Ptop 10%/P or Ptop 1%/P ratios. Both prior and current studies demonstrate that these ratios often fail to provide a reliable measure of research efficiency or capacity to producebreakthroughs. Therefore, the continued use of such indicators is not recommended. The reliance on inappropriate evaluation metrics in research policy could lead to misguided decision-making, ultimately impairing the effectiveness of research strategies and their outcomes.
Tables S1 and S2 can be accessed at: https://doi.org/10.57760/sciencedb.24243