When the sample of 205 Chinese universities is merged with the 197 US universities included in Leiden Rankings 2020, the results similarly indicate three main groups: low, middle, and high. Using this data (Leiden Rankings and Web of Science), the
We show empirically that differences in ranking may be due to changes in the data, the models, or the modeling effects on the data. The scientometric groupings are not always stable when we use different methods.
Differences among universities can be tested for their statistical significance. The statistics relativize the values of decimals in the rankings. One can operate with a scheme of low/middle/high in policy debates and leave the more fine-grained rankings of individual universities to operational management and local settings.
In the discussion about the rankings of universities, the question of whether differences are statistically significant, has, in our opinion, insufficiently been addressed in research evaluations.