Accès libre

Minimum Representative Size in Comparing Research Performance of Universities: the Case of Medicine Faculties in Romania

À propos de cet article

Citez

Introduction

The evaluation of departments or universities has become common place for nowadays academia. Different indicators are used for such purposes, while research is emphasized in many cases (Hazelkorn, 2011). Scientific productivity can be measured using different indicators, such as the number of published articles, the number of received citations, h-index (Egghe, 2008; Hirsch, 2005; Molinari & Molinari, 2008), g-index (Egghe, 2006; Tol, 2008). Whole departments, higher education institutions or even cities and countries are assessed in these terms. In practice, often data are aggregated and compared using the mean value or the total value of the aggregated sets. For instance, Academic Ranking of World Universities (ARWU, 2018), QS World University Rankings (QS, 2016), and CWTS Leiden Ranking (CWTS, 2018) all use either the mean or total value type of a specific indicator. Such approaches can be criticized for reliability, as the arithmetic mean may not be an appropriate gauge for the performance of a collective set, when the sample values in the set display a highly skewed distribution. When the distribution of the data set deviates significantly from a normal function the samples are no longer within a narrow region centered at the arithmetic mean. Often for a set of papers in a journal, a set of researchers in a university, it is often the case that the distribution is skewed (Lotka, 1926; Seglen, 1992). In such a case, the high variation within the populations to be compared obscures the inter-population variation, thus rendering the comparison often unreliable.

In Shen et al. (2017), we asked when the arithmetic mean of a sample set, or some other kind of average score, be used to represent the set, and under which conditions a comparison based on such measures of central tendency can be reliable? When does the comparison of arithmetic means indicate a grouping of academics, such as a department or a university, performs better than the one it is compared to? In order to answer this question, we proposed a definition of the minimum representative size κ as a parameter which characterizes a pair of data sets which are to be compared. The method is described in details, including the analytical demonstration, in Shen et al. (2017). In a nut-shell, κ represents the size of the smallest sample of randomly extracted points within the data set whose variance of averages is less than or equal to the variance between the data sets to be compared.

Given, for example, two sets of data, (1) and (2), whose averages are in an ordinal relation e.g. the average of data set (1) is higher than that of data set (2), what is the probability that a random sample from the first data set has a higher average than a random sample from the second? If the two data sets are skewed, the probability is often not high. For the very purpose of increasing this probability when it is possible to increase it, we introduced in Shen et al. (2017) the definition of κ and instead of drawing one random sample, we turn to draw κi random samples from the corresponding set i and compare the average of those κ1 and κ2 samples. The value of κ depends on the variance of the data set: the smaller the variance, the smaller κ. If κ is comparable to the size of the data set, or even larger—and this happens when the set has very large variance compared to other sets, then the average is totally an unreliable indicator for comparing the two data sets. Therefore, the minimum representative size κ serves as an indicator of consistence of the set, which can be used as a supplementary indicator to be computed before using the average as a basis for comparing two data sets with a skewed distribution which deviates significantly from normality.

An alternative approach is to truncate the distribution and to consider a particular segment when comparing a set of populations. That particular segment has the property of smaller variation and thus renders the comparison between populations more reliable. In this case, theoretical arguments for choosing a specific segment as being representative of the entire population are strongly required. We pursued this stream of research in Proteasa et al. (2017).

In this article, we apply the former analytical approach in an empirical context: we calculate the minimum representative size for six medicine departments in Romania in order to allow for reliable comparisons between them as collective units.

The Method

Let us denote a skewed data set j, which may be total number of papers, received citations, h-index, g-index of each researchers in a university, as {sj}. Due to the skewness of the distribution of {sj}, the mean Sj=sj=psjpp${{S}_{j}}=\left\langle {{s}_{j}} \right\rangle =\frac{{{\sum }_{p}}s_{j}^{p}}{{{\sum }_{p}}}$does not display the properties which are typical for a normal distribution. For example, an academic from a university with such a distribution has a higher probability to have an individual score sj which is at a significant distance from the average of the data set Sj. Such skewed data sets display also higher frequencies of extremely large moments of the distribution function and a higher degree of overlap when compared in pairs. Thus:

Pr(si>sj|Si>Sj)=dxj(xjdxiρi(xi)ρj(xj))<<1,$$\Pr \left( {{s}_{i}}>{{s}_{j}}|\,{{S}_{i}}>{{S}_{j}} \right)=\int_{-\infty }^{\infty }{d{{x}_{j}}}\left( \int_{{{x}_{j}}}^{\infty }{d{{x}_{i}}{{\rho }_{i}}\left( {{x}_{i}} \right){{\rho }_{j}}\left( {{x}_{j}} \right)} \right)\,<<1,$$

where ρi(xi) is the distribution with mean Si in data set i. This means that the probability of a random sample from the data set with a higher mean to have a higher value than that from the data set with a lower mean can be very low. However, when moving from an individual sample to large enough Ki and Kj samples from data sets i and j respectively, the odds change significantly and the probability that a random sample of Ki researchers from university i has higher mean score than a random sample of Kj random researchers from university j, is high, as indicated in the equation below:

Pr(Gi(Ki)>Gj(Kj)|Si>Sj)>>01,$$\Pr \left( {{G}_{i}}\left( {{K}_{i}} \right)>{{G}_{j}}\left( {{K}_{j}} \right)|{{S}_{i}}>{{S}_{j}} \right)>>0\approx 1,$$

where, Gi(Ki) is the mean of the Ki samples from the data set i,

Gi(Ki)=1Kip=1Kixip$${{G}_{i}}\left( {{K}_{i}} \right)=\frac{1}{{{K}_{i}}}\sum\limits_{p=1}^{{{K}_{i}}}{x_{i}^{p}}$$

We also know that the average of G(K), μ(G(K)) is close to S (the mean of data set), and the variance of G(K) decreases with K.

Equation (2) can be solved by bootstrap sampling (Wasserman, 2004) (see Shen et al. (2017) for further details). For data sets i and j with Si > Sj, starting from Ki = 1 and Kj = 1, bootstrap sampling unfolds as follows:

For given values of Ki and Kj, L=4000 sets of random samples are generated from data sets i and j, respectively. Following this, Gil(Ki)andGjl(Kj)$G_{i}^{l}\left( {{K}_{i}} \right)\,\,\,\,and \,\,\,G_{j}^{l}\left( {{K}_{j}} \right)$are calculated for each l=1, 2,…, L.

Pair matching is performed on Gim(Ki)andGin(Kj)$G_{i}^{m}\left( {{K}_{i}} \right)\,\,\,and G_{i}^{n}\left( {{K}_{j}} \right)$and the percentage of Gim(Ki)>Gjn(Kj)$G_{i}^{m}\left( {{K}_{i}} \right)\,\,\,>\,\,G_{j}^{n}\left( {{K}_{j}} \right)$is calculated. The following conditions are imposed:

{Pr(Gi(Ki)>Gj(Kj)|Si>Sj)0.9σi2κi=σj2κj,$$\left\{ \begin{align} & \Pr \left( {{G}_{i}}\left( {{K}_{i}} \right)>\,{{G}_{j}}\left( {{K}_{j}} \right)|\,{{S}_{i}}>{{S}_{j}} \right)\ge \,0.9 \\ & \qquad\qquad\qquad\frac{\sigma _{i}^{2}}{{{\kappa }_{i}}}=\frac{\sigma _{j}^{2}}{{{\kappa }_{j}}} \\ \end{align} \right.,$$

If (4) conditions are satisfied, then the pairs values (κi, κj)=(Ki, Kj) represent what Shen et al. (2017) termed the minimum representative size of the pair of data sets i and j. If (4) conditions are not satisfied, then Ki or Kj are increased and he sequence is repeated starting with step 1.

If κi is less than the size of data set, i.e. κi < Ni and κj < Nj, then we may say the averages Gi(κi) and Gj(κj) can be reliably compared. The smaller κ, the more consistent the distribution within the data set. Values of κi which are greater than Ni indicate that the average is not a reliable measure to describe the central tendency of the data set i.

We increase Ki and Kj according to the ratio between their variances, KiKj=σi2σj2,$\frac{{{K}_{i}}}{{{K}_{j}}}=\frac{\sigma _{i}^{2}}{\sigma _{j}^{2}}\,\,,$instead of making Ki = Kj since we believe that the set with larger variance should be responsible to reduce its variance more by using larger sample size.

Results
Data

In this article, we study 3374 academics from the departments of medicine within the six health studies universities in Romania: Cluj, Bucharest, Timişoara, Iaşi, Craiova and Tg. Mureş.

The personnel lists were compiled from public sources: websites and reports in 2014. We collected publication data from the Scopus data-base for the population of academics we established. We collected information regarding publications, citations, and Hirsch’s h-index. We included single and co-authored papers, as well as their respective citations for a five year interval: from 2009 to 2014. We excluded authors’ self-citations. The data were collected between November 2014 and May 2015. Full counting was used. We disambiguated manually authors’ name and affiliation. We included all papers from authors with multiple affiliation, provided they were published in the 2009–2014 time window. We included in the population all the academics which were considered to belong to health disciplines, according to an official categorization of teaching personnel in health studies (MS, 2009). We excluded only a small minority of the population we compiled e.g. English teachers employed in medicine departments. We identified a set of publication with an outstanding number of citations whose character is different from a standard research article: guidelines, medical procedure recommendations, and definitions. We excluded these articles as well, based on a systematic word search. We then recalculated the number of received citations, the h-index and the g-index for each academic in the data base. The data base was used previously for the analyses presented in Proteasa et al. (2017). A description of the populations we study is presented in Table 1.

Basic statistics. For each university, its name, number of academics(N), mean score(‹•›), standard variance (σ) of the corresponding index, are shown

UniversityNCitationh-indexg-index
cσhσgσ
Cluj59624.75124.661.692.232.534.03
Bucharest111918.4180.811.451.992.153.56
Timişoara50515.9357.611.422.031.933.16
Iaşi54715.8675.221.541.902.063.08
Craiova28014.1940.951.501.911.982.90
Tg. Mureş3275.4022.190.831.281.082.04

We illustrate the skewness of the six distribution in Fig.1, where we plotted the distributions of total citations of each of the academics affiliated to the six universities. A visual inspection of the plots reveals that the first quartile and minimum citation of the six universities are all close to zero thus cannot be seen in the figure, since a considerable number of academics received no citations in the time window during which citations were collected. The six distributions are skewed and present a high degree of overlapping, as indicated by their large standard variance and by the large difference between medians and means.

Figure 1

Box-plot for the citation distribution. The citation distributions of six universities are shown in box-plot. The five horizontal lines in each box represent the maximum, third quartile, median, first quartile and minimum citation (the last two lines can not be seen in these figures since they are to close to zero.), respectively. The star markers represent the mean of the citations. There is a huge difference between mean and median in each distribution.

In the following section we will engage with two research questions, conceptualized in the section dedicated to the outlining the method: (1) Is the mean a reliable measure of central tendency for the purpose of establishing a hierarchy of the six medical schools, which can account for the quality of the academics affiliated with them? (2) How can the six medical schools be compared using the minimum representative size?

Pairwise comparison

A first statistic treatment we perform includes pairwise comparison of the six medicine faculties. In this respect, we used distributions of citations(c), values of the h-index(h) and, respectively, the g-index(g). Thus, we calculate the minimum representative size, i.e. the minimum size of a sample of representative academics, κ for each pair of medicine faculties, according to Eq.4. We use notation κ(I)cr$\kappa \left( I \right)_{c}^{r}$to denote the minimum number of representative academics of faculty $r$ when compared with faculty c, based on the index I. For example, κ for Tg. Mureş and Cluj on g-index is κ(g)CLMU=5$\kappa \left( g \right)_{CL}^{MU}=5$and κ(g)MUCL=22$\kappa \left( g \right)_{MU}^{CL}=22$respectively. The considerable difference of mean g-index between these two universities may be the main reason of the small values of the pair of κ. κ(g)MUCL=22$\kappa \left( g \right)_{MU}^{CL}=22$ is larger partially due to the relatively higher heterogeneity of the g-index distribution of Cluj. The value of κ(I)cr$\kappa \left( I \right)_{c}^{r}$ is determined by the overlap between distributions r and c and also by the variances of r and c.

The distributions of the Bootstrap average G(κ), are shown in Fig.2(b). As a comparison, the distributions of individual g-index are shown in Fig.2(a), where there is a larger overlap between the two universities though we can clearly see the difference between their average scores. In Fig.2(b) the overlap is visibly smaller: the standard deviations are much smaller for G(κ), while mean values did not change.

Figure 2

Box-plot for the g-index distribution of Cluj and Tg. Mureş. (a) The original distribution of Cluj and Tg. Mureş. The five horizontal lines in each box represent the maximum, third quartile, median, first quartile and minimum g-index, respectively. The star markers represent the mean g-index. (b) Distribution of the Bootstrap κ-sample average of κ-index, with the minimum representative sizes of the two sets when the two sets are pair-wisely compared.

The complete results of pairwise comparison based on the three indexes, citation, h-index, and g-index are shown in Fig.3(a), (b), and (c), respectively. The size of each circle at position (r, c) is the ratio of mean value of the corresponding index I for university r and c, i.e. srIscI.$\frac{s_{r}^{I}}{s_{c}^{I}}.$ Their magnitude can be assessed against the gray circles along the diagonal, which serve as references and amount to size 1. The color of the other circles represents the value of κ(I)cr.$\kappa \left( I \right)_{c}^{r}.$ The darker the color, the greater the value of κ. For example, the size of circle at the upper right corner of Fig.3(c) corresponds to sCLgsMUg=2.34$\frac{s_{CL}^{g}}{s_{MU}^{g}}=2.34$ and the color of the same circle corresponds to κ(g)MUCL=22.$\kappa \left( g \right)_{MU}^{CL}=22.$ The size of the circle at the lower left corner of Fig.3(c) corresponds to sMUgsCLg=0.43,$\frac{s_{MU}^{g}}{s_{CL}^{g}}=0.43,$ while the color of the same circle corresponds to κ(g)CLMU=5.$\kappa \left( g \right)_{CL}^{MU}=5.$

Figure 3

Heatmap of the pairwise compared. (a) Faculties are sorted in descending order by average citations. The darkness of each circle represents the value of κ(I)cr$\kappa \left( I \right)_{c}^{r}$ for the pair formed by the faculty on row r and the one on column c. The size of the circles represents the ratio of the mean citation for the faculties corresponding to row r and column c. The gray circles on the diagonal are reference circles with size 1. (b) Results are calculated based on h-index. (c) Results are calculated based on g-index.

A visual inspection of Fig.3 reveals that most of the circles corresponding to pairwise comparisons between faculties are leaning towards the dark ends of the spectra. We interpret this observation as proof of the incapacity of the mean scores to account for the differences between the six faculties, when compared in pairs. More than this, some of the values κ takes are greater than size of the faculty, e.g. κ(c)IABU=3188,κ(c)BUIA=2762,butNBU=1119andNIA=547.$\kappa \left( c \right)_{IA}^{BU}=3188,\,\,\kappa \left( c \right)_{BU}^{IA}=2762,\,\,but\,\,{{N}_{BU}}=1119\,and\,{{N}_{IA}}=547.$In this particular case, the comparison of the two faculties based on their average is unreliable.

Group comparison

In a second statistical treatment, we compare one faculty u against the rest of the faculties, taken as a whole i.e. the reunion of their populations of academics. In other words, we compare the academics within one faculty with the rest of the academics as if they are from a single virtual faculty rest. This approach allows us to perform pairwise comparisons as we did in the previous sub-section, between faculty u and rest in order to compute the values of κ(I)restu,$\kappa \left( I \right)_{rest}^{u},$ for each I index. The results we obtained are presented in Table 2, below. The values of κ(I)urest$\kappa \left( I \right)_{u}^{rest}$ are not included in the table, as they do not serve for the description of the social phenomenon we study.

Values of κrest. For each university, its name, mean score of the rest university (‹crest, ‹hrest, ‹grest), and minimum number of representative academics (κrest), calculated by the corresponding index, are shown.

UniversityNCitationh-indexg-index
crest,σhrestσgrestσ
Cluj59615.504801.401831.95145
Bucharest111916:5052831:44> 1042.001664
Timişoara50517:3549001.4590662.081378
Iaşi54717:3871871.438932.06> 104
Craiova28017:404771.4233752.063634
Tg. Mureş32718:3921.51122.1613

The faculty from Cluj has the highest mean score in all the comparisons we performed, regardless of the index. The corresponding values of the minimum representative size, κrestCL,$\kappa _{rest}^{CL}\,,$are not small compared to the size of the population, but remain lower than it. At the other extreme, the faculty based in Tg. Mureş has the lowest mean score and smallest values of κrestMU$\kappa _{rest}^{MU}$in all the comparisons. For the other four universities, the values of κrest are all exceeding their number of academics. We interpret these results as proof of the fact that it is reliable to compare the faculty based in Cluj against the rest of the faculties, and, in fact, it exhibits superior values of all the three indexes we used. At the same time, it is also reliable to argue that the faculty from Tg. Mureş exhibits inferior performance compared to the rest of the faculties, under all the three indexes we used as a basis of comparison. Last but not least, our results from both the pair-wise comparison and the one-to-the-rest comparison indicate that it is not reliable to distinguish differences in the performance of the other four faculties (Bucharest, Timişoara, Iaşi, and Craiova), irrespective of the index which is used as a basis for calculation.

Summary

In a nut-shell, this contribution consists in applying the minimum representative size, a methodology developed in Shen et al. (2017), to a new empirical context— that of the faculties of medicine in the health studies universities in Romania, previously studied by Proteasa et al. (2017). The “quality” of the academics affiliated to the six faculties located in Cluj, Bucharest, Timişoara, Iaşi, Craiova, and Tg. Mureş is measured by the total citations received by each academic, and the respective values of the h-index and g-index. We performed pair-wise comparison and one-to-the-rest comparison. We found that, when the population of academics from Cluj is compared to the others, in either of the two methods of comparison, the minimum representative size is reasonably small. We interpret this finding as a reliable indication of superior performance, in relation to all three indexes used in this article. We also find that when the population of academics affiliated to Tg. Mureş is compared to the others, in either of the two methods of comparison, the minimum representative size is quite small, thus it is reliable to say that its performance is inferior to the other faculties, on the three measures used in this work. For the rest of the faculties we investigated in this work, we cannot rel iably distinguish differences among them, since their minimum representative size are all quite large, sometimes even bigger than their own sizes.

One might think that these results which substantiate that the faculties located in Cluj and Tg. Mureş are quite different from the rest, while the others are rather similar is trivial. It can be argued that a similar conclusion can be reached by a simple comparison of the mean scores in Table 1. We emphasize that the method we unfolded in this article which builds on the concept of the minimum representative size (κ), especially the relation between κ and the size of the whole population N, represents a validation, in this case, of the falsifiable hypothesis which can be derived from a simple comparison of the averages. When κ << N the hypothesis is validated and such a comparison is reasonable, while when κ ~ N or even κ > N, then the hypothesis is invalidated, and the hierarchy of the averages represents more a numerical artifact, than a substantial property of the distributions, thus such a comparison is unreliable. That case is exemplified through the four faculties whose corresponding minimum representative size exceeded the size of their distributions. To conclude, the minimum representative size relative to the size of the population proved to be a useful and reliable ancillary indicator to the mean scores.

We consider our findings are particularly relevant in situations when aggregate scores are computed for the purpose of ranking data sets associated with different collective units, such as faculties, universities, journals etc. Whenever one wants to distinguish the performance of two collective units, the minimum representative size of pair-wise comparison should be calculated first as an indication of the reliability of the comparison of the means. When κ is small compared to size of the set, then the mean can be seen as a good representative value of a Bootstrapped κ number of samples from the set. Whenever one is interested in comparing one particular collective unit against a group of similar units, such as the medicine faculties in health studies universities in Romania, then κ calculated by one-to-the-rest comparison would be appropriate. In both cases, those groups whose κ are comparable or even bigger than their group sizes should be discarded from such comparisons. A small κ is an indication of the consistency of a data set, which is an attribute that, we consider, should be assessed before ranking collective units consisting from individuals with different performance levels.

eISSN:
2543-683X
Langue:
Anglais