Open Access

The Scientometric Measurement of Interdisciplinarity and Diversity in the Research Portfolios of Chinese Universities


Cite

Introduction

On July 29, 2020, the Academic Degrees Committee of the State Council—an advisory council of the Chinese government—announced that the category “interdiscipline” had been added to the list of national disciplines accessible for academic degrees. This initiative will not only result in a structural change to China's classification of academic degrees, it was also designed to promote the future development of interdisciplinarity in China. As a case in point, three months after its release in late October 2020, the National Natural Sciences Foundations of China (NSFC) announced the launch of a new department for “interdisciplinary studies”. This will be the ninth department of the NSFC, and will focus on funding interdisciplinary projects. As the first change to the NSFC funding scheme in 11 years, the decision has drawn much attention.

Interdisciplinarity is a hot topic in science and technology policy. However, the concept of interdisciplinarity is both abstract and complex, which makes it difficult to fully represent or measure interdisciplinarity in terms of indicators, which can be compared among them. A variety of measures for diversity, as a proxy of interdisciplinarity, has been proposed in the literature. Further, one can find such indicators to measure the interdisciplinarity of a set of articles, patents, or journals. In this study, we ask: Can one rank institutions in terms of their disciplinary diversity? And, if so, what does this tell us about interdisciplinarity?—noting that diversity is not necessarily a goal universities strive for; some aspire to be the best in a particular discipline.

During the last few years, we, the authors of this paper, have explored the scientometric measurement of interdisciplinarity and diversity in scholarly communications in collaboration with a number of colleagues. Contributions to this program of studies were made (in alphabetic order) by Lutz Bornmann, Wolfgang Glänzel, Inga Ivanova, Ronald Rousseau, Caroline S. Wagner, and Ping Zhou (Leydesdorff & Ivanova, 2020; Leydesdorff, Wagner, & Bornmann, 2018 and 2019; Zhang, Rousseau, & Glänzel, 2016; Zhang, Sun, Chinchilla-Rodríguez, Chen, & Huang, 2018; Zhang, Sun, Jiang, & Huang, 2021). One of our objectives has been to develop a non-commercial, public-domain application that allows researchers and policy analysts to measure the diversity of any document set or network structure using a range of indicators. To our best knowledge, no such tool has ever been developed, at least not for public consumption.

A large number of indicators of “diversity” have been proposed in the literature (e.g. Rao-Stirling diversity; Stirling (2007), the Gini-coefficient, Simpson (1949) indicator, Hirschman-Herfindahl (Herfindahl, 1950; Hirschman, 1945), etc. In this communication, we report on the facilities which we created during the last two years. Particularly, we introduce the freely available program interd_vb.exe (available at http://www.leydesdorff.net/software/interdisc.2020/) for this purpose. We document the various options and provide instructions for practitioners interested in measuring diversity and interdisciplinarity. By elaborating on the measurement of the disciplinary diversity of the research portfolios of the 42 top universities listed as the “Double First-Class” universities (Liu et al., 2018), we are able to show the options and choices to be made given the current state of the art.

Technical instructions are additionally available at http://www.leydesdorff.net/software/interdisc.2020/index.htm. The inputs and outputs are in .csv format. The same output is also stored in interdis.bdf. The subsequent analysis demonstrates the options and choices that can be made as route to a final comparison. As a disclaimer, note that we are in no way professional programmers. We cannot guarantee that our routines are error-free, and we acknowledge that the user interface could be improved. However, as a test, one of us programmed the application in two different computer languages, and the results were virtually the same. Additionally, we do believe the functionality is unique and, therefore, state of the art for what it is.

One of the advantages of the application is its ability to handle large volumes of data. For example, the need to analyze an entire database, such as Web-of-Science (WoS), Scopus, or Google Scholar, is becoming increasingly common. Analyses of this magnitude can generate baselines for evaluating the disciplinary diversity of articles, journals, topics, etc. The Interdisc program can relieve the computational overhead of processing massive amounts of data. That said, although the equations used to calculate diversity indicators are often mathematically transparent, specifying the terms as computer code can help analysts to further precision in decisions that would not otherwise be involved in a manual calculation.

The relation of the indicators to bibliometrics

Interdisciplinarity can be operationalized as references to different literatures. Such co-citing is known in scientometrics as bibliographic coupling (Kessler, 1963). When a document, for example, cites both articles in physics journals and in sociology journals, this can be expected to indicate interdisciplinarity more than citing chemical physics and solid-state physics in the same document or in the same set. In other words, one couples literature from different disciplines in the references. This coupling can be at the level of articles, journals, or Web-of-Science Subject Categories (WCs).

Bibliographic coupling is an indicator on the citing side and thus the operation opposite to co-citation: co-citations across disciplinary borders indicate interdisciplinary diffusion, whereas the measurement of interdisciplinarity by bibliographic coupling focuses on aggregated citing behaviour.

Whereas “interdisciplinarity” by citing papers refers to documents, documents are often not the units of analysis in the case of research evaluation at the institutional level. The interdisciplinary operator of bibliographic coupling is defined in terms of disciplines and not in terms of institutions. Does the diversity of a university in terms of departments indicate interdisciplinarity or only comprehensiveness of a research portfolio? Since there is no coupling in terms of different fields, one may measure only comprehensiveness, and not interdisciplinarity.

Institutional units are primarily administratively and not disciplinarily organized. The diversity indicators apply to disciplinary differentiations; social differentiation in terms of departments, etc., may have a different meaning. For example, diversity may also indicate comprehensiveness. How does this work out empirically?

Indicators of diversity

In this section, we first discuss the following indicators of diversity and interdisciplinarity in terms of the basic equations:

Shannon's entropy

Using Shannon's (1948) information theory, one can measure diversity as the uncertainty in a distribution. The equation of the Shannon entropy can be stated as follows: H=pilog(pi) H = - \sum {{p_i}\log \left( {{p_i}} \right)} Where = pi = xi / X, and ∑pi = 1. xi denotes the number of cells belonging to subject category i. Based on information theory, the maximum capacity (Hmax) of a system is composed of two parts which are (1) the number of realized states and (2) the not-yet-realized but possible states (HmaxHsystem); that is, the redundancy. Leydesdorff and Ivanova (2021t) proposed to use redundancy as a measure of synergy.

The Simpson index

The Simpson index was originally developed to measure “concentration” (Rousseau, 2018; Simpson, 1949). Stirling (2007) introduced the concept into the field of scientometrics as a way to evaluate the variety of subject categories and the unevenness in the distribution of these categories. For this reason, Simpson diversity is often called a “dual concept” indicator of diversity. It combines variety with balance in a single number. The equation for Simpson's diversity index is SI=1pi2 SI = 1 - \sum {p_i^2} where pi = xi / X, X = ∑xi, and xi denotes the number of elements belonging to the subject category i.

Rao-Stirling index

Stirling (2007) proposed Rao-Stirling (RS) diversity to measure interdisciplinarity, distinguishing variety, balance, and disparity as the three components of interdisciplinarity. Formally, the indicator is calculated as RS=i,j(pipj)αdijβ RS = {\sum\nolimits_{i,j} {{{\left( {{p_i}{p_j}} \right)}^\alpha }d} _{ij}}^\beta where dij (or equivalently 1-Sij) denotes the distance between subject i and subject j, and Sij is the similarity between the subjects i and j. pi = xi / X, X = ∑xi, and xi denotes the number of cells belonging to subject i. The exponents α and β are two parameters for adjusting the relative weights of distance dij and variety or balance pipj.

The novelty of RS lies in the disparity term (dij). The other part of Eq. 3 is the same as the Simpson index, which measures both variety and balance.

In most scientometric applications, α and β are set to 1 (Rafols & Meyer, 2010), which simplifies Eq. (3) to: D=ijdijpipj D = \sum\nolimits_{i \ne j} {{d_{ij}}{p_i}{p_j}}

True RS diversity

True RS diversity has its origins in a variant of the Hill indicator proposed by Leinster and Cobbold (2012) which adds disparity into the Hill equation traditionally used in ecology. This indicator was subsequently modified by Zhang et al. (2016) as follows: 2DS=(i=1npi(j=1nsijpj))1=1i,j=1nsijpipj ^2{D^S} = {\left( {\sum\limits_{i = 1}^n {{p_i}\left( {\sum\limits_{j = 1}^n {{s_{ij}}{p_j}} } \right)} } \right)^{ - 1}} = {1 \over {\sum\limits_{i,j = 1}^n {{s_{ij}}{p_i}{p_j}} }} where Sij denotes the similarity between subjects i and j. pi = xi / X, X = ∑xi, and xi is the number of cells belonging to subject i. Note that True RS is no longer bounded between zero and one, and it allows the parameters to be scaled such that one unit of study is, say, twice as interdisciplinary as another.

DIV

Stirling (1998) stated that “any integration of variety and balance into dual concept diversity must necessarily involve the implicit or explicit prioritization of the subordinate properties”. From this, Leydesdorff et al. (2019) proposed a new diversity indicator, called DIV, that divides interdisciplinarity into its three components (variety, balance, and disparity) and recombines them by multiplication. An empirical experiment proves the advantages of this new indicator over RS diversity. Formally, DIV is expressed as follows: DIVc=[ncN]*[1G(c)]*[j=nci=nci=1,j=1,ijdij{nc*(nc1)}] {DIV_c} = \left[ {{{{n_c}} \over N}} \right]*\left[ {1 - G\left( c \right)} \right]*\left[ {\sum {\matrix{ {j = {n_c}} \hfill \cr {i = {n_c}} \hfill \cr {i = 1,} \hfill \cr {j = 1,} \hfill \cr {i \ne j} \hfill \cr } } {{{d_{ij}}} \over {\left\{ {{n_c}*\left( {{n_c} - 1} \right)} \right\}}}} \right] where n(c) is the number of elements in the case under study; N is the total number of elements in the set; c is the sequence number of the column vector in the set; G(c) is the Gini coefficient of c; and dij is the level of disparity between elements i and j.

Rousseau (2019) suggested some improvements to DIV. He showed that DIV can be turned into a measure of True Diversity by removing the term N (variety) in the denominator of Eq. 6. Rousseau argued that a better framework for diversity measurement would account for several requirements, not all of which are met by existing frameworks. Responding to the improvements made by Rousseau (2019), Leydesdorff, Wagner, and Bornmann (2019) provided an updated version of the improved DIV* as a True Diversity measure: DIVc*=nc*[1G(c)]*[j=nci=nci=1,j=1,ijdij{nc*(nc1)}] DIV_c^* = {n_c}*\left[ {1 - G\left( c \right)} \right]*\left[ {\sum {\matrix{ {j = {n_c}} \hfill \cr {i = {n_c}} \hfill \cr {i = 1,} \hfill \cr {j = 1,} \hfill \cr {i \ne j} \hfill \cr } } {{{d_{ij}}} \over {\left\{ {{n_c}*\left( {{n_c} - 1} \right)} \right\}}}} \right] where n(c) is the number of elements in subject c; G(c) is the Gini coefficient of c; and dij is the level of disparity between elements i and j.

Gini coefficient

The Gini coefficient is a well-known indicator for representing income inequality among people and wealth inequality among nations (Lorenz, 1905). Hence, when measuring the diversity of interdisciplinary research with the Gini coefficient, the research is treated as a system comprised of three elements—variety, balance, and disparity (Porter & Rafols, 2009; Rafols & Meyer, 2010) where (1 – Gini) is used as the indicator of balance (Nijssen et al., 1998).

The theory of relative mean differences defines the Gini coefficient as (e.g. Buchan, 2002): G=i=1nj=1n|xixj|2n2x¯ G = {{\sum\nolimits_{i = 1}^n {\sum\nolimits_{j = 1}^n {\left| {{x_i} - {x_j}} \right|} } } \over {2{n^2}\bar x}} where x is an observed value, n is the number of values observed, and x bar is the mean value.

Note, however, that there are several alternative definitions of the Gini coefficient. See, for example, that provided at https://en.wikipedia.org/wiki/Gini_coefficient (cf. Rousseau (1992)).

If the x values are first placed in ascending order such that each x has rank i, some of the comparisons above can be avoided and computation is therefore more efficient, i.e.: G=2n2x¯i=1ni(xix¯) G = {2 \over {{n^2}\bar x}}\sum\limits_{i = 1}^n {i\left( {{x_i} - \bar x} \right)} G=i=1n(2in1)xini=1nxi G = {{\sum\nolimits_{i = 1}^n {\left( {2i - n - 1} \right){x_i}} } \over {n\sum\nolimits_{i = 1}^n {{x_i}} }} where x is an observed value, n is the number of values observed, and i is the rank of values in ascending order.

For G to be an unbiased estimate of the true population value, it should be multiplied by n/(n-1) (Dixon, 1987; Mills & Zandvakili, 1997). In the bibliometric literature, this index is also known as the Pratt index (Pratt, 1977). The value of both the Gini and the normalized G are provided by interd_vb.exe.

Other indicators

The concept of coherence based on network analysis has attracted attention from researchers in scientometrics (e.g. Rafols, 2014). While the diversity indicators rely on a pre-defined category system, coherence can be generated via a bottom-up approach that describes the intensity of the relations between any elements in a network. From this perspective, comprehensive frameworks composed of diversity and coherence have been proposed to improve the depiction of interdisciplinary systems (Rafols & Meyer, 2010).

The computation of diversity and interdisciplinarity indicators

The program interd_vb.exe (http://www.leydesdorff.net/software/interdisc.2020/interd_vb.exe) was rewritten based on the routine Mode2Div.exe previously programmed in the so-called xBase language. Unfortunately, computing cosine values for large matrices can be time-consuming with xBase, which imposes a soft limit on the size of the datasets that can be processed. Hence, we rewrote Mode2Div. exe in Visual Basic 6 to become interd_vb.exe, i.e. the online Interdisc application. Visual Basic 6 runs on Win10 (32/64 bits) and does not require the predetermined amount of memory to be allocated to processing. Therefore, the only limitation to the size of the dataset that can be processed is hardware. The two programs, interd_vb.exe and Mode2Div.exe, have similar objectives but a different organization and architecture, and the results they produce are exactly the same. Both programs are documented in Leydesdorff et al. (2018, 2019) and the software is available for download from https://www.leydesdorff.net/software/interdisc.2020/ and Figshare (https://figshare.com/account/articles/12871529).

One key difference between the two versions of the program is their input requirements. In the case of mode2div.exe, the input is stored listwise using the Pajek format, each line describing the row and column of a cell in a matrix of values. Thus, the input can be read as three fields without any system limitations. The data is assumed to be 2-mode so that an asymmetrical (citation) matrix can be processed. The program then computes the diversity measures along the column vectors of a data matrix saved in .csv format. As an example, to measure the interdisciplinarity of a set of documents, one could use jcitnetw.exe

https://www.leydesdorff.net/software/interdisc/jcitnetw.exe

to easily generate a co-occurrence matrix of cited journals in the Pajek format, using plain text downloaded from the Web of Science. More details on this can be found at https://www.leydesdorff.net/software/mode2div/.

The distance metric and the disparity measure

Stirling (2007) added a new element to diversity measurement: disparity. Disparity indicates the distance between two subjects in the sample(s) under study. For example, if the distances in a subset are small, this space can be considered a niche of related variety (Frenken et al., 2007). However, disparity as a factor in both RS and the DIV requires the choice of a distance metric. Following Salton and McGill (1983), Ahlgren, Jarneving, and Rousseau (2003) proposed cosine as a non-parametric measure of similarity for bibliometrics. From a comparison of a number of similarity/distance measures, Egghe and Leydesdorff (2009) concluded that the cosine fulfills a number of requirements.

Like Pearson correlations, cosine values are defined in a vector space and are therefore positional, whereas the very similar Jaccard index is relational. Unlike the Pearson correlation, however, cosines do not normalize to a mean and, since bibliometric distributions are highly skewed, normalizations using the mean are to be avoided. Our routines use (1 – cosine), which can be considered a distance measure. Pragmatically, the terms of a cosine can be written as co-occurrence in the numerator and the sum of squares along the two column vectors x and y multiplied in the denominator. Note that, here, the matrix rows contain the disciplines and the columns contain the universities, so the cosine values are computed between the row vectors.

One disadvantage of Mode2Div.exe is that data is often not readily available in Pajek format and converting the data into this format may generate other problems (Pfeffer, Mrvar, & Batagelj, 2013). The most generic format for data, however, is a matrix as a comma or tab-separated plain ASCII file. There are no size limitations for this data, although Excel (depending on the Office version) may not allow for more than 255 variables. This data, however, can also be written using a text editor (e.g. the freeware Note++) or any other program. The size of the matrix is only limited by external factors such as free diskspace.

The routine begins with asking for the name of the .csv file containing the variables and the number of vectors to be compared for the purposes of error correction. The file is then rewritten into output which is reported in the files interdis.dbf and equivalently interdis.csv. The specific differences in terms of inputs, outputs, and other related items about these programs are summarized in Appendix Table S1.

See for further details at http://www.leydesdorff.net/software/interdisc.2020/

Data

As empirical data, we used the portfolio of research articles from the 42 Chinese universities listed as “Double First-Class universities” between 2017 (when the list was first released) and 2019. The Chinese government offers substantial support to this select group of universities through a series of special programs. Additionally, although this particular list has only been published since 2017, similar initiatives under different names have existed periodically since the 1990s, with the majority of universities considered to be elite remaining much the same this whole time. Thus, these 42 institutions were selected because this group is both clearly delineated and large enough to provide a large-scale sample. In addition, we also included the portfolios of two well-known American universities, Harvard and Stanford, to provide a standard those in the West might find easier to benchmark. In a subsequent article, Leydesdorff, Wagner, and Zhang (2021), we further compare these results with 205 Chinese universities.

Each of the universities in the sample promotes itself as a comprehensive university. However, some note specific missions or strengths; for instance, the agricultural universities. The publications associated with each university were retrieved using the organization's name and/or its variants from the Preferred Organization Index in WoS.

The domains searched include the Science Citation Index Expanded (SCI-E), the Social Sciences Citation Index (SSCI), and the Arts & Humanities Citation Index (A&HCI) in the Web of Science (WoS) Core Collection. We limited the document type to articles and reviews. The number of articles retrieved per university are listed in Table 1 in decreasing order.

Number of publications associated with the 44 universities in our sample (2017–2019); in decreasing order.

No. University name Papers No. University name Papers
1 Harvard Univ 76,144 23 Northeastern Univ 14,893
2 Shanghai Jiao Tong Univ 37,016 24 Beihang Univ 14,484
3 Zhejiang Univ 35,204 25 Dalian Univ of Technology 13,861
4 Tsinghua Univ 32,681 26 Zhengzhou Univ 12,993
5 Stanford Univ 32,428 27 Northwestern Polytechnical Univ 12,497
6 Peking Univ 30,160 28 Chongqing Univ 12,451
7 Sun Yat-Sen Univ 26,823 29 Univ of Electronic S & T of China 12,334
8 Huazhong Univ of S & T 24,822 30 Xiamen Univ 11,607
9 Fudan Univ 24,475 31 Beijing Institute of Technology 11,206
10 Sichuan Univ 23,259 32 Beijing Normal Univ 10,043
11 Central South Univ 22,870 33 Nankai Univ 9970
12 Xi’an Jiaotong Univ 22,698 34 Hunan Univ 9811
13 Shandong Univ 21,601 35 Lanzhou Univ 9156
14 Jilin Univ 21,068 36 China Agricultural Univ 8762
15 Harbin Institute of Technology 20,750 37 Northwest A & F Univ 7817
16 Univ of S & T of China 20,747 38 East China Normal Univ 7610
17 Wuhan Univ 19,748 39 National Univ of Defense Technology 6601
18 Nanjing Univ 19,246 40 Ocean Univ of China 6390
19 Tianjin Univ 17,778 41 Renmin Univ of China 2946
20 Tongji Univ 17,226 42 Yunnan Univ 2835
21 Southeast Univ 16,959 43 Xinjiang Univ 1979
22 South China Univ of Technology 15,595 44 Minzu Univ of China 760

We first organized the data into an asymmetrical occurrence matrix of the 44 universities against 254 WoS categories. We then computed the six diversity measures using Interd_vb.exe.

Results
Ranking of universities in terms of interdisciplinarity

The interdisciplinarity scores for each indicator and university are listed in Table 2. Additionally, we have provided a ranking against each indicator. For example, for the DIV* indicator, Stanford University is ranked No. 1, whereas, according to the True RS indicator, it is ranked No. 15. Tsinghua University, which is widely considered to be the top university in China, sits in 21st place on the list of DIV*. Keep in mind, however, that this is a ranking of comprehensiveness as measured by disciplinary diversity, not of impact. As mentioned in Section 2.6, the Gini coefficient is a measure of unbalance, and therefore (1 – Gini) is used in the computation of DIV* (Eq. 7; Table 2).

The Indicator scores generated by interd_vb.exe routine.

University DIV* Rank True RS Rank Simpson Rank Shannon Rank Variety Rank Disparity Rank (1-Gini) Rank
Stanford Univ 40.260 1 1.503 15 0.986 1 6.831 1 0.988 1 0.472 23 0.340 1
Sun Yat-Sen Univ 35.754 2 1.549 6 0.983 5 6.663 2 0.945 5 0.474 12 0.314 2
Peking Univ 33.352 3 1.516 13 0.982 7 6.568 4 0.953 3 0.474 15 0.291 4
Zhejiang Univ 33.237 4 1.549 7 0.983 3 6.594 3 0.949 4 0.473 18 0.292 3
Harvard Univ 32.328 5 1.288 39 0.983 6 6.512 7 0.988 1 0.471 25 0.274 7
Shanghai Jiao Tong Univ 31.151 6 1.553 5 0.984 2 6.565 5 0.921 9 0.471 26 0.283 5
Sichuan Univ 30.092 7 1.527 12 0.983 4 6.517 6 0.913 11 0.473 19 0.274 6
Wuhan Univ 29.117 8 1.548 8 0.982 9 6.465 8 0.917 10 0.473 16 0.264 8
Northeastern Univ 28.892 9 1.485 18 0.975 24 6.335 15 0.945 5 0.466 37 0.258 10
Fudan Univ 28.102 10 1.457 22 0.979 17 6.361 12 0.929 7 0.468 35 0.254 12
Shandong Univ 27.683 11 1.492 17 0.981 11 6.416 9 0.898 13 0.472 24 0.257 11
East China Normal Univ 27.471 12 1.495 16 0.980 12 6.415 10 0.886 18 0.470 28 0.260 9
Nanjing Univ 26.735 13 1.444 24 0.977 20 6.256 21 0.929 7 0.475 9 0.238 19
Beijing Normal Univ 26.427 14 1.575 4 0.976 22 6.328 16 0.890 16 0.467 36 0.250 13
Xiamen Univ 26.392 15 1.439 25 0.977 18 6.301 19 0.894 14 0.474 13 0.245 16
Tongji Univ 26.286 16 1.538 11 0.979 13 6.341 13 0.894 14 0.471 27 0.246 15
Huazhong Univ of S&T 25.700 17 1.452 23 0.979 15 6.308 18 0.886 18 0.475 10 0.241 18
Central South Univ 25.298 18 1.545 9 0.979 16 6.336 14 0.843 23 0.479 2 0.247 14
Lanzhou Univ 23.959 19 1.513 14 0.981 10 6.362 11 0.815 26 0.472 22 0.245 17
Jilin Univ 23.323 20 1.431 26 0.975 26 6.177 22 0.866 21 0.474 14 0.224 22
Tsinghua Univ 23.253 21 1.371 29 0.975 27 6.120 25 0.913 11 0.470 29 0.213 25
Xi’an Jiaotong Univ 22.879 22 1.409 27 0.975 25 6.128 24 0.890 16 0.472 21 0.214 24
Zhengzhou Univ 21.553 23 1.463 20 0.976 23 6.152 23 0.811 27 0.475 7 0.220 23
Southeast Univ 20.325 24 1.385 28 0.970 33 5.971 29 0.870 20 0.470 30 0.196 28
Renmin Univ 20.323 25 1.458 21 0.979 14 6.277 20 0.748 35 0.457 41 0.234 20
Yunnan Univ 18.896 26 1.540 10 0.982 8 6.314 17 0.681 39 0.469 31 0.233 21
Nankai Univ 18.091 27 1.334 32 0.969 40 5.894 32 0.827 24 0.462 40 0.187 30
Univ of S&T – China 17.782 28 1.303 37 0.968 41 5.797 38 0.850 22 0.477 5 0.172 38
Tianjin Univ 17.466 29 1.296 38 0.970 35 5.852 37 0.819 25 0.475 11 0.177 35
South China Univ of Technol 17.286 30 1.349 31 0.970 36 5.882 33 0.795 28 0.473 20 0.181 31
Chongqing Univ 17.029 31 1.313 34 0.970 34 5.856 36 0.795 28 0.478 3 0.176 36
Hunan Univ 16.958 32 1.307 36 0.973 30 5.913 31 0.772 32 0.480 1 0.180 32
Ocean Univ of China 16.824 33 1.734 1 0.977 21 6.102 26 0.677 40 0.473 17 0.207 26
Dalian Univ of Technol 16.509 34 1.315 33 0.974 29 5.922 30 0.756 34 0.478 4 0.180 33
Harbin Inst of Technology 15.412 35 1.286 40 0.969 38 5.769 39 0.776 31 0.475 8 0.165 39
Beihang Univ 15.007 36 1.311 35 0.970 37 5.762 40 0.780 30 0.465 38 0.163 40
China Agricultural Univ 14.671 37 1.670 3 0.973 31 5.873 35 0.701 37 0.469 34 0.176 37
Northwest A&F Univ 14.040 38 1.681 2 0.972 32 5.881 34 0.665 41 0.469 33 0.177 34
Beijing Inst of Technol 13.944 39 1.269 44 0.969 39 5.728 41 0.724 36 0.475 6 0.159 41
Xinjiang Univ 12.921 40 1.369 30 0.975 28 5.997 28 0.571 42 0.463 39 0.193 29
Univ of Electronic S&T of China 12.847 41 1.281 42 0.950 44 5.428 43 0.768 33 0.448 43 0.147 42
Minzu Univ of China 12.104 42 1.464 19 0.977 19 6.049 27 0.535 44 0.448 42 0.199 27
Northwestern Polytechnical Univ 12.062 43 1.275 43 0.962 42 5.571 42 0.693 38 0.469 32 0.146 43
National Univ of Defense Technol 7.783 44 1.285 41 0.951 43 5.274 44 0.563 43 0.446 44 0.122 44

The Spearman rank-order correlations are provided in Table 3. The DIV* indicator correlates much more closely to the VARIETY and GINI indicator, as is to be expected since (1-GINI) is actually used to calculate DIV.* H owever, there is only a moderate correlation between the two true diversity indicators, True RS and DIV* at (ρ = 0.50; p < 0.01). Further, the rankings of the top five universities according to these two indicators are inconsistent. These unexpected results raise further questions.

Spearman's correlations for ranking order generated by Interd_vb.exe (N = 42).

DIV* TRUE RS VARIETY DISPARITY (1 -GINI) SIMPSON SHANNON
DIV*
TRUE RS .563**
VARIETY .926** .323*
DISPARITY .215 −.092 .230
(1 – GINI) .936 .717** .772** .074
SIMPSON .789** .766** .551** .085 .917**
SHANNON .911** .734** .725** 087 .990** .950**

Correlation is significant at the 0.01 level (2-tailed).

Correlation is significant at the 0.05 level (2-tailed).

The new element added to the Striling (2007) to the measurement of diversity and interdisciplinarity was disparity. In Table 3, disparity indeed is not significantly correlated with any of the other diversity indicators. Factor analysis of this data (Table 4) shows disparity (and variety) as a second component. Unlike True RS, DIV* captures both dimensions, as was Stirling's theoretical intention.

Factor analysis of the interdisciplinarity and diversity indicators (N = 42).

Rotated Component Matrixa

Component

1 2
True RS .881 −.133
Shannon .877 .455
(1-Gini) .862 .456
Simpson .830 .390
Div* .703 .657
Variety .329 .853
Disparity .792

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation converged in 3 iterations; 85.1% of the variance explained.

As stated above, when applying interd_vb.exe, the terms of the cosine are pragmatically computed using co-occurrences in the sample in the numerator and the square roots of the products of sum of squares along the thus affiliated vectors x and y in the denominator. Disparity is then defined as the sum of local values of (1-cosine) over the set. This matrix is a “sample-dependent” local matrix since it reflects the disparity within the data samples. Consequently, these values vary with the data-sample used as input. It may often be convenient for analysts and developers to calculate the diversity values in this way (locally), particularly, when one has no access to a global disparity matrix. However, the systems of reference for the cosine-normalization are then different among samples.

Local versus global disparity

In contrast to local disparity, using a global matrix solves (almost by definition) the problem of comparability across samples. To demonstrate the difference between “local” and “global” matrices, we recalculated the diversity scores using a global cosine matrix based on the full set of JCR data for 2019. These data include 236 subject categories in the Science and Social Sciences Citation Indexes (but not the 25 in the Arts & Humanities Citation Index).

The results for both DIV* and True RS are shown in Table 5, and Table 6 shows the Spearman's correlations for the ranking order of the two indicators.

The local cosine matrix was generated with interdisc_vb.exe; the global one was retrieved from http://www.leydesdorff.net/software/wc19. The cosine similarity matrix for the WoS categories based on JCR 2019 data is also provided at http://www.leydesdorff.net/wc15/wc19.

As expected, the correlation between DIV* and True RS (or RS) increased (from 0.502 to 0.695), demonstrating that the consistency between different diversity indicator values can be improved by using a global matrix instead of a local matrix.

Local vs. global disparity using JCR data for 2019.

University DIV* Rank TRUE RS Rank
Stanford Univ 72.956 1 5.488 2
Sun Yat-Sen Univ 68.429 2 4.741 5
Zhejiang Univ 63.343 3 4.300 12
Peking Univ 62.654 4 4.632 8
Shanghai Jiao Tong Univ 60.907 5 4.643 7
Sichuan Univ 58.367 6 4.033 19
Harvard Univ 58.301 7 4.461 9
Wuhan Univ 56.903 8 4.702 6
Northeastern Univ 54.887 9 4.161 15
Fudan Univ 54.196 10 3.897 22
Shandong Univ 53.921 11 4.162 14
East China Normal Univ 51.757 12 4.309 11
Xiamen Univ 51.354 13 3.705 24
Tongji Univ 51.348 14 4.823 4
Beijing Normal Univ 50.815 15 5.082 3
Huazhong Univ of S&T 50.632 16 3.963 20
Central South Univ 50.535 17 4.107 17
Nanjing Univ 50.285 18 3.851 23
Lanzhou Univ 47.622 19 4.087 18
Jilin Univ 46.049 20 3.292 34
Xi’an Jiaotong Univ 45.655 21 3.664 25
Tsinghua Univ 45.121 22 3.601 26
Zhengzhou Univ 43.389 23 3.442 29
Southeast Univ 39.662 24 3.902 21
Renmin Univ 37.896 25 5.563 1
Nankai Univ 36.427 26 2.950 43
Yunnan Univ 36.236 27 4.382 10
Univ of S & T – China 35.002 28 2.876 44
Tianjin Univ 34.613 29 2.995 41
South China Univ of Technol 34.388 30 2.978 42
Ocean Univ of China 32.747 31 4.202 13
Chongqing Univ 32.519 32 3.260 35
Hunan Univ 32.394 33 3.379 32
Dalian Univ of Technol 31.933 34 3.355 33
Harbin Inst of Technol 30.166 35 3.191 36
Beihang Univ 30.029 36 3.508 28
China Agricultural Univ 29.258 37 3.396 31
Northwest A & F Univ 27.904 38 3.402 30
Beijing Inst of Technol 27.102 39 3.184 37
Univ of Electronic S&T of China 26.892 40 3.073 39
Xinjiang Univ 25.828 41 3.531 27
Northwestern Polytechnical Univ 23.873 42 3.031 40
Minzu Univ of China 22.645 43 4.132 16
National Univ of Defense Technol 16.236 44 3.118 38

Spearman's correlations for consistency of rank order – local vs. global disparity.

DIV*_local TRUE RS_local DIV*_global TRUE RS_global
DIV*_local
TRUE RS_local .502**
DIV*_global .996** .516**
TRUE RS_global .697** .707** .695**

Correlation is significant at the 0.01 level (2-tailed).

With a correlation between the local and global values of DIV* at .996, DIV* is obviously not sensitive to the scaling. As Rousseau (2019) noted, the disparity in DIV* “is just a relative (normalized) sum.” With hindsight, this seems an advantage of DIV* when compared with True RS.

Differences among specific universities

There are some interesting observations to be made in terms of the results of specific universities. Comparing Stanford University and Tsinghua University as examples, Stanford University ranks significantly higher than Tsinghua according to both DIV* and True RS, as shown in Table 4. The science overlay maps in Figures 1 and 2 illustrate this vividly (Carley et al., 2017; Leydesdorff et al., 2016; Rafols et al., 2010). Using VOS Viewer for the visualization (Waltman et al., 2010), each node represents a WoS category, and the size of the node indicates the number of publications.

Figure 1

Science overlay map of the publications with an address at Tsinghua University. [Note: The base map of disciplines was developed from the matrix of 227 × 227 cells of WoS categories. This was generated on the basis of direct citation counting and normalized with the cosine function (Carley et al. 2017).

Figure 2

The science overlay map of the publications associated with Stanford University. [Note: The base map of disciplines was developed from the matrix of 227 × 227 cells of WoS categories. This was generated on the basis of direct citation counting and normalized with the cosine function (Carley et al. 2017).

It is clear (on the basis of visual inspection of these two maps) that the category distributions of the two universities are very different. Stanford University obviously prioritizes research in Clinical Medicine, Biomedicine, and other medical disciplines, while Tsinghua University has a clear focus on Computer Science & Engineering, Material Science, and other Engineering fields. However, although each university has strengths in particular disciplines, the distribution of disciplines across Stanford's portfolio is more balanced than that across Tsinghua's.

Discussion and conclusion

DIV* values were more in line with our intuition about the diversity of these universities than the RS or True RS values. The latter, particularly worsen when the results are based on local disparity matrices. Using this local matrix, however, some field-specific universities like Ocean University of China and the Northwest Agriculture & Forestry University are found to have high diversity values with the True RS (and RS) indicators. These results raise further questions.

The results for RS/True RS are more sensitive than DIV* to the choice of similarity measures (Rafols & Leydesdorff, 2010). As Rousseau (2019) notes: “DIV, taking disparity into account as just a relative (normalized) sum” is not sensitive to scaling. In Eq. (8), disparity is only defined at the level of the sample; the interaction between category i and category j (pi and pj, respectively) with dij is not taken into account at the cell level, only the total sum of all disparity values is.

Table 2 (above) showed that the Ocean University has the highest True RS diversity of all universities. However, when checking the specific distribution of Web of Science categories, we found that more papers are published within Oceanography (14.01%) than any other category. Yet, Oceanography is a relatively marginal category in our sample, with much lower cosine similarities than other categories. As a result, the disparity (1-cosine) between Oceanography and other categories is much higher than on average, at a value of 0.73 vs 0.47, respectively. The extraordinarily high proportion of publications in Oceanography and the category's high disparity from other categories leads to an unexpectedly high diversity value when measured with RS/True RS. However, when using a global similarity matrix (Table 4), the scores of RS/True RS in most field-specialized universities decreased. As noted, these rankings were not affected by this effect when using DIV*.

The portfolio of papers with a Harvard address covers a wide range of categories and the distribution is relatively balanced. However, the cosine similarities of the categories with most publications are relatively high, i.e. they tend to have low disparity values, which results in a lower valus of RS/True RS when using a local similarity matrix. These empirical results suggest that RS diversity values based on a global disparity matrix provide results that are more in line with expectations. Therefore, insofar as a user has access to a global matrix one is advised to use this instead of the values generated endogenously by our software.

When universities operate in similar markets with the same institutional imperatives, such as tasks specified in national legislation, one might expect them to develop isomorphism (Halffman & Leydesdorff, 2010; Powell & DiMaggio, 1991; Wagner, Bornmann, Cai, & Leydesdorff, in preparation). However, our results indicate that universities do not tend toward isomorphism when it comes to comprehensiveness, as they do with impact. We reason that this is because impact is measured and prioritized in the bureaucratic frameworks of the state, whereas comprehensiveness is influenced by local opportunities, such as emerging technologies in the companies geographically or intellectually nearby. Hence, developing a deeper understanding of institutional comprehensiveness demands consideration of a broader context and more aspects of society, such as missions of specific universities.

Our analysis clarifies further differences between impact and comprehensiveness. Competition for impact pertains to quality, while competition for diversity/specialty pertains to differentiation. For example, shielding intellectual property rights is specific to a university's relations with industry. When it comes to comprehensiveness, the specificity of the knowledge content matters more than the formal criteria of measuring and comparing output and impact. In our opinion, interdisciplinarity, diversity, or comprehensiveness should not be considered another type of impact. While impact can be formalized across units of operation, e.g. faculties, departments, etc., after proper normalization, diversity or comprehensiveness remains content-based.

In other words, the analytical distinction between intellectual and social organization does not mean that the two dimensions can be traded off at the level of a university. On the contrary, one can expect a correlation, whether positive or negative, between the different types of research efforts. However, the differences between the two make it urgent that we develop a set of indicators for measuring diversity comparable to those of impact. By making an application available that allows users to generate the various measures of diversity for any data matrix, we hope to have contributed to this objective of quantifying and measuring diversity.

Finally, we note that although diversity is often used as a proxy for measuring interdisciplinarity, one should not expect any simplistic index to produce an informative outcome on its own (Abramo et al., 2018). The interpretations of the values of indicators should always be addressed according to the context, the purpose, and the specific object under study. The empirical analysis of the 42+ Chinese universities in terms of diversity measures not only relates to interdisciplinarity at the intellectual level, but also reflects comprehensiveness at the institutional level. Although comprehensiveness is not necessarily a goal of universities, it may reflect the status quo of disciplinary diversity within a university (or at least the structural feature of a disciplinary distribution). The measurement results of this study provide a knowledge base for understanding portfolios. A better understanding may provide new windows on potential policies and thus facilitate the development of interdisciplinarity within a university.

eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining