There are huge differences in mission, emphasis, inherent capability, and targeted utilization of research among scientific institutions. Hence, when it comes to assessments, a one-size-fits-all approach cannot meet the goal(s) of these assessments. Probably even larger differences exist between individuals, research teams and departments.
It is up to the research community to come up with objective, sound, reliable, easy to use, easy to understand, scalable, and sustainable methodologies, techniques, and tools for all types of scientific assessments, considering the reality of data availability, quality, and computability. Meeting these needs requires more than just changing to another set of indicators. A better understanding of what are the contributions and impacts for each of different types of research is necessary. Multiple data sources and computational methods may be needed, not just as individual tools but often coherently integrated to reveal pertinent, insightful, and, perhaps, even non-expected results. Moreover, tools for interactive analysis even by non-specialist decision-makers may be called for to support using the combined power of human intuition/experience (peers) and data mining & computational analytics.
Besides persons, other science related entities are also assessed such as journals, research programs and research infrastructure. Here we provide some examples to illustrate the many contexts in which differentiated assessments are expected.
Entities of evaluation consisting of persons: researchers, research teams, research institutions. There may be differentiating evaluations for researchers or institutions at different career or development stages. Typical reasons for this type of evaluations include promotion and funding. Universities and scientific institutions may have a legal obligation to perform regular assessments of their research work.
Types of research or performance: basic or applied research in the natural sciences, basic or applied research in the social sciences and humanities, contributing to interdisciplinary research, advancing clinical medicine, patenting, technique & product development, policy research, social engineering, programming, statistical analyses.
Level of achievement: creating a new field, leading the field, parallel front runners, following but closing in, following with a distance, losing track, …
Organization and environment of research: research facilities; research programs & initiatives; research climate, including the existence of offices and policies against scientific misconduct, gender and minority bias or harassment, agreeable advisor-advisee relations.
Publication outlets as units of assessment: journals, textbooks, handbooks, …
Funding agencies themselves.
In the next section we provide some more details about some of these features without any attempt at completeness.
Evaluating research teams should include their composition in terms of size, gender, nationality, sectorial composition, i.e. mixed such as in company-university collaborations or uni-sectoral, and age. Is there a clear team leader (not just on paper) who is accepted and supported by the whole team? How is team (and individual) authorship counted (Sivertsen et al., 2019)?
Concerning journal evaluation we mention Wouters et al. (2019) who call for an expansion of journal indicators to cover all functions of scholarly journals. In their call they explicitly mention: registering, curating, evaluating (peer review; issuing corrections if necessary), disseminating and archiving. Evaluating submissions should include a balanced use of reviewers (in terms of gender, geographic distribution, specialty). Indicators should be such that they are very impractical to manipulate. They should, moreover, be validated through empirical testing. Wouters et al. (2019) further write that
In recent years, especially in the context of performance-based funding systems, good progress has been made to evaluate the social sciences, arts and humanities on an equal footing as the natural sciences, engineering and medicine, see e.g. (Sivertsen, 2018; Engels & Guns, 2018).
Although less frequently done, also funding agencies are evaluated in terms of the success of their programs. An early example of a comparison between journal articles published by a selected group of grantees (351 in total) and the general literature on schistosomiasis can be found in (Pao & Goffman, 1990). These colleagues found that this small core of sponsored researchers (those sponsored by the
This special section of the
Noyons (2019) proposes the ABC method to characterize journals. Here ABC stands for area-based connectedness to society. In this approach he captures signals connecting research output to society. For journal indicators he implements the following dimensions and corresponding signals: news (papers being mentioned in news items); policy (papers being mentioned in policy documents); industry R&D (industry authorship); technological or commercial application (papers cited in patents) and local scope (papers in local languages, including English language journals with a local interest).
Vanlee and Ysebaert (2019) provide a concrete example of how the quality of artistic research output may be evaluated. Obviously, established evaluation models originating from academia (here understood as non-arts) are not suitable. The authors emphasize the importance of allowing an assessment culture to emerge from practitioners themselves, instead of imposing ill-suited methods borrowed from established scientific evaluation models.
Chang and Liu (2019) illustrate a university evaluation system, by taking ShanghaiTech University as an example. ShanghaiTech is a recently (2013) established research university jointly by the Shanghai Municipal Government and the Chinese Academy of Sciences (CAS). It is purposely organized as a small-scale, internationalized, and first-class research institute aiming at solving globally advanced and difficult scientific challenges. Its research performance should manifest itself not in numbers of papers, average number of citations, or even its h-index, but in competitiveness, breakthroughs, breakaways, and power of leading. So the common ranking schemes do not serve its mission. For this reason the authors, working with the university administration, designed and tested a new benchmarking scheme based on competitiveness and research subject distributions of ShanghaiTech compared to a selective group of first international universities. At the moment this scheme relies on publications of research-oriented departments and is accepted as a regular service for the university.
Finally, Fu, and Li (2019) discuss the evaluation practices of the Centers for Excellence of the Chinese Academy of Sciences (CAS). CAS has been developing its more than 100 research institutions in 4 different categories: Centers of Excellence aiming for internationally first-class basic research; Institutes of Innovation striving for technology breakthroughs with global and national significance; Institutes for Specialized Research, focusing on special, more applied, or locally-oriented areas; and finally, scientific facilities that support the whole research community. Apparently, these categories cannot be assessed by a one-size-fits-all evaluation scheme. The authors focus on the assessment design and practices for the Center of Excellence relying on evaluation panels consisting of local, i.e. Chinese, and international experts.