Accesso libero

Can Crossref Citations Replace Web of Science for Research Evaluation? The Share of Open Citations

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction

The adoption of the Digital Object Identifiers (DOIs, see the DOI Handbook) by publishers of scholarly works is advancing. DOIs are persistent identifiers with a resolution service and a set of metadata about the referenced resources. Scholarly publishing DOI registration is almost exclusively operated by the Crossref DOI registration agency (Crossref). An important part of the metadata that is deposited with Crossref is the list of references, which can be aggregated as the network of citation links between scholarly works. The COCI project (OpenCitations, 2018) makes openly available the citation links from Crossref that are marked as open. This presents an open alternative to commercial citation databases such as Web of Science (WoS, by Clarivate Analytics) which only offer citation data limited by restrictive and fee-based licenses.

The ISSI Open Citations Letter (ISSI, 2017) calls for citation metadata to become openly available for scientometrics, both for research in the field and for its applications that support science policy and research evaluation, the latter having a large impact on the scientific community. The lack of transparency and reproducibility implied by the vendor paywalls around citation data inhibit sound practices in the field of scientometrics. Crossref, the only named candidate organization in the open letter, appears to be the best positioned for fulfilling the role of an open citation infrastructure, as it (1) is existing and operational, (2) already contains a sizeable proportion of the required metadata, and (3) makes its metadata openly available.

The proportion of open citations in Crossref is increasing. More than half of the citations in Crossref were classified as open (Shotton, 2017). Van Eck et al. (2018) show that while 77.1% of citations in the Web of Science (WoS) are present in Crossref, only 39.7% are classified as open. Efforts towards open scientometric data sources, documented by events such as the workshop reported on by Fraumann and Van Eck (2019), promise the advent of “open scientometrics” where citation data need not be sourced from commercial providers. The prerequisite for that is that Crossref covers and openly provides a sufficiently large part of citations from the WoS, today's de-facto standard citation database for most fields of science. We study whether this prerequisite is satisfied in the context of the Czech Technical University in Prague (CTU), Czech Republic i.e. we investigate the level of coverage of the WoS citation database by the openly available citation links from the COCI project (OpenCitations, 2018) on the sample where the cited publications are those we track in our institution's Current Research Information System (CRIS). We provide a breakdown to individual faculties, fields and where possible, also subfields in two different discipline classifications: the OECD Fields of Research and Development classification and the Czech national discipline classification.

The Czech Technical University is the largest technical university in the country (and the oldest one as well, established in 1707) and is comparable to many technical universities in Central Europe. We expect our results to be relevant to other institutions of similar profiles in the region.

This article extends the work presented at the ISSI 2019 conference (Chudlarský & Dvořák, 2019).

Data sources and method

The Czech Technical University in Prague has a long tradition of running an in-house built institutional CRIS. The CRIS integrates our records and those harvested from the WoS web service interface, including the citations of our authors’ works. This is one of the many integrations of the CRIS, for a detailed description see Dvořák, Chudlarský, and Špaček (2019).

We limit ourselves to publications from the period 2013–2017 which have both (1) a WoS accession number with a valid record in WoS, and (2) a DOI that is registered in Crossref. For checking the second condition we consult the DOIBoost dataset described in (La Bruzzo, Manghi, & Mannocci, 2019) or perform an API call to Crossref. We exclude those publications that have differing DOI values in the CRIS itself and in the WoS record. This gives the sample of 12,796 publications for which we look up the citations in both the WoS and Crossref: the citing and the cited publication are both present in both WoS and Crossref.

The November 2018 release of the Crossref Open Citations corpus (OpenCitations, 2018) was used. The “cited” side of the linking relationships is of very diverse quality. Some multiline values need to be straightened up. Some values seem to contain several DOIs concatenated, separated by spaces. To rectify these most severe errors we developed a script; its application made the data load possible and even slightly raised the number of citations to 449,843,367 (by 2,864 from the original 449,840,503). However, removing duplicate DOI pairs from the dataset leaves only 445,827,638 unique citation links (by 4,015,729 less). Some of the cited “DOIs” are still unsatisfactory: they contain internal spaces or illegal characters, end in an extra full stop, have superfluous parts in their contents or are incomplete. There clearly is room for further investigation and improvements which we are undertaking in a different thread of activity and plan to report on separately. Data quality problems on the side of Crossref citations clearly have a lowering effect on the recall of our study.

Findings

We found that 53.7% of WoS are present in the COCI dump of the open citation network.

This is significantly more than the approximate 40% coverage measured by Van Eck et al. (2018) for four out of five broad main fields (in the CWTS Leiden Ranking classification). Note that the remaining main field of Social Sciences and Humanities is marginal in our sample, given the research profile of a technical university.

We found important differences in the coverage among faculties (ranging from 63% down to 28%) – see Figure 1 and the supporting Table 1.

Figure 1

Coverage of WoS citations in COCI by CTU unit. COCI_WOS_RATIO denotes the proportion of Web of Science citations that are found in Crossref as open citations.

Coverage of WoS citations in COCI by the unit of the Czech Technical University.

Faculty or University InstituteWoS publicationsWoS citationsOf which in COCICoverage
Institute of Experimental and Applied Physics1,12224,34815,22562.5%
Faculty of Nuclear Sciences and Physical Engineering4,22554,47032,39859.5%
Faculty of Transportation Sciences56715,8309,32958.9%
Faculty of Mechanical Engineering1,77826,11414,99957.4%
Czech Technical University (whole)12,79690,67548,70753.7%
Faculty of Electrical Engineering3,95916,7267,76846.4%
Faculty of Biomedical Engineering4782,05095046.3%
Czech Institute of Informatics, Robotics and Cybernetics21945919141.6%
Faculty of Civil Engineering1,7277,1312,53935.6%
University Centre of Energy Efficient Buildings1142327231.0%
Klokner Institute1262557830.6%
Faculty of Information Technology34765418528.3%

Also, the coverage significantly differs among disciplines (ranging from 78% to 25%)—see Figure 2 and the supporting Table 2. Only the disciplines with more than one hundred publications are listed. The field of Physical sciences is the most populous one and lends itself to a useful subdivision; the subfields of Astronomy (at 78% coverage) on one side and Optics (with 35%) on the other side illustrate the variance even within the single field. The second most populous field of “Electrical engineering, Electronic engineering, Information engineering” is dominated by Electronic engineering in the context of the Czech Technical University, so no useful subdivision is possible there.

Figure 2

Coverage of WoS citations in COCI by discipline (the OECD FORD classification). COCI_WOS_RATIO denotes the proportion of Web of Science citations that are found in Crossref as open citations. The constant column Physical Sciences represents the average value for the equally named FORD field.

Coverage of WoS citations in COCI by discipline (the OECD FORD classification).

Field ( / Subfield)WoS publicationsWoS citationsOf which in COCICoverage
- Physical sciences / Astronomy (including astrophysics, space science)1171,02880378.1%
- Physical sciences / Fluids and plasma physics (including surface physics)5212,4441,55263.5%
- Physical sciences / Particles and field physics1,42635,83822,32062.3%
Physical sciences (whole)4,30757,87735,15260.7%
- Physical sciences / Nuclear physics86812,6047,58560.2%
- Physical sciences / Other7883,8102,18757.4%
Biological sciences11499154555.0%
Czech Technical University (whole)12,79690,67548,70753.7%
Clinical medicine13165231648.5%
Chemical sciences2001,08352448.4%
Earth and related environmental sciences2521,46871148.4%
Electrical engineering, Electronic engineering,2,83410,5234,95147.0%
Information engineering Mathematics8202,30394240.9%
- Physical sciences / Optics (including laser optics and quantum optics)5902,25378935.0%
Computer and information sciences1,0003,0971,07134.6%
Materials engineering7454,1841,40433.6%
Mechanical engineering5421,56251633.0%
Environmental engineering22361720032.4%
Civil engineering9422,55574029.0%
Medical engineering1031773821.5%

Table 3 lists information similar to Table 2 aggregated in the original Czech national discipline classification. Similar fields in both classifications have very similar levels of coverage, e.g. Astronomy, Particle physics, Nuclear physics, Optics, Mathematics, Electrical and electronic engineering, and Civil engineering. The discipline classification system that is used does not to affect the end result too much.

Coverage of WoS citations in COCI by discipline (the original Czech national discipline classification).

DisciplineWoS publicationsWoS citationsOf which in COCICoverage
Astronomy, Celestial Mechanics, Astrophysics1141,02580378.3%
Plasma and Gas Discharge Physics3761,9861,38969.9%
Theoretical Physics3751,9571,35369.1%
Elementary Particles and High Energy Physics1,39835,79222,30862.3%
Nuclear, Atomic and Molecular Physics, Colliders93412,7207,63560.0%
Czech Technical University (whole)12,79690,67548,70753.7%
Nuclear & Quantum Chemistry10146324152.1%
Sensors, Measurement, Regulation3771,13957250.2%
Computer Applications, Robotics5303,8071,86849.1%
Solid Matter Physics & Magnetism2941,46667045.7%
Electronics & Optoelectronics, Electrical Engineering1,3853,1491,41645.0%
Computer Hardware & Software6362,6111,16844.7%
General Mathematics7301,99380240.2%
Other Materials15297735536.3%
Fluid Dynamics16148917235.2%
Optics, Masers, Lasers5842,24778735.0%
Control Systems Theory3241,52852834.6%
Informatics, Computer Science5771,36246233.9%
Non-nuclear Energetics, Energy Consumption & Use21253517532.7%
Composite Materials2812,00064132.0%
Civil Engineering6331,76152129.6%
Building Engineering25668219528.6%
Metallurgy16665818828.6%
Nuclear Energetics1192476225.1%
Discussion & conclusion

The significant difference of our results from those of Van Eck et al. (2018) may be caused by the specific discipline profile of our institution and by the specific publisher choice patterns of our authors, and also the fact that the 5-year window of our sample (2013–2017) is one year later than that of the referenced work (2012–2016). These differences all deserve further research in the future.

The open citations network in Crossref is not yet ready to replace the Web of Science citations. The observed levels of coverage of citations are not yet sufficient for Crossref to be used as the source for citation analyses in research evaluation at the university and/or faculty levels. Note also that while scholarly publications without a DOI are increasingly rare, they still exist.

eISSN:
2543-683X
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining