Acceso abierto

Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science


Cite

Introduction with brief review

Open Science (OS) is one of the hottest topics in the scientific world. According to UNESCO, OS allows scientific information, data, and outputs to be more widely accessible (Open Access, OA) and more reliably harnessed (Open Data) with the active engagement of all the stakeholders (Open to Society). Since open sources in computer software began the open movement, along with Open Data (OD) and Open Peer Review (OPR), OA has jointly pushed forward and benefited OS and attracted the most attention.

Open access means that accessing, downloading, and reading scientific literature is free to the entire population of Internet users (Craig, et al., 2007).

There are two routes to open access: open-access journals and e-print repositories (i.e. pre-prints or post-prints) (Antelman, 2004). By tracking and predicting growth areas (Small, 2006), OA possesses advantages in scholarly communication (Wang et al., 2015) and free online availability that substantially increases the impact of research (Lawrence, 2001). Researchers can gain OA advantage through open practices, including more citations, media attention, potential collaborators, job opportunities, and funding opportunities (McKiernan et al., 2016).

Researchers have conducted many studies on the OA advantage. Some studies have shown that authors who made their works open accessed received more downloads (Davis, 2011) or citations (Antelman, 2004; Eysenbach, 2006; Lawrence, 2001; McKiernan et al., 2016; Norris, Oppenheim, & Rowland, 2008; Wang et al., 2015) than those authors whose articles remained behind a subscription wall. A comparison of OA and TA (toll access) of 4,633 articles in ecology, applied mathematics, sociology, and economics reveals that 2,280 (49%) OA articles achieved a mean citation count of 9.04 whereas the mean for TA articles was 5.76 (Eysenbach, 2006). But some researchers also argue that there is no OA citation advantage. The observed citation differences between OA and TA articles are mainly caused by open access postulate, a selection bias postulate and early view postulate (Craig et al., 2007). Davis (2011) states that OA articles receive more downloads from a broader group of readers, however, receive no more citations than TA articles. A study of working papers in economics shows that the impact of working papers is relatively low and provides no evidence of an OA advantage (Frandsen, 2009). OA advantages might be topic-dependent; that is, authors tend to select citation-attractive topics, especially for OA outlets that are more likely to attract citations (Sotudeh, 2019). However, we do not really know whether OA benefits hot topics or TA cools cold topics.

Therefore, we designed indicators of hot-degree and R-index to verify the OA-TA relations quantitatively, following mentioned studies (Craig et al., 2007; Eysenbach, 2006; Lawrence, 2001; Norris et al., 2008; Small, 2006; Sotudeh, 2019; Wang et al., 2015). Then, our unforeseen results show that OA or TA does not affect the normal spread of real scientific discoveries while disseminating scientific discoveries varies with OA or TA.

Methodology

Indicators are designed and the clustering methods are applied as follows.

Indicators

For measuring hot topics (Banks, 2006; Ye, 2013), we can follow the definition of original h-index to define a hot-degree of a topic (including subjects, keywords, and so on) in sciences, as follows.

The hot-degree of a topic is equal to H, if H is the largest number of publications each with citations at least equal to H on the study of the topic.

Simply, H of a topic means that there are H publications at most, each with H citations at least, on the topic. In WoS, we can easily search H because it is just the h-index of a topic.

For comparing the OA and TA, we can introduce the ratio of hot-degree as a concise indicator R=HOAHTA R = {{{H_{OA}}} \over {{H_{TA}}}} where HOA is the hot-degree of OA and HTA is that of TA, for the same topic. When R > 1, i.e. OA > TA, it means open advantages. When R < 1, i.e. TA > OA, it indicates toll advantages.

Clustering

To clarify the topic-related properties of published papers, we used classical Latent Dirichlet Allocation (LDA, c.f. Blei, Ng, & Jordan, 2003) to cluster the topics of Nature and Science. We divided the Nature dataset into three parts according to its disciplines, namely biomedicine, physics, and others. We performed the same operation on the Science dataset. Taking the Nature biomedicine dataset as an example, we show the data processing flowchart in Figure 1.

Figure 1

The flowchart of data processing (Nature biomedicine case).

The algorithm includes the following steps.

Divided Nature biomedicine dataset into two parts, OA and TA.

Based on the title and abstract fields, we used the LDA model to extract 100 topics of the OA and TA datasets, which can be represented by TopicOA and TopicTA TopicOA={to1,,toi,,to100},TopicTA={tt1,,ttj,,tt100} Topi{c_{OA}} = \left\{ {{t_{o1}}, \ldots ,{t_{oi}}, \ldots ,{t_{o100}}} \right\},\,Topi{c_{TA}} = \left\{ {{t_{t1}}, \ldots ,{t_{tj}}, \ldots ,{t_{t100}}} \right\}

Each paper in OA or TA datasets consists of a set of topics with a probability greater than 0.01. For example, one paper in OA datasets can be denoted by poi, the topic of paper poi can be retrieved from the LDA model after training from OA dataset.  top(poi)={topic1,,topici,topicm},wheretopiciTopicOA top\left( {{p_{oi}}} \right) = \left\{ {topi{c_1}, \ldots ,topi{c_i},topi{c_m}} \right\},\,\,where\,\,topi{c_i} \in \,Topi{c_{OA}}

So, for each topic toi in TopicOA, it belongs to a set of papers in OA dataset, which means Paper(toi)={p1,,pi,,pm},wherepiOA Paper\left( {{t_{oi}}} \right) = \left\{ {{p_1}, \ldots ,{p_i}, \ldots ,{p_m}} \right\},\,\,where\,\,{p_i} \in OA

For topics in TopicOA and TopicTA, we can calculate the hot-degree of the topic based on the citations of the paper which belong to the topic, noted HOA, HTA respectively. HOA={ho1,,hoi,,ho100},HTA={ht1,,htj,,ht100} {H_{OA}} = \left\{ {{h_{o1}}, \ldots ,{h_{oi}}, \ldots ,{h_{o100}}} \right\},\,{H_{TA}} = \left\{ {{h_{t1}}, \ldots ,{h_{tj}}, \ldots ,{h_{t100}}} \right\} where hoi represents the hot-degree of ith topic of TopicOA, and htj represents the hot-degree of jth topic of TopicTA.

We suppose that during the period of 2010–2019, the research topics of OA and TA are similar, so the topics can be matched. In this paper, we used the Pearson correlation coefficient to match the topics of OA and TA. So the OA-TA topic pair can be represented by Pair{OATA}={(toi,ttj),1i100,1j100} Pai{r_{\left\{ {OA - TA} \right\}}} = \left\{ {\left( {{t_{oi}},{t_{tj}}} \right),1 \le i \le 100,\,1 \le j\, \le 100} \right\} where toiTopicOA, ttjTopicTA.

The OA topic toi and the TA topic ttj represent the same topic, so the R-index can be calculated from the PairOA – TA.

Data description

We collected data from the Web of Science (WoS) by searching and downloading both OA and TA articles, letters, and reviews published in Nature and Science for the period of 2010–2019. In our research, we only chose papers published in Nature or Science. The sub journals of Nature or Science like Nature Communications, Scientific Reports were not taken into account. As the most prominent comprehensive scientific magazines, both Nature and Science receive significant recognition among colleagues and may open the way to greater media visibility. Science and Nature publish only 10 percent of received papers, thus creating what has been called a situation of “artificial scarcity” (a paradox in the age of digital information). The data from both magazines are comparable, according to the OA classification of the WoS.

As more and more academic journals provide the choice of open access, the OA status of one paper is also provided across the WoS platform, which can be one of the following choices: legal Gold, Bronze (free contents at a publisher's website) and Green(e.g. author self-archived as a repository) OA versions. In our research, papers with any of the above open access statuses will be classified into OA dataset. Otherwise, they will be classified into TA dataset. Since WoS does not provide the disciplinary classification for Nature and Science, we collected the disciplinary data from www.nature.com and www.sciencemag.org directly and classified the data into three broad disciplines, which are biomedicine, physics, and others. Table 1 shows the characteristics of the dataset. As the data was distributed unequally, we calculated median values rather than average values.

Research dataset of Nature and Science.

Statistical data Nature Science


OA TA OA TA
Biomedical Papers 4,214 2,227 2,943 1,572
Median Citations 138 101 104 114
Physical Papers 444 1,690 1,006 1,487
Median Citations 81 85 78 119
Other Papers 2,419 510 674 2,770
Median Citations 1 7.5 76 1
Total Papers 7,077 4,427 4,623 5,829
Median Citations 64 87 95 58

When summing both OA and TA items together, we found that 56% of the papers belong to biomedicine, 19% physics, and 25% others in Nature; while 43% of the papers are in biomedicine, 24% physics, and 33% others in Science. That is, publications in Nature and Science are skewed to biomedicine.

Furthermore, we used classical LDA to quantitatively cluster all papers (which mean average scientific discoveries) into 100 topics and compared them with the qualitative dataset, as well as 2011–2019 Nature remarkable papers and 2010–2019 Science breakthroughs (which mean important scientific discoveries) qualitatively. For comparative analysis, we selected three topics to manifest in both Nature and Science, CRISPR/Cas9, quantum computing & quantum entanglement, and astrophysics. CRISPR/Cas9 and quantum computing & entanglement are hot topics in biology and physics, respectively, and astronomy belongs to a frontier topic of physics. CRISPR/Cas9 was selected as one of Science breakthroughs in 2013 (Genetic Microsurgery for the Masses, 2013), 2015 (Travis, 2015), and 2017 (Stokstad et al., 2017) and among Nature remarkable papers in 2019 (Robots, hominins, and superconductors: 10 remarkable papers from 2019, 2019). Quantum computing is a hot topic of 2019 (Pennisi et al., 2019) Science breakthroughs and 2016 (2016 Editors’ choice, 2016) Nature remarkable paper, and quantum entanglement is a 2015 (Runners-up, 2015) Science breakthrough and a 2015 (2015 Editors’ choice, 2015) Nature remarkable paper. There are 5 Nature remarkable papers and 9 Science breakthroughs in astronomy, mainly involving celestial mechanics, astrometry, and astrophysics.

Results and discussion

We strengthen the results and discussion as quantitative results and qualitative discussion.

Quantitative results

Using designed indicators – hot-degree and R-index, we concluded the representative results as shown in Table 2, including biomedical, physical, and other records as well as p-value of OA-TA related t-test, where p < 0.05 indicates significant effectiveness.

Representative results of hot-degree and R-index (Median values) for publications in Nature and Science (2010–2019).

Indicators Nature Science


OA TA p-value OA TA p-value
Biomedical hot-degree 183.5 119 3.7e-38 103 15.5 6.6e-61
R-index 1.57 7.3
Physical hot-degree 5 26 1.6e-23 11 19 6.9e-11
R-index 0.20 0.63
Other hot-degree 2 2 0.0006 5 6 0.049
R-index 0.39 0.83
Total hot-degree 188 123.5 2.5e-26 160 128 2e-20
R-index 1.55 1.27

Except for Science Other category, we can see a significant difference between the hot-degree of OA and TA at the level of p < 0.01 (high significant effectiveness) in the related t-test. The hot-degree of OA and TA in different disciplines show differences. For example, the hot-degree of OA in biomedical is greater than that of TA and the hot-degree of OA in physical is less than TA. This conclusion can be drawn from both Nature and Science datasets. For other disciplines, the difference of hot-degree between OA and TA is significant in the Nature dataset and not significant in the Science dataset, but the hot-degree of OA is less than TA in both Nature and Science datasets. The combined dataset shows that there is a significant difference between the hot-degree of OA and TA, and the hot-degree of OA is greater than TA.

Obviously, there exist disciplinary differences. Annual disciplinary distributions of biomedicine and physics suggest distribution curves of total and median citations in Nature and Science as Figure 2.

Figure 2

Total and median citations in biomedical and physical fields: (a) Biomedical; (b) Physical.

As OA > TA means open advantages and TA > OA indicates toll advantages, we cannot say that OA is always advantageous over TA in any field and under any condition. According to Figure 2, median citations show OA advantages in both biomedicine and physics. In the view of total citations, the discipline difference between biology and physics is slightly obvious. The total citation of biomedical OA papers is greater than that of TA papers, but in physics total citation of TA papers is at a disadvantage compared to that of OA ones.

After clustering the publications in both magazines using LDA, where hot topics are clustered by LDA algorithm in total papers of Nature and Science and 100 different topics are extracted in 10 years, we obtained the distributions of hot-degree, which shows biomedical and physical topics with OA or TA respectively in Figure 3.

Figure 3

Hot-degree distribution of different topics in biomedical and physical fields: (a) Biomedical; (b) Physical.

Figure 3 shows that different topics have different distributions with different hot-degree. The distribution of OA hot-degree in biomedicine is significantly greater than that of TA, while physics shows the opposite. The distribution of hot-degree of OA is significantly less than TA.

Meanwhile, we conclude R-index distribution on LDA topics as Figure 4. The pie chart in each distribution plot depicts the topic proportions of r > 1, r = 1 or r < 1 in different datasets.

Figure 4

R-index distribution on LDA topics.

In Figure 4, obviously, OA advantages do exist in biomedicine. However, TA advantages occupy most portion in physics.

Therefore, we see that the way a paper is published in Science and Nature, OA or TA, does not necessarily affect the dissemination of average scientific discoveries in all fields. The above analysis also indicates that OA may benefit and promote hot topics in biomedicine. Therefore, we selected typical qualitative evidence for further discussion.

Qualitative discussion

Quantum computing & quantum entanglement and CRISPR/CAS9 are two hot sub-fields of physics and biology, respectively, which can be used for comparison. The former has a total of 3 papers in the qualitative dataset, so we selected the three most important papers of CRISPR/Cas9. Jennifer Doudna and Emmanuelle Charpentier firstly demonstrated that the CRISPR/Cas9 system could be used to edit genes accurately in 2012 (Jinek et al., 2012). Zhang Feng's team proved the ease of programming and wide applicability of RNA-guided nuclease technology based on human and mouse cells (Cong et al., 2013). George Church engineered the type II bacterial CRISPR system to function with custom guide RNA (gRNA) in human cells. These are pioneering papers in CRISPR/CAS9 (Mali et al., 2013). Since there are 17 papers (7 OA and 10 TA papers, access at https://www.nature.com and https://www.sciencemag.org) of astronomy covering several branches, they are used separately to further compare OA and TA.

From the representative qualified papers, we found that CRISPR/Cas9 papers were more likely to go down the OA route, quantum computing papers the TA route, and astronomy likes both OA and TA. These papers were all published in 2010–2019 Nature or Science, of which 3 CRISPR/CAS9 papers were published in Science, 3 Quantum papers were published in Nature, and Astronomy had 9 Nature papers and 8 Science papers, respectively, with representative results shown in Figure 5.

Figure 5

Citation curves of qualified papers: (a) CRISPR/Cas9 and quantum computing & quantum entanglement; (b) Astronomy.

In Figure 5, the upper (a) part shows trends of citations to CRISPR/Cas9 and quantum computing & quantum entanglement papers, and the lower (b) part shows the median citation trends of the astronomy papers in OA and TA. In Figure 5 (a), the three curves with higher citations are CRISPR/CAS9, and the three curves with lower citations are quantum computing & quantum entanglement. CRISPR/Cas9 papers prefer the OA route and quantum topic the TA route, which is consistent with the above quantitative analysis on the disciplinary differences of biomedicine and physics. Zooming on the topic of quantum computing & quantum entanglement, we further found that there were both OA and TA papers. Although one OA paper (the green curve) was published in 2019, it obtained a higher number of citations compared with the other two TA papers. In Figure 4 (b), the median citation curves of OA always keep above the TA-median curve, which might indicate an OA advantage in astronomy. However, whether the publication is Nature or Science, the median citation of Physics TA is higher than that of OA in Table 1. This result indicates that OA can promote the spread of important scientific discoveries more than TA.

These findings reveal that disseminating important scientific discoveries may follow either the open or close route, but scientific breakthroughs are never decided accordingly. OA or TA may affect the citations to papers, leading to either OA or TA advantages, where OA papers may receive high citations while TA may not, which merely suggests that OA promotes the spread of scientific discoveries. However, real scientific discoveries have become known not because of the channel through which the papers about these discoveries are published.

More and more academic journals became hybrid journals, where the author(s) could choose OA or TA publishing route. Therefore, the OA and TA records in database WoS became a research example in this article. As we ignored the effect of other open information sources such as arXiv and bioArxiv, we may partly lose the numbers of citations, which could cause limitations. Another flaw concerns that Nature truly employs a number of strong measures for access-promoting subscription-based articles. For example, since December 2, 2014, all research papers from Nature have been made free to read in a proprietary screen-view format that can be annotated but not copied, printed, or downloaded (Nature, 2014), so that the boundary between OA and TA articles may become fuzzy.

Conclusion

We conclude that OA or TA does not necessarily affect the academic dissemination of average scientific discoveries, but OA promotes the spread of important scientific discoveries. A real scientific discovery is spreading not because of its taking the OA or TA route but its significance that is a paramount factor helping its dissemination. OA or TA is only the disseminating way, and the key is scientific finding in published contents. Scientific discovery itself exceeds its disseminating way, and OA advantages never generally exist. We hope that this research may stimulate further explorations.

eISSN:
2543-683X
Idioma:
Inglés
Calendario de la edición:
4 veces al año
Temas de la revista:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining