Open Science (OS) is one of the hottest topics in the scientific world. According to UNESCO, OS allows scientific information, data, and outputs to be more widely accessible (Open Access, OA) and more reliably harnessed (Open Data) with the active engagement of all the stakeholders (Open to Society). Since open sources in computer software began the open movement, along with Open Data (OD) and Open Peer Review (OPR), OA has jointly pushed forward and benefited OS and attracted the most attention.
Open access means that accessing, downloading, and reading scientific literature is free to the entire population of Internet users (Craig, et al., 2007).
There are two routes to open access: open-access journals and e-print repositories (i.e. pre-prints or post-prints) (Antelman, 2004). By tracking and predicting growth areas (Small, 2006), OA possesses advantages in scholarly communication (Wang et al., 2015) and free online availability that substantially increases the impact of research (Lawrence, 2001). Researchers can gain OA advantage through open practices, including more citations, media attention, potential collaborators, job opportunities, and funding opportunities (McKiernan et al., 2016).
Researchers have conducted many studies on the OA advantage. Some studies have shown that authors who made their works open accessed received more downloads (Davis, 2011) or citations (Antelman, 2004; Eysenbach, 2006; Lawrence, 2001; McKiernan et al., 2016; Norris, Oppenheim, & Rowland, 2008; Wang et al., 2015) than those authors whose articles remained behind a subscription wall. A comparison of OA and TA (toll access) of 4,633 articles in ecology, applied mathematics, sociology, and economics reveals that 2,280 (49%) OA articles achieved a mean citation count of 9.04 whereas the mean for TA articles was 5.76 (Eysenbach, 2006). But some researchers also argue that there is no OA citation advantage. The observed citation differences between OA and TA articles are mainly caused by open access postulate, a selection bias postulate and early view postulate (Craig et al., 2007). Davis (2011) states that OA articles receive more downloads from a broader group of readers, however, receive no more citations than TA articles. A study of working papers in economics shows that the impact of working papers is relatively low and provides no evidence of an OA advantage (Frandsen, 2009). OA advantages might be topic-dependent; that is, authors tend to select citation-attractive topics, especially for OA outlets that are more likely to attract citations (Sotudeh, 2019). However, we do not really know whether OA benefits hot topics or TA cools cold topics.
Therefore, we designed indicators of hot-degree and R-index to verify the OA-TA relations quantitatively, following mentioned studies (Craig et al., 2007; Eysenbach, 2006; Lawrence, 2001; Norris et al., 2008; Small, 2006; Sotudeh, 2019; Wang et al., 2015). Then, our unforeseen results show that OA or TA does not affect the normal spread of real scientific discoveries while disseminating scientific discoveries varies with OA or TA.
Indicators are designed and the clustering methods are applied as follows.
For measuring hot topics (Banks, 2006; Ye, 2013), we can follow the definition of original h-index to define a hot-degree of a topic (including subjects, keywords, and so on) in sciences, as follows.
Simply, H of a topic means that there are H publications at most, each with H citations at least, on the topic. In WoS, we can easily search H because it is just the h-index of a topic.
For comparing the OA and TA, we can introduce the ratio of hot-degree as a concise indicator
To clarify the topic-related properties of published papers, we used classical Latent Dirichlet Allocation (LDA, c.f. Blei, Ng, & Jordan, 2003) to cluster the topics of Nature and Science. We divided the Nature dataset into three parts according to its disciplines, namely biomedicine, physics, and others. We performed the same operation on the Science dataset. Taking the Nature biomedicine dataset as an example, we show the data processing flowchart in Figure 1.
The algorithm includes the following steps.
Divided Nature biomedicine dataset into two parts, OA and TA. Based on the title and abstract fields, we used the LDA model to extract 100 topics of the OA and TA datasets, which can be represented by Each paper in OA or TA datasets consists of a set of topics with a probability greater than 0.01. For example, one paper in OA datasets can be denoted by So, for each topic For topics in We suppose that during the period of 2010–2019, the research topics of OA and TA are similar, so the topics can be matched. In this paper, we used the Pearson correlation coefficient to match the topics of OA and TA. So the OA-TA topic pair can be represented by
Divided Nature biomedicine dataset into two parts, OA and TA.
Based on the title and abstract fields, we used the LDA model to extract 100 topics of the OA and TA datasets, which can be represented by
Each paper in OA or TA datasets consists of a set of topics with a probability greater than 0.01. For example, one paper in OA datasets can be denoted by
So, for each topic
For topics in
We suppose that during the period of 2010–2019, the research topics of OA and TA are similar, so the topics can be matched. In this paper, we used the Pearson correlation coefficient to match the topics of OA and TA. So the OA-TA topic pair can be represented by
The OA topic
We collected data from the Web of Science (WoS) by searching and downloading both OA and TA articles, letters, and reviews published in
As more and more academic journals provide the choice of open access, the OA status of one paper is also provided across the WoS platform, which can be one of the following choices: legal Gold, Bronze (free contents at a publisher's website) and Green(e.g. author self-archived as a repository) OA versions. In our research, papers with any of the above open access statuses will be classified into OA dataset. Otherwise, they will be classified into TA dataset. Since WoS does not provide the disciplinary classification for
Research dataset of
When summing both OA and TA items together, we found that 56% of the papers belong to biomedicine, 19% physics, and 25% others in
Furthermore, we used classical LDA to quantitatively cluster all papers (which mean average scientific discoveries) into 100 topics and compared them with the qualitative dataset, as well as 2011–2019 Nature remarkable papers and 2010–2019 Science breakthroughs (which mean important scientific discoveries) qualitatively. For comparative analysis, we selected three topics to manifest in both
We strengthen the results and discussion as quantitative results and qualitative discussion.
Using designed indicators – hot-degree and R-index, we concluded the representative results as shown in Table 2, including biomedical, physical, and other records as well as
Representative results of hot-degree and R-index (Median values) for publications in
Obviously, there exist disciplinary differences. Annual disciplinary distributions of biomedicine and physics suggest distribution curves of total and median citations in
As OA > TA means open advantages and TA > OA indicates toll advantages, we cannot say that OA is always advantageous over TA in any field and under any condition. According to Figure 2, median citations show OA advantages in both biomedicine and physics. In the view of total citations, the discipline difference between biology and physics is slightly obvious. The total citation of biomedical OA papers is greater than that of TA papers, but in physics total citation of TA papers is at a disadvantage compared to that of OA ones.
After clustering the publications in both magazines using LDA, where hot topics are clustered by LDA algorithm in total papers of Nature and Science and 100 different topics are extracted in 10 years, we obtained the distributions of hot-degree, which shows biomedical and physical topics with OA or TA respectively in Figure 3.
Figure 3 shows that different topics have different distributions with different hot-degree. The distribution of OA hot-degree in biomedicine is significantly greater than that of TA, while physics shows the opposite. The distribution of hot-degree of OA is significantly less than TA.
Meanwhile, we conclude R-index distribution on LDA topics as Figure 4. The pie chart in each distribution plot depicts the topic proportions of
In Figure 4, obviously, OA advantages do exist in biomedicine. However, TA advantages occupy most portion in physics.
Therefore, we see that the way a paper is published in
Quantum computing & quantum entanglement and CRISPR/CAS9 are two hot sub-fields of physics and biology, respectively, which can be used for comparison. The former has a total of 3 papers in the qualitative dataset, so we selected the three most important papers of CRISPR/Cas9. Jennifer Doudna and Emmanuelle Charpentier firstly demonstrated that the CRISPR/Cas9 system could be used to edit genes accurately in 2012 (Jinek et al., 2012). Zhang Feng's team proved the ease of programming and wide applicability of RNA-guided nuclease technology based on human and mouse cells (Cong et al., 2013). George Church engineered the type II bacterial CRISPR system to function with custom guide RNA (gRNA) in human cells. These are pioneering papers in CRISPR/CAS9 (Mali et al., 2013). Since there are 17 papers (7 OA and 10 TA papers, access at
From the representative qualified papers, we found that CRISPR/Cas9 papers were more likely to go down the OA route, quantum computing papers the TA route, and astronomy likes both OA and TA. These papers were all published in 2010–2019
In Figure 5, the upper (a) part shows trends of citations to CRISPR/Cas9 and quantum computing & quantum entanglement papers, and the lower (b) part shows the median citation trends of the astronomy papers in OA and TA. In Figure 5 (a), the three curves with higher citations are CRISPR/CAS9, and the three curves with lower citations are quantum computing & quantum entanglement. CRISPR/Cas9 papers prefer the OA route and quantum topic the TA route, which is consistent with the above quantitative analysis on the disciplinary differences of biomedicine and physics. Zooming on the topic of quantum computing & quantum entanglement, we further found that there were both OA and TA papers. Although one OA paper (the green curve) was published in 2019, it obtained a higher number of citations compared with the other two TA papers. In Figure 4 (b), the median citation curves of OA always keep above the TA-median curve, which might indicate an OA advantage in astronomy. However, whether the publication is Nature or Science, the median citation of Physics TA is higher than that of OA in Table 1. This result indicates that OA can promote the spread of important scientific discoveries more than TA.
These findings reveal that disseminating important scientific discoveries may follow either the open or close route, but scientific breakthroughs are never decided accordingly. OA or TA may affect the citations to papers, leading to either OA or TA advantages, where OA papers may receive high citations while TA may not, which merely suggests that OA promotes the spread of scientific discoveries. However, real scientific discoveries have become known not because of the channel through which the papers about these discoveries are published.
More and more academic journals became hybrid journals, where the author(s) could choose OA or TA publishing route. Therefore, the OA and TA records in database WoS became a research example in this article. As we ignored the effect of other open information sources such as arXiv and bioArxiv, we may partly lose the numbers of citations, which could cause limitations. Another flaw concerns that
We conclude that OA or TA does not necessarily affect the academic dissemination of average scientific discoveries, but OA promotes the spread of important scientific discoveries. A real scientific discovery is spreading not because of its taking the OA or TA route but its significance that is a paramount factor helping its dissemination. OA or TA is only the disseminating way, and the key is scientific finding in published contents. Scientific discovery itself exceeds its disseminating way, and OA advantages never generally exist. We hope that this research may stimulate further explorations.
Research dataset of Nature and Science.
Representative results of hot-degree and R-index (Median values) for publications in Nature and Science (2010–2019).