Open Access

A comparative study on characteristics of retracted publications across different open access levels


Cite

Introduction

Scientific publications are regarded as the cornerstone reflecting the development of the scientific community (Shah et al., 2021). Research integrity is important since the level of trust characterized science and its relationship with society (Olson & Griffiths, 1995). Misconduct and errors in publications will undermine academic development and public trust in science, therefore, retraction of dubious publications represents the fulfilment of social responsibilities (Vuong et al., 2020). Retraction is a vital way of self-purification in the scientific community, which can reduce the negative influence of flawed researches.

Open access (OA) is aimed to promote transparency of results, widen the diffusion of knowledge (European Commission, 2020). The post-publication content scrutiny of OA publications based on a large number of readers may accelerate the detection of misconduct and errors of flawed publications (Shah et al., 2021). The connection between OA and research integrity has been identified. Openness is increasingly recognized as a driver of responsible research practices (Tijdink et al., 2021). It is highlighted that openness in research is more than just access to research, which also brought equality to the research process (Nosek et al., 2018).

However, several challenges in OA need to be addressed. OA journals are often criticized for having high article processing charges (Björk & Solomon, 2015) and lacking transparency of the review process (Butler, 2013; Bohannon, 2013). Moreover, fake science (like the “predatory journals”) that exploited the OA publishing business model is emerging (Shen & Björk, 2015), which dilutes high-quality research (European Commission, 2020). Therefore, despite the advantages, OA also poses some potential threats to research integrity.

Previous studies have analyzed the relationship between OA and research integrity. However, there are few empirical studies on OA retracted publications based on a comprehensive literature dataset, especially on retracted publications across different OA levels. The aim of this study is to compare the characteristics of retracted articles across different OA levels and discover whether OA level influences the characteristics of retracted articles.

Literature review
Study on characteristics of retraction

The trend of retraction and reasons for retraction are the most concerned aspects of previous empirical studies on retracted publications. In terms of the trend of retraction, studies have revealed that the number of retracted publications was booming in the past 20 years (Steen et al., 2013; Vuong et al., 2020; Zhang & Grieneisen, 2013). Specifically, the number of retracted publications started rising in 2006-2010 (Bar-Ilan & Halevi, 2018; He, 2013; Shuai et al., 2017), fluctuated during 2010-2012, and kept climbing rapidly in 2016-2021 (Sharma, 2021). The increase of retraction was owing to not only the rising misconduct and errors in research, but also that researchers and editors were more skilled at identifying flawed publications (Fanelli, 2013). Also, the reasons for retraction expanded over time, causing more post-publication content scrutiny of articles (Steen et al., 2013).

The reasons for retraction, which are closely related to the misbehavior of scientists, have attracted wide attention. Plagiarism and data falsification were the most common reasons for retraction in obstetrics and gynecology (Chambers et al., 2019) and biomedical researches in India (Elango, 2021). For biomedical studies in China, the common reasons for retraction were plagiarism, errors, self-plagiarism, and fake peer review (Chen et al., 2018). Most retractions in Iran were due to fake peer review and plagiarism (Ghorbi et al., 2021). Fake peer review was observed to be the most common reason for retraction by a large-scale research which studied over 18,000 retracted articles covering 127 research fields (Vuong et al., 2020). The reasons for retraction would significantly affect retraction time lag of articles. Falsification and errors usually took a longer time for the post-publication content scrutiny than plagiarism due to the difficulty of detection (Dal-Ré, 2019; Trikalinos, 2008). Publications with the issues of falsification were found to take the longest time to be retracted among all retraction reasons (Elango, 2021). In this study, we compared the trends and reasons for retraction among articles with different OA levels, and tried to explain their difference in retraction time lag from the perspective of reasons for retraction.

Study on OA retracted publications

There are only a few studies focused on retracted publications from the perspective of OA. As for the comparison of OA and non-OA retracted publications, Peterson (2013) found that OA literature did not differ from non-OA literature in impact factor, detection of errors, or change in post-retraction citation rates. The dataset of only biomedicine would limit its application to other disciplines. Shah et al. (2021) reported that the retraction rate for OA articles was 62% higher than non-OA articles, and non-OA publications were retracted earlier. The reasons for retractions have not been investigated, and number of OA types have been simplified to only two, which might miss distinctive types of OA.

Generally, most studies only discussed the characteristics of OA retracted publications themselves, without comparison to non-OA retracted publications. Moreover, the characteristics, like reasons for retraction and retraction time lag, which was closely related to research integrity, have rarely been discussed. Besides, studies were mostly focused on specific disciplines, like the biomedical field (Elango, 2021; Freedman & Inglese, 2014; Stojanovski, 2015; Wang et al., 2019), obstetrics and gynecology (Chambers et al., 2019). Few researches examined characteristics of different OA levels based on a comprehensive dataset covering all disciplines. This study will study deeper into characteristics of retracted publications in different OA levels from multiple perspectives, including trends of retraction, reasons of retraction and retraction time lag.

Research questions

Attention is a key predictor of retraction (Furman et al., 2012), while OA can increase the attention of article (Vadhera et al., 2022; Wang et al., 2015). With free online accessibility, OA articles might have quicker exposure to more readers, then the post-publication content scrutiny of more readers can cause earlier detection of fraudulent publications (Foo, 2011; Shah et al., 2021). This study attempts to further explore the characteristics of retracted publications across different OA levels, to find out whether OA level makes a difference in retraction of scientific publications. Specifically, our research questions are as follows:

What are the characteristics of retracted publications across different OA levels from the perspectives of trends and reasons for retraction?

Do scientific publications with higher OA levels have higher retraction rates?

Do scientific publications with higher OA levels get retracted faster?

Data and methodology
Data collection

This study collected information of retracted publications from two comprehensive databases, namely, Web of Science (WoS) and Retraction Watch. The Science Citation Index Expanded (SCIE) and Social Sciences Citation Index (SSCI) of WoS Core Collection were selected, thus mitigating the influence of low-quality OA journals (like “predatory journals”). Retraction Watch database assembles the information regarding retracted publications identified by several databases and contains abstracts dating back to the 1970s. It is the largest and most visible database of retracted publications now (Brainard & You, 2018; Oransky, 2018; Retraction Watch, 2018). By collecting and checking publications from these two databases, we ensured the richness of data and the credibility of research results.

This study adopted the same search strategy as our previous study (Zhang et al., 2020). The strategy mainly contains three steps: (i) searching retracted publications and retraction notices in SCIE and SSCI databases in WoS between 2001 and 2020; (ii) searching the corresponding reasons for retraction in Retraction Watch database; (iii) combining records of retracted publications, retraction notices and reasons for retraction by matching them with titles. The publication year, author, title, journal, total and annual citations, OA level, retraction year, full texts of retraction notice, detailed and classified reason for retraction, and other information related to retraction were extracted. Finally, 6,005 retracted publications were obtained for further analysis.

Classification of OA level

In this study, we selected two different types of OA, namely Gold OA and Green OA, and used the results of non-OA as a reference for these two OA categories. The primary difference between Gold OA and Green OA is explained in Chan et al. (2002), which recommends two complementary strategies for achieving open access to scholarly journal literature. The first strategy involves self-archiving, where scholars deposit their articles in open electronic archives, now known as Green OA. The second strategy pertains to open-access journals, where scholars publish their articles in fully open access journals, referred to today as Gold OA.

(1) High OA level – Gold OA: A freely accessible final version of an article, including articles published in journals listed in the Directory of Open Access Journals (DOAJ), and articles identified as having a Creative Commons license by ImpactStory’s Unpaywall Database but were not in journals listed on the DOAJ (WoS, 2021). In this study, we marked “Gold” and “Gold Hybrid” publications as Gold OA group. If a publication was tagged both in Gold OA and Green OA in WoS, we marked the publication as Gold OA, because the OA level of Gold OA is higher than Green OA.

(2) low OA level – Green OA: A freely accessible version of an article located in an institutional or discipline-based OA repository. We classified “Green accepted” and “Green published” publications as Green OA group.

(3) non-OA: Only subscribers could access the full text of the article. We marked publications without the “Open Access” tag in WoS as non-OA group.

Among all publications, there were 1,454 (24%) Gold OA retracted publications, Green OA took 329 (6%) publications, and 4,222 (70%) publications were published as non-OA at the lowest OA level.

Reasons for retraction

The reasons for retraction classified by Retraction Watch database could be grouped into eight broad categories. The classification is same with our previous studies (Zhang et al., 2020), which is illustrated in Table 1.

Category of reasons for retraction.

Major reason Original reasons identified by Retraction Watch database
Error and concern Error in image/data/text/results and/or conclusions/methods/materials (general)/ cell lines/tissues/analyses; Concerns/issues about image/data/results/referencing/ attributions; Contamination of reagents/materials (general)/cell lines/tissues; Unreliable data/image/results; Results not reproducible
Plagiarism Plagiarism of text/image/data/article; Euphemisms for plagiarism
Self-plagiarism Self-plagiarism of text/image/data/article; Euphemisms for self-plagiarism; Salami slicing
Falsification and manipulation Falsification/fabrication of results/image/data; Manipulation of results/images; Hoax publication; Paper mill; Fake peer review; Sabotage of materials/methods
Authorship issues Forged authorship; Concerns/Issues about authorship
Ethical issues Legal reasons/legal threats; Civil/criminal proceedings; Ethical violations; Lack of ethical approval; informed/patient consent-none/retracted; Infringement of patient privacy; Lack of balance/bias issues; Conflict of interest; Copyright claims
Others Other reasons for retraction
Not available/ lack of information
Indicators
Retraction rate

The retraction rate of articles in one OA type refers to the ratio of the number of retracted publications to the total number of publications of that OA type, i.e. Retraction rate=nretraction/npublication. We chose the retraction rate, combined with the number of retracted publications, to describe the trends of retraction and reasons for retraction of retracted publications across different OA levels. The value of retraction rate is related to the proportion of potential flawed articles. To examine its impact on the results, we investigated the proportion of retracted articles across various Journal Impact Factor Quartiles in WoS (see Figure A1 in the Appendix 1). Our analysis revealed that no specific OA type exhibits a disproportionately high percentage of journals in low quartiles. This suggests that the comparison of retraction rates will not be significantly influenced by the proportion of potentially flawed articles within each OA type across different quartiles.

Retraction time lag

The retraction speed is another vital indicator, which has usually been measured by retraction time lag. Retraction time lag refers to the interval between the publication time and retraction time of an article, which characterized how long it takes for the scientific community to detect and retract flawed publications.

We used the publication year of the article as the publication time, and the publication year of the retraction notice as the retraction time. The difference between them was the retraction time lag. Based on the retraction time lag, we drew the survival rate curve of retracted publications, to reveal the retraction speed of each OA level. We further examined the retraction time lag of the different OA levels in each reason for retraction, in order to find out how and why retraction time lag varied from different OA levels.

Results
Retraction rate of OA level
Trends of retraction

The trends of the number and retraction rate of retracted publications across different OA levels are shown in Figure 1. The number of OA journals have increased quickly from 80 to 4,672 during 2001 to 2020. Similarly, the number of OA retracted publications increased from 2001 to 2014 (Figure 1a). Non-OA had the largest number of retracted publications among different OA levels. Gold OA took second place, which surged from 2009, and reached its peak in 2014. The number of Green OA retracted publications was the least, with relatively gentle fluctuation during the investigated period.

Figure 1.

Trends of number of retracted publications (a) and retraction rate of different OA levels (b). The horizontal axis is the publication year.

Figure 1b showed that the gap of retraction rate among different OA levels was small before 2010. The retraction rate of Gold OA was significantly higher than that of Green OA and non-OA after 2010, but it saw a substantial decline starting in 2015, approaching the levels of Green OA and non-OA by 2020. The retraction rates of Green OA and non-OA stayed low in the recent 20 years, and the trends of their curves were similar. The turning point occurring after 2014 in the figure could primarily be attributed to the time lag in the retraction. The publisher or journal requires time to thoroughly examine, investigate, and provide responses before reaching a final decision on inquiries related to a flawed publication. We calculated the average retraction rate of different OA levels (Figure 2). The overall retraction rate is 1.90%00, which is lower than that of Gold OA and Green OA but higher than that of non-OA. The retraction rate of Gold OA was higher than Green OA, and that of Green OA was higher than non-OA, which was consistent with a previous study that the retraction rate of OA articles was higher than non-OA articles (Shah et al., 2021).

Figure 2.

Average retraction rate of different OA levels. Note: *p = 0.05; **p = 0.01; ***p=0.001; the same below.

Generally, the gap between the number of OA and non-OA publications narrowed year by year (Figure 1a). The number of retracted publications of Gold OA showed a quickly increasing trend after 2010, which implied that articles of high OA level were active on retraction.

Reasons for retraction

The six categories of main reasons for retraction were ranked by descending order of frequency, including error and concern, self-plagiarism, ethical issues, falsification and manipulation, plagiarism, and authorship issues. The proportion of reasons for retraction among different types of OA articles can be seen in Table 2. The gap of proportion of reasons for retraction between Gold OA and non-OA was small, while that of Green OA was quite different from the other two. The proportions of falsification and manipulation, error and concern of Green OA, were much higher than those of Gold OA and non-OA, while Green OA had much lower shares in plagiarism and authorship issues.

The proportion and retraction time lag of different reasons for retraction.

Reasons for retraction % of reasons of retraction
Gold OA Green OA non-OA Total Retraction time lag (year)
Self-plagiarism 25% 21% 23% 23% 3.75
Falsification and manipulation 16% 25% 19% 18% 3.67
Ethical issues 8% 6% 8% 8% 2.92
Error and concern 36% 42% 32% 34% 2.90
Authorship issues 5% 1% 6% 5% 2.44
Plagiarism 10% 5% 13% 12% 2.40

Figure 3 shows the retraction rate of reasons for retraction across different OA levels. It could be seen that the retraction rate of Gold OA was the highest among all reasons for retraction. The retraction rate of non-OA was the lowest in error and concern, self-plagiarism, falsification and manipulation, and higher than Green OA in plagiarism and authorship issues. Green OA has a similar retraction rate in ethical issues with non-OA.

Figure 3.

Retraction rate of reasons for retraction across different OA levels.

Except the error and concern of reasons for retraction, Green OA had the highest proportion in falsification and manipulation, while it has unexpectedly the lowest proportion in plagiarism and authorship issues. The reason could be partially explained by the “Selection Bias” hypothesis (Craig et al., 2007) of the author. Green OA articles are uploaded by authors themselves into the OA repository, and authors tend to avoid uploading the publications with easily-detected issues, like plagiarism.

Therefore, the proportion of Green OA with these issues was relatively low. On the contrary, authors will take risks to upload the publications with difficult-detected issues to enhance their scientific impact, like falsification and manipulation, which is considered to be harder to be examined (Dal-Ré, 2019; Gerber, 2006; Trikalinos, 2008). This performance may be one of the reasons for the higher proportion of these issues in Green OA.

Retraction time lag of OA level
The retraction time lag of different OA level

Figure 4a shows the average retraction time lag of publications of different OA levels. The overall retraction time lag is 2.95 years, exceeding that of Gold OA, yet remaining lower than those observed for both non-OA and Green OA. The retraction time lag of Gold OA was the shortest, followed by non-OA, while that of Green OA was the longest.

Figure 4.

Retraction time lag (a) and survival rate curve (b) of different OA levels.

We drew the survival rate curve of retracted publications across different OA levels (Figure 4b), so as to reveal the retraction speed of each OA level from a more detailed perspective. The horizontal axis was the retraction time lag, and the vertical axis was the percentage of problematic publications that were still “alive” (not yet been retracted) in that year. Among all retracted publications in Green OA, 56% of the publications were still “alive” in the first three years after publication, while the survival rate of non-OA was about 41%, and that of Gold OA was only 36% of Gold OA publications were “alive” in the first three years after publication. To ensure the robustness of our findings, the study selected retracted articles from Q1 journals across different OA types. The results indicate that the conclusions regarding their retraction time lags are consistent with the overall results (see Figure A2 in the Appendix 2).

It is obvious that Gold OA publications were retracted faster than Green OA and non-OA. The result verifies the added value of Gold OA publications of being more forthcoming about errors when they are detected (Peterson, 2013; Shah et al., 2021). Specifically, Gold OA literature has an advantage of greater post-publication content scrutiny, for they are subject to the scrutiny of more readers, so potential unethical behavior is easier to be identified (Fox & Beall, 2014; Lin & McPhee, 2007). The retraction time lag of Green OA was much longer than that of non-OA, which seems unusual. Therefore, we made the analysis of retraction time lag of reasons for retraction, trying to explore why Green OA has the longest retraction time lag.

The reasons why Green OA has the longest retraction time lag

This study reveals that the abnormal proportion of reasons for retractions in Green OA may constitute a potential explanation for the longest retraction time lag observed within Green OA. Table 2 presents the proportion and retraction time lag of reasons for retraction across articles of different OA levels. It was obvious that different reasons for retraction caused different retraction time lags. Green OA articles had higher proportion in falsification and manipulation, and error and concern, and lower proportion in plagiarism and authorship issues. Publications with the issues of falsification and error usually takes longer time to be retracted than plagiarized publications due to the difficulty of detection, which requires more time to have the post-publication content scrutiny (Dal-Ré, 2019; Gerber, 2006; Trikalinos, 2008). The retraction time lags of plagiarism and authorship issues were comparatively shorter, mainly because these issues were easier to be proposed and identified.

Furthermore, we drew the cumulative survival rate curve of each reason for retraction in Figure 5, in order to shed light on the detailed survival characteristics in reasons for retraction of different OA levels. The survival rates of Green OA decreased the slowest in most of the reasons for retraction, followed by non-OA. The survival rate of Gold OA was the fastest, which corresponded to the result of retraction time lag.

Figure 5.

Survival rate curve of different reasons for retraction.

Therefore, in Green OA publications, the reasons for retraction with longer retraction time lag accounted for the highest shares, while the reasons with shorter retraction time lag took the lowest shares. This fact could be partly responsible for the longest retraction time lag of Green OA.

Discussion and conclusion

After collecting the data of retracted publications from Web of Science and Retraction Watch database, this study compared the differences in the characteristics of retracted publications across different OA levels, and drew the following conclusions.

(1) The retraction rate of Gold OA was much higher than that of Green OA and non-OA. The higher OA level tended to have a higher retraction rate. The number of non-OA retracted articles accounted for the largest proportion of total retracted publications during the past 20 years, followed by Gold OA and Green OA. The number of Gold OA and non-OA retracted articles both increased during 2001-2014, that of Gold OA have gradually narrowed the gap with non-OA, especially in the most recent years. The number of Green OA retracted publications remained stable in the past 20 years.

(2) The reasons for retraction ranked by descending order of frequency were: error and concern, self-plagiarism, ethical issues, falsification and manipulation, plagiarism, and authorship issues. The proportion of retraction reasons between Gold OA and non-OA was similar, but the retraction rate of Gold OA was much higher than that of non-OA in all reasons. In terms of Green OA, the proportion of falsification and manipulation was higher than the other two OA types, which had an influence on its long retraction time lag.

(3) The retraction time lag of Gold OA was the shortest, and that of Green OA was the longest, rather than non-OA as expected. The reason for the long retraction time lag of Green OA could be partly explained by the abnormal proportion of reasons for retraction.

Generally, high OA level (Gold OA) has the highest retraction rate and highest retraction speed among articles of all different OA levels, low OA level (Green OA) has the second retraction rate, non-OA has the lowest retraction rate. However, this did not necessarily mean that higher OA level has better retraction effect, because lots of factors could affect the retraction rate and retraction speed of one OA type, like the proportion of potential flawed articles, the attention that articles received, the difficulty of problem detection. There are two main potential reasons which may explain why articles of higher OA level has higher retraction rate: (1) Articles of higher OA level could attract more readers, and thus cause problems in articles easier to be identified; (2) There are more potential problems (or more easily-detected problems) in articles of higher OA level, which leads to their higher retraction rate. Therefore, whether high OA level could enhance the effectiveness of retraction need to be further discussed.

More activities can be recommended and supported to promote the OA movement from the perspective of research integrity in the scientific community. Peer reviewers, editors, readers, and publishers should join together to promote detection of OA problematic publications. Editors and peer reviewers need to pay more attention to the reliability of images and data in OA publications. Although OA journals should be promoted in scholarly communications (Shah et al., 2021), there was wide variation in quality control among OA journals now (Erfanmanesh & Teixeira, 2019). The use of duplication checking technology in article submission systems could be recommended in OA journals (Liu & Lei, 2021), so that publications with image or data issues could be detected and corrected at the review stage. Readers could use PubPeer or other online platforms to help build an early warning system that made the scientific community aware of the problems in publications (Haunschild & Bornmann, 2021).

We have analyzed the retraction rate and time lag of retracted articles across different OA types. Moreover, the citation impact serves as a distinguishing feature among retracted articles within different OA categories. An exploration of citation patterns both before and after retraction across diverse OA types can illuminate the relationship between openness and the changes in post-retraction citations. This analysis could lead to a better understanding of how open access affects retractions in different ways. Besides, it is also important to consider the impact of other factors on the retraction rate and retraction speed in future studies, such as the discipline and the publication date. Each discipline garners varying levels of attention, those with higher scrutiny are more likely to have issues in articles detected by readers, potentially leading to higher retraction rate and quicker retraction processes (Yeo-Teh & Tang, 2022). Articles published earlier might face less rigorous review from readers and editors, resulting in lower retraction rate and speed. Conversely, articles published more recently may also exhibit lower retraction rate and speed since it takes time for issues to be identified and articles to be retracted. Therefore, articles with neither too early nor too recent publication dates may have relatively higher retraction rate and retraction speed (Fang et al., 2012; Richard, 2011). Future research that controls for these variables when exploring the impact of OA on retraction could lead to more precise and insightful conclusions.

It should be noted that this study has several limitations. (1) OA level in this study is limited to Gold OA, Green OA, and non-OA. There is no exploration of a wider range of OA levels, such as Hybrid OA and Bronze OA. (2) This study classifies the OA level of all retracted publications as its highest OA level. For the literature published in both high OA level and low OA level, we underestimate low OA level and overlook the multiplicity of the OA levels of publications to a certain extent.

Despite these limitations, our bibliometric analysis of OA level of retracted publications provides a broader overview of the relationship between Open Access and research integrity, and may serve as a reference for readers, researchers, and policymakers who paid attention to the development of OA publishing model, along with the problem of research integrity in OA literature.

eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining