The recent surge in retractions (Brainard, 2018; Van Noorden, 2023) has raised significant concerns within the scientific community. Of particular worry is the prevalence of academic misconduct, which accounts for the majority of retractions (Fang et al., 2012). Despite representing a small fraction of the published literature, retractions not only undermine trust in scientific research but also impede its advancement (Cokol et al., 2007; Furman et al., 2012; Hsiao & Schneider, 2022; Sharma, 2021). Many retracted papers continue to be cited frequently, drawing widespread attention from various stakeholders in the research ecosystem (Alam & Wilson, 2023; Gross, 2016; Peng et al., 2022; Tong et al., 2022; Wilson, 2023), including countries, research institutions, professional associations, publishers, journals, and funding agencies. These entities are particularly concerned about retractions linked to themselves, especially those associated with academic misconduct.
While numerous studies have examined the prevalence of retractions over the past two decades (Hesselmann et al., 2017), much of this research has focused on papers indexed in PubMed (Furuse, 2024; Hsiao & Schneider, 2022; Madlock-Brown & Eichmann, 2015; Mongeon & Lariviere, 2016; Sharma et al., 2023), predominantly within specific fields such as radiology (Rosenkrantz, 2016), cardiovascular research (Audisio et al., 2022), urology (Mena et al., 2019), clinical trials (Steen & Hamer, 2014), and cancer (da Silva & Nazarovets, 2023). Only a few studies have taken a broader approach, utilizing data from the Web of Science (Fanelli, 2013; Grieneisen & Zhang, 2012; He, 2013; Lu et al., 2013; Sharma, 2021; Trikalinos et al., 2008). These studies categorize articles into different fields at the journal level, offering a more comprehensive understanding of retraction trends across disciplines (Grieneisen & Zhang, 2012; Li & Shen, 2024). However, existing data often lack updates and do not fully encompass the entire scientific field, particularly in biology.
The classification of retraction reasons remains a topic of debate in previous research (Davis et al., 2007; Gilbert & Denison, 2003; Mousavi & Abdollahi, 2020; Rong et al., 2022; Sharma et al., 2023). While some studies consider plagiarism as an unintentional error (Fang et al., 2012; Steneck, 2006; Steen, 2011), others classify it as intentional misconduct (Mousavi & Abdollahi, 2020; Sharma et al., 2023). Additionally, there is uncertainty regarding the impact of plagiarism on scientific reliability (Steneck, 2006). The lack of clarity in distinguishing between the investigation process and the actual reasons for retraction further complicates this issue. Additionally, differentiating between retraction reasons attributed to academic misconduct and those due to honest errors is not always clear, making it challenging to conduct accurate research on scientific integrity. Therefore, a more precise and comprehensive classification of retraction reasons is essential.
Existing collections of retractions have limitations in fully capturing the entire scientific landscape and may suffer from ambiguous labeling and classification of retraction reasons. Furthermore, crucial information related to research integrity, such as concerns from social media and punishment announcements from administrative agencies, is often not collected or linked to retracted papers. These limitations hinder the ability to conduct thorough research on scientific integrity. Therefore, it is vital to establish a globally comprehensive platform for retracted and concerning articles to address these limitations and facilitate research on scientific integrity. Through this platform, all stakeholders can investigate their respective concerns, fostering collaborative governance of academic misconduct.
In this paper, we introduce “Amend,” a comprehensive platform inspired by Retraction Watch. Amend consolidates concerns and lists of problematic articles from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Additionally, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS).
Whether an article has been retracted is contingent upon the official publication of the retraction notice by the journal. Generally, the retraction notice is published in the journal where the original article was published. Therefore, it is a challenge to compile a comprehensive list of retracted papers by collecting data from numerous journals. In addition, bibliographic databases may not update article status changes in a timely manner due to varying retrospective policies (Grieneisen & Zhang, 2012). Due to the above reasons, we collect retraction notices from the website of crossref. org, which is the central hub for registering and retrieving metadata of academic publications, and Web of Science, which is a widely used online research database and citation index. The process of compiling the list of retracted papers is as follows:
Retrieve the metadata of related to retracted papers or retraction notices, such as the title, journal, publication date and digital object identifier (DOI), and so on, by utilizing the Crossref API and specifying relevant keywords “retracted”, “withdrawn”, “retraction”, and so on. Similarly, retrieve metadata related to retracted papers or retraction notices by querying for “TI=retract*” in the Web of Science. Collect the content of retraction notices and the information, e.g. DOI, of the corresponding original articles from the official website of journals by using the DOI of retraction notices. Pair the retracted papers and their corresponding retraction notices. Repeat the above steps at regular intervals.
The key terminology for identifying retracted papers, such as “retraction”, “retracted”, and “withdrawal”, are used in various contexts beyond retractions. For instance, “teeth-retraction” is a term used in orthodontics. Additionally, the term “retracted” can appear in articles studying academic misconduct. Hence, Amend platform exclusively includes paired retracted papers and retraction notices. As of now, Amend has accumulated over 40,000 retracted papers. Investigation and punishment announcements from NSFC, MOE, MOST, NHC are collected and link to the corresponding retracted papers.
In order to characterize the retracted articles from various perspectives, Amend platform has collected multiple attribute information of these articles. The collected information includes the following:
The retraction notice not only marks the official retraction of the article, but also provides valuable information for reason analyses. The complete retraction notices have been collected from journal website. The statements in these retraction notices have been consistently analyzed to summarize the reasons for retraction. In case of duplicate publications or plagiarism, cross-checking has been conducted by comparing the titles and authors of the references in the retraction notices. If there are common authors involved in the publication, it may be considered as duplicate publication rather than plagiarism. In addition to the retraction notice, recognizing the reasons for Paper Mill or AIGC also involves referring to the list of suspicious articles released on the websites such as For Better Science, Science Integrity Digest and Problematic Paper Screener.
After carefully reviewing journal retraction notices and consulting the definition of reasons for retraction in previous literature (Davis et al., 2007; Gilbert & Denison, 2003; Mousavi & Abdollahi, 2020; Sharma et al., 2023; Steen, 2011; Steneck, 2006), the reasons for retraction have been identified for each paper. These reasons can be categorized into different levels, each providing more specific details. Firstly, based on whether it was intentional or not, the reasons can be divided into two main categories: academic misconduct and honest errors. Secondly, academic misconduct can be further subdivided into 10 meso-level causes, while honest errors are categorized into two meso-level causes. Lastly, within each meso-level cause, there are multiple specific reasons for retraction. Please see the details below:
Categories of Reasons for Retraction:
In this section, we utilize Amend to analyze the retraction patterns of papers indexed in WoS. Within Amend, a total of 34,615 retracted papers are indexed in the Web of Science Core Collection, spanning the Science Citation Index Expanded (SCI), Social Sciences Citation Index (SSCI), and Emerging Sources Citation Index (ESCI) from 1980 to 2023. Among them, 32,515 retracted papers are assigned a Citation Topic. For brevity, we will refer to this dataset as the Amend dataset throughout the subsequent discussion.
Table 1 provides an overview of key characteristics within the Amend dataset. It encompasses 32,515 retracted papers, comprising 14,519 under Gold Open Access and 17,996 under non-Gold Open Access, all published and retracted between 1980 and 2023. Of these retracted papers, 26,620 (81.87%) were identified as cases of academic misconduct, including 13,487 Gold Open Access papers and 13,133 non-Gold Open Access papers. Notably, 92.89% of Gold Open Access papers and 72.98% of non-Gold Open Access papers were associated with academic misconduct.
Characteristics of Amend dataset for retreated papers.
Retraction | Retraction Rate | Misconduct | Misconduct Rate | Ratio | |
---|---|---|---|---|---|
Retraction | 32,515 | 6.64 | 26,620 | 5.44 | 81.87% |
Gold | 14,519 | 16.38 | 13,487 | 15.21 | 92.89% |
Non-Gold | 17,996 | 4.49 | 13,133 | 3.28 | 72.98% |
During the selected period, the retraction rate in the Amend dataset accounted for 6.64 out of 10,000 papers for retracted papers. Furthermore, the retraction rate for misconduct papers in the Amend dataset was 5.44 out of 10,000 papers. For Gold Open Access papers, the retraction rate was 16.38 out of 10,000 papers for retracted papers and 15.21 for misconduct papers, both significantly higher than the overall rate. Conversely, for non-Gold Open Access papers, the retraction rate was 4.49 out of 10,000 papers for retracted papers and 3.28 for misconduct papers, both notably lower than that of Gold Open Access papers.
Of the retractions, 81.87% were associated with various forms of academic misconduct, totaling 35,442 instances across 25,710 retractions related to misconduct, considering a single retracted paper may have multiple reasons. Specifically, 12,413 retracted papers were involved in Fake Peer-review, accounting for 35.02% of all reasons for misconduct, while 6,152 retractions were related to duplication. More details can be found in Table 2.
Frequency and proportion of reasons for academic misconduct retraction
Reason | Frequency | Percent | Reason | Frequency | Percent |
---|---|---|---|---|---|
Falsification | 1,712 | 4.83% | AIGC | 1,176 | 3.82% |
Fabrication | 1,568 | 4.42% | Inappropriate Authorship | 1,562 | 4.41% |
Plagiarism | 2,648 | 7.47% | Duplication | 6,152 | 17.36% |
Fake Peer-review | 12,413 | 35.02% | Ethical Violations | 1,130 | 3.12% |
Paper Mill | 2,283 | 6.44% | Other Misconduct | 4,798 | 13.54% |
Furthermore, through the integration of the Amend dataset with WoS, we have identified 143,783 researchers affiliated with 6,126 research institutions spanning 167 countries or regions. The retracted papers within the Amend dataset were published across 4,680 journals by 622 publishers and cover a diverse range of topics, including 10 macro topics, 324 meso topics, and 2,279 micro topics. These retracted papers received funding from 635 different agencies, including prominent organizations such as the National Institutes of Health (NIH) in the USA, the National Science Foundation (NSF), the National Natural Science Foundation of China (NSFC), the Ministry of Education, Culture, Sports, Science and Technology in Japan (MEXT), the Japan Society for the Promotion of Science, UK Research & Innovation (UKRI), the German Research Foundation (DFG), the National Research Foundation of Korea, among others.
Recent years have witnessed a notable increase in the number of retractions, consistent with previous research findings (Bar-Ilan & Halevi, 2018; Van Noorden, 2023). Specifically, the number of retractions in 2023 has exceeded 12,100, primarily attributed to extensive retractions by Hindawi. This figure represents a staggering fifteen-fold increase compared to a decade ago. Among these retractions, over 11,700 were due to academic misconduct, with more than 9,700 being Gold Open Access papers (Figure. 1 (a)).
Additionally, the number of retractions has surged concerning the publication year in recent years. More than 8,300 papers published in 2022 have been retracted, marking an approximate nine-fold increase compared to a decade ago. Among these retractions, over 8,000 were attributed to academic misconduct, with more than 7,500 being Gold Open Access papers (Figure. 1 (b)).
Although retractions still represent a small fraction of all publications (0.066%), the overall retraction rate, which measures the number of retractions relative to the number of newly published journal articles in a given year, has been on the rise. Figure 2 illustrates the annual retraction rate, depicting a rapid escalation indicating retractions are expanding more rapidly than the growth of scientific papers. Notably, the overall retraction rate surpasses 30 per 10,000 papers. It’s worth noting that the retraction rate for gold open access papers exceeds 60, double the overall retraction rate.
To delve deeper into the reasons for retraction, the river chart illustrates the distribution of various reasons for each year (see Figure. 3). Before 2010, academic misconduct was primarily associated with fabrication, falsification, plagiarism, and duplication, typically considered occasional personal behaviors. While there has been a slight increase in recent years, these reasons have generally remained stable. However, the emergence of organized large-scale fraud has introduced new forms of academic misconduct, such as Paper Mill, Fake Peer-review, and AIGC, resulting in a significant number of retractions. For instance, the number of articles retracted due to Fake Peer-review was only 10 in 2010, but by 2022, it had surged to over 7,500. Similarly, the count of papers retracted due to AIGC has risen from 4 in 2010 to over 500 in 2021 (Table 3). Due to delays in retraction processing, the number of retracted papers published in 2023 is relatively small.
The number of retractions attributed to each reason in the year of publication.
Reasons | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Falsification | 81 | 91 | 85 | 118 | 101 | 101 | 82 | 90 | 141 | 148 | 128 | 83 | 30 |
Fabrication | 68 | 81 | 78 | 81 | 55 | 48 | 61 | 69 | 59 | 54 | 58 | 49 | 27 |
Plagiarism | 125 | 96 | 157 | 160 | 195 | 220 | 194 | 208 | 203 | 191 | 181 | 183 | 88 |
Fake Peer-review | 10 | 28 | 51 | 119 | 194 | 162 | 86 | 125 | 221 | 196 | 724 | 2,774 | 7,592 |
Paper Mill | 1 | 3 | 1 | 8 | 64 | 89 | 75 | 179 | 416 | 594 | 460 | 288 | 98 |
AIGC | 4 | 1 | 3 | 1 | 7 | 7 | 8 | 21 | 55 | 46 | 289 | 521 | 194 |
Inappropriate Authorship | 24 | 26 | 34 | 52 | 72 | 132 | 112 | 128 | 104 | 123 | 173 | 230 | 191 |
Duplication | 231 | 314 | 319 | 354 | 367 | 425 | 396 | 452 | 547 | 614 | 535 | 315 | 138 |
Ethical Violations | 23 | 19 | 18 | 31 | 29 | 22 | 30 | 45 | 54 | 63 | 74 | 161 | 369 |
Other Misconducts | 56 | 61 | 88 | 103 | 100 | 100 | 111 | 119 | 135 | 179 | 283 | 567 | 2,466 |
Research integrity has become a pressing concern for all stakeholders in the scientific research community. Thus, a globally comprehensive platform of papers may have research integrity issues is indispensable. This article introduces the Amend platform and outlines the information it provides. Through the analysis of 32,515 retracted articles published and retracted between 1980 and 2023, it is revealed that 81.87% of retracted papers are linked to academic misconduct.
Furthermore, the number of retractions has been steadily increasing, aligning with prior research findings. It is notable that the number of retractions in 2023 alone has exceeded 12,100. Additionally, academic misconduct appears to be more prevalent in gold open access papers compared to non-gold open access papers.
In the Amend dataset, reasons for retraction are categorized into different levels based on the retraction notice. Initially, they are divided into two main reasons: academic misconduct and honest error, depending on whether they are intentional or not. Academic misconduct is further divided into 10 categories, while honest error is subdivided into two categories. Each category is then further subdivided into specific causes according to the retraction notice. This classification aids in gaining a comprehensive understanding of the issues surrounding retractions and helps prevent the widespread stigmatization of retractions as academic misconduct.
The Amend database serves as a platform for recording retracted papers with various potential applications. Firstly, it acts as an alert system in the academic community, enabling researchers to identify problematic papers and avoid citing retracted ones. Secondly, the database can be utilized to regulate scientific integrity. It assists management departments in understanding the prevalence of academic misconduct in different institutions and helps funding agencies evaluate the scientific integrity of applicants, ensuring funds are allocated to trustworthy research projects. Additionally, it provides data for research on scientific integrity, such as investigating post- and pre-citation citations (Palla et al., 2023) based on retracted papers in the Amend database on a large scale. The prevalence of academic misconduct can be thoroughly examined across disciplines or research topics (Li & Shen, 2024). Overall, the Amend database contributes to upholding scientific integrity, enhancing research quality, and promoting the healthy development of scientific research.
However, the Amend database has limitations. For instance, it does not cover all retracted papers, affecting the completeness of the dataset. The accuracy of tagged reasons relies on the retraction notices, which may contain errors or inaccurate information. Particularly, tagging reasons such as Paper Mill and AIGC heavily depend on information disclosed by academic communities. The process of tagging reasons for retraction can be challenging, and errors in labeling may stem from the diverse expressions used in retraction notices. Therefore, it is hoped that journals will adopt a standardized format to describe investigation conclusions in retraction notices, disclose more information related to retractions, and clarify the detailed reasons for retraction.