Amend: an integrated platform of retracted papers and concerned papers

The recent surge in retractions (Brainard, 2018; Van Noorden, 2023) has raised significant concerns within the scientific community. Of particular worry is the prevalence of academic misconduct, which accounts for the majority of retractions (Fang et al., 2012). Despite representing a small fraction of the published literature, retractions not only undermine trust in scientific research but also impede its advancement (Cokol et al., 2007; Furman et al., 2012; Hsiao & Schneider, 2022; Sharma, 2021). Many retracted papers continue to be cited frequently, drawing widespread attention from various stakeholders in the research ecosystem (Alam & Wilson, 2023; Gross, 2016; Peng et al., 2022; Tong et al., 2022; Wilson, 2023), including countries, research institutions, professional associations, publishers, journals, and funding agencies. These entities are particularly concerned about retractions linked to themselves, especially those associated with academic misconduct.

While numerous studies have examined the prevalence of retractions over the past two decades (Hesselmann et al., 2017), much of this research has focused on papers indexed in PubMed (Furuse, 2024; Hsiao & Schneider, 2022; Madlock-Brown & Eichmann, 2015; Mongeon & Lariviere, 2016; Sharma et al., 2023), predominantly within specific fields such as radiology (Rosenkrantz, 2016), cardiovascular research (Audisio et al., 2022), urology (Mena et al., 2019), clinical trials (Steen & Hamer, 2014), and cancer (da Silva & Nazarovets, 2023). Only a few studies have taken a broader approach, utilizing data from the Web of Science (Fanelli, 2013; Grieneisen & Zhang, 2012; He, 2013; Lu et al., 2013; Sharma, 2021; Trikalinos et al., 2008). These studies categorize articles into different fields at the journal level, offering a more comprehensive understanding of retraction trends across disciplines (Grieneisen & Zhang, 2012; Li & Shen, 2024). However, existing data often lack updates and do not fully encompass the entire scientific field, particularly in biology.

The classification of retraction reasons remains a topic of debate in previous research (Davis et al., 2007; Gilbert & Denison, 2003; Mousavi & Abdollahi, 2020; Rong et al., 2022; Sharma et al., 2023). While some studies consider plagiarism as an unintentional error (Fang et al., 2012; Steneck, 2006; Steen, 2011), others classify it as intentional misconduct (Mousavi & Abdollahi, 2020; Sharma et al., 2023). Additionally, there is uncertainty regarding the impact of plagiarism on scientific reliability (Steneck, 2006). The lack of clarity in distinguishing between the investigation process and the actual reasons for retraction further complicates this issue. Additionally, differentiating between retraction reasons attributed to academic misconduct and those due to honest errors is not always clear, making it challenging to conduct accurate research on scientific integrity. Therefore, a more precise and comprehensive classification of retraction reasons is essential.

Existing collections of retractions have limitations in fully capturing the entire scientific landscape and may suffer from ambiguous labeling and classification of retraction reasons. Furthermore, crucial information related to research integrity, such as concerns from social media and punishment announcements from administrative agencies, is often not collected or linked to retracted papers. These limitations hinder the ability to conduct thorough research on scientific integrity. Therefore, it is vital to establish a globally comprehensive platform for retracted and concerning articles to address these limitations and facilitate research on scientific integrity. Through this platform, all stakeholders can investigate their respective concerns, fostering collaborative governance of academic misconduct.

In this paper, we introduce “Amend,” a comprehensive platform inspired by Retraction Watch. Amend consolidates concerns and lists of problematic articles from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Additionally, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS).

2

Description of Amend platform

2.1

Data sources and inclusion criteria for retracted papers

Whether an article has been retracted is contingent upon the official publication of the retraction notice by the journal. Generally, the retraction notice is published in the journal where the original article was published. Therefore, it is a challenge to compile a comprehensive list of retracted papers by collecting data from numerous journals. In addition, bibliographic databases may not update article status changes in a timely manner due to varying retrospective policies (Grieneisen & Zhang, 2012). Due to the above reasons, we collect retraction notices from the website of crossref. org, which is the central hub for registering and retrieving metadata of academic publications, and Web of Science, which is a widely used online research database and citation index. The process of compiling the list of retracted papers is as follows:

Retrieve the metadata of related to retracted papers or retraction notices, such as the title, journal, publication date and digital object identifier (DOI), and so on, by utilizing the Crossref API and specifying relevant keywords “retracted”, “withdrawn”, “retraction”, and so on.

Similarly, retrieve metadata related to retracted papers or retraction notices by querying for “TI=retract*” in the Web of Science.

Collect the content of retraction notices and the information, e.g. DOI, of the corresponding original articles from the official website of journals by using the DOI of retraction notices.

Pair the retracted papers and their corresponding retraction notices.

Repeat the above steps at regular intervals.

The key terminology for identifying retracted papers, such as “retraction”, “retracted”, and “withdrawal”, are used in various contexts beyond retractions. For instance, “teeth-retraction” is a term used in orthodontics. Additionally, the term “retracted” can appear in articles studying academic misconduct. Hence, Amend platform exclusively includes paired retracted papers and retraction notices. As of now, Amend has accumulated over 40,000 retracted papers. Investigation and punishment announcements from NSFC, MOE, MOST, NHC are collected and link to the corresponding retracted papers.

2.2

Meta data

In order to characterize the retracted articles from various perspectives, Amend platform has collected multiple attribute information of these articles. The collected information includes the following:

Unique identifier: DOI, which is collected form Crossref or journal website.

Title: Collected from Crossref or journal website.

Authors: Collected from Crossref or journal website

Journal or Publisher: Collected from Crossref or journal website.

Institution or Country: Extracted from the affiliations of a retracted paper.

Year of publication or retraction: Deduced from the year when a retracted paper or retraction notice was published, respectively.

Open access: Gold or non-Gold, which are collected from journal website.

Reasons for retraction: Summarized based on the content of the retraction notices.

Concerning Source Tag: Links to the sources of original concerns and related investigation and punishment announcements.

2.3

Reasons for retraction

The retraction notice not only marks the official retraction of the article, but also provides valuable information for reason analyses. The complete retraction notices have been collected from journal website. The statements in these retraction notices have been consistently analyzed to summarize the reasons for retraction. In case of duplicate publications or plagiarism, cross-checking has been conducted by comparing the titles and authors of the references in the retraction notices. If there are common authors involved in the publication, it may be considered as duplicate publication rather than plagiarism. In addition to the retraction notice, recognizing the reasons for Paper Mill or AIGC also involves referring to the list of suspicious articles released on the websites such as For Better Science, Science Integrity Digest and Problematic Paper Screener.

After carefully reviewing journal retraction notices and consulting the definition of reasons for retraction in previous literature (Davis et al., 2007; Gilbert & Denison, 2003; Mousavi & Abdollahi, 2020; Sharma et al., 2023; Steen, 2011; Steneck, 2006), the reasons for retraction have been identified for each paper. These reasons can be categorized into different levels, each providing more specific details. Firstly, based on whether it was intentional or not, the reasons can be divided into two main categories: academic misconduct and honest errors. Secondly, academic misconduct can be further subdivided into 10 meso-level causes, while honest errors are categorized into two meso-level causes. Lastly, within each meso-level cause, there are multiple specific reasons for retraction. Please see the details below:

Categories of Reasons for Retraction:

1.Academic Misconduct

(a) Fabrication: Intentionally creating or inventing data, results, images or samples that do not actually exist or cannot be obtained through repeated experiments, including: (1) Data Fabrication; (2) Image Fabrication; (3) Sample Fabrication; (4) Result Fabrication. The common terminology used in retraction notices are as follows: data not being reproduced, inventing data, anomalous features of data or image, fabrication (of data, image and result).

(b) Falsification: Intentionally manipulating, misrepresenting, or distorting data, results, or images to misrepresent findings, including: (1) Data Falsification; (2) Image Falsification; (3) Result Falsification. The common terminology used in retraction notices are as follows: manipulation (of data, image or result), falsifying (data, image or result), flipping or rotating picture, omitting data.

(c) Plagiarism: Using ideas, method, data, text, or figures from someone else’s work without proper attribution, including: (1) Text Plagiarism (2) Idea Plagiarism; (3) Data Plagiarism; (4) Method Plagiarism; (5) Image Plagiarism; (6) Article Plagiarism; (7) Unpublished work Plagiarism; (8) Plagiarism via Peer Review. The common terminology used in retraction notices are as follows: plagiarising (text, data, image, idea), neither acknowledged nor credited, unattributed sections, without appropriate authorization or permission.

(d) Paper Mill: Producing and selling fake academic papers to researchers with the following typical features: overlap with a previously-published article by different authors, authorship for sale or ghost writing. The common terminology used in retraction notices are as follows: authorship for sale, carried out by a third party, third party involvement, similarities with (un)published articles from a separate third-party institute.

(e) Artificial Intelligence-Generated Content (AIGC): Generating contents using Artificial intelligence (AI) technologies or translation software. The common terminology used in retraction notices are as follows: tortured phases or nonsensical contents.

(f) Fake Peer-review: Manipulating the peer review process with the following typical features: recommending fictitious peer reviewers, or colluding to submit fake reviews. The common terminology used in retraction notices are as follows: compromising peer review process.

(g) Duplication: Resubmitting an identical paper or partially overlapping content, such as text, figures, data, etc., from one’s own previous publications without acknowledging the original publication or seeking permission, including: (1) Duplicate submission; (2) Salami slicing; (3) Duplication publication; (4) Data Duplication; (5) Image Duplication; (6) Text Duplication. The common terminology used in retraction notices are as follows: duplicate significant parts, identical features in figure, overlap with a published paper, duplication (of image, data or result), the data being not properly accredited, previously published in, highly similar to a previously paper, figure similarities, overlap in method and source code.

(h) Inappropriate Authorship: Excluding substantial contributors, including non-substantial contributors, or listing others without permission as authors, and providing false information about authors, including: (1) Authorship dispute; (2) Forged Authorship; (3) False information. The common terminology used in retraction notices are as follows: authorship list changing, violation in authorship, authorship fraud, author not being affiliated with either of these institutions, names being added to the article without their knowledge.

(i) Ethical Violations: Breaching ethical principles and standards in areas such as clinical trials, animal experimentation, and genetic research and engineering. Some examples of these violations include failure to obtain informed consent, not following ethical guidelines for animal welfare, and neglecting to protect participants’ privacy and confidentiality. (1) Lack of IRB/IACUC Approval; (2) Lack of Informed/Patient Consent; (3) Privacy leak; (4) Lack of trial registration. The common terminology used in retraction notices are as follows: not obtaining the appropriate ethics approval, not matching the research proposal for approval, animal (human) welfare concerns, not accessing to the written informed consent.

(j) Other misconduct: Excluding the above forms of academic misconduct including: (1) Inappropriate citation; (2) Sabotage of Materials/Instruments; (3) Without authorization; (4) Unspecified Academic Misconduct; (5) Conflict of Interest.

2. Honestly Error

(a) Errors by authors: Unintentional mistakes or oversights that occur during the research process. (1) Method Error; (2) Analyses Error; (3) Data Error; (4) Image Error; (5) Materials Error; (6) Conclusion Error; (7) Text Error; (8) Contamination of Materials; (9) Contamination of Reagents; (10) Citing Retracted paper.

(b) Errors by publishers: Mistakenly publishing the same article multi times or publishing or prematurely publishing an article before completing the review process by publisher. The common terminology used in retraction notices are as follows: Publisher error, erroneously published twice, premature publication.

3

Results

In this section, we utilize Amend to analyze the retraction patterns of papers indexed in WoS. Within Amend, a total of 34,615 retracted papers are indexed in the Web of Science Core Collection, spanning the Science Citation Index Expanded (SCI), Social Sciences Citation Index (SSCI), and Emerging Sources Citation Index (ESCI) from 1980 to 2023. Among them, 32,515 retracted papers are assigned a Citation Topic. For brevity, we will refer to this dataset as the Amend dataset throughout the subsequent discussion.

3.1

General characteristics

Table 1 provides an overview of key characteristics within the Amend dataset. It encompasses 32,515 retracted papers, comprising 14,519 under Gold Open Access and 17,996 under non-Gold Open Access, all published and retracted between 1980 and 2023. Of these retracted papers, 26,620 (81.87%) were identified as cases of academic misconduct, including 13,487 Gold Open Access papers and 13,133 non-Gold Open Access papers. Notably, 92.89% of Gold Open Access papers and 72.98% of non-Gold Open Access papers were associated with academic misconduct.

Table 1.

Characteristics of Amend dataset for retreated papers.

	Retraction	Retraction Rate	Misconduct	Misconduct Rate	Ratio
Retraction	32,515	6.64	26,620	5.44	81.87%
Gold	14,519	16.38	13,487	15.21	92.89%
Non-Gold	17,996	4.49	13,133	3.28	72.98%

During the selected period, the retraction rate in the Amend dataset accounted for 6.64 out of 10,000 papers for retracted papers. Furthermore, the retraction rate for misconduct papers in the Amend dataset was 5.44 out of 10,000 papers. For Gold Open Access papers, the retraction rate was 16.38 out of 10,000 papers for retracted papers and 15.21 for misconduct papers, both significantly higher than the overall rate. Conversely, for non-Gold Open Access papers, the retraction rate was 4.49 out of 10,000 papers for retracted papers and 3.28 for misconduct papers, both notably lower than that of Gold Open Access papers.

Of the retractions, 81.87% were associated with various forms of academic misconduct, totaling 35,442 instances across 25,710 retractions related to misconduct, considering a single retracted paper may have multiple reasons. Specifically, 12,413 retracted papers were involved in Fake Peer-review, accounting for 35.02% of all reasons for misconduct, while 6,152 retractions were related to duplication. More details can be found in Table 2.

Table 2.

Frequency and proportion of reasons for academic misconduct retraction

Reason	Frequency	Percent	Reason	Frequency	Percent
Falsification	1,712	4.83%	AIGC	1,176	3.82%
Fabrication	1,568	4.42%	Inappropriate Authorship	1,562	4.41%
Plagiarism	2,648	7.47%	Duplication	6,152	17.36%
Fake Peer-review	12,413	35.02%	Ethical Violations	1,130	3.12%
Paper Mill	2,283	6.44%	Other Misconduct	4,798	13.54%

Furthermore, through the integration of the Amend dataset with WoS, we have identified 143,783 researchers affiliated with 6,126 research institutions spanning 167 countries or regions. The retracted papers within the Amend dataset were published across 4,680 journals by 622 publishers and cover a diverse range of topics, including 10 macro topics, 324 meso topics, and 2,279 micro topics. These retracted papers received funding from 635 different agencies, including prominent organizations such as the National Institutes of Health (NIH) in the USA, the National Science Foundation (NSF), the National Natural Science Foundation of China (NSFC), the Ministry of Education, Culture, Sports, Science and Technology in Japan (MEXT), the Japan Society for the Promotion of Science, UK Research & Innovation (UKRI), the German Research Foundation (DFG), the National Research Foundation of Korea, among others.

3.2

Temporal Trends

Recent years have witnessed a notable increase in the number of retractions, consistent with previous research findings (Bar-Ilan & Halevi, 2018; Van Noorden, 2023). Specifically, the number of retractions in 2023 has exceeded 12,100, primarily attributed to extensive retractions by Hindawi. This figure represents a staggering fifteen-fold increase compared to a decade ago. Among these retractions, over 11,700 were due to academic misconduct, with more than 9,700 being Gold Open Access papers (Figure. 1 (a)).

Additionally, the number of retractions has surged concerning the publication year in recent years. More than 8,300 papers published in 2022 have been retracted, marking an approximate nine-fold increase compared to a decade ago. Among these retractions, over 8,000 were attributed to academic misconduct, with more than 7,500 being Gold Open Access papers (Figure. 1 (b)).

Although retractions still represent a small fraction of all publications (0.066%), the overall retraction rate, which measures the number of retractions relative to the number of newly published journal articles in a given year, has been on the rise. Figure 2 illustrates the annual retraction rate, depicting a rapid escalation indicating retractions are expanding more rapidly than the growth of scientific papers. Notably, the overall retraction rate surpasses 30 per 10,000 papers. It’s worth noting that the retraction rate for gold open access papers exceeds 60, double the overall retraction rate.

To delve deeper into the reasons for retraction, the river chart illustrates the distribution of various reasons for each year (see Figure. 3). Before 2010, academic misconduct was primarily associated with fabrication, falsification, plagiarism, and duplication, typically considered occasional personal behaviors. While there has been a slight increase in recent years, these reasons have generally remained stable. However, the emergence of organized large-scale fraud has introduced new forms of academic misconduct, such as Paper Mill, Fake Peer-review, and AIGC, resulting in a significant number of retractions. For instance, the number of articles retracted due to Fake Peer-review was only 10 in 2010, but by 2022, it had surged to over 7,500. Similarly, the count of papers retracted due to AIGC has risen from 4 in 2010 to over 500 in 2021 (Table 3). Due to delays in retraction processing, the number of retracted papers published in 2023 is relatively small.

Table 3.

The number of retractions attributed to each reason in the year of publication.

Reasons	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022
Falsification	81	91	85	118	101	101	82	90	141	148	128	83	30
Fabrication	68	81	78	81	55	48	61	69	59	54	58	49	27
Plagiarism	125	96	157	160	195	220	194	208	203	191	181	183	88
Fake Peer-review	10	28	51	119	194	162	86	125	221	196	724	2,774	7,592
Paper Mill	1	3	1	8	64	89	75	179	416	594	460	288	98
AIGC	4	1	3	1	7	7	8	21	55	46	289	521	194
Inappropriate Authorship	24	26	34	52	72	132	112	128	104	123	173	230	191
Duplication	231	314	319	354	367	425	396	452	547	614	535	315	138
Ethical Violations	23	19	18	31	29	22	30	45	54	63	74	161	369
Other Misconducts	56	61	88	103	100	100	111	119	135	179	283	567	2,466

4

Conclusion and discussion

Research integrity has become a pressing concern for all stakeholders in the scientific research community. Thus, a globally comprehensive platform of papers may have research integrity issues is indispensable. This article introduces the Amend platform and outlines the information it provides. Through the analysis of 32,515 retracted articles published and retracted between 1980 and 2023, it is revealed that 81.87% of retracted papers are linked to academic misconduct.

Furthermore, the number of retractions has been steadily increasing, aligning with prior research findings. It is notable that the number of retractions in 2023 alone has exceeded 12,100. Additionally, academic misconduct appears to be more prevalent in gold open access papers compared to non-gold open access papers.

In the Amend dataset, reasons for retraction are categorized into different levels based on the retraction notice. Initially, they are divided into two main reasons: academic misconduct and honest error, depending on whether they are intentional or not. Academic misconduct is further divided into 10 categories, while honest error is subdivided into two categories. Each category is then further subdivided into specific causes according to the retraction notice. This classification aids in gaining a comprehensive understanding of the issues surrounding retractions and helps prevent the widespread stigmatization of retractions as academic misconduct.

The Amend database serves as a platform for recording retracted papers with various potential applications. Firstly, it acts as an alert system in the academic community, enabling researchers to identify problematic papers and avoid citing retracted ones. Secondly, the database can be utilized to regulate scientific integrity. It assists management departments in understanding the prevalence of academic misconduct in different institutions and helps funding agencies evaluate the scientific integrity of applicants, ensuring funds are allocated to trustworthy research projects. Additionally, it provides data for research on scientific integrity, such as investigating post- and pre-citation citations (Palla et al., 2023) based on retracted papers in the Amend database on a large scale. The prevalence of academic misconduct can be thoroughly examined across disciplines or research topics (Li & Shen, 2024). Overall, the Amend database contributes to upholding scientific integrity, enhancing research quality, and promoting the healthy development of scientific research.

However, the Amend database has limitations. For instance, it does not cover all retracted papers, affecting the completeness of the dataset. The accuracy of tagged reasons relies on the retraction notices, which may contain errors or inaccurate information. Particularly, tagging reasons such as Paper Mill and AIGC heavily depend on information disclosed by academic communities. The process of tagging reasons for retraction can be challenging, and errors in labeling may stem from the diverse expressions used in retraction notices. Therefore, it is hoped that journals will adopt a standardized format to describe investigation conclusions in retraction notices, disclose more information related to retractions, and clarify the detailed reasons for retraction.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Amend: an integrated platform of retracted papers and concerned papers

Menghui Li

Fuyou Chen

Sichao Tong

Liying Yang

Zhesi Shen

Article Category: Research Papers

Published Online: May 27, 2024

Page range: 41 - 55

Received: Feb 20, 2024

Accepted: Mar 14, 2024

DOI: https://doi.org/10.2478/jdis-2024-0012

KeywordsResearch Integrity, Retraction Rate, Academic Misconduct, Reasons for Retraction, Temporal Trends

© 2024 Menghui Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Research Integrity, Retraction Rate, Academic Misconduct, Reasons for Retraction, Temporal Trends