Plagiarism is an important issue that profoundly compromises the dissemination and development of science. Plagiarism in medical sciences is frequent and the characteristics of plagiarism in papers from different medical fields vary (Kwee et al., 2022; Dal-Ré & Ayuso, 2019). It is necessary to reveal the variations between different fields and provide recommendations for the supervision of medical academic misconduct.
Previous studies have revealed the situation around plagiarism in papers from different medical fields; most were focused on specific medical fields, for example, Baskaran et al. (2019) analyzed plagiarism in andrology papers from the perspective of trends and differences in similarity indices between review and research articles. Shafer (2016) analyzed plagiarism in papers published in
In this study, we analyzed 2,469 representative Chinese journal papers, aiming to reveal the situation around text duplication and plagiarism in papers from different medical fields. We compared the differences in the trends among the papers in similarity indices and sections containing duplication. Analysis of differences between the similarity indices in research and review articles according to different fields was also performed.
In total, 2,469 representative Chinese medical journal papers submitted by researchers in 2020 and 2021 were included in our study. We received detailed information (institute, author, journal, publication date, etc.) about these papers, as well as the manuscripts. These papers were classified into four different groups according to the corresponding fields of research: basic medicine, health management, pharmacology and pharmacy, and public health and preventive medicine. Basic medicine studies involve the structure of the human body and how it operates, as well as causes of disease; immunology, pathogenic biology, pathology, and pathophysiology are its main focuses (Drews, 1999; Pandey, 2010). Health service management or health management science is a cross-disciplinary field of research including medicine, social science, and management science. It applies the theory and methods of social and management science to reveal the social, cultural, economic, and other impacts of population health. Its main focus is on public health policy, health law, hospital management, and medical ethics (Bener & Mazroei, 2010; Ozcan & Smith, 1998). Pharmacology and pharmacy deals with the interactions between drugs and animals or humans, particularly, the mechanisms of drug action as well as the therapeutic and other uses of drugs (Reidenberg, 1991; Shcherbakova & Desselle, 2020). Public health and preventive medicine deal with groups or populations rather than individuals, aiming to reduce the risk of disease transmission. It mainly focuses on epidemics, hygienic toxicology, nutrition, and food hygiene (Brenner & Siu, 2009; Foege, 1994).
The general similarity index was generated using the Academic Misconduct Literature Check System (AMLC,
Plagiarism check systems are irreplaceable because of their comprehensive comparison of academic resources. However, the original general similarity index must be further corrected due to invalid duplication in the funding, addresses, references, etc., which are not excluded (Manley, 2023; Zhang et al., 2014). We therefore generated a corrected general similarity index based on the original AMLC index, according to the following steps:
Excluding similar sources published in the same month as the checked papers or later. Excluding invalid duplication, such as duplication in the funding, addresses, references, etc., and duplication of terminology, the names of laws, regulations, policies, databases, and tools. We deleted similar sources with only invalid duplication directly in the AMLC; the corrected general similarity index can be generated automatically and synchronously. For similar sources with both valid and invalid duplication, the corrected general similarity index was calculated according to information listed in the AMLC (length of paper, length of all duplicated passages and each duplicated part, etc.).
We compared the AMLC general similarity index and the corrected index, which showed that the difference value (D-value) for all medical papers was 4.02 ± 10.43%. Pharmacology and pharmacy papers had the highest D-value (6.42 ± 10.46%), followed by basic medicine (5.21 ± 10.45%), and public health and preventive medicine (5.18 ± 10.41%), while those on health management had the lowest D-value (2.31 ± 10.34%). Mann-Whitney tests showed that the AMLC similarity index was significantly higher than the corrected index in all four fields (P < 0.001). Using the AMLC similarity index alone may lead to misjudgment of plagiarism; therefore the corrected general similarity index was used for further analysis.
A unified threshold for defining suspected plagiarism in medical journal papers has not yet been formed. The Committee on Publication Ethics has defined text duplication and recommended actions on how to proceed in cases of suspected plagiarism, but the threshold of defining suspected plagiarism or to what extent text reuse is tolerated is left to the editor’s discretion. Many studies on medical misconduct suggest a threshold of > 20%, for example, Swaan (2010) described that manuscripts submitted to
The data were not normally distributed; therefore, non-parametric tests, such as Kruskal-Wallis and Mann-Whitney tests, were performed using SPSS 26.0.
The average similarity index for 2,469 representative Chinese medical papers was 8.03 ± 7.58%, and pharmacology and pharmacy papers had the highest average similarity index (8.67 ± 5.92%), followed by health management (8.50 ± 8.85%), while those on basic medicine had the lowest value (7.33 ± 8.00%). Kruskal-Wallis tests showed that the average similarity index was significantly different among the four fields (P < 0.001), and Mann-Whitney tests showed that the average similarity index for pharmacology and pharmacy papers was significantly higher than that in other fields, except health management (Table 1).
Similarity index for Chinese journal papers in four fields.
Field | No. papers | Mean corrected general similarity index ± SD (P value) |
---|---|---|
Basic Medicine | 265 | 7.33 ± 8.00%** (0.009) |
Health Management | 1,067 | 8.50 ± 8.85% (0.509) |
Public Health & Preventive Medicine | 994 | 7.61 ± 6.00%* (0.032) |
Total | 2,469 | 8.03 ± 7.58% (0.003) |
The benchmark subset for Mann-Whitney tests is highlighted in bold.
Over time, the trend in similarity indices for papers in different fields peaked in 2011–2013, followed by a decrease since 2014. Armen et al. (2017) also observed a peak of global interest in plagiarism in 2013. The Chinese government has strengthened its emphasis on academic ethics since 2013, and many policies and rules on the management of academic misconduct released by the Chinese Ministry of Education and Ministry of Science and Technology have now been officially implemented. For example, the
Most medical papers in our study were research articles (1,828, 74.04%). The average similarity index for review articles (9.77 ± 10.28%) was significantly higher than that for research articles (7.41 ± 6.26%). Mann-Whitney tests showed the average similarity index for reviews on health management and public health and preventive medicine was significantly higher than that of research articles (P ≤ 0.001 and 0.05, respectively). However, there was no significant difference in basic medicine (Table 2).
Similarity indices for research and review articles.
Fields | No. papers | Mean ± SD | Mann-Whitney (P) | ||
---|---|---|---|---|---|
Research | Review | Research | Review | ||
Basic Medicine | 228 (86.04%) | 37 (13.96%) | 7.18 ± 8.23% | 8.28 ± 6.52% | 0.123 |
Health Management | 524 (49.11%) | 543 (50.89%) | 7.08 ± 6.12% | 9.87 ± 10.67% | 0.000 |
Pharmacology & Pharmacy | 143 (100.00%) | 0 (0.00%) | 8.67 ± 5.92% | ——— | ——— |
Public Health & Preventive Medicine | 933 (93.86%) | 61 (6.14%) | 7.47 ± 5.80% | 9.84 ± 8.32% | 0.037 |
Total | 1,828 (74.04%) | 641 (25.96%) | 7.41 ± 6.26% | 9.77 ± 10.28% | 0.000 |
In total, 143 papers had similarity indices ≥ 15%, accounting for 5.80% of all papers. The percentage of papers suspected of plagiarism in health management was the highest (7.31%), followed by pharmacology and pharmacy (6.99%), while that of public health and preventive medicine was the lowest (3.82%). These results were similar to those in Table 1; papers on health management and pharmacology and pharmacy were more likely to contain suspected plagiarism than the other two fields (Table 3).
Similarity indices according to field.
Field | No. papers | No. papers with similarity indices in each range | |
---|---|---|---|
< 15% | ≥ 15% | ||
Basic Medicine | 265 | 248 (93.59%) | 17 (6.42%) |
Health Management | 1,067 | 989 (92.69%) | 78 (7.31%) |
Pharmacology & Pharmacy | 143 | 133 (93.01%) | 10 (6.99%) |
Public Health & Preventive Medicine | 994 | 956 (96.18%) | 38 (3.82%) |
Total | 2,469 | 2,326 (94.21%) | 143 (5.71%) |
In total, 70 review papers and 73 research papers had similarity indices ≥ 15%. Most reviews were on health management (85.71%), and as shown in Table 2, the similarity indices of review articles were significantly higher than those of research articles, especially in the field of health management; this may be a reason for the highest percentage of papers suspected of plagiarism in this field (Figure 2).
Each paper containing suspected plagiarism was split into three sections: introduction/background, data/methods, and results/discussion. According to the distribution of sections containing duplication, we classified all papers into two groups: plagiarism in a single section and plagiarism in multiple sections. Table 4 shows that medical papers with plagiarism in multiple sections accounted for a larger percentage (90.21%) than those with plagiarism in a single section (9.79%); the distributions of the two groups in all four fields aligned with this.
Distribution of sections with duplication in papers with similarity index ≥ 15% according to field.
Field | Plagiarism in a single section | Plagiarism in multiple sections | Total |
---|---|---|---|
Basic Medicine | 0 (0.00%) | 17 (100.00%) | 17 |
Health Management | 11 (14.10%) | 67 (85.90%) | 78 |
Pharmacology & Pharmacy | 1 (10.00%) | 9 (90.00%) | 10 |
Public Health & Preventive Medicine | 2 (5.26%) | 36 (94.74%) | 38 |
Total | 14 (9.79%) | 129 (90.21%) | 143 |
Papers in basic medicine were more likely to contain duplication in the introduction/ background (94.12%) and results/discussion section (94.12%), followed by the data/ methods section (76.47%). Most of these contained duplication in all three sections. Basic medicine studies the mechanisms of how the human body operates and how diseases occur. These mechanisms are complex and diverse and are generally explained by several existing mechanisms. If the descriptions in these papers of the existing mechanisms were similar to previous papers, this may have caused a high similarity index. For example, a study on basic medicine analyzed the pathology of Turner’s syndrome from the perspective of chromosome karyotype, and most of the duplication in this paper was about the pathology of the X and Y chromosomes in Turner’s syndrome, which had already been proved by existing studies.
Papers on pharmacology and pharmacy were more likely to contain duplication in the data/methods section. They mostly contained duplication of materials, devices, or procedures as the descriptions of these parts were almost the same as papers with similar topics or samples. These parts were required to describe things in as much detail as possible, which may explain why so many papers in this field contained duplication in this section. All health management papers containing suspected plagiarism contained duplication in the results/discussion (100%), 83.33% contained duplication in the introduction/background; most were review papers, which require comprehensive analysis of existing studies or policies. This may explain why so many papers in this field contained duplication in this section. In total, 94.74% of papers on public health and preventive medicine contained duplication in the results/discussion section, most of which was description of the demographic characteristics of respondents (age, education, gender, etc.). Papers in this field shared the same or similar questionnaires, especially for those on similar research topics, which may explain why so many papers in this field contained duplication in this section (Table 5).
Sections with duplication of papers in four fields.
Fields | Sections containing duplication | ||
---|---|---|---|
Introduction/background | Data/methods | Results/discussion | |
Basic Medicine | 16 (94.12%) | 13 (76.47%) | 16 (94.12%) |
Health Management | 65 (83.33%) | 18 (23.08%) | 78 (100.00%) |
Pharmacology & Pharmacy | 8 (80.00%) | 10 (100.00%) | 7 (70.00%) |
Public Health & Preventive Medicine | 32 (84.21%) | 29 (76.32%) | 36 (94.74%) |
Total | 121 (84.62%) | 70 (48.95%) | 137 (95.80%) |
This study analyzed the duplicated text in 2,469 representative Chinese medical journal papers and revealed the differences in their similarity indices among four medical fields. The trends in similarity index, differences in similarity index between review and research articles, and distribution of papers containing suspected plagiarism were analyzed according to these different fields. We generated a corrected similarity index based on the AMLC general similarity index, which was used for all analyses.
We found that the AMLC general similarity index was significantly higher than the corrected index (P < 0.001). The initial similarity index generated by the plagiarism check system required correction, which aligns with previous studies (Baskaran et al., 2019). The accuracy of the plagiarism check system when identifying duplication in bibliographic sections of manuscripts still requires improvement; it is necessary to build exclusion vocabulary lists in different medical fields, as duplicates like medical terms and their abbreviations, policy names, etc. should be excluded automatically. However, medical papers with a corrected similarity index ≥ 15% should be treated as suspected plagiarism and investigated further.
According to comparative analysis of similarity indices among four fields, text duplication and plagiarism for papers on pharmacology and pharmacy as well as health management require more attention than the other two fields. Pharmacology and pharmacy papers had the highest similarity index (8.67 ± 5.92%), significantly higher than all other fields, except health management (8.50 ± 8.85%, P > 0.05); this value increased over time (2.80% before 2011 to 8.62% in 2020–2021). A survey of attitudes of pharmacy and medical biochemistry students toward plagiarism showed that 63% thought that plagiarism was not very important, while 59% thought that plagiarism was harmless (Pupovac et al., 2010). It is therefore important to enhance researchers’ and students’ awareness of academic misconduct in this field. Furthermore, papers suspected of plagiarism in pharmacology and pharmacy were more likely to contain duplication in the data/methods section (100%). Some papers subdivided their data into small individual publications, also known as salami slice publications (Martin, 2013). The registration number of clinical trials is being required by an increasing number of journals when submitting medical manuscripts, which helps to avoid unnecessary duplication of experiments. For editors and reviewers, this is also helpful for recognizing salami slice publications in that field. Manuscripts with the same trial registration number should be paid more attention in order to avoid plagiarism (Menon et al., 2022). The percentage of papers containing suspected plagiarism in the field of health management was the highest (7.31%); most of these were review papers, which had higher similarity indices than research papers. Researchers in health management should avoid quoting or copying texts from policies or rules in large paragraphs and should instead summarize these in their own words.
The average similarity index for papers in different fields has decreased since 2014, potentially due to policies and rules released by the Chinese government in relation to research integrity management. However, 143 papers containing suspected plagiarism were published as plagiarism checks were not required by every Chinese journal, especially before 2014. In addition, some journals conduct plagiarism checks, but not for each submission. The rules on research integrity of some Chinese medical journals are not transparent to researchers and the criterion for defining suspected plagiarism varies among different journals. These may be why papers with a high similarity index were published. Medical journals should publicize their rules on research ethics or plagiarism in as much detail as possible, and each manuscript submitted to a journal must be checked, rather than carrying out checks at random. Furthermore, the rules of judgment on plagiarism for journals in the same or related fields should be unified.
In this study, we performed a comprehensive analysis of text duplication in medical papers. We aimed to provide recommendations for the supervision of medical academic misconduct and the formation of criteria for defining suspected plagiarism in medical papers. However, only four medical fields were included in our research, and all papers in our study were supplied by researchers themselves. Additional analysis on a wider scale is required in the future.