A paper mill detection model based on citation manipulation paradigm
, , , , and
Jan 06, 2025
About this article
Article Category: Research Papers
Published Online: Jan 06, 2025
Page range: 167 - 187
Received: May 10, 2024
Accepted: Nov 05, 2024
DOI: https://doi.org/10.2478/jdis-2025-0003
Keywords
© 2025 Jun Zhang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Matching table for citation patterns and meta-paths_
Pattern Classification | Specific modalities | Meta-paths | |||||
---|---|---|---|---|---|---|---|
1-PP |
2-PPP | 3-PBP | 4-PJP | 5-PAuP | 6-PUP | ||
Circular citation of papers in a journal | ☑ | ☑ | |||||
Manipulation of References | Cross referencing in a journal | ☑ | ☑ | ☑ | |||
Cross-referencing of papers between journals | ☑ | ☑ | ☑ | ||||
Papers within journals citing the same papers | ☑ | ☑ | ☑ | ||||
Irrelevant Citations | Citing papers that are not relevant to the topic | ☑ | |||||
Carrying citations directly from cited references | ☑ | ☑ | |||||
Aggregation of Cited Papers | Same publisher | ☑ | ☑ | ||||
Same journal | ☑ | ☑ | |||||
Same academic society | ☑ | ☑ |
Experimental results of ablation experiments_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
H | 0.101 | 0.865 | 0.186 | 0.091 | 0.129 |
T+H | 0.057 | 0.857 | 0.108 | 0.038 | 0.039 |
T+L | 0.231 | 0.041 | 0.069 | 0.018 | 0.058 |
H+L | 0.428 | 0.439 | 0.434 | 0.234 | 0.414 |
Heterogeneous map dataset details_
Edge (A-B) | Num of A | Num of B | Meta-path | Num of Meta-path | Feature dimension | Training set | Validation set | Testing set |
---|---|---|---|---|---|---|---|---|
Paper-Paper | 25,900 | 25,900 | PP | 549,452 | ||||
25,900 | 25,900 | PPP | 1,339,889 | |||||
Paper-Journal | 25,900 | 3,226 | PJP | 2,742,868 | ||||
Paper-Publisher | 25,900 | 285 | PBP | 81,173,164 | 768 | 14,258 | 3,872 | 7,770 |
25,900 | 500w | PUP | 62,193 | |||||
Paper-Auxiliary_Paper | 25,900 | 500w | PUP5 | 45,854 | ||||
Paper-Academic Society | 25,900 | 16,010 | PAuP | 18,016 |
Meta-path weights in the heterogeneous graph attention network_
Name of Meta-path | Meta-path weights |
---|---|
PP | 0.0049 |
PPP | 0.0052 |
PJP | 0.4129 |
PBP | 0.0052 |
PUP | 0.0111 |
PUP5 | 0.3353 |
PAuP | 0.2255 |
Comparison of experimental results_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
RGCN | 0.047 | 0.794 | 0.089 | 0.013 | -0.01 |
HGT | 0.095 | 0.303 | 0.145 | 0.023 | 0.091 |
GIN | 0.333 | 0.954 | 0.494 | 0.325 | 0.438 |
RGAT | 0.029 | 0.845 | 0.058 | 0.001 | 0.007 |
LDA-Title | 0.381 | 0.1039 | 0.1633 | ||
LDA-Abstract | 0.549 | 0.437 | 0.487 | ||
LDA-Full-text | 0.438 | 0.259 | 0.326 | ||