A paper mill detection model based on citation manipulation paradigm
, , , , et
06 janv. 2025
À propos de cet article
Catégorie d'article: Research Papers
Publié en ligne: 06 janv. 2025
Pages: 167 - 187
Reçu: 10 mai 2024
Accepté: 05 nov. 2024
DOI: https://doi.org/10.2478/jdis-2025-0003
Mots clés
© 2025 Jun Zhang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Matching table for citation patterns and meta-paths_
Pattern Classification | Specific modalities | Meta-paths | |||||
---|---|---|---|---|---|---|---|
1-PP |
2-PPP | 3-PBP | 4-PJP | 5-PAuP | 6-PUP | ||
Circular citation of papers in a journal | ☑ | ☑ | |||||
Manipulation of References | Cross referencing in a journal | ☑ | ☑ | ☑ | |||
Cross-referencing of papers between journals | ☑ | ☑ | ☑ | ||||
Papers within journals citing the same papers | ☑ | ☑ | ☑ | ||||
Irrelevant Citations | Citing papers that are not relevant to the topic | ☑ | |||||
Carrying citations directly from cited references | ☑ | ☑ | |||||
Aggregation of Cited Papers | Same publisher | ☑ | ☑ | ||||
Same journal | ☑ | ☑ | |||||
Same academic society | ☑ | ☑ |
Experimental results of ablation experiments_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
H | 0.101 | 0.865 | 0.186 | 0.091 | 0.129 |
T+H | 0.057 | 0.857 | 0.108 | 0.038 | 0.039 |
T+L | 0.231 | 0.041 | 0.069 | 0.018 | 0.058 |
H+L | 0.428 | 0.439 | 0.434 | 0.234 | 0.414 |
Heterogeneous map dataset details_
Edge (A-B) | Num of A | Num of B | Meta-path | Num of Meta-path | Feature dimension | Training set | Validation set | Testing set |
---|---|---|---|---|---|---|---|---|
Paper-Paper | 25,900 | 25,900 | PP | 549,452 | ||||
25,900 | 25,900 | PPP | 1,339,889 | |||||
Paper-Journal | 25,900 | 3,226 | PJP | 2,742,868 | ||||
Paper-Publisher | 25,900 | 285 | PBP | 81,173,164 | 768 | 14,258 | 3,872 | 7,770 |
25,900 | 500w | PUP | 62,193 | |||||
Paper-Auxiliary_Paper | 25,900 | 500w | PUP5 | 45,854 | ||||
Paper-Academic Society | 25,900 | 16,010 | PAuP | 18,016 |
Meta-path weights in the heterogeneous graph attention network_
Name of Meta-path | Meta-path weights |
---|---|
PP | 0.0049 |
PPP | 0.0052 |
PJP | 0.4129 |
PBP | 0.0052 |
PUP | 0.0111 |
PUP5 | 0.3353 |
PAuP | 0.2255 |
Comparison of experimental results_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
RGCN | 0.047 | 0.794 | 0.089 | 0.013 | -0.01 |
HGT | 0.095 | 0.303 | 0.145 | 0.023 | 0.091 |
GIN | 0.333 | 0.954 | 0.494 | 0.325 | 0.438 |
RGAT | 0.029 | 0.845 | 0.058 | 0.001 | 0.007 |
LDA-Title | 0.381 | 0.1039 | 0.1633 | ||
LDA-Abstract | 0.549 | 0.437 | 0.487 | ||
LDA-Full-text | 0.438 | 0.259 | 0.326 | ||