A paper mill detection model based on citation manipulation paradigm
, , , , oraz
06 sty 2025
O artykule
Kategoria artykułu: Research Papers
Data publikacji: 06 sty 2025
Zakres stron: 167 - 187
Otrzymano: 10 maj 2024
Przyjęty: 05 lis 2024
DOI: https://doi.org/10.2478/jdis-2025-0003
Słowa kluczowe
© 2025 Jun Zhang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Matching table for citation patterns and meta-paths_
Pattern Classification | Specific modalities | Meta-paths | |||||
---|---|---|---|---|---|---|---|
1-PP |
2-PPP | 3-PBP | 4-PJP | 5-PAuP | 6-PUP | ||
Circular citation of papers in a journal | ☑ | ☑ | |||||
Manipulation of References | Cross referencing in a journal | ☑ | ☑ | ☑ | |||
Cross-referencing of papers between journals | ☑ | ☑ | ☑ | ||||
Papers within journals citing the same papers | ☑ | ☑ | ☑ | ||||
Irrelevant Citations | Citing papers that are not relevant to the topic | ☑ | |||||
Carrying citations directly from cited references | ☑ | ☑ | |||||
Aggregation of Cited Papers | Same publisher | ☑ | ☑ | ||||
Same journal | ☑ | ☑ | |||||
Same academic society | ☑ | ☑ |
Experimental results of ablation experiments_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
H | 0.101 | 0.865 | 0.186 | 0.091 | 0.129 |
T+H | 0.057 | 0.857 | 0.108 | 0.038 | 0.039 |
T+L | 0.231 | 0.041 | 0.069 | 0.018 | 0.058 |
H+L | 0.428 | 0.439 | 0.434 | 0.234 | 0.414 |
Heterogeneous map dataset details_
Edge (A-B) | Num of A | Num of B | Meta-path | Num of Meta-path | Feature dimension | Training set | Validation set | Testing set |
---|---|---|---|---|---|---|---|---|
Paper-Paper | 25,900 | 25,900 | PP | 549,452 | ||||
25,900 | 25,900 | PPP | 1,339,889 | |||||
Paper-Journal | 25,900 | 3,226 | PJP | 2,742,868 | ||||
Paper-Publisher | 25,900 | 285 | PBP | 81,173,164 | 768 | 14,258 | 3,872 | 7,770 |
25,900 | 500w | PUP | 62,193 | |||||
Paper-Auxiliary_Paper | 25,900 | 500w | PUP5 | 45,854 | ||||
Paper-Academic Society | 25,900 | 16,010 | PAuP | 18,016 |
Meta-path weights in the heterogeneous graph attention network_
Name of Meta-path | Meta-path weights |
---|---|
PP | 0.0049 |
PPP | 0.0052 |
PJP | 0.4129 |
PBP | 0.0052 |
PUP | 0.0111 |
PUP5 | 0.3353 |
PAuP | 0.2255 |
Comparison of experimental results_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
RGCN | 0.047 | 0.794 | 0.089 | 0.013 | -0.01 |
HGT | 0.095 | 0.303 | 0.145 | 0.023 | 0.091 |
GIN | 0.333 | 0.954 | 0.494 | 0.325 | 0.438 |
RGAT | 0.029 | 0.845 | 0.058 | 0.001 | 0.007 |
LDA-Title | 0.381 | 0.1039 | 0.1633 | ||
LDA-Abstract | 0.549 | 0.437 | 0.487 | ||
LDA-Full-text | 0.438 | 0.259 | 0.326 | ||