A paper mill detection model based on citation manipulation paradigm
, , , , y
06 ene 2025
Acerca de este artículo
Categoría del artículo: Research Papers
Publicado en línea: 06 ene 2025
Páginas: 167 - 187
Recibido: 10 may 2024
Aceptado: 05 nov 2024
DOI: https://doi.org/10.2478/jdis-2025-0003
Palabras clave
© 2025 Jun Zhang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Matching table for citation patterns and meta-paths_
Pattern Classification | Specific modalities | Meta-paths | |||||
---|---|---|---|---|---|---|---|
1-PP |
2-PPP | 3-PBP | 4-PJP | 5-PAuP | 6-PUP | ||
Circular citation of papers in a journal | ☑ | ☑ | |||||
Manipulation of References | Cross referencing in a journal | ☑ | ☑ | ☑ | |||
Cross-referencing of papers between journals | ☑ | ☑ | ☑ | ||||
Papers within journals citing the same papers | ☑ | ☑ | ☑ | ||||
Irrelevant Citations | Citing papers that are not relevant to the topic | ☑ | |||||
Carrying citations directly from cited references | ☑ | ☑ | |||||
Aggregation of Cited Papers | Same publisher | ☑ | ☑ | ||||
Same journal | ☑ | ☑ | |||||
Same academic society | ☑ | ☑ |
Experimental results of ablation experiments_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
H | 0.101 | 0.865 | 0.186 | 0.091 | 0.129 |
T+H | 0.057 | 0.857 | 0.108 | 0.038 | 0.039 |
T+L | 0.231 | 0.041 | 0.069 | 0.018 | 0.058 |
H+L | 0.428 | 0.439 | 0.434 | 0.234 | 0.414 |
Heterogeneous map dataset details_
Edge (A-B) | Num of A | Num of B | Meta-path | Num of Meta-path | Feature dimension | Training set | Validation set | Testing set |
---|---|---|---|---|---|---|---|---|
Paper-Paper | 25,900 | 25,900 | PP | 549,452 | ||||
25,900 | 25,900 | PPP | 1,339,889 | |||||
Paper-Journal | 25,900 | 3,226 | PJP | 2,742,868 | ||||
Paper-Publisher | 25,900 | 285 | PBP | 81,173,164 | 768 | 14,258 | 3,872 | 7,770 |
25,900 | 500w | PUP | 62,193 | |||||
Paper-Auxiliary_Paper | 25,900 | 500w | PUP5 | 45,854 | ||||
Paper-Academic Society | 25,900 | 16,010 | PAuP | 18,016 |
Meta-path weights in the heterogeneous graph attention network_
Name of Meta-path | Meta-path weights |
---|---|
PP | 0.0049 |
PPP | 0.0052 |
PJP | 0.4129 |
PBP | 0.0052 |
PUP | 0.0111 |
PUP5 | 0.3353 |
PAuP | 0.2255 |
Comparison of experimental results_
Model | Precision | Recall | F1-score | NMI | ARI |
---|---|---|---|---|---|
RGCN | 0.047 | 0.794 | 0.089 | 0.013 | -0.01 |
HGT | 0.095 | 0.303 | 0.145 | 0.023 | 0.091 |
GIN | 0.333 | 0.954 | 0.494 | 0.325 | 0.438 |
RGAT | 0.029 | 0.845 | 0.058 | 0.001 | 0.007 |
LDA-Title | 0.381 | 0.1039 | 0.1633 | ||
LDA-Abstract | 0.549 | 0.437 | 0.487 | ||
LDA-Full-text | 0.438 | 0.259 | 0.326 | ||