Otwarty dostęp

A paper mill detection model based on citation manipulation paradigm

, , , ,  oraz   
06 sty 2025

Zacytuj
Pobierz okładkę

Figure 1.

Three “Paper mills” citation patterns.
Three “Paper mills” citation patterns.

Figure 2.

The process of constructing meta-paths from heterogeneous graphs.
The process of constructing meta-paths from heterogeneous graphs.

Figure 3.

The framework of our proposed PDCN model.
The framework of our proposed PDCN model.

Figure 4.

Hierarchical attention structure.
Hierarchical attention structure.

Figure 5.

Dataset Construction Process
Dataset Construction Process

Figure 6.

“Paper mills” cases.
“Paper mills” cases.

Matching table for citation patterns and meta-paths_

Pattern Classification Specific modalities Meta-paths
1-PP* 2-PPP 3-PBP 4-PJP 5-PAuP 6-PUP
Circular citation of papers in a journal
Manipulation of References Cross referencing in a journal
Cross-referencing of papers between journals
Papers within journals citing the same papers
Irrelevant Citations Citing papers that are not relevant to the topic
Carrying citations directly from cited references
Aggregation of Cited Papers Same publisher
Same journal
Same academic society

Experimental results of ablation experiments_

Model Precision Recall F1-score NMI ARI
H 0.101 0.865 0.186 0.091 0.129
T+H 0.057 0.857 0.108 0.038 0.039
T+L 0.231 0.041 0.069 0.018 0.058
H+L 0.428 0.439 0.434 0.234 0.414

Heterogeneous map dataset details_

Edge (A-B) Num of A Num of B Meta-path Num of Meta-path Feature dimension Training set Validation set Testing set
Paper-Paper 25,900 25,900 PP 549,452
25,900 25,900 PPP 1,339,889
Paper-Journal 25,900 3,226 PJP 2,742,868
Paper-Publisher 25,900 285 PBP 81,173,164 768 14,258 3,872 7,770
25,900 500w PUP 62,193
Paper-Auxiliary_Paper 25,900 500w PUP5 45,854
Paper-Academic Society 25,900 16,010 PAuP 18,016

Meta-path weights in the heterogeneous graph attention network_

Name of Meta-path Meta-path weights
PP 0.0049
PPP 0.0052
PJP 0.4129
PBP 0.0052
PUP 0.0111
PUP5 0.3353
PAuP 0.2255

Comparison of experimental results_

Model Precision Recall F1-score NMI ARI
RGCN 0.047 0.794 0.089 0.013 -0.01
HGT 0.095 0.303 0.145 0.023 0.091
GIN 0.333 0.954 0.494 0.325 0.438
RGAT 0.029 0.845 0.058 0.001 0.007
LDA-Title 0.381 0.1039 0.1633
LDA-Abstract 0.549 0.437 0.487
LDA-Full-text 0.438 0.259 0.326
PDCN 0.819 0.795 0.805 0.626 0.788
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Informatyka, Technologia informacyjna, Zarządzenie projektami, Bazy danych i eksploracja danych