Accesso libero

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Figure 1

An example of character-level sequence labeling.
An example of character-level sequence labeling.

Figure 2

An example of word-level sequence labeling.
An example of word-level sequence labeling.

Figure 3

An example of character-level iob format generation.
An example of character-level iob format generation.

Figure 4

Character-level sequence labeling keyphrase extraction model architecture.
Character-level sequence labeling keyphrase extraction model architecture.

Figure 5

Input representations of character-level sequence labeling keyphrase extraction model.
Input representations of character-level sequence labeling keyphrase extraction model.

Lexicon scale comparative experiments of unsupervised approaches.

Method P R F
Term Frequency (whole lexicon) 47.66% 33.36% 39.24%
Term Frequency (training set lexicon) 37.31% 26.11% 30.72%
TF*IDF (whole lexicon) 54.14% 37.90% 44.59%
TF*IDF (training set lexicon) 42.18% 29.53% 34.74%
TextRank (whole lexicon) 43.13% 30.19% 35.52%
TextRank (training set lexicon) 34.37% 24.06% 28.30%

Character-level IOB generation results on data sets.

Data Set P R F Number of Recognized keyphrases Number of Correct Recognized Keyphrases Number of Ground-truth Keyphrases
Training Set 99.18% 99.42% 99.3% 416,013 409,371 408,373
Development Set 99.13% 99.54% 99.34% 25,942 26,169 26,061
Test Set 99.15% 99.56% 99.36% 13,344 13,458 13,403

Word-level IOB generation results on data sets.

Data Set P R F Number of Recognized keyphrases Number of Correct Recognized Keyphrases Number of Ground-truth Keyphrases
Training Set 91.15% 96.93% 93.96% 395,852 434,266 408,373
Development Set 91.35% 97.03% 94.11% 25,287 27,680 26,061
Test Set 90.99% 97.11% 93.95% 13,016 14,305 13,403

N-value comparative experiments of unsupervised baseline approaches.

Method Top 3 Candidate Keyphrases
Top 5 Candidate Keyphrases
P R F P R F
Term Frequency 47.66% 33.36% 39.24% 37.53% 43.78% 40.42%
TF*IDF 54.14% 37.90% 44.59% 40.37% 47.11% 43.48%
TextRank 43.13% 30.19% 35.52% 33.29% 38.84% 35.85%

Word-level and character-level comparative experiments of supervised machine learning baselines.

Method Word-Level Character-Level
CRF 47.90% 46.37%
BiLSTM 44.35% 38.38%
BiLSTM-CRF 49.86% 50.16%

Word-level and character-level comparative experiments of BERT-based models.

Metrics Word-Level Character-Level
P 26.88% 60.33%
R 54.93% 59.28%
F 36.10% 59.80%

Performance evaluation of keyphrase extraction.

Method P R F
TF*IDF (Baseline) 54.14% 37.90% 44.59%
BiLSTM-CRF (Baseline) 42.55% 61.09% 50.16%
BERT-based Model (our model) 60.33% 59.28% 59.80%
Adjusted Model (our model) 61.95% 59.22% 60.56%
eISSN:
2543-683X
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining