Method | P | R | F |
---|---|---|---|
Term Frequency (whole lexicon) | 47.66% | 33.36% | 39.24% |
Term Frequency (training set lexicon) | 37.31% | 26.11% | 30.72% |
TF*IDF (whole lexicon) | 54.14% | 37.90% | |
TF*IDF (training set lexicon) | 42.18% | 29.53% | 34.74% |
TextRank (whole lexicon) | 43.13% | 30.19% | 35.52% |
TextRank (training set lexicon) | 34.37% | 24.06% | 28.30% |
Data Set | P | R | F | Number of Recognized keyphrases | Number of Correct Recognized Keyphrases | Number of Ground-truth Keyphrases |
---|---|---|---|---|---|---|
Training Set | 99.18% | 99.42% | 99.3% | 416,013 | 409,371 | 408,373 |
Development Set | 99.13% | 99.54% | 99.34% | 25,942 | 26,169 | 26,061 |
Test Set | 99.15% | 99.56% | 99.36% | 13,344 | 13,458 | 13,403 |
Data Set | P | R | F | Number of Recognized keyphrases | Number of Correct Recognized Keyphrases | Number of Ground-truth Keyphrases |
---|---|---|---|---|---|---|
Training Set | 91.15% | 96.93% | 93.96% | 395,852 | 434,266 | 408,373 |
Development Set | 91.35% | 97.03% | 94.11% | 25,287 | 27,680 | 26,061 |
Test Set | 90.99% | 97.11% | 93.95% | 13,016 | 14,305 | 13,403 |
Method | Top 3 Candidate Keyphrases |
Top 5 Candidate Keyphrases |
||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
Term Frequency | 47.66% | 33.36% | 39.24% | 37.53% | 43.78% | 40.42% |
TF*IDF | 54.14% | 37.90% | 40.37% | 47.11% | 43.48% | |
TextRank | 43.13% | 30.19% | 35.52% | 33.29% | 38.84% | 35.85% |
Method | Word-Level | Character-Level |
---|---|---|
CRF | 47.90% | 46.37% |
BiLSTM | 44.35% | 38.38% |
BiLSTM-CRF | 49.86% |
Metrics | Word-Level | Character-Level |
---|---|---|
P | 26.88% | 60.33% |
R | 54.93% | 59.28% |
F | 36.10% |
Method | P | R | F |
---|---|---|---|
TF*IDF (Baseline) | 54.14% | 37.90% | 44.59% |
BiLSTM-CRF (Baseline) | 42.55% | 61.09% | 50.16% |
BERT-based Model (our model) | 60.33% | 59.28% | 59.80% |
Adjusted Model (our model) | 61.95% | 59.22% |