Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

Ding, Liangping; Zhang, Zhixiong; Liu, Huan; Li, Jie; Yu, Gaihong

Acceso abierto

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

,

,

,

y

02 mar 2021

Journal of Data and Information Science

Volumen 6 (2021): Edición 3 (Junio 2021)

Acerca de este artículo

Artículo anterior

Artículo siguiente

Cite

Descargar portada

Categoría del artículo: Research Paper

Publicado en línea: 02 mar 2021

Páginas: 35 - 57

Recibido: 31 oct 2020

Aceptado: 15 ene 2021

DOI: https://doi.org/10.2478/jdis-2021-0013

Palabras clave
Automatic keyphrase extraction, Character-level sequence labeling, Pretrained language model, Scientific chinese medical abstracts

© 2021 Liangping Ding et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

An example of character-level sequence labeling.

An example of word-level sequence labeling.

An example of character-level iob format generation.

Character-level sequence labeling keyphrase extraction model architecture.

Input representations of character-level sequence labeling keyphrase extraction model.

Lexicon scale comparative experiments of unsupervised approaches_

Method	P	R	F
Term Frequency (whole lexicon)	47.66%	33.36%	39.24%
Term Frequency (training set lexicon)	37.31%	26.11%	30.72%
TF*IDF (whole lexicon)	54.14%	37.90%	44.59%
TF*IDF (training set lexicon)	42.18%	29.53%	34.74%
TextRank (whole lexicon)	43.13%	30.19%	35.52%
TextRank (training set lexicon)	34.37%	24.06%	28.30%

Character-level IOB generation results on data sets_

Data Set	P	R	F	Number of Recognized keyphrases	Number of Correct Recognized Keyphrases	Number of Ground-truth Keyphrases
Training Set	99.18%	99.42%	99.3%	416,013	409,371	408,373
Development Set	99.13%	99.54%	99.34%	25,942	26,169	26,061
Test Set	99.15%	99.56%	99.36%	13,344	13,458	13,403

Word-level IOB generation results on data sets_

Data Set	P	R	F	Number of Recognized keyphrases	Number of Correct Recognized Keyphrases	Number of Ground-truth Keyphrases
Training Set	91.15%	96.93%	93.96%	395,852	434,266	408,373
Development Set	91.35%	97.03%	94.11%	25,287	27,680	26,061
Test Set	90.99%	97.11%	93.95%	13,016	14,305	13,403

N-value comparative experiments of unsupervised baseline approaches_

Method	Top 3 Candidate Keyphrases			Top 5 Candidate Keyphrases
Method	P	R	F	P	R	F
Term Frequency	47.66%	33.36%	39.24%	37.53%	43.78%	40.42%
TF*IDF	54.14%	37.90%	44.59%	40.37%	47.11%	43.48%
TextRank	43.13%	30.19%	35.52%	33.29%	38.84%	35.85%

Word-level and character-level comparative experiments of supervised machine learning baselines_

Method	Word-Level	Character-Level
CRF	47.90%	46.37%
BiLSTM	44.35%	38.38%
BiLSTM-CRF	49.86%	50.16%

Word-level and character-level comparative experiments of BERT-based models_

Metrics	Word-Level	Character-Level
P	26.88%	60.33%
R	54.93%	59.28%
F	36.10%	59.80%

Performance evaluation of keyphrase extraction_

Method	P	R	F
TF*IDF (Baseline)	54.14%	37.90%	44.59%
BiLSTM-CRF (Baseline)	42.55%	61.09%	50.16%
BERT-based Model (our model)	60.33%	59.28%	59.80%
Adjusted Model (our model)	61.95%	59.22%	60.56%

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Tecnologías de la información, Gestión de proyectos, Bases de datos y minería de datos

RSS Feed de revista

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

Categoría del artículo: Research Paper

Publicado en línea: 02 mar 2021

Páginas: 35 - 57

Recibido: 31 oct 2020

Aceptado: 15 ene 2021

DOI: https://doi.org/10.2478/jdis-2021-0013

Palabras clave
Automatic keyphrase extraction, Character-level sequence labeling, Pretrained language model, Scientific chinese medical abstracts

© 2021 Liangping Ding et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Lexicon scale comparative experiments of unsupervised approaches_

Character-level IOB generation results on data sets_

Word-level IOB generation results on data sets_

N-value comparative experiments of unsupervised baseline approaches_

Word-level and character-level comparative experiments of supervised machine learning baselines_

Word-level and character-level comparative experiments of BERT-based models_

Performance evaluation of keyphrase extraction_

Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling

Liangping Ding

Zhixiong Zhang

Huan Liu

Jie Li

Gaihong Yu

Categoría del artículo: Research Paper

Publicado en línea: 02 mar 2021

Páginas: 35 - 57

Recibido: 31 oct 2020

Aceptado: 15 ene 2021

DOI: https://doi.org/10.2478/jdis-2021-0013

Palabras claveAutomatic keyphrase extraction, Character-level sequence labeling, Pretrained language model, Scientific chinese medical abstracts

© 2021 Liangping Ding et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Lexicon scale comparative experiments of unsupervised approaches_

Character-level IOB generation results on data sets_

Word-level IOB generation results on data sets_

N-value comparative experiments of unsupervised baseline approaches_

Word-level and character-level comparative experiments of supervised machine learning baselines_

Word-level and character-level comparative experiments of BERT-based models_

Performance evaluation of keyphrase extraction_

Palabras clave
Automatic keyphrase extraction, Character-level sequence labeling, Pretrained language model, Scientific chinese medical abstracts