Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements

Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements.

Design/methodology/approach

Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels.

Findings

The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research.

Research limitations

The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE.

Practical implications

Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled.

Originality/value

We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Tecnologías de la información, Gestión de proyectos, Bases de datos y minería de datos

RSS Feed de revista

Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements

Xin Guo

Yuming Chen

Jian Du

Erdan Dong

Categoría del artículo: Research Paper

Publicado en línea: 25 abr 2022

Páginas: 6 - 30

Recibido: 25 oct 2021

Aceptado: 05 mar 2022

DOI: https://doi.org/10.2478/jdis-2022-0008

Palabras claveUncertain knowledge, Information entropy, Natural language processing, Cardiovascular diseases, China

© 2022 Xin Guo et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Palabras clave
Uncertain knowledge, Information entropy, Natural language processing, Cardiovascular diseases, China