Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements
Catégorie d'article: Research Paper
Publié en ligne: 25 avr. 2022
Pages: 6 - 30
Reçu: 25 oct. 2021
Accepté: 05 mars 2022
DOI: https://doi.org/10.2478/jdis-2022-0008
Mots clés
© 2022 Xin Guo et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Purpose
Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements.
Design/methodology/approach
Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels.
Findings
The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research.
Research limitations
The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE.
Practical implications
Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled.
Originality/value
We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.