A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.

Findings

The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

Research limitations

While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.

Practical implications

Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

Originality/value

To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.

eISSN:: 2543-683X
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 4 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Informationstechnik, Projektmanagement, Datanbanken und Data Mining

Zeitschrift RSS Feed

A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

Article Category: Research Paper

Online veröffentlicht: 01. Sept. 2017

Seitenbereich: 69 - 85

Eingereicht: 21. Jan. 2016

Akzeptiert: 26. Feb. 2016

DOI: https://doi.org/10.20309/jdis.201606

SchlüsselwörterData-usage statements extraction, Information extraction, Bootstrapping, Unsupervised learning, Academic text-mining

© 2016 Qiuzi Zhang, Qikai Cheng, Yong Huang, Wei Lu

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Schlüsselwörter
Data-usage statements extraction, Information extraction, Bootstrapping, Unsupervised learning, Academic text-mining