Open Access

A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications


Cite

Figure 1

Bootstrapping framework for extraction.
Bootstrapping framework for extraction.

Figure 2

Extensibility of pattern changes over the process of iteration under COM-SEED. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words.
Extensibility of pattern changes over the process of iteration under COM-SEED. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words.

Figure 3

Extensibility o f pattern changes over the process of iteration under GEN-SEED. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.
Extensibility o f pattern changes over the process of iteration under GEN-SEED. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.

Figure 4

Extensibility of patterns in the form of “Predicate + Object” changes over the process of iteration. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.
Extensibility of patterns in the form of “Predicate + Object” changes over the process of iteration. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.

Figure 5

Extensibility of patterns in the form of “Subject + Predicate” changes over the process of iteration. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.
Extensibility of patterns in the form of “Subject + Predicate” changes over the process of iteration. COM-SEED refers to the strategy of selecting both the names of a few well-known datasets and a category of general indicative words as seed words. GEN-SEED refers to the strategy of selecting a category of general indicative words as seed words.

Figure 6

Extensibility of pattern changes over the process of iteration under an optimum combination of seed-selection strategy and pattern-construction strategy.
Extensibility of pattern changes over the process of iteration under an optimum combination of seed-selection strategy and pattern-construction strategy.

DUS examples.

StatementsPositive (☑) OR negative (☒)
In our experiments, the experimental subset contains 1,552 images selected from the GT database and the FERET databases.☑ The name, source, and compositions of data
The large-scale database contains 93,638 images captured from 9,668 palms of 4,834 individuals, in which 4–10 images are collected for each palm.☑ The source and compositions of data
Consequently, both of the two experimental subsets contain 1,200 samples for training and 1,200 samples for testing.☑ Data compositions and application
In order to show the robustness over short noisy intervals and satisfy the two defined semantics R1 and R2, we generate two completely separated clusters, C1 and C2, using two disjoint interval sequences, Q1 and Q2, and add the synthetically generated short noisy intervals marked in red. Each group contains 10 subjects.☒ Algorithm description
☒ Experiment participants
The average training time of the repeated random sub-sampling validation is 1.83 × 30 = 54.9 s, and that of the CBE cross-validation is 1.84 × 5 = 9.2 s.☒ Experiment process

Elementary statistics on extraction results.

Seed-selection strategyPatternSeed numberPattern numberStatement number
COM-SEEDPredicate + Object14,00067029,722
Subject + Predicate5,10559611,869
GEN-SEEDPredicate + Object18,23540435,711
Subject + Predicate5,53033411,247

Exemplifications of pattern construction.

PatternSentences covered by this pattern and the extracted data_clue words
Consists of # samplesThe breast cancer set consists of 569 samples with 357 benign and 212 malignant. Dataset 1 is referred to as Char250, which has 250 samples per category for lower and upper cases, respectively; dataset 2 is referred to as Char1000, which has 1,000 samples per category for lower and upper cases, respectively. (Please note this pattern occurs twice here.)
We perform experimentsTo assess the ability of the proposed clustering algorithm for classifying the shape classes, we perform experiments on an increasing number of shapes in the two Aslan and Tari datasets. We perform our experiments on a real-estate system with real-life house dataset used in.

Initial seed words.

Seed-selection strategyCOM-SEEDGEN-SEED
Initial seed wordstree #data
kdd eupdataset
treecorpus
wall street journaldata set
the # kdd eup
dataset
corpus

Precision of statement extraction from CSExperiment-triple (2000–2013).

Seed-selection strategyPatternPrecision (%)
COM-SEEDPredicate + Object96.34
Subject + Object69.67
Overall83.01
GEN-SEEDPredicate + Object95.34
Subject + Predicate37.00
Overall66.17
eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining