Uneingeschränkter Zugang

Can Automatic Classification Help to Increase Accuracy in Data Collection?


Zitieren

Accuracy – Individual benchmark.

Training set10%20%50%80%
Default model73.56%73.60%73.11%69.94%
Forest80.69%96.57%96.52%96.72%
GLMNet82.88%82.58%94.98%93.26%
Boosting95.45%95.22%96.68%95.18%
MaxEnt86.95%90.44%92.89%93.83%
SLDA73.56%74.90%75.66%88.25%
SVM94.29%95.80%96.75%98.07%
Tree94.38%95.56%95.13%93.83%

Recall of individual algorithms.

Training set10%20%50%80%
AlgorithmNoYesNoYesNoYesNoYes
Forest26.95%100.00%89.95%98.95%90.23%98.84%89.74%99.72%
GLMNet35.55%99.88%34.37%99.87%88.22%97.46%88.46%95.32%
Boosting93.34%96.21%95.43%95.15%96.26%96.83%90.38%97.25%
MaxEnt51.14%99.82%64.35%99.80%73.85%99.89%79.49%100.00%
SLDA0100.00%11.88%97.51%44.25%87.21%60.90%100.00%
SVM84.74%97.72%93.78%96.52%95.69%97.15%97.44%98.35%
Tree90.10%95.92%91.04%97.18%88.79%97.46%91.03%95.04%

Search process.

Search stringNumber of items
# 1TS=“her 2”8,542
# 2String 126,972
# 3#2 AND #16,396
# 4#1 NOT #32,146

Performance of the two best algorithms combined.

AlgorithmCoverageCoverage YesCoverage NoAccuracyRecall for the Yes categoryRecall for the No category
SVM and Boosting2,1311,62051199.06%99.69%97.06%

Consensus on the classification of records.

Consensus (Number of algorithms)CoverageCoverage NoCoverage YesAccuracyRecall for the Yes categoryRecall for the No category
≥ 42,3306161,71487.51%99.88%53.08%
≥ 52,0453361,70993.99%99.94%63.69%
≥ 61,8251621,66398.19%100.00%79.63%
≥ 71,606111,59599.32%100.00%N/A
eISSN:
2543-683X
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
4 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Informatik, Informationstechnik, Projektmanagement, Datanbanken und Data Mining