Dataset | #of topics | Macro F1 | Micro F1 | Hamming loss | |
---|---|---|---|---|---|
20 | 0.18 | 0.0101 | |||
50 | 0.07 | 0.0438 | 0.2540 | ||
100 | 0.04 | 0.0436 | 0.0096 | 0.2544 | |
20 | 0.47 | 0.0100 | |||
50 | 0.19 | 0.0410 | 0.2590 | ||
100 | 0.09 | 0.0386 | 0.2603 | ||
50 | 0.40 | 0.0023 | 0.0010 | 0.1274 | |
100 | 0.20 | 0.0022 | 0.1316 | ||
200 | 0.10 | 0.0022 | 0.1343 | ||
400 | 0.05 | 0.0010 |
Datasets | Training Set | Test Set |
---|---|---|
Health-Sciences | 16,932 | 4,236 |
Biological-Sciences | 9,034 | 2,258 |
USPTO | 283,899 | 71,622 |
Datasets | Top_k | Model | Macro F1 | Micro F1 | Hamming loss |
---|---|---|---|---|---|
7 | TextCNN | 0.2304 | |||
TextRNN | 0.0788 | 0.2341 | |||
TextRCNN | 0.0836 | 0.2294 | 0.1359 | ||
Biological-Sciences | 4 | TextCNN | 0.3070 | 0.4693 | 0.5055 |
TextRNN | 0.2026 | 0.4202 | |||
TextRCNN | 0.4714 | ||||
65 | TextCNN | 0.0341 | 0.2018 | 0.0127 | |
TextRNN | 0.0301 | 0.0107 | |||
TextRCNN | 0.2408 | 0.0089 |
Dataset | #of instances | #of labels | #of hierarchies | Label cardinality | doc/label | ||
---|---|---|---|---|---|---|---|
Avg. | Max. | Min. | |||||
Health-Sciences | 21,168 | 507 | 5 | 2.25 | 94.10 | 1,571 | 1 |
Biological-Sciences | 11,292 | 484 | 6 | 1.56 | 36.29 | 606 | 1 |
USPTO | 355,058 | 8,867 | 4 | 4.08 | 152.38 | 20,988 | 1 |
Emotions | 593 | 6 | 1 | 1.869 | 184.67 | 264 | 148 |
Scene | 2,407 | 6 | 1 | 1.074 | 430.83 | 533 | 364 |
Bibtex | 7,395 | 159 | 1 | 2.401 | 112 | 1,042 | 51 |
Medical | 978 | 45 | 1 | 1.25 | 27 | 266 | 1 |
Name | Link |
---|---|
Dependency LDA | |
Scikit-Multilearn | |
Magpie | |
Hierarchical Text Multi Label Classification | |
Keras-TextClassification | |
Neural Classifier |
Dataset | Method | Macro F1 | Micro F1 | Hamming Loss |
---|---|---|---|---|
Biological-Sciences | Dependency LDA | 0.0443 | 0.0102 | |
ML |
0.0836 | 0.1535 | 0.3389 | |
RA |
0.0280 | 0.0794 | 0.2859 | |
LabelPowerset | 0.0044 | 0.0219 | 0.3845 | |
TextCNN | 0.4693 | 0.5055 | ||
TextRNN | 0.2026 | 0.4202 | 0.2548 | |
TextRCNN | 0.3114 | 0.4714 | ||
Health-Sciences | dependency LDA | 0.0412 | 0.0102 | 0.2542 |
ML |
0.0806 | 0.1364 | 0.2727 | |
RA |
0.0294 | 0.0928 | 0.1761 | |
LabelPowerset | 0.0066 | 0.0620 | 0.3113 | |
TextCNN | 0.2304 | |||
TextRNN | 0.0788 | 0.2341 | ||
TextRCNN | 0.0836 | 0.2294 | 0.1359 | |
USPTO | dependency LDA | 0.0026 | 0.0012 | 0.0895 |
ML |
0.0618 | |||
RA |
0.0423 | 0.1102 | 0.0587 | |
LabelPowerset | 0.0274 | 0.1161 | 0.0594 | |
TextCNN | 0.0341 | 0.2018 | 0.0127 | |
TextRNN | 0.0301 | 0.2437 | 0.0107 | |
TextRCNN | 0.0401 | 0.2408 |
Dataset | Method | Max-features | Macro F1 | Micro F1 | Hamming Loss |
---|---|---|---|---|---|
Biological-Sciences | ML |
800 | 0.0831 | 0.3389 | |
1,000 | 0.1532 | 0.3404 | |||
RA |
800 | 0.0280 | 0.0794 | ||
1,000 | 0.0229 | 0.0637 | 0.2923 | ||
LabelPowerset | 800 | 0.0044 | 0.0219 | 0.3845 | |
1,000 | 0.0042 | 0.0195 | 0.3850 | ||
Health-Sciences | ML |
800 | 0.0738 | 0.2727 | |
1,000 | 0.1242 | 0.2759 | |||
RA |
800 | 0.0284 | 0.0858 | ||
1,000 | 0.0294 | 0.0928 | 0.1844 | ||
LabelPowerset | 800 | 0.0062 | 0.0610 | 0.3115 | |
1,000 | 0.0066 | 0.0620 | 0.3113 | ||
USPTO | ML |
12,000 | 0.1142 | 0.0643 | |
18,000 | 0.2673 | 0.0618 | |||
RA |
12,000 | 0.0421 | 0.1038 | 0.0588 | |
18,000 | 0.0423 | 0.1102 | |||
LabelPowerset | 12,000 | 0.0273 | 0.1161 | 0.0594 | |
18,000 | 0.0274 | 0.1151 | 0.0624 |
Datasets | #of word tokens | #of unique words |
---|---|---|
Health-Sciences | 1,556,854 | 64,113 |
Biological-Sciences | 1,486,840 | 45,610 |
USPTO | 2,540,118 | 81,268 |
Macro F1 | Micro F1 | Hamming loss | |
---|---|---|---|
Macro F1 | 1.000 | 0.762 | 0.277 |
Micro F1 | 0.762 | 1.000 | -0.008 |
Hamming loss | 0.277 | -0.008 | 1.000 |