Dataset | ID | records | classes |
---|---|---|---|
Titles | T | 143,838 | 816 |
Titles and keywords | T_KW | 121,505 | 802 |
Keywords only | KW | 121,505 | 802 |
Titles, major classes | T_MC | 72,937 | 29 |
Titles and keywords, major classes | T_KW_MC | 60,641 | 29 |
Keywords only, major classes | KW_MC | 60,641 | 29 |
Support Vector Machine | ||||
---|---|---|---|---|
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
Training set | Test set | Training set | Test set | |
T_KW_stm_2d | 90.60% | 72.68% | 96.23% | 73.32% |
T_KW_2d | 91.21% | 72.14% | 95.48% | 73.24% |
KW_2d | 81.75% | 71.86% | 86.18% | 71.96% |
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
---|---|---|---|---|
Training set | Test set | Training set | Test set | |
T | 93.74% | 40.91% | 99.59% | 40.45% |
T_KW | 97.50% | 65.25% | 99.90% | 66.13% |
KW | 83.09% | 64.02% | 92.38% | 64.09% |
T_MC | 93.95% | 57.99% | 99.62% | 57.80% |
T_KW_MC | 97.89% | 80.75% | 99.93% | 81.37% |
KW_MC | 90.58% | 79.56% | 96.30% | 80.38% |
Support Vector Machine | ||||
---|---|---|---|---|
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
Training set | Test set | Training set | Test set | |
T_KW_MC | 97.89% | 80.75% | 99.93% | 81.37% |
T_KW_MC_rem | 92.51% | 80.94% | 95.02% | 81.83% |
T_KW_MC_stm | 97.21% | 81.07% | 99.91% | 81.80% |
T_KW_MC_stm_rem | 92.18% | 81.34% | 94.89% | 82.20% |
T_KW_MC_sw | 95.44% | 80.98% | 98.48% | 81.24% |
T_KW_MC_sw_rem | 92.46% | 81.04% | 94.30% | 82.13% |
T_KW_MC_sw_stm | 94.87% | 81.40% | 98.72% | 81.24% |
T_KW_MC_sw_stm_rem | 92.17% | 81.54% | 94.16% | 81.90% |
Dataset | Linear | RNN | ||
---|---|---|---|---|
Training set | Test set | Training set | Test set | |
T_KW_MC | 97.17% | 79.99% | 92.76% | 78.70% |
KW_MC | 91.30% | 78.41% | 88.03% | 78.74% |
T_KW_MC_stm | 96.90% | 80.81% | 92.38% | 79.16% |
Naïve Bayes | ||||
---|---|---|---|---|
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
Training set | Test set | Training set | Test set | |
T_KW_stm_2d | 87.40% | 65.64% | 93.18% | 67.79% |
T_KW_2d | 88.26% | 64.78% | 93.55% | 66.92% |
KW_2d | 78.36% | 68.12% | 82.53% | 67.94% |
Dataset | NN | CNN | ||
---|---|---|---|---|
Training set | Test set | Training set | Test set | |
T_KW_MC | 96.19% | 79.40% | 95.33% | 79.92% |
KW_MC | 90.54% | 78.23% | 90.39% | 79.15% |
T_KW_MC_stm | 95.92% | 79.57% | 94.60% | 80.38% |
Naïve Bayes | ||||
---|---|---|---|---|
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
Training set | Test set | Training set | Test set | |
T_KW_MC | 95.42% | 76.52% | 99.66% | 75.96% |
T_KW_MC_rem | 90.17% | 76.79% | 93.25% | 78.21% |
T_KW_MC_stm | 94.32% | 76.36% | 99.59% | 76.36% |
T_KW_MC_stm_rem | 89.62% | 76.26% | 92.95% | 78.27% |
T_KW_MC_sw | 95.50% | 76.46% | 99.64% | 76.62% |
T_KW_MC_sw_rem | 90.28% | 77.09% | 92.33% | 78.60% |
T_KW_MC_sw_stm | 94.49% | 76.59% | 99.53% | 76.95% |
T_KW_MC_sw_stm_rem | 89.79% | 76.36% | 91.96% | 78.90% |
Dataset | Accuracy, unigrams | Accuracy, unigrams + 2-grams | ||
---|---|---|---|---|
Training set | Test set | Training set | Test set | |
T | 83.54% | 34.89% | 95.82% | 34.15% |
T_KW | 90.01% | 55.33% | 98.14% | 55.45% |
KW | 75.28% | 59.15% | 84.95% | 58.11% |
T_MC | 90.83% | 54.21% | 98.63% | 50.51% |
T_KW_MC | 95.42% | 76.52% | 99.66% | 75.96% |
KW_MC | 86.94% | 77.25% | 94.24% | 77.09% |