Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms
26 mag 2024
INFORMAZIONI SU QUESTO ARTICOLO
Pubblicato online: 26 mag 2024
Pagine: 214 - 232
DOI: https://doi.org/10.2478/ceej-2024-0015
Parole chiave
© 2024 Maciej Świtała, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Tokens Characterized by the Highest Differences Between Maximum and Minimum Value of Their Partial Dependence Plots
1 | pension | 21,027.80 | 21,027.80 (0) | 38,725.99 (5) | 17,698.18 |
2 | family | 16,532.26 | 16,532.26 (0) | 34,000.38 (16) | 17,468.12 |
3 | hospital | 20,559.30 | 20,559.30 (0) | 35,327.96 (20) | 14,768.65 |
4 | fracture | 19,852.62 | 19,852.62 (0) | 30,671.07 (25) | 10,818.46 |
5 | year | 19,473.51 | 19,473.51 (0) | 28,048.15 (10) | 8,574.65 |
6 | life | 19,712.14 | 19,712.14 (0) | 25,905.34 (36) | 6,193.20 |
7 | dead | 22,645.50 | 22,645.50 (0) | 28,181.21 (7) | 5,535.71 |
8 | zloty_monthly | 22,557.29 | 22,557.29 (0) | 27,708.96 (29) | 5,151.67 |
9 | extend | 22,570.76 | 22,570.76 (0) | 26,658.50 (6) | 4,087.74 |
10 | substantial | 22,346.35 | 22,346.35 (0) | 25,140.15 (19) | 2,793.80 |
11 | bones | 23,056.96 | 23,056.96 (0) | 25,536.80 (25) | 2,479.84 |
12 | child | 22,439.85 | 22,439.85 (0) | 24,908.14 (39) | 2,468.29 |
13 | bond | 23,108.54 | 23,108.54 (0) | 25,423.49 (24) | 2,314.95 |
14 | cervical | 23,675.45 | 21,368.98 (22) | 23,675.45 (0) | 2,306.47 |
15 | disorders | 22,655.92 | 22,655.92 (0) | 24,947.08 (17) | 2,291.17 |
16 | collar | 23,419.00 | 22,944.30 (11) | 23,419.00 (0) | 2,264.04 |
17 | disability | 22,944.30 | 22,944.30 (0) | 25,129.94 (5) | 2,185.64 |
18 | family_bond | 23,262.10 | 23,262.10 (0) | 25,430.35 (10) | 2,168.25 |
19 | twist | 23,417.04 | 21,541.70 (7) | 23,417.04 (0) | 1,875.34 |
20 | future | 22,640.79 | 22,640.79 (0) | 24,386.69 (22) | 1,745.90 |
Error Measures Obtained on a Test Set With Different Algorithms Applied
OLS | Token counts | 308,207.87 | 112.32 | 11,747.55 | 52.79 |
TF-IDF | 100,743.48 | 94.59 | 11,759.07 | 49.25 | |
LASSO | Token counts | 287,706.24 | 109.68 | 11,474.52 | 52.64 |
TF-IDF | 95,976.82 | 93.46 | 11,435.79 | 48.64 | |
Random forests | Token counts | 75,268.12 | 86.43 | 11,064.28 | 47.16 |
TF-IDF | 74,674.55 | 88.59 | 10,518.75 | 48.13 | |
XGBoost | Token counts | 94,271.95 | 105.27 | 12,654.46 | 52.18 |
TF-IDF | 192,230.66 | 102.79 | 12,821.45 | 51.48 | |
Multilingual BERT | - | 95,665.94 | 426.45 | 38,062.44 | 88.03 |
Error Measures Obtained on a Test Set With Different Algorithms Applied and Models’ Overestimates Assumed as Not Errors
OLS | Token counts | 61,829.04 | 23.92 | 1,024.79 | 5.29 |
TF-IDF | 67,563.77 | 21.83 | 178.39 | 1.08 | |
LASSO | Token counts | 62,019.89 | 23.83 | 1,061.22 | 5.57 |
TF-IDF | 68,215.32 | 21.78 | 129.66 | 0.69 | |
Random forests | Token counts | 74,939.24 | 23.11 | 1,729.41 | 8.01 |
TF-IDF | 73,939.24 | 22.80 | 1,741.93 | 9.02 | |
XGBoost | Token counts | 71,414.42 | 23.83 | 1,069.24 | 7.93 |
TF-IDF | 65,657.99 | 24.13 | 1,209.66 | 7.68 | |
Multilingual BERT | - | 82,212.22 | 29.04 | 0.00 | 0.00 |