Predicting the Amount of Compensation for Harm Awarded by Courts Using Machine-Learning Algorithms
| 26 mai 2024
Publié en ligne: 26 mai 2024
Pages: 214 - 232
© 2024 Maciej Świtała, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.
Distribution of Amounts Awarded as Compensation for Harm Suffered and Their LogarithmsFigure 2.
Distribution of Amounts Awarded as Compensation for Harm Suffered among the 25 Most Impactful FeaturesFigure 3.
Partial Dependence Plots for the Most Impactful FeaturesFigure 4.
Example Model Prediction: Judgement of the Regional Court in Gliwice Dated August 31, 2014, File Number I C 1946/14Figure 5.
Example Model Prediction: Judgement of the Regional Court in Zambrów Dated September 24, 2015, File Number I C 335/15Tokens Characterized by the Highest Differences Between Maximum and Minimum Value of Their Partial Dependence Plots
Id |
Token |
Average awarded compensation when token does not occur |
Minimum awarded compensation |
Maximum awarded compensation |
Difference between maximum and minimum awarded compensation |
1 |
pension |
21,027.80 |
21,027.80 (0) |
38,725.99 (5) |
17,698.18 |
2 |
family |
16,532.26 |
16,532.26 (0) |
34,000.38 (16) |
17,468.12 |
3 |
hospital |
20,559.30 |
20,559.30 (0) |
35,327.96 (20) |
14,768.65 |
4 |
fracture |
19,852.62 |
19,852.62 (0) |
30,671.07 (25) |
10,818.46 |
5 |
year |
19,473.51 |
19,473.51 (0) |
28,048.15 (10) |
8,574.65 |
6 |
life |
19,712.14 |
19,712.14 (0) |
25,905.34 (36) |
6,193.20 |
7 |
dead |
22,645.50 |
22,645.50 (0) |
28,181.21 (7) |
5,535.71 |
8 |
zloty_monthly |
22,557.29 |
22,557.29 (0) |
27,708.96 (29) |
5,151.67 |
9 |
extend |
22,570.76 |
22,570.76 (0) |
26,658.50 (6) |
4,087.74 |
10 |
substantial |
22,346.35 |
22,346.35 (0) |
25,140.15 (19) |
2,793.80 |
11 |
bones |
23,056.96 |
23,056.96 (0) |
25,536.80 (25) |
2,479.84 |
12 |
child |
22,439.85 |
22,439.85 (0) |
24,908.14 (39) |
2,468.29 |
13 |
bond |
23,108.54 |
23,108.54 (0) |
25,423.49 (24) |
2,314.95 |
14 |
cervical |
23,675.45 |
21,368.98 (22) |
23,675.45 (0) |
2,306.47 |
15 |
disorders |
22,655.92 |
22,655.92 (0) |
24,947.08 (17) |
2,291.17 |
16 |
collar |
23,419.00 |
22,944.30 (11) |
23,419.00 (0) |
2,264.04 |
17 |
disability |
22,944.30 |
22,944.30 (0) |
25,129.94 (5) |
2,185.64 |
18 |
family_bond |
23,262.10 |
23,262.10 (0) |
25,430.35 (10) |
2,168.25 |
19 |
twist |
23,417.04 |
21,541.70 (7) |
23,417.04 (0) |
1,875.34 |
20 |
future |
22,640.79 |
22,640.79 (0) |
24,386.69 (22) |
1,745.90 |
Error Measures Obtained on a Test Set With Different Algorithms Applied
Algorithm applied |
Predictors |
Root mean squared error |
Mean Absolute percentage error |
Root median squared error |
Median absolute percentage error |
OLS |
Token counts |
308,207.87 |
112.32 |
11,747.55 |
52.79 |
TF-IDF |
100,743.48 |
94.59 |
11,759.07 |
49.25 |
LASSO |
Token counts |
287,706.24 |
109.68 |
11,474.52 |
52.64 |
TF-IDF |
95,976.82 |
93.46 |
11,435.79 |
48.64 |
Random forests |
Token counts |
75,268.12 |
86.43 |
11,064.28 |
47.16 |
TF-IDF |
74,674.55 |
88.59 |
10,518.75 |
48.13 |
XGBoost |
Token counts |
94,271.95 |
105.27 |
12,654.46 |
52.18 |
TF-IDF |
192,230.66 |
102.79 |
12,821.45 |
51.48 |
Multilingual BERT |
- |
95,665.94 |
426.45 |
38,062.44 |
88.03 |
Error Measures Obtained on a Test Set With Different Algorithms Applied and Models’ Overestimates Assumed as Not Errors
Algorithm applied |
Predictors |
Root mean squared error |
Mean Absolute percentage error |
Root median squared error |
Median absolute percentage error |
OLS |
Token counts |
61,829.04 |
23.92 |
1,024.79 |
5.29 |
TF-IDF |
67,563.77 |
21.83 |
178.39 |
1.08 |
LASSO |
Token counts |
62,019.89 |
23.83 |
1,061.22 |
5.57 |
TF-IDF |
68,215.32 |
21.78 |
129.66 |
0.69 |
Random forests |
Token counts |
74,939.24 |
23.11 |
1,729.41 |
8.01 |
TF-IDF |
73,939.24 |
22.80 |
1,741.93 |
9.02 |
XGBoost |
Token counts |
71,414.42 |
23.83 |
1,069.24 |
7.93 |
TF-IDF |
65,657.99 |
24.13 |
1,209.66 |
7.68 |
Multilingual BERT |
- |
82,212.22 |
29.04 |
0.00 |
0.00 |