Accesso libero

Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

, , ,  e   
23 set 2024
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

Overall proportion of correct and incorrect answers (McNemar’s test: chi-squared = 13_4; df=1; p <0_001; OR (95% CI) = 3_78 (1_78–8_96))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 28 (28.6%) 9 (9.2%) 37 (37.8%)
No 34 (34.7%) 27 (27.6%) 61 (62.2%)
Column total 62 (63.3%) 36 (36.7%) N = 98

Distribution of correct and incorrect answers for Treatment & Pharmacology question category (McNemar’s test: chi-squared = 1_5; df=1; p=0_2207; OR (95% CI) = 5 (0_559–236_488))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 2 (20%) 1 (10%) 3 (30%)
No 5 (50%) 2 (20%) 7 (70%)
Column total 7 (70%) 3 (30%) N = 10

Distribution of correct and incorrect answers for Physiology & Diagnostics question category (McNemar’s test: chi-squared = 2_5; df=1; p=0_1138; OR (95% CI) = 4 (0_798–38_666))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 6 (28.6%) 2 (9.5%) 8 (38.1%)
No 8 (38.1%) 5 (23.8%) 13 (61.9%)
Column total 14 (66.7%) 7 (33.3%) N = 21

Distribution of correct/false answers allocated for level of confidence for GPT4

Level of confidence GPT-4
Correct Incorrect
Definitely sure 8 1
Very sure 40 19
Almost sure 14 16
Not very sure - -
Definitely not sure - -

Distribution of correct and incorrect answers for Pediatrics question category (McNemar’s test: chi-squared = 0_571; df=1; p=0_4497; OR (95% CI) = 2_5 (0_409–26_253))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 2 (18.2%) 2 (18.2%) 4 (36.4%)
No 5 (45.5%) 2 (18.2%) 7 (63.6%)
Column total 7 (63.6%) 4 (36.4%) N = 11

Distribution of certainty levels between LLMs

LLM GPT-4
GPT-3.5 Level of confidence Definitely sure Very sure Almost sure Not very sure Definitely not sure Total
Definitely sure 2 (2.04 %) 18 (18.37%) 21 (21.43%) - - 41 (41.84%)
Very sure 7 (7.14%) 41 (41.84%) 9 (9.18%) - - 57 (58.16%)
Almost sure - - - - - -
Not very sure - - - - - -
Definitely not sure - - - - - -
Total 9 (9.18%) 59 (60.20%) 30 (30.61%) - - N = 98

Distribution of correct/false answers allocated for level of confidence for GPT3_5

Level of confidence GPT-3.5
Correct Incorrect
Definitely sure 18 23
Very sure 19 38
Almost sure - -
Not very sure - -
Definitely not sure - -

Distribution of correct and incorrect answers for Clinical & Case Questions question category (McNemar’s test: chi-squared = 4_083; df=1; p=0_0433; OR (95% CI)= 5 (1_066–46_933))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 17 (39.5%) 2 (4.7%) 19 (44.2%)
No 10 (23.3%) 14 (32.6%) 24 (55.8%)
Column total 27 (62.8%) 16 (37.2%) N = 43

Distribution of correct and incorrect answers for Surgery question category (McNemar’s test: chi-squared = 1_125; df=1; p=0_2888; OR (95% CI) = 3 (0_536–30_393))

LLM GPT-4
GPT-3.5 Correct answer Yes No Row total
Yes 1 (7.7%) 2 (15.4%) 3 (23.1%)
No 6 (46.2%) 4 (30.8%) 10 (76.9%)
Column total 7 (53.9%) 6 (46.2%) N = 13
Lingua:
Inglese
Frequenza di pubblicazione:
1 volte all'anno
Argomenti della rivista:
Scienze biologiche, Biologia molecolare, Microbiologia e virologia, Medicina, Scienze medicali di base, Immunologia