Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

Ciekalski, Marcin; Laskowski, Maciej; Koperczak, Agnieszka; Śmierciak, Maria; Sirek, Sebastian

Accesso libero

Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

,

,

,

e

23 set 2024

Postępy Higieny i Medycyny Doświadczalnej

Volume 78 (2024): Numero 1 (Gennaio 2024)

INFORMAZIONI SU QUESTO ARTICOLO

Articolo precedente

Articolo Successivo

Cita

Scarica la copertina

Categoria dell'articolo: Original Study

Pubblicato online: 23 set 2024

Pagine: 111 - 116

Ricevuto: 11 gen 2024

Accettato: 19 giu 2024

DOI: https://doi.org/10.2478/ahem-2024-0006

Parole chiave
ophthalmology, ChatGPT, Polish national specialty exam

© 2024 Marcin Ciekalski et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Overall proportion of correct and incorrect answers (McNemar’s test: chi-squared = 13_4; df=1; p <0_001; OR (95% CI) = 3_78 (1_78–8_96))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	28 (28.6%)	9 (9.2%)	37 (37.8%)
	No	34 (34.7%)	27 (27.6%)	61 (62.2%)
	Column total	62 (63.3%)	36 (36.7%)	N = 98

Distribution of correct and incorrect answers for Treatment & Pharmacology question category (McNemar’s test: chi-squared = 1_5; df=1; p=0_2207; OR (95% CI) = 5 (0_559–236_488))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	2 (20%)	1 (10%)	3 (30%)
	No	5 (50%)	2 (20%)	7 (70%)
	Column total	7 (70%)	3 (30%)	N = 10

Distribution of correct and incorrect answers for Physiology & Diagnostics question category (McNemar’s test: chi-squared = 2_5; df=1; p=0_1138; OR (95% CI) = 4 (0_798–38_666))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	6 (28.6%)	2 (9.5%)	8 (38.1%)
	No	8 (38.1%)	5 (23.8%)	13 (61.9%)
	Column total	14 (66.7%)	7 (33.3%)	N = 21

Distribution of correct/false answers allocated for level of confidence for GPT4

Level of confidence	GPT-4
Level of confidence	Correct	Incorrect
Definitely sure	8	1
Very sure	40	19
Almost sure	14	16
Not very sure	-	-
Definitely not sure	-	-

Distribution of correct and incorrect answers for Pediatrics question category (McNemar’s test: chi-squared = 0_571; df=1; p=0_4497; OR (95% CI) = 2_5 (0_409–26_253))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	2 (18.2%)	2 (18.2%)	4 (36.4%)
	No	5 (45.5%)	2 (18.2%)	7 (63.6%)
	Column total	7 (63.6%)	4 (36.4%)	N = 11

Distribution of certainty levels between LLMs

LLM	GPT-4
GPT-3.5	Level of confidence	Definitely sure	Very sure	Almost sure	Not very sure	Definitely not sure	Total
	Definitely sure	2 (2.04 %)	18 (18.37%)	21 (21.43%)	-	-	41 (41.84%)
	Very sure	7 (7.14%)	41 (41.84%)	9 (9.18%)	-	-	57 (58.16%)
	Almost sure	-	-	-	-	-	-
	Not very sure	-	-	-	-	-	-
	Definitely not sure	-	-	-	-	-	-
	Total	9 (9.18%)	59 (60.20%)	30 (30.61%)	-	-	N = 98

Distribution of correct/false answers allocated for level of confidence for GPT3_5

Level of confidence	GPT-3.5
Level of confidence	Correct	Incorrect
Definitely sure	18	23
Very sure	19	38
Almost sure	-	-
Not very sure	-	-
Definitely not sure	-	-

Distribution of correct and incorrect answers for Clinical & Case Questions question category (McNemar’s test: chi-squared = 4_083; df=1; p=0_0433; OR (95% CI)= 5 (1_066–46_933))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	17 (39.5%)	2 (4.7%)	19 (44.2%)
	No	10 (23.3%)	14 (32.6%)	24 (55.8%)
	Column total	27 (62.8%)	16 (37.2%)	N = 43

Distribution of correct and incorrect answers for Surgery question category (McNemar’s test: chi-squared = 1_125; df=1; p=0_2888; OR (95% CI) = 3 (0_536–30_393))

LLM	GPT-4
GPT-3.5	Correct answer	Yes	No	Row total
	Yes	1 (7.7%)	2 (15.4%)	3 (23.1%)
	No	6 (46.2%)	4 (30.8%)	10 (76.9%)
	Column total	7 (53.9%)	6 (46.2%)	N = 13

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Biologia molecolare, Microbiologia e virologia, Medicina, Scienze medicali di base, Immunologia

Feed RSS della rivista

Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

Marcin Ciekalski

Maciej Laskowski

Agnieszka Koperczak

Maria Śmierciak

Sebastian Sirek

Categoria dell'articolo: Original Study

Pubblicato online: 23 set 2024

Pagine: 111 - 116

Ricevuto: 11 gen 2024

Accettato: 19 giu 2024

DOI: https://doi.org/10.2478/ahem-2024-0006

Parole chiaveophthalmology, ChatGPT, Polish national specialty exam

© 2024 Marcin Ciekalski et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Overall proportion of correct and incorrect answers (McNemar’s test: chi-squared = 13_4; df=1; p <0_001; OR (95% CI) = 3_78 (1_78–8_96))

Distribution of correct and incorrect answers for Treatment & Pharmacology question category (McNemar’s test: chi-squared = 1_5; df=1; p=0_2207; OR (95% CI) = 5 (0_559–236_488))

Distribution of correct and incorrect answers for Physiology & Diagnostics question category (McNemar’s test: chi-squared = 2_5; df=1; p=0_1138; OR (95% CI) = 4 (0_798–38_666))

Distribution of correct/false answers allocated for level of confidence for GPT4

Distribution of correct and incorrect answers for Pediatrics question category (McNemar’s test: chi-squared = 0_571; df=1; p=0_4497; OR (95% CI) = 2_5 (0_409–26_253))

Distribution of certainty levels between LLMs

Distribution of correct/false answers allocated for level of confidence for GPT3_5

Distribution of correct and incorrect answers for Clinical & Case Questions question category (McNemar’s test: chi-squared = 4_083; df=1; p=0_0433; OR (95% CI)= 5 (1_066–46_933))

Distribution of correct and incorrect answers for Surgery question category (McNemar’s test: chi-squared = 1_125; df=1; p=0_2888; OR (95% CI) = 3 (0_536–30_393))

Parole chiave
ophthalmology, ChatGPT, Polish national specialty exam