Evaluation of the responses from different chatbots to frequently asked patient questions about impacted canines

To evaluate the responses given by ChatGPT 4.0, Google Gemini 1.5 and Claude 3.5 Sonnet chatbots to questions about impacted canines in relation to reliability, accuracy and readability.

Methods

Thirty-five questions were posed to 3 different chatbots and 105 responses were received. The answers were evaluated in relation to reliability (Modified DISCERN), accuracy (Likert scale and Accuracy of Information Index (AOI)) and readability (Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid grade level (FKGL)). Statistical significance was set at p<0.05.

Results

Gemini had the highest modified DISCERN score (33.66 ± 2.64), followed by Claude (29.70 ± 3.08) and ChatGPT (28.13 ± 2.83). ChatGPT had the highest mean Likert score (4.76 ± 0.43), while Claude and Gemini had 4.71 ± 0.47 and 4.66 ± 0.47, respectively. For the AOI index, ChatGPT had the highest mean score (8.67 ± 0.55), which was statistically significant when compared to others (ChatGPT vs Claude: p=0.042, ChatGPT vs Gemini: p=0.036). All chatbots showed similar readability FRES and FKGL scores without any significant differences (p=0.121 and p=0.377, respectively). Claude expressed responses with significantly fewer words than the other chatbots (Claude vs ChatGPT: p=0.019, Claude vs Gemini: p=0.001) and ChatGPT was the AI service that used the most words (239.74 ± 114.21).

Conclusions

In answering questions about impacted canines, Gemini showed good, while ChatGPT and Claude provided moderate reliability. All chatbots achieved high scores for accuracy. However, the responses were difficult to understand for anyone below a college reading level. Chatbots can serve as a resource for patients seeking general information about impacted canines, potentially enhancing and expediting clinician–patient communication. However, it should be noted that the readability of chatbot-generated texts may pose challenges, thereby affecting overall comprehension. Moreover, due to patient-specific, case-based variations, the most accurate interpretation should be provided by the patient’s healthcare professional. In the future, improved outcomes across all parameters may be achieved through advancements in chatbot technology and increased integration between healthcare providers.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Medizin, Vorklinische Medizin, Grundlagenmedizin, Vorklinische Medizin, Grundlagenmedizin, andere

Zeitschrift RSS Feed