Open Access

Can ChatGPT evaluate research quality?

   | May 27, 2024

Cite

Figure 1.

The average REF star rating given by the REF D GPT against the author’s prior evaluation of the REF score of 51 of his open access articles.
The average REF star rating given by the REF D GPT against the author’s prior evaluation of the REF score of 51 of his open access articles.

Figure 2.

The range of REF star ratings given by the REF D GPT against the author’s prior evaluation of the REF score of 51 of his open access articles. The area of each bubble is proportional to the number of times the y axis score was given by ChatGPT to the x axis article. My REF scores are marked on the x axis.
The range of REF star ratings given by the REF D GPT against the author’s prior evaluation of the REF score of 51 of his open access articles. The area of each bubble is proportional to the number of times the y axis score was given by ChatGPT to the x axis article. My REF scores are marked on the x axis.

The scores given by ChatGPT-4 REF D and me to 51 of my open access articles.

Score GPT % Me %
1* 0 0.0% 2 4%
1.5* 0 0.0% 3 6%
2* 14 1.8% 12 24%
2.33* 1 0.1% 0 0%
2.5* 2 0.3% 9 18%
2.67* 2 0.3% 0 0%
2.75* 0 0.0% 1 2%
3* 509 66.5% 8 16%
3.33* 9 1.2% 0 0%
3.5* 14 1.8% 7 14%
3.67* 15 2.0% 0 0%
4* 199 26.0% 9 18%
Total 765 100.0% 51 100%

Pearson correlations for 51 of my open access articles, comparing my initial scores, and scores from ChatGPT-4 REF D.

Correlation All articles Articles scored 2.5+ by me Articles scored 3+ by me
GPT average vs. author (95% CI) 0.509 0.200 0.246
(0.271,0.688) (-0.148,0.504) (-0.175,0.590)
GPT vs. author, average of 15 pairs (fraction of 95% Cis excluding 0) 0.281 0.102 0.128
(8/15) (1/15) (1/15)
GPT vs. GPT (average of 105 pairs) 0.245 0.194 0.215
Sample size (articles) 51 34 24
eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining