ChatGPT is an emerging text-based artificial intelligence (AI) tool that is becoming more and more popular. By May 2023, it had reached over 100 million users (Ciaccio 2023). Since the time of Alan Turing’s famous question ‘Can machines think? (Turing 1950), the development of machine learning has been progressing constantly. Nowadays, thanks to deep learning techniques, in which algorithms can generate patterns and predictions based on learning using a huge dataset, it is possible to develop natural language processing, which allows for the further development of ChatGPT (Haleem et al. 2022). ChatGPT was originally created by OpenAI and released in November 2022 as a result of constantly improving systems from the GPT series, including GPT-2 and GPT-3 (Haleem et al. 2022; Ray 2023).
There are multiple examples of its application in different fields, such as medicine (Jungwirth & Haluza 2023; Ollivier et al. 2023), education (Cooper 2023) and the atmospheric sciences (Biswas 2023). Although multiple machine learning techniques can be incredible aids in climate research, ChatGPT, which uses machine learning to generate human-like responses, has some limitations (Biswas 2023; Cooper 2023), such as providing inaccurate information and creating fake references, URLs and DOI numbers (Day 2023). Even if ChatGPT can generate scientifically reasonable text, as it was trained on data that included scientific content, all information needs to be verified and critically evaluated (Ollivier et al. 2023).
The aim of this paper is to verify and evaluate the ChatGPT responses connected to different aspects of changing climate in Poland; the responses cover spatial and temporal changes of particular weather elements and include references to the sources of the presented information.
ChatGPT was asked eight questions regarding the spatial and temporal changes of weather elements in Poland (in one session between May and June 2023). The generated answers were then assessed in terms of scientific accuracy, and verified using published journal articles and books, as well as official governmental and intergovernmental documents. Each sentence in each answer produced by ChatGPT was assessed as true, false or neutral. The next step was to assess the overall answer on the scale of 0–10, where 0 means that no information given is correct, and 10 that all information is correct. This assessment was based on how much correct information was given in the answer in order to make it as objective as possible. For example:
The grading process was as follows: two sentences incorrect and one correct, which means that about 30% of the answer is correct. Overall grade: 3/10.
ChatGPT answer:
Assessment: The general answer looks legitimate; for example, the reference to IPCC is accurate. However, it was not specified from which report this information was taken. Since ChatGPT 3.5 was only trained on information up to 2021 (Grünebaum et al. 2023), it has no access to the latest IPCC report (2022), which indicates temperature changes of 2.5–4°C, with the most probable value being 3°C. If the user is aware of ChatGPT limitations, it could be a quick and useful tool to find general information about the temperature rise in Poland. The information about the specific value for the rise in temperatures is slightly inaccurate – the annual mean temperature rise is predicted to be 1–1.5°C until 2050 (not 2.5°C) and 2–4°C (correct) until the end of the century (Falarz 2021). However, this answer is very general – it has no information about regional changes in air temperature. Overall assessment: 7/10.
ChatGPT:
Assessment: ‘Fourth National Communication on Climate Change of the Republic of Poland’ is real document, which was published in 2006; however, on the same website, there are other documents such as ‘Fifth National Communication on Climate Change of the Republic of Poland’, published in 2010, ‘Sixth National Communication on Climate Change of the Republic of Poland’ (2013), ‘Seventh National Communication on Climate Change of the Republic of Poland’ (2017), with the latest one being ‘Eighth National Communication and Fifth Biennial Report under the United Nations Framework Convention on Climate Change’, published in 2022 (Website of the Republic of Poland 2022). This means that ChatGPT mixed up those reports, referring to the fourth, yet citing the year the seventh report was published (2017). Also, none of the reports uses the period 1901–2016; however, in the fourth report, it is stated that ‘Stations located at the Baltic coast also demonstrated temperature growth in 1901–1990 by 0.5°C/100 years in Gdańsk Wrzeszcz and in Hel (…) and by 0.7°C/100 years in Koszalin’ (Fourth National Communication under the United Nations Framework Convention on Climate Change 2006, p. 18). In the Seventh Report (2017), the year 1901 never appears. Also, the information ‘
ChatGPT:
Assessment: The same mistake as in the previous question appears – Fourth Report was published in 2006, not 2017. In that document, it is written that ‘
ChatGPT:
Assessment: First, most references are focused on snow cover, not snowfall. Changes in snow cover vary depending on region, but the trends are decreasing and changing from 4.6 days/decade in Suwałki to 3.0 days/decade in Wrocław, based on data from the years 1966/67 to 2019/20, according to Tomczyk et al. (2021), who indicate the most drastic changes in north-eastern Poland, not exactly the eastern and southern parts, as stated by ChatGPT. Some data from Poznań, based on a similar period (1966/1967 to 2019/2020) shows a statistically significant decrease of 5.3 days/decade (Szyga-Pluta 2022a), so the ranges given by ChatGPT are not quite accurate. Also, this time ChatGPT delivered real surnames of scientists. Two of the three scientists mentioned published results connected with snow cover (Miętus 1991 and Szwed et al. 2017), but the third focused more on history than climate. The last paragraph overgeneralized the shift from snowfall to rainfall being due to global warming. In fact, it ‘depends not only on temperature, but also on the water vapor content in the air and on the topography’ (Twardosz et al. 2012). Overall assessment: 2/10, mostly due to an off-topic answer.
ChatGPT:
Łupikasza, E, Wibig, J & Szyga-Pluta, K 2015, ‘High-level cloud cover over Poland in 1966–2010’,
Szyga-Pluta, K, Łupikasza, E & Wibig, J 2017, ‘Trends and variability of high-level cloud cover over Poland’,
Assessment: This time ChatGPT provided a bibliography, which looks very real – those names represent real scientists who published some papers connected with cloudiness. Unfortunately, those are not real publications; the titles have been fabricated. The first paragraph overgeneralizes because the height of high-level clouds is dependant on latitude, but the cloud genera classified as high-level cloud are correct. In the fourth and fifth paragraphs, the answers are mutually exclusive – a decreasing trend in summer and a decreasing trend in winter. Also the trend is incorrect – the occurrence of high-level clouds is increasing (Matuszko et al. 2022; Wibig 2008). Other research shows that, for example, the frequency of Cirrus clouds is increasing and, at the same time, the frequency of Cirrocumulus is decreasing, while Cirrostratus frequency was decreasing until the 1990s but has increased sharply since 2006 (Szyga-Pluta 2022b). In many papers, relationships between cloud genera and circulation type are described; however, those relationships are very varied and complicated. For example, clouds are usually observed in groups, not just as single clouds (Matuszko 2018), and there are many circulation classifications that influence cloudiness, some more than others (Szyga-Pluta 2022b)). So, again, this is oversimplification, which could be misleading for someone who does not know much about this topic. It is worth mentioning that there are no numeric values here – for example, in relation to trends or the percentage of high-level clouds in general cloudiness. Overall assessment: 1/10.
ChatGPT:
Wibig, J & Łupikasza, E 2015, ‘Sunshine duration in Poland in the years 1951–2010’,
Kędziora, A, Błaś, M & Markowicz, K 2016, ‘Relationship between aerosols and sunshine duration in Poland’,
Szyga-Pluta, K, Łupikasza, E & Wibig, J 2019, ‘Trends and variability of sunshine duration in Poland in the years 1966–2015’,
Assessment: Again, ChatGPT presented real names of scientists who publish papers about different aspects of climate in Poland. However, scientists who are strongly connected with topics of sunshine duration are not mentioned. Sunshine duration is a very interesting meteorological element because its variations are caused by many different factors (besides the consequences of contemporary climate change, such as changes in circulation patterns and cloudiness). These include changes in the measurement devices on IMGW meteorological stations – from the traditional Campbell-Stokes heliograph to the CSD3 automatic sensor, since 2014, to the ‘global brightening’ caused by the collapse of heavy industry polluters in Europe (Matuszko et al. 2020). The claim, repeated by ChatGPT in three paragraphs, that ‘
ChatGPT:
Krzyścin, JW & Kolendowicz, L 2018, ‘Spatiotemporal variability of wind speed in Poland during 1986–2015’,
Szyga-Pluta, K, Łupikasza, E & Wibig, J 2019, ‘Trends and variability of wind speed in Poland in the years 1966–2015’,
Czernecki, B, Dore, AJ & Hofman, J 2020, ‘A high-resolution gridded dataset of wind speed for Poland (1979–2016)’,
Assessment: The general trend of wind speed over Poland, but also over Europe, is decreasing, mostly due to increased friction (Wibig 2021a), so ChatGPT’s analysis of the overall trend is correct. Similar to other answers, the names of scientists and journals are real but the publications have been fabricated. It is true that NAO (North Atlantic Oscillation) and wind speed are correlated, but not as stated – rather the opposite: wind speed decreases during negative NAO phases (Phillips et al. 2013). The strongest winds are in the winter season, which means that the decrease is also more pronounced then, and the weakest winds are in the summer season (Wibig 2021a), which ChatGPT stated correctly. The strongest winds are from W, WSW and WNW sectors but, due to the small decreases of wind speed in general, those are getting weaker (Wibig 2021a); however, this doesn’t mean that easterly winds are stronger. The statement about the highest wind speeds occurring in coastal areas is partly correct because the strongest winds in Poland are noted in mountain areas, especially on Śnieżka (Wibig 2021a); however, in older publications, mountain stations were not included (Koźmiński & Michalska 2002), so this is probably the reason for this inaccuracy. The reasons not mentioned by ChatGPT that influence spatial and temporal changes in wind speed and direction are changes in measurement equipment – before automatization, the wind direction was measured ‘by eye’ – and various factors connected with particular station localization, which can influence both wind speed and direction (Koźmiński & Michalska 2002; Wibig 2021a). ChatGPT does not provide the information about the size of the trend, only about the direction. Overall assessment: 4/10.
ChatGPT:
Kundzewicz, ZW & Matczak, P 2016, ‘Climate change regional review: Poland’,
Kryza, M & Dore, AJ 2019, ‘Changes in extreme precipitation events in Poland in the period 1951–2015’,
Spinoni, J, Vogt, JV, Naumann, G, Barbosa, P & Dosio, A 2018, ‘Will drought events become more frequent and severe in Europe?’,
Wibig, J & Głowicki, B 2015, ‘Climate change in Poland: past and future’,
Assessment: This is a quite coherent summary of the most important changes in climate extremes in the future. Heatwaves will definitely increase in the future, but the information about a 30% rise in heatwave days is incomplete. This 30% is in reference to what base period? Pre-industrial or, for example, 1961–1990? Or 1991–2020? And how are heatwave days calculated? There are multiple methods, the most popular being percentiles or absolute thresholds. For Poland, heatwaves are based on maximum temperatures reaching above 30°C (Kuchcik 2006). In recent scientific papers, the trend value of heatwave days, calculated both ways is 7 to 8 days per decade since the 1980s (Wibig 2021b); however, if we take a longer base period – since the 1960s – the trend is 2 to 4 days per decade (Tomczyk & Bednorz 2019; Wibig 2021b) due to the low number of heatwaves in the late 1960s and 1970s.
It is true that, in general, precipitation is going to increase in Poland in the future, but the rate of the change may vary significantly, depending on the RCP scenario and the period predicted (2021–2050 or 2071–2100) (Pińskwar & Choryński 2021). Also, ChatGPT’s paragraph about drought is partly incorrect. First, there is no information about the kind of drought – meteorological, hydrological or agricultural. Second, the area most prone to drought is not the south-east, but rather central Poland in the region of Kujawy (Doroszewski Andrzej et al. 2014; Łabędzki & Bąk 2015). According to the last IPCC report (Pörtner et al. 2022), droughts, heatwaves and heavy precipitation events will be more frequent and severe in the future, so in this aspect ChatGPT was correct.
The paragraph about thunderstorms is inaccurate because periods with potential conditions to form thunderstorms are getting longer; moreover, based on the years 1951–2018, the eastern part of Poland has experienced an increased frequency of thunderstorms, while in the western part their number has decreased (Falarz 2021). Overall, no long-term tendency of changes has been observed, with the exception of a statistically significant trend during the cool part of the year (Falarz 2021).
The references cited are getting closer to reality – in fact, Kundzewicz, Z. W. and Matczak, P. are authors of the publication ‘Climate change regional review: Poland’ in Wiley Interdisciplinary Reviews: Climate Change (Kundzewicz & Matczak 2012). The incorrect information relates to the year, volume, issue and page numbers provided. Also, the reference entry ‘Climate change in Poland: past and future’ is very similar to the book
ChatGPT is an emerging tool of great potential for improving human work in different aspects of life. Despite it being a conversational model (Grünebaum et al. 2023; Haleem et al. 2022), research papers dealing with its multiple applications in science are constantly being created and published (Biswas 2023; Ciaccio 2023; Grünebaum et al. 2023; Ray 2023). The goal of this paper is to verify and assess the information provided by ChatGPT regarding different aspects of the changing climate in Poland. Although the assessment is designed to be as fair as possible, I am aware of the limitations of the applied methodology. It was not entirely possible to completely eliminate subjectivity.
Unfortunately, the scientific information provided by ChatGPT is very disappointing. Below is a brief summary of the several aspects of the ChatGPT answers, which are compiled in Table 1:
References. Almost all references are fake; however, with each answer the accuracy improves – at the beginning, the surnames were fabricated (for example, Kovalski) but, by the end, the authors and title in the bibliography related to a real scientific paper by Kundzewicz & Matczak 2012. General trends, factors and dependencies. In most cases, general information, such as cloud genres, factors influencing sunshine duration and future changes in extreme events, is correct. Unfortunately, sometimes this leads to overgeneralization and oversimplification; for example, the height of high-level clouds being dependent on latitude was omitted. Numbers and values. In most cases the numbers given by ChatGPT are highly inaccurate or incorrect. While the general trend is usually correct (for example, the increase in air temperature in Poland), the value of this trend is mostly incorrect (for example, the temperature trend should be 0.29°C/10 years, not 0.17°C/10). The reason for this might be that the proper reference period was not applied. In some responses, the values of climatological trends, averages, totals, and so on are not given at all. Weather elements. While some weather elements were presented more accurately, such as air temperature and precipitation, others were completely misinterpreted – for example, snow precipitation. Many scientific results in the field of geography are presented in the form of maps and charts, which are difficult for ChatGPT 3.5 to interpret accurately.
The assessment of eight answers given by ChatGPT with short descriptions and grades
1. How much is air temperature predicted to rise in Poland? | Correct | Temperature rise in the future, IPCC reference, 2–4°C rise until the end of the century, factors that influence temperature rise and information about projection uncertainties. | 7 |
Incorrect | Outdated information, inaccurate temperature values (should be 1–1.5°C until 2050, not 2.5°C), no information about regional variability. | ||
2. Describe with proper references the past changes of air temperature in Poland. | Correct | General information correct (such as increasing temperature trends, especially in winter season), IPCC fifth report correct. | 3 |
Incorrect | Mixed national reports, fake years and references, some data are highly inaccurate (temperature trend should be 0.29°C/10 years, not 0.17°C/10 years), fake reference years. | ||
3. Describe the pattern of precipitation in Poland with proper references and links to the sources of information. | Correct | General pattern correct, IPCC fifth report correct. | 7 |
Incorrect | Incorrect precipitation totals, incorrect references, some inaccurate data (precipitation totals should be 500–1500 mm, not 400–1000 mm). | ||
4. Explain how snow precipitation in Poland changed over time with references to published scientific papers. | Correct | Correct real names of scientists. | 2 |
Incorrect | Off topic – the answer is focused on snow cover, not snow precipitation, incorrect years and journals in references, varied inaccurate ranges of changes, overgeneralization. | ||
5. Explain the changes of occurrence of high-level clouds over Poland with references to scientific papers. | Correct | Correct cloud genres, real scientists connected with cloudiness research quoted, the names of real journals mentioned. | 1 |
Incorrect | Publication titles and teams are fake, information about changes in high-level clouds are mostly incorrect and without numerical values, two facts are mutually exclusive (decreasing trend in summer and in winter). | ||
6. Explain the reason for changes in sunshine duration in Poland with references to real scientific papers | Correct | Correct factors influencing sunshine duration. | 2 |
Incorrect | Important scientists omitted, information about trends incorrect, incomplete information about factors that influence this meteorological element, no numerical values given. | ||
7. Describe temporal and spatial variation of wind speed in Poland with references. | Correct | Real scientists and journals, trends mostly correct. | 4 |
Incorrect | Fake publications, incorrect information about the correlation with NAO index, incomplete information about spatial distribution, no numerical values given. | ||
8. How are extreme weather events going to change in Poland in the future? Give examples with references. | Correct | Improved accuracy of bibliography, mostly correct general trends and patterns of changes. | 5 |
Incorrect | Some incomplete and inaccurate information about the future of extreme events, for example, a 30% increase in heatwave days without any additional information. |
The average grade for the assessment of the eight questions related to climate change in Poland is only 3.87, suggesting that only 30–40% of the information is correct. At this stage of development, ChatGPT is a rather poor tool to be used in research and academic education (Castro Nascimento & Pimentel 2023; Zheng & Zhan 2023). However, thanks to its ability to learn on the basis of previous prompts (Haleem et al. 2022), it is possible its responses will improve. This requires further research to focus on the proper construction and improvement of prompt building, which should result in more accurate answers from ChatGPT.
Currently, its only viable applications are those recognized by scientific journals, such as correcting and improving English language in submitted manuscripts, sorting out bibliographies (e.g. changing styles, removing DOIs, etc.) and clearly indicating any parts written by ChatGPT as generated by the AI and not to be considered as co-authored contributions by, for example,