Open Access

Mapping Research on User-Generated Content in the Service Sector — A Bibliometric Analysis


Cite

Introduction

The development of World Wide Web platforms and technologies has had a significant impact on marketing. Through the use of digital platforms, companies are now able to directly communicate with their target audiences and obtain quick response in the form of comments likes, clicks or shares. In online services such as e-commerce websites, social media (SM) platforms, web blogs and peer networking sites, customers have the ability to express their opinions regarding the products and services they have purchased and used. The comments and opinions of consumers regarding a good or service are included in the ‘word of mouth’ (‘WOM’) that nowadays is more often transmitted electronically, a phenomenon referred to as electronic word-of-mouth (eWOM).

Internet consumers tend to have a greater level of credibility in eWOM than they have in case of traditional media (Cheung et al., 2008). According to Babić Rosario et al. (2016), the vast majority of clients read customer reviews posted online before making their final purchasing selections. According to Rasool et al. (2020), the development of electronic technology has increased the level of customer engagement that takes place in the virtual environment. In addition, the growth of SM has enabled customers to engage in conversations about businesses through the use of options such as liking, commenting and sharing. SM, such as Facebook, Twitter, Linkedin, YouTube and Instagram, are broad and important sources of social data about everyday life (Bello-Orgaz et al., 2016). Off-line behaviour is now largely transferred to the web, and thus becomes potentially available for analysis. The practice of communicating with one another verbally is now taking place on a much larger scale, and it is no longer limited by the geographical distance that separates individual consumers. The reviews circulate rapidly and have the potential to influence the potential purchasers’ buying decisions (Hutter et al., 2013).

eWOM is important in influencing customer purchasing decisions, but it also helps service providers increase revenue and improve their standing in the industry. eWOM is primarily web user-generated content (UGC). According to the definition from the Organisation for Economic Cooperation and Development (OECD), UGC is ‘content that is publicly available over the internet reflecting a certain amount of creative input and created outside of professional routines and practices’ (OECD, 2007).

Content generated by web users can relate practically to all spheres of human activity. It mirrors people's attitudes and opinions. Some content is generated spontaneously, such as opinions shared on discussion groups or SM. Others may be created under the influence of the actions of businesses and other institutions that have an interest in engaging Internet users, for instance, on-site question boxes, electronic questionnaires, and games or competitions in which participants are encouraged to share texts or multimedia content in exchange for rewards. Additionally, content created online takes a variety of formats that go far beyond the standard text. These are so called ‘fun fiction’ texts, pictures, films, presentations and animated gifs, music, wiki, wlogs and questions and answers (Q&A) (Wąsowicz-Zaborek, 2020).

Considering the growing body of literature on the subject, it is necessary to synthesise it. Some authors have summarised the literature on UGC topics in the past. These reviews provide a comprehensive overview, but new, innovative publications are coming up all the time. Besides that, the authors either analysed research on UGC usage in general with no specific focus on the service sector (Cortis & Davis, 2021; Donthu et al., 2021; Iswari et al., 2022) or limited their analysis to a one or a few industries, especially hospitality or tourism (Lu & Stepchenkova, 2015; Manosso & Domareski Ruiz, 2021; Molina-Collado et al., 2022).

As the number of publications in the field grows rapidly, traditional reviewing techniques, which are primarily descriptive and narrative in nature, appear to be less effective in summarising the literature. Considering this phenomenon, this study employs bibliometric methodology to review the research published on the subject in the last 10 years. The article presents mainly a bibliometric analysis that gives a broad picture of the subject, its evolution and the relationships between related works. The review will also be used to analyse the content-mining-related topics mentioned by academics. The purpose of this review is to describe UGC-based research in services. There are presented the findings of a search of the Scopus database, which has been undertaken at the end of 2022 to understand more about the state-of-the-art in mining of UGC, and to make recommendations for the potential future research in this area.

The paper is structured as follows: at the beginning the research method is described, then the main findings from bibliometric and systematic review are presented and at the end the conclusion and the future research recommendations are provided.

Research Method

The main goal of the research is the diagnosis of the UGC usage in research in service sector. The following research questions were posed as a further specification of the indicated research problem:

How has the usage of UGC in services research evolved in the last 10 years?

Which journals most often publish articles on UGC analysis in the service sector?

What methods have academics used to analyse UGC?

Which service industries have the authors studied?

Which data sources have been used by authors (which platforms)?

What have been the main themes of services research using UGC?

The bibliometric method is used for investigation of the topic of importance in services research employing UGC. Pritchard (1969) conceptualised bibliometrics as ‘the application of mathematics and statistical methods for books and other media.’ In that method the quantitative tools are used to analyse bibliographic data. It has developed into a respectable academic area that has relevance across several disciplines, including management and organisation (Zupic & Èater, 2015). Bibliometric methods, such as key word occurrence, bibliographic coupling or co-citation analysis, utilise bibliographic data from publishing databases, for example, Scopus or Web of Science to generate systemic pictures of scientific domains.

The two primary applications of bibliographic approaches are — performance analysis and science mapping. The purpose of performance analysis is to assess the research and publication output of individuals and organisations. The objective of science mapping is to reveal the structure and dynamics of scientific domains. The understanding of structure and evolution is helpful when an author's objective is to review a certain research field. The subjective assessment of literature obtains quantitative rigour by bibliometric methodologies. Throughout a review, a researcher can provide evidence of theoretically derived categories (Zupic & Èater, 2015).

In the present study, an analysis of the trend in publications in journals indexed in Scopus database was performed. An initial query using the key words ‘user generated content’, ‘online reviews’ and ‘online text mining’ limited to ‘services’ returned 840 articles. It is important to emphasise that there were also other limitations set of the search — years of publication 2012–2022, English language, only final versions of papers and only journal articles. The searching was also limited to subject areas: ‘Social Sciences’, ‘Business, Management and Accounting’ (with subareas ‘Accounting’, ‘Business and International Management’, ‘Industrial relations’, ‘Management of Technology and Innovation’, ‘Marketing’, ‘Organisational Behaviour and Human Resource Management’, ‘Strategy and Management’ and ‘Tourism, Leisure and Hospitality Management’) and ‘Economics, Econometrics and Finance’.

The initial set of articles was then screened manually (mainly titles and abstracts, but also if necessary full texts of articles) to eliminate all papers potentially not suitable with reference to the research topic. This means that all the non-empirical papers were excluded, as were those in which the authors did not use UGC as a data source in the study, as well as those not bearing any relation to service sector. Resultantly, for the final analysis, a set of 433 articles was accepted.

Data preprocessing and formatting, as well as additional calculations in the further study and visualisations, were performed with the aid of Excel spreadsheets. The identified documents were coded and analysed with MaxQDA software to present the publication years, main publication journals, authors who published the most frequently and the countries presented, research methods utilised and services industries evaluated by these authors as well as their main sources of data.

For mapping, a few bibliometric techniques were implemented to identify potential research collaborators, co-author, co-citation analysis and co-word analysis. In this case, the VOSviewer version 1.6.18 was used.

Co-word analysis is a type of content analysis that utilises the words within documents to verify connections and construct a conceptual foundation of the subject. The approach is based on the assumption that when words appear frequently in different papers, it indicates that the constructs around these words are closely tied. Co-word analysis produces a network of themes and their relationships that represents the overall mental framework of a domain. Co-word analysis can be used to analyse whole texts, abstracts, document titles and keywords (Zupic & Èater, 2015). In this research, the co-word analysis is applied to keywords describing the analysed articles.

The method was used to extract thematic clusters, which were adopted as the main topics of the research conducted by the authors of the analysed articles. The results of the co-words analysis were also verified by analysing the frequency of words in the abstracts of the articles, with the use of MaxQDA. The conducted verification allowed for a better interpretation and confirmed the relevance of the identified clusters.

The research topics identified in the clusters were used for the further literature review. Using a narrative method, the basic conclusions drawn by the selected authors of the papers classified in each cluster were described.

The concept and process of the survey are visualised below.

Figure 1.

Research process workflow

Source: Own elaboration.

Findings

Bibliometric analysis allowed the answering of research questions pertaining to the development and current status of the research using UGC.

Time of publication

In the analysed sample extracted from Scopus database, there were 433 papers published between 2012 and 2022. Looking at the distribution of publication frequency by year, it can be noted that until 2015 the number of publications did not exceed 10. In the following years, growth is observed. The highest increases were in 2017, when the number of publications nearly tripled compared to the previous year, and in 2020 when the number of publications exceeded 100 (compared to 60 from 2019). The great change observed in 2020 may be the result of an increase in the popularity of the subject, and the causative factor for this increase might in turn be the impact of studies published in previous years. This period also coincided with the first year of the COVID-19 pandemic when, due to restrictions introduced in most countries of the world, the conduct of traditional research based on survey methods or in-depth interviews was limited. Perhaps, therefore, the authors, looking for large, reliable datasets, decided to take a deeper look at the content generated by Internet users.

Figure 2.

Number of publications by year

Source: Own elaboration.

Publication venue

As for the place of publication, most journals published a maximum of a few articles presenting the use of UGC in service research. Nonetheless, a few journals with a few dozen papers published can be identified. The most articles were found in Sustainability (Switzerland), followed by the International Journal of Hospitality Management and the International Journal of Contemporary Hospitality Management. The list of journals in which publications on research in the service sector using UGC have appeared a minimum of five times is presented in Table 1. In addition, the years in which the articles were published are indicated.

Journals most often publishing articles reporting findings of services research using UGC

Journal 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Total

Sustainability (Switzerland) 2 8 11 5 14 40
International Journal of Hospitality Management 1 1 4 1 3 12 6 6 34
International Journal of Contemporary Hospitality Management 1 1 3 2 2 5 6 9 29
Tourism Management 1 1 5 3 4 3 2 4 23
Decision Support Systems 1 2 2 2 3 10
Journal of Hospitality and Tourism Technology 1 1 1 2 2 1 8
Journal of Business Research 2 2 3 1 8
Journal of Retailing and Consumer Services 3 2 1 1 7
Electronic Commerce Research and Applications 1 1 1 1 3 7
International Journal of Culture, Tourism, and Hospitality Research 1 1 2 2 1 7
Journal of Hospitality and Tourism Management 4 1 2 7
Tourism Management Perspectives 1 1 1 2 2 7
Journal of Hospitality Marketing and Management 2 1 1 2 6
Journal of Travel Research 1 1 2 2 6
Current Issues in Tourism 3 1 2 6
Industrial Management and Data Systems 1 4 5
International Journal of Information Management 1 1 1 1 1 5
Journal of Quality Assurance in Hospitality and Tourism 2 1 2 5

Source: Own elaboration. UGC, user-generated content.

Authors and affiliations

In total, there were 1,038 different authors represented in the data sample of articles. The most effective scholars are listed in Table 2; each of them has published at least five papers.

Most-published authors

S. No. Author Affiliation Number of publications

1 Xu Xun Department of Management, Operations, and Marketing, California State University, Turlock, CA, USA 13
2 Law Rob Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China 9
3 Rita Pablo ISCTE-IUL, Business Research Unit (BRU-IUL), Lisbon, Portugal 9
4 Li Hengyun School of Hotel and Tourism Management, The Hong Kong Polytechnic University, Hong Kong SAR, China 8
5 Li Yong-Hai Department of E-commerce, Henan University of Technology, Zhengzhou, China 7
6 Nilashi, Mehrbakhsh Centre for Global Sustainability Studies (CGSS), Universiti Sains Malaysia, 11800, USM Penang, Malaysia 7
7 Borghi Matteo Henley Business School, University of Reading, Greenlands, Henley on Thames, Oxfordshire, UK 6
8 Brochado Ana SCTE-IUL, University Institute of Lisbon, Centre for Socioeconomic and Territorial Studies (DINÂMIA’CET — IUL) & Business Research Unit (BRU-IUL), Lisbon, Portugal 6
9 Chatterjee Swagato IIT Kharagpur, Kharagpur, West Bengal, India 6
10 Kim Hak-Seon School of Hospitality and Tourism Management, Kyungsung University, 309 Suyoungro, Nam-Gu, Busan, South Korea 6
11 Mariani Marcello Henley Business School, University of Reading Greenlands, Henley on Thames Oxfordshire, UK 6
12 Moro Sérgio University Institute of Lisbon (ISCTE-IUL), ISTAR-IUL, Lisbon, Portugal 6
13 Samad Sarminah Department of Business Administration, Collage of Business and Administration, Princess Nourah Bint Abdulrahman University, Saudi Arabia 6
14 Bala, Pradip Kumar Indian Institute of Management Ranchi, Suchana Bhawan, 5th Floor, Audrey House Campus, Meur's Road, Ranchi, Jharkhand, India 5
15 Gao Baojun Economics and Management School, Wuhan University, Wuhan, China 5
16 Ray Arghya Indian Institute of Management Ranchi, Suchana Bhawan, 5th Floor, Audrey House Campus, Meur's Road, Ranchi, Jharkhand, India 5
17 Teichert Thorsten Marketing and Innovation, Faculty of Business, Economics and Social Sciences, University of Hamburg, Hamburg, Germany 5
18 Ye Qiang Harbin Institute of Technology, China 5

Source: Own elaboration.

Most of the publications were prepared in teams. The network of coauthors is presented below. As may be observed, the authors who published the most frequently are also among those who cooperated with other authors very often.

Figure 3.

Co-authors’ clusters

Source: Own elaboration with VOSviewer.

According to the findings of the analysis of the authors’ affiliations, the vast majority of the papers were associated with academics from either the USA (162) or China (103). Table 3 presents the total number of papers classified according to the authors’ country of affiliation. Owing to the limitations inherent in the present research, only countries with at least five manuscripts have been listed.

Number of articles by country represented by authors

Country Number of publications

USA 162
China 103
UK 42
India 34
South Korea 22
Hong Kong 21
Taiwan 21
Germany 17
Canada 14
Australia 13
France 12
Italy 11
Spain 11
Malaysia 9
Singapore 9
Netherlands 5
Portugal 5

Source: Own elaboration.

In the next stage, the analysis of citations between articles by authors with affiliations from various countries was performed. The following summary (Table 4) presents the clusters extracted based on citations. Looking at Clusters 3 and 4, it can be observed that citations are mainly concentrated in articles from the same geographical regions (Asia and Europe). In other clusters that rule cannot be observed; and so, the authors probably concentrated mainly on the field of the research, for example, analysed industry or the methods that were used and described in the articles.

Citation clusters by country

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Austria Czechia China Australia Belgium
Costa Rica France Hong Kong Germany Canada
Finland Iraq India Poland Israel
Iceland Ireland Japan Portugal Qatar
Iran Malaysia Morocco Sweden Russian Federation
Italy Saudi Arabia Puerto Rico Switzerland USA
Malta Spain Singapore United Arab Emirates
Netherlands Tunisia South Korea
Norway Turkey
Taiwan
UK
Vietnam

Source: Own elaboration with VOSviewer. Minimum number of articles from a country — 1; minimum number of citations — 1.

Analysed services industries and data sources

One of the research questions, set at the beginning, concerned the industry that was the field of the authors’ interest. Every article under analysis was screened and coded, which made it possible to list the services industries that the articles focussed on. Table 5 presents the findings.

Service industries targeted in the articles

Industry Number of articles

Tourism (hotels; hostels; rental services; bed and breakfast) 293
Restaurant 53
Transport (airlines; public transport) 37
Other (different) 37
Healthcare 28
Airport services 8
Banking services 3
Convenience stores, shops, retail 3
Logistics and supply chain 3
Audiobook, film streaming 2
Education 2
Escort services 2
Fitness 2
Public libraries 2
Air pollution protecting 1
Circular services 1
Convention centres 1
Crowdsourcing 1
Digital gaming 1
Insurance 1
Interpreting services 1
IT services 1
Self-services electronic vehicles 1
Telecommunication 1

Source: Own elaboration with MaxQDA; in some cases, the authors analysed multiple industries, and therefore the sum of articles in the table exceeds the study's total number of publications.

As can be observed, the overwhelming number of publications addressed the tourism industries, particularly focussing on the accommodation services, and also the rental services. Quite a number of articles also concerned the gastronomic services, transportation services and health care. It should be noted that only articles that refer to studies in which the authors used UGC to evaluate the service provider, services’ customer attitudes towards them were included in the analysis. However, articles in which only products (especially tangible goods) offered by service companies had been evaluated were excluded. For example, analyses pertaining to ratings of products posted on marketplaces (e.g. Amazon) were not considered. For this reason, the sample did not include several articles pertaining to the evaluation of movies or music offered on streaming platforms.

The distribution of the studied industries largely determined the use of information sources. The vast majority of the research utilised data from platforms enabling the publication of ratings and opinions regarding travel and hospitality services. The most frequently used platform was TripAdvisor (80). The other tourist platforms used in the research were Yelp (23), Airbnb (20), Booking.com (12), Ctrip (7), Expedia (2), Google Travel (2), Tuniu (1), Trivago (1) and Trustpilot (1). The authors also quite often used data from Amazon (35) to observe customer behaviour on the platform. As for the most popular worldwide SM services, the following were the corresponding usage frequencies — Facebook was used 11 times, Twitter 14 times, YouTube 3 times and Instagram and Weibo only once each. The authors also used data from Google Maps (7), Flickr (4), Coursera (3) and Uber (2). Individual pages and platforms of the service providers were also used as a source of information.

It should be emphasised that SM platforms (such as Facebook, Twitter and Instagram) discussing general topics, not being dedicated to the specific industry, are less useful for acquiring the big sets of data. That is why, most often, only the brand's fun pages were analysed. The goal of the research in that case was usually service quality or brand image assessment of the individual company or the group of companies (e.g. He et al., 2013).

Methods used in data analysis

The growth of research using UGC has also led to a searching for new and more efficient methods of analysing data sets acquired from the web, which are often rapidly growing in size. Traditional methods of content analysis, even using computer assisted qualitative data analysis software (CAQDAS) are proving to be too time and cost consuming. Moreover, they fail to extract all the content relevant to answering the research questions posed by the researchers. Researchers who published their work in earlier years tended to provide a simple analysis. Some focussed only on the descriptive statistics and did not dig into the content of the opinions themselves (Racherla & Friske, 2012). Some concentrated more on the content analysis but without the support of computer techniques (Barreda & Bilgihan, 2013; Sreenivasan et al., 2012). In the following years, it can be observed that there was a greater interest among authors in extended text analysis techniques using supervised or unsupervised machine learning allowing the analysis of large data sets. Analysis of articles from the Scopus database indicates widespread implementation of text mining in research using UGC. Authors often choose to combine different methods in their research, as is presented in Table 6 and for better illustration in the following visualisation. Currently, the most commonly used data analysis methods should be associated with quantitative rather than qualitative methods. However, some authors use a mixed methods approach (both qualitative and quantitative) (Liu et al., 2021; Mejia et al., 2021; Naeem et al., 2022).

Data analysis methods and their co-occurrence in articles

Source: Own elaboration with MaxQDA. LSTM, long short-term memory; NLP, natural language processing; TOPSIS, Technique for Order of Preference by Similarity to Ideal Solution.

Figure 4.

Clusters of research methods

Source: Own elaboration with MaxQDA.

Key words co-occurrence — main topic recognition

An analysis of the keywords co-occurrence was performed to investigate the main topics mentioned by authors and analyse the relationships between the articles. In the total sample of articles (433), 1,972 keywords were found, and 106 of them exceeded a minimum number of occurrences at Level 5. As a result, as shown in Figure 5, six major clusters have been extracted from the data sample. Each of them holds articles that cover similar common topics.

Figure 5.

Keywords co-occurrence

Source: Own elaboration with VOSviewer.

In the next step, the keywords frequencies in the clusters were analysed to determine the topic coverage most relevant for each cluster. Considering that all articles were based on UGC, the keyword ‘online reviews’, also appearing in the singular form, as well as ‘WOM’ and ‘word of mouth’ also with the prefix ‘e’, were not included in the process of determining the main topic. The frequencies of the keywords occurrence were analysed; also, the logical explanation of the links between the words grouped in the cluster was derived. The extracted topics are as follows: customers satisfaction, social media analysis, services quality, tourist destinations and companies management, data mining in hotel industry and sentiment analysis. The conclusion of the analysis is presented in Table 7.

Keywords clusters

Cluster 1 — Customer satisfaction Cluster 2 — SM analysis

Keywords Links Total link strength Occurrences Keywords Links Total link strength Occurrences

Airline industry 31 42 9 Consumer behaviour 21 28 5
Artificial intelligence 28 38 9 Customer engagement 13 16 5
Big data 55 102 17 Decision-making 37 70 14
Competitiveness 28 32 5 Electronic commerce 25 39 9
Customer experience 30 56 8 Empirical analysis 15 17 5
Customer satisfaction 68 197 47 Hospitals 17 22 5
Detection method 26 30 5 Hotels 33 79 17
Hospitality industry 55 95 17 Human 22 38 10
Life satisfaction 40 74 9 Information systems 6 8 5
Literature review 51 98 13 Sales 44 113 19
Marketing 45 68 12 Service provider 15 22 5
Natural language processing 14 17 5 Social media 72 178 43
Network analysis 32 47 6 Social networking (online) 34 74 16
Online review 67 166 45 Text mining 75 241 65
Perception 63 144 24 Twitter 11 14 5
Performance assessment 28 33 6
Profitability 15 17 5
Psychology 29 51 7
Qualitative analysis 32 42 7
Questionnaire survey 35 53 8
Regression analysis 28 42 7
Restaurant 18 22 8
Semantic network analysis 33 63 8
Service sector 35 49 11
Strategic approach 23 31 5
Sustainability 31 45 7
Sustainable development 40 68 10
World Wide Web 50 85 13

Cluster 3 — Services quality Cluster 4 — Tourist destinations and companies management

Keywords Links Total link strength Occurrences Keywords Links Total link strength Occurrences

China 46 76 13 Airbnb 36 65 16
Consumption behaviour 49 90 14 Culture 21 25 8
COVID-19 21 33 9 Hospitality 16 20 8
Data set 31 46 6 Hotel attributes 13 19 5
Deep learning 18 28 7 Netnography 15 18 6
Latent Dirichlet allocation 37 64 14 Portugal 18 23 6
Machine learning 56 114 26 Satisfaction 47 75 20
Numerical model 37 53 9 Topic modelling 13 17 6
Peer-to-peer accommodation 24 32 5 Tourism 49 102 24
Quality control 29 55 8 Tourism development 23 28 6
Quality of service 26 44 6 Tourism management 34 58 13
Service quality 81 266 58 Tourism market 21 29 5
Servqual 19 28 6 Tourist behaviour 44 95 17
Sharing economy 39 75 17 Tourist destination 37 64 12
Social network 21 23 5 Travel behaviour 38 71 13
Statistics 23 31 5
Text analytics 28 41 11
Topic model 16 19 5
Topic modelling 37 56 10
United States 40 59 10

Cluster 5 — Data mining in hotel industry Cluster 6 — Sentiment analysis

Keywords Links Total link strength Occurrences Keywords Links Total link strength Occurrences

Big data analytics 16 20 6 Comparative study 21 27 5
Data mining 75 215 39 Content analysis 34 52 17
Hotel 33 55 10 Google maps 13 17 5
Hotel industry 61 177 37 Sentiment analysis 80 242 68
Internet 75 220 38 Service evaluation 16 19 5
Language 37 53 10 Text analysis 28 37 10
Online hotel reviews 24 34 8
Online ratings 11 16 9
Review helpfulness 11 16 5
Tripadvisor 30 52 20
United Kingdom 16 27 5

Source: Own elaboration with VOSviewer. SM, social media.

Main topic analysis
Cluster 1 — Customer satisfaction

The authors concentrate primarily on learning more about the determinants of customer satisfaction of service companies, especially the airline and hospitality companies also those offering services with the artificial intelligence elements (e.g. robots in restaurants). The performance assessment confirmed that concentration on customer experience and customer satisfaction improvement can affect the competitiveness and profitability of the service provider. The authors also noted the importance of the sustainable approach of services companies, which was discussed by customers. The researchers used big data samples, which were mainly analysed by using the following methods: natural language processing (NLP), semantic network analysis, detection method and regression analysis, as well as qualitative analysis (Ban et al., 2019; Ban & Kim, 2019; Bhattacharyya & Dash, 2021; Borghi & Mariani, 2021; Chen et al., 2019; Cherapanukorn & Sugunnasil, 2022; Fu et al., 2022; Gao et al., 2018; Hu et al., 2019; Huang et al., 2021; Kaya, 2018; Kostromitina et al., 2021; Lee et al., 2020; Lin et al., 2018; Liu et al., 2013, 2017, 2021a,b; Lynn & Kwortnik, 2020; Nilashi et al., 2021, 2022; Rasool & Pathania, 2021; Sezgen et al., 2019; Shadiyar et al., 2020; Shin & Nicolau, 2022; Song et al., 2022; Sudhakar & Gunasekar, 2020; Teichert et al., 2020; Terziyska, 2021; Xu & Li, 2016; Yu et al., 2022a,b; Zibarzani et al., 2022).

Cluster 2 — SM analysis

The authors explored the issue of UGC and eWOM in SM platforms. They focussed on several important aspects: the impact of SM on consumer behaviour and decision-making, and consequently on the purchase of services.

They also analysed the aspect of consumer involvement in obtaining and sharing information. The main platform used was Twitter. The analysis was concentrated mainly on the services of hotels and hospitals, and the authors most often used the text mining method (Cox, 2015; Drevs & Hinz, 2014; Duan et al., 2016; Eslami et al., 2018; Gao et al., 2015; He et al., 2013; Huang et al., 2017; Hyman et al., 2022; Khurana et al., 2019; Kim et al., 2016; Liu et al., 2017; Lockie et al., 2015; Shen et al., 2020; Sreenivasan et al., 2012; Viswanathan et al., 2018).

Cluster 3 — Services quality

The authors looked at factors related to service quality and quality control. Focussing specifically on peer-to-peer accommodation services, they highlighted the determinants of quality by relating them to the framework defined by the SERVQUAL method (a quality measurement method; a detailed description can be accessed in the study of Parasuraman et al., 1988. They also referred to the impact of quality on customer behaviour. Attention was also focussed on the role of quality management in the context of the COVID-19 pandemic threats. The most-often mentioned analysis methods were deep learning, latent Dirichlet allocation, machine learning, text analytics and topic modelling. The studies were mainly concentrated in China and the USA (Barravecchia et al., 2020; Brett, 2019; Ding et al., 2020; Gunasekar et al., 2021; Lim & Lee, 2020; Lin et al., 2022; Liu & Chen, 2022; Mejia et al., 2021; Palese & Usai, 2018; Stepaniuk, 2016; Wong et al., 2022; Ye et al., 2019; Zuo et al., 2022).

Cluster 4 — Tourist destinations and companies management

The authors analysed which factors have the greatest impact on tourists’ satisfaction and behaviour and what is considered when evaluating tourist destinations and businesses. Awareness and knowledge of these factors is crucial in planning the development and management of both destinations and tourism businesses. The authors used topic modelling and netnography techniques to analyse the data (Aggarwal & Gour, 2020; Dias et al., 2014; Guo et al., 2015; Henrique de Souza et al., 2020; Hu et al., 2019; Jin et al., 2018; Jin & Xu, 2020; Kim et al., 2017; Lamb et al., 2020; Lu & Stepchenkova, 2012; Medina et al., 2016; Mkono, 2013; Pjero & Gjermëni, 2020; Shang & Luo, 2022; Sthapit & Björk, 2020; Terziyska, 2021; Tontini et al., 2017; Veríssimo & Costa, 2018; Xie et al., 2017).

Cluster 5 — Data mining in hotel industry

The authors focus on the helpfulness of online ratings and reviews in hotel industry. They concentrate on methodological aspects of big data analysis, especially with data mining techniques (Barreda & Bilgihan, 2013; Chen et al., 2019; Ghose et al., 2012; Guo et al., 2017; Lee et al., 2020; Liu et al., 2013, 2017; Nilashi et al., 2021; Ranjbari et al., 2020; Schuckert et al., 2015).

Cluster 6 — Sentiment analysis

The largest proportion of authors, who used UGC in their research, decided to conduct sentiment analysis. This is the type of content analysis that involves deployment of the NLP technique, which determines if the data in the text (mainly those data elements that are constituted in the form of words) are positive, negative or neutral. The authors used data from web mapping platforms such as Google Maps to evaluate services (Alamoudi & Alghamdi, 2021; AlSurayyi et al., 2019; Eletter, 2020; Hu et al., 2020; Khan & Loan, 2022; Li et al., 2022; Luo et al., 2021a,b; Mathayomchan & Taecharungroj, 2020; Mittal & Agrawal, 2022; Park et al., 2020; Rasool & Pathania, 2021; Witherspoon & Stone, 2013; Yu & Zhang, 2020; Yu et al., 2022a,b; Zhang et al., 2021).

The authors grouped in the last two described clusters focussed heavily on methodological issues deepening the understanding of the use of big data and presenting techniques to analyse them. In contrast, three other clusters (1, 3 and 4), focussed on in-depth examination of selected aspects of marketing and management use of UGC (service management, customer satisfaction and quality management). Cluster 2 focussed more on the place where UGC is published — SM. The analysis allows the conclusion to be reached that a certain limitation characterises the addressed topics. For example, the issue of reactions to negative comments, often analysed by authors in experimental or survey research (Liao et al., 2021), has not been heavily explored by researchers using UGC. The topic of factors influencing customer dissatisfaction has also been addressed relatively seldom.

Conclusion

The purpose of this paper is to report the results of a bibliometric study that was conducted in order to analyse the field of utilisation of UGC as data for the research on services. The findings of the investigation demonstrated that there has been a significant increase in the total number of publications that are related to that subject, especially in the last 3 years. The recognised tendency is the effect of the expansion of Internet technologies and SM, which have impacted the development of UGC. Many academics regard published content as a reliable and valuable source of information due to its wide availability and diversity.

An examination of the methods used by the researchers reveals that they have been still improved and new methods are constantly being developed. It should also be emphasised that more sophisticated machine learning algorithms are implemented for text analysis. In the analysed dataset of articles, many authors indicated affiliations with technical universities. Therefore, it can be assumed that the abovementioned trend will continue, especially considering the geometrical growth of Internet content and the ever-increasing datasets to be analysed.

The authors, in their research, have mainly focussed on the tourism and hospitality industries. Little attention was given to other industries. Therefore, it would be advisable to undertake analyses pertaining to them in the future.

The keywords co-occurrence analysis made it possible to identify the main themes of the research. These are: customer satisfaction, SM analysis, services quality, tourist destination and company management, data mining in hotel industry and sentiment analysis.

This list of topics does not exhaust the potential of the data from UGC. Considering the spontaneous nature of the opinions and the natural way of formulating their content, it is worth considering, for example, the use of UGC in constructing measurement scales for reliable questionnaire research. This is because users’ statements can provide a basis for the better operationalisation of constructs.

Among future research topics, it is also advisable to consider the analysis of reactions to negative comments. Yet, in addition to analysing the text posted online, it is also worth examining the broader context of the statement and considering such data as the time of publication, the people accompanying the consumption of the service (individual vs group consumption), whether the statement is spontaneous or triggered by the company's actions (e.g. an inquiry or a contest) and the place of the review (third party platform or the evaluated company's own website or SM page/profile).

The research also has limitations. First, the set of analysed articles is limited only to those published in journals indexed in the Scopus database between 2012 and 2022. In addition, the query used keywords and additional exclusions, which also limited the study sample. Introducing other keywords and expanding or reducing the list of exclusions would have affected the scope of the study.

This study has theoretical implications. Future researchers can benefit from the analyses carried out via literature reviews. Not only has the importance of the topic been demonstrated but also the main themes addressed by the authors have been identified, indicating limitations that can serve as a basis for identification of a gap that can be addressed in further research.

The article is also useful for practitioners because it presents some patterns in current research, particularly topics and methods useful also in potential practical analysis.