Open Access

Twitter sentiment analysis: An estimation of the trends in tourism after the outbreak of the Covid-19 pandemic


Cite

Introduction

SARS-COV2, a novel coronavirus, is the causative agent for Covid-19, which has driven a global pandemic and has caused great health concerns. This virus originated in 2019 in the animal wet market of Wuhan city in China, and it was highly contagious, spreading easily through person-to-person contact. By the beginning of 2020, the virus had spread across most parts of Europe, America, and the Asian sub-continent (Guan et al., 2020; Mao et al., 2020; Mizumoto & Chowell, 2020; Shereen et al., 2020; Sohrabi et al., 2020). There has been a fight against this pandemic worldwide and nations have opted for lockdowns to curb its impact (Lau et al., 2020; Saadat et al., 2020; Setiati & Azwar, 2020; The Lancet, 2020). This major lockdown impacted economies drastically and every sector is working toward revival and trying to cope with the dent created by this pandemic. Out of all the sectors, the most severely impacted is the tourism sector, where a lot has to be done to come back to the position which existed before the pandemic (Bakar & Rosbi, 2020; Gössling et al., 2020; Higgins-Desbiolles, 2020; Lapointe, 2020; World Tourism Organization, 2020). The tourism sector, the primary focus of this study, was an exception: the role of IT was marginal and only acted as a supplementary service to the core service (Albeshr & Ahmad, 2015; Edghiem & Mouzughi, 2018; Nafei, 2019; Sinclair-Maragh, 2014).

Covid-19 related to travel restrictions created an atmosphere of fear that badly hit the tourism sector across the world (Ulak, 2020). In the Indian economy, contrary to the countries where tourism contributes a significant percentage of their national incomes, the e-commerce industry, automobile sector, information technology and software services, rail system, and many other industries played a vital role in the contribution of national income in the third quarter of 2019–20, when the Covid-19 pandemic knocked at the door of India (Bansal et al., 2020). With the lockdown, a tremendous loss has been reported in the respective sectors, leading to a loss in the GDP (Osterholm & Olshaker, 2020). After the process of unlocking the economies, it has been predicted that domestic tourism in OECD countries, which accounts for around 75% of the tourism economy, is expected to recover more quickly (Sigala, 2020). The adverse impact of this pandemic has been felt on the entire tourism ecosystem and restarting and restoration of tourist destinations requires a collaborative approach. There is a need to build confidence among travelers, to stimulate them to visit a destination with the government’s tourism-specific measures (Assaf & Scuderi, 2020). So, the main objective of this research is to focus on sentiment analysis using Indian tourist tweets during Covid-19, using Python and the maximum likelihood method to determine the parameters.

Global Tourism and Pandemic

Global tourism has been widely exposed to many crises that have been influencing the tourism sector, especially from 2000 till today like the 9/11 terrorist attacks, the SARS outbreak (2003), the MERS outbreak, the global economic crisis in 2008/2009, and now Covid-19 as depicted in Figure 1. The relationships between pandemics and travel are central to understanding health security and global change (World Travel & Tourism Council, 2018). Globally the tourism sector has been considerably affected, with a downsize of 22% in the first quarter of 2020 which could further reach 60–80% in comparison to the travel figures of 2019 (World Tourism Organization, 2019).

Figure 1:

International Tourist Arrivals By Region In Q1 2020

Source: Report of World Tourism Organization

Review of literature

Technological changes related to the internet, including smartphones and tablets, have revolutionised the tourism industry from a brick-and-mortar and person-to-person service industry to a heavily digitally supported and omnipresent travel service network. Travelers have access to online platforms to provide feedback and make recommendations for other travellers (Neidhardt et al., 2017; Yang et al., 2017; Ye et al., 2009). In addition to professional systems, online social media, such as Twitter, Instagram, Facebook, Four Square, Sina Weibo, and Google Plus, play a significant role in creating electronic word-of-mouth (Confente, 2015; García-Pablos et al., 2016; Leung et al., 2013; Phillips et al., 2017). Importantly, online social media, travel professional websites and platforms, and blogs present inexpensive means to gather rich, authentic, and unsolicited data on travelers’ opinions. Besides, sentiment can also be modelled by machines for automation, and integration across various applications (Choi et al., 2007; Rabanser & Ricci, 2005). Previous research used Twitter data to discover people’s reactions to coronavirus-related issues such as lockdown, safety, job loss, mental health-related issues, and sentiments related to their resort visits (Barkur et al., 2020; Rajput et al., 2020; Cinelli et al., 2020; Samuel et al., 2020).

Sentiment analysis refers to the use of computational linguistics and natural language processing to analyze text and identify its subjective information. The sentiment analysis has the task of detecting and classifying opinions and feelings expressed in the text. Whilst there are many challenges to be solved before sentiment analysis programs attain expert human-level performance (Cambria et al., 2017), they give results that correlate positively with user ratings and so it is reasonable to use sentiment analysis scores when user ratings are absent (López-Barbosa et al., 2015). Sentiment analysis can also give additional insights into the important components of an offering. For example, an analysis of restaurant reviews found that context was important to diners in addition to those more widely recognised: food, service, price, and ambiance (Gan et al., 2017). Sentiment analysis has become a standard component of the social media analysis toolkit of marketers and customer relationship managers in large organisations (Hofer-Shall, 2010). These may purchase relevant data (e.g., Facebook posts mentioning a hotel) and gain access to a suite of sentiment analysis and other tools to analyze it in a platform like Pulsar (pulsarplatform.com). Understanding how sentiment analysis works is therefore important for those working in tourism and the wider marketing sector.

Tourists can benefit from sentiment analysis through computing systems that provide recommendations. The software can do this by extracting the key aspects of potential destinations (e.g., hotels), extracting reviews about these destinations, processing the reviews with aspect-based sentiment analysis, and then summarising the overall sentiment expressed about the customers’ potential choices. A web user wishing to decide between three hotels might be presented with a graphic showing their key aspects (e.g., breakfast quality, cleanliness, view) and then be shown the average rating of customers on those aspects (Schmunk et al., 2013). In theory, this would save customers the task of reading reviews if, for example, they were only concerned with one aspect of a hotel (e.g., children’s entertainment, bar, beach) and so overall ratings that considered all aspects of a hotel would not be relevant. Whilst this would seem to be a useful service, there does not seem to be a successful commercial example so far. A promising but so far also apparently unsuccessful application is to overlay a map with icons representing tourism services, color-coded by sentiment (Cresci et al., 2014). Other applications have also attempted to exploit sentiment in tourist reviews to generate improved recommendations for them (e.g., Dong and Smyth, 2016). An unusual research application of sentiment analysis analysed the sentiment of tourism-related press coverage for countries and cities, generating an overall map of destinations by sentiment (Scharl et al., 2008). Even, as the global community faces the Covid-19 pandemic together, Twitter is helping people find reliable information, connect with others, and follow what’s happening in real time.

Research Methodology

To conduct the sentiment analysis and opinion-mining process in which a piece of writing is analysed computationally positively, negatively, or neutrally, the tweets were fetched from Twitter using Python. The lockdown and work-from-home model led to a wide-scale increase in the socialisation of people on the internet via Facebook, Twitter, etc. These platforms gave people opportunities to express their feelings and sentiments. The tweets were fetched during March, April, and May when most of the economies were under lockdown due to the Covid-19 pandemic. Tweepy, TextBlob, and Corpora were installed on Python (Sai & Balachander, 2020), and with a registered app through a Twitter account, the tweets through Twitter API were fetched. Tweepy is a Python client for official Twitter API, TextBlob was used for cleaning and processing the fetched tweets, and the Sentiment classifier (D’Andrea et al., 2015) was then used to classify the tweets as positive, negative, or neutral (Da Silva et al., 2014; Pujazon-Zazik & Park, 2010; Shelar & Huang, 2018). Finally, logistic regression was incorporated to analyse the fetched tweets, since it explains the relationship between one dependent binary variable and one or more independent variables. In other words, it is a classification algorithm that predicts the occurrence of an event in binary as well as multiple predicting variables, which is categorical (Da Silva et al., 2014). The maximum likelihood method has been used while estimating the logistic regression, as it determines the parameters that are likely to produce the observed data and assumes mass function (Allison, 2008; Friedman et al., 2000; Pregibon, 1981; Spitznagel, 2007).

The sigmoid function, also called the S-curve, has been used as it can take up any real value and map it into values ranging from 0 to 1 (Lipovetsky, 2010; Pampel & Denney, 2011), as seen in Figure 2.

Figure 2:

Equation1–Sigmoid Function

A confusion matrix has been used to evaluate the performance of the classified model. A confusion matrix is “a table that is often used to describe the performance of a classification model (or ‘classifier’) on a set of test data for which the true values are known. It allows the visualisation of the performance of an algorithm” (Schulz et al., 2011; Ting, 2017). See Figure 3 & Figure 2.

Figure 3:

Confusion matrix

where,

TP is true positive

TN is true negative

FN is false negative

FP is false positive

Receiver operates the characteristic curve. Figure 3 is a “graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied and the ROC curve is created by plotting the true positive rate against the false-positive rate at various threshold settings”. The higher the value under the area of the curve the better the predictive power of the model. Considering the different threshold value, the ROC curve is built, but which is never beyond 1 (Figure 4). The curve helps to determine the threshold values too (Bradley, 1997; Faraggi & Reiser, 2002).

Figure 4:

ROC Curve

Results and discussion

The tweets’ data were processed using the various libraries of Python by installing the needed data to process the tweets. The data was checked for any missing values and outliers to get accurate predictions. From the sklearn linear model logistic regression was imported, as shown in Figure 5. The tweet data was run for logistic regression on the test and train data with the train size=.70, as shown in Figure 5.

Figure 5:

Logistic Regression on Python

The confusion matrix was created with the following values

array([[632, 86], [287, 190]]

The classification report depicted an accuracy of .687. The precision value (measures the ability of the classifier not to label as positive a sample that is negative) was .688 and the recall value (measures the proportion of positives that are correctly identified) was .398. while the F1 score was .50. F1 Score is the weighted average of precision and recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. This F1 score was quite good compared to the training data, and the accuracy of positive and negative sentiments are quite a bit better than that of the neutral. In the results, 1 and 0 represent positive and negative sentiments respectively (Figures 6 and 7).

Figure 6:

Confusion matrix on Python

Figure 7:

Confusion matrix–accuracy on Python

The receiver operating characteristic curve’s value was higher under the area of the curve, signifying better predictive power of the model (Bradley, 1997; Faraggi & Reiser, 2002) as shown in Figure 8. All this clearly proved the reliability and validity of the model and the algorithms.

Figure 8:

ROC curve on Python

From Table 1 it can be understood that any data where the sentiment score was less than or equal to 0.50 is considered as a positive tweet and a score holding less than 0.49 has been considered negative. Looking at each keyword one by one, trekking has a positive sentiment with 0.06K likes and 97.71K re-tweets, which shows a high performance of the tourism site where trekking is available. Similarly, the keyword nature has a positive sentiment score: though there are no likes, it has been re-tweeted at 1238.4k. The performance is high, likewise, for the keywords travelgram, museums, and adventure tour. On the other hand, keywords spiritual tourism and nostalgia bear very low sentiment scores (0.12 and 0.36, respectively) which was considered as negative, though 5310.9K re-tweets showed low performance. Other keywords such as memorials, travel life, travel diaries, mountains, cruise, antitravel, holiday wanderlust, and beaches, performed moderately.

Sentiment Analysis

Keywords sentiment likes re-tweet Performance
Trekking 0.52 0.06 97.71 High
Beaches 0.2 0.04 325.68 Moderate
Travel 0.34 0.01 439.42 Moderate
Wanderlust 0.24 0.81 1521.9 Moderate
Holiday 0.3 0.15 1455.17 Moderate
Travelgram 0.55 0.5 16.04 High
Nature 0.58 0 1238.4 High
Insta travel 0.44 0.59 5.83 Moderate
Travel life 0.43 1.43 9.33 Moderate
Travel diaries 0.44 0.58 6.78 Moderate
Mountains 0.38 0.04 3471.22 Moderate
Cruise 0.25 0.12 822.25 Moderate
Birding .32 0.99 66.11 Moderate
Museums 0.51 0.03 6895.4 High
Memorials 0.27 0.85 965.5 Moderate
Nostalgia tourism 0.3636 2.367 3.41 Low
‘Spiritual’ tourism 0.12 0.12 5310.9 Low (many religious places are closed till date)
Adventure Tour 0.54 1.26 77.6 High

From the sentiment analysis, it can be clearly seen that nature, trekking adventure tours, and visits to the museum will be the most sought-after places by tourists once the economies open their boundaries for tourism. The tourism sector must thus focus on developing the sites where the tourist can experience adventure, and further sites where trekking can be possible should be developed, so that trekking coupled with adventure can draw more tourists to these destinations. Museums have been the most liked and visited, as they create an experience and build adventure, like time traveling into the past. The tourist sector should consider this point, take care of museums well, and also create further experiences by creating IT-enabled services. The IT-enabled services may create a virtual experience by adding more adventure to their tours. The next-ranked destinations, such as memorials, mountains, cruises, holiday wanderlust, and beaches, should be considered by the tourism sector to enhance customer experience according to the sentiment analysis depicted in Figure 9.

Figure 9:

Surface Diagram of Sentiment Analysis

Conclusion

This paper analysed tweets in detail with the help of sentiment analysis to highlight their usefulness to the tourism sector. The analysis in particular presents the road map to the revival of the tourism sector after the pandemic. The various algorithm scores predicted which enhancements of the tourism sector will improve the customer experience. The sentiment analysis of the tweets revealed that the majority of the tweets were positive and there was significant enthusiasm for the revival of the sector. The findings suggest that the tourism sector could leverage the customer sentiment expressed in the tweets in order to understand their needs and expectations and to enhance their customer experience. The findings from the analysis could be used to plan promotional activities and to understand the sentiments of target customers. Overall, this analysis has provided valuable insights into the sentiments of the people regarding the tourism sector and the road map to its revival. The paper also highlighted the need for changes to existing tourism policies and regulations to ensure the safety of tourists and employees. The analysis also discussed the importance of social media marketing and how to use it to promote tourism in the post-Covid era. All in all, this paper provides valuable insights and a practical road map to the revival of the tourism industry.

Implications and limitations

The study contributes a lot to the tourism literature in many respects. Firstly, the study contributes methodologically, in the way in which the tweets have been fetched and the sentiment analysis has been conducted, which is an innovative and novel way of predicting the foot-print in the various sites of tourism. Next, the literature contributes to the hospitality sector concerning the occupancy of hotels, resorts, and other properties (McCartney & Pao Cheng Pek, 2018; Philander & Zhong, 2016). This study also looks at other tourism sites, such trekking sites, beaches, spiritual sites, and so on, which is an altogether new dimension that has been analysed for the tourism sector. Further, the paucity of research on the tourism sector, especially in the Indian sub-continent, has also led to the motivation to take up this study, so the study will subsequently contribute to the generalisation of findings in the Asian sub-continent, because of the similarity in the culture. Again, the extant studies have undertaken sentiment analysis in the developed countries, and the scarcity of studies in the context of developing countries has stimulated us to undertake the study, so the study contributes well to the literature of the tourism sector in developing strategies to draw tourists of the fast-developing countries as major income in the tourism sector is contributed by the fast-developing countries (Crane et al., 1997; Lu et al., 2018; Mazumder et al., 2009; O’Sullivan & Jackson, 2002). Though it has been predicted that there will be a rapid revival of the tourism sector, which has been proved in this study, which sites of tourism have to be considered to enhance the customer experience has not been predicted by prior studies. This again is a substantial contribution towards the tourism literature.

From the tourism perspective, the sentiment analysis enables tourism organisations to keep track of tweets and scale these to useful opinions, which could help them in developing their revival strategies during the global pandemic. The sentiment analysis helps marketers to know the trajectory of public opinion, which would enable them to be ready with the right kind of service mix to get maximum advantage. Further, these analyses would also help managers in developing short-term strategies to cope with the pandemic. The sentiment analysis would also help managers in understanding and quantifying the customer’s perception so that better strategies can be developed to revive the tourism sector.

With all its advantages, the paper also has certain limitations; it would be unethical not to mention them. The first limitation is data fetching: there are limited tweets that can be fetched at a certain point of time; some may be incorrect or biased, and it can create a data bubble. Not only this, more keywords such as vacation, honeymoon, and family trip, may widen the array of studies. Subsequently, the data being smaller in number can sometimes be miscategorised. Future studies can undertake bigger data sets, improving the content validity of these classifier algorithms and establishing guidelines for managers for sample size. Further studies can also be undertaken stressing the demographic profile of the Twitter accounts, which could widen the scope of the study by giving a realtime picture of the sentiment analysis. Future studies can also be considered for traditional market research tools, which can give new insights into the issues, though tweets can act as complementary tools to these research tools. This may enhance the scope and draw more concrete results. Last but not least, the tweets are from a developing economy, India; results may become more concrete to generalise if more developing countries are included in the study and comparative analysis of these nations may give rise to new dimensions in tourism literature.

eISSN:
2182-4924
Language:
English