Accès libre

Research on Forecasting Tourism Consumption Trends Based on Data Analysis Techniques

  
19 mars 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

In the past three years, travel consumption has gradually increased under the calming of the epidemic. At the same time, the continuous progress of science and technology and the popularization of the Internet, the tourism industry is also constantly changing and developing. Against this backdrop, tourism consumption trends are also changing. As people's demand for tourism experience continues to improve, tourism consumption tends to be diversified and personalized. The traditional tour group mode gradually loses market share, and people are more inclined to free travel and customized travel [12]. According to their own interests and needs, independent choice of tourist destinations, transportation, accommodation, hotels, attractions and so on. This kind of personalized tourism not only can meet people's pursuit of unique experience, but also improve the quality of tourism products and services. Moreover, for the reason of continuous cultural export, cross-border tourism consumption has gradually increased, including air tickets, hotels, stores, attraction tickets and so on [34]. After the epidemic, people pay more attention to physical and mental health, and health and leisure tourism consumption is gradually emerging [56]. The pursuit of relaxation in tourism, some activities with health significance, such as yoga, hiking, hot springs, SPA and so on. This also brings new development opportunities to the tourism industry, and hotels, resorts, tourist attractions, etc. have launched products and services related to health and leisure [78]. In addition to traditional consumer considerations such as price and quality, people are increasingly focusing on cultural experience and environmental quality [910]. When choosing tourist destinations and products, people pay more attention to the local cultural heritage, historical sites, folk customs and so on. At the same time, people are more and more concerned about environmental protection and sustainable development, and tend to choose green tourism products and services [1112]. Therefore, the tourism industry needs to innovate and adapt to the trend in order to meet people's demand for tourism consumption. And the tourism consumption trend prediction can be very good to give high quality service to travelers, sellers and producers in the tourism industry chain.

The article combines and optimizes the univariate regression prediction model, grey system prediction model and exponential smoothing prediction model, and constructs an optimized combination prediction model to predict the trend of tourism consumption. In order to test the effectiveness of the optimized combination prediction model, the relative errors of the prediction results are compared with those of the single prediction model, so as to judge the prediction performance of the optimized combination prediction model in this second year. Then, the optimized combination forecasting model constructed in this paper is used for example analysis to take the inbound tourism of City A as the research object, and based on the number of inbound tourists and foreign exchange income from international tourism in City A from 2007 to 2023, the number of international tourists and foreign exchange income from 2024 to 2028 are forecasted.

Overview

Currently, most studies focus on the prediction of travel demand, mainly using social platform comment data, search engine search data, travel platform data, etc., after the search cycle decomposition [13], Google Trends [14], multi-sequence structured time series models [15], deep learning models [16], etc. There are fewer forecasts of tourism consumption trends, but for the analyzed data used are the same and the forecasting principles are similar. Meanwhile, tourism demand prediction also includes the travel mode of tourists, tourist destinations, hotel requirements, attraction needs, etc. These correspond to tourism consumption trends, including data such as travel prices, attraction ticket and scenic spot commodity prices, hotel prices, and average consumption levels in tourist destinations. Therefore, the prediction of travel consumption trends is definitely associated with data such as travel mode chosen by travelers' personal travel platforms, destinations and surrounding attractions searched by search engines, hotel bookings, and past travel consumption history. And analyzing the data is also a major process in predicting trends. Data analysis techniques such as descriptive statistical analysis, exploratory data analysis, data mining, and cluster analysis can be used for trend prediction. Among them, descriptive statistical analysis focuses on the concentration trend of data by summarizing data characteristics [17]. Exploratory data analysis explores data features in depth by plotting statistics and other means to predict potential trends and anomalies [18]. These suggest that data analysis techniques are feasible in predicting tourism consumption trends.

Construction of optimized portfolio forecasting model based on tourism consumption
Tourism Consumption Forecasting under the Univariate Regression Forecasting Model
Overview of regression analysis forecasting

One of the main uses of regression is as a tool for prediction. If the regression model reflects the relationship between the dependent and independent variables well, the independent variables of the regression equation can be used to predict the dependent variable.

There are various types of predictive methods of regression analysis. According to the different classification of the number of independent variables in the correlation, it can be divided into the univariate regression analysis method and multiple regression analysis method [19]. In univariate regression analysis, there is only one independent variable, whereas in multiple regression analysis, there are more than two independent variables. Based on the different correlations between the independent and dependent variables, it can be divided into linear regression prediction and nonlinear regression prediction.

The main object of regression analysis is the statistical relationship between the variables of objective things, which is based on a large number of experiments and observations of objective things, and is used to find the statistical regularity hidden in those phenomena that seem to be uncertain. Regression analysis is an effective tool for studying the closeness of interrelationships between variables, structural states, and model predictions through the establishment of statistical models.

Univariate linear regression forecasting modeling mechanism

If the relationship formed between an independent variable x and a dependent variable y is a linear correlation, then a unitary linear model is built to describe the relationship between x and y. And the one-dimensional linear model constructed is called the subsumption model and is denoted as: y=α0+α1x+ε

If n observation (xi, yi), i=1,2,…,n is known.

Then the following equation holds: yi=α0+α1xi+εi,i=1,2,,n

Let Y = (y1,y2,…,yn)T, ε = (ε1,ε2,…,εn)T, α = (α1,α2,…,αn)T, then: X=(1x11x21xn)

Then (1) can be written as: Y=Xα+ε where α is the unknown parameter Y is the vector of observations, X is the matrix of observations, and ε is the vector of random errors. General assumptions rank(X) = m + 1, E(ε) = 0, cov(ε,ε) = σ2In, where σ2 is the variance of the random errors and In is the n th order unit matrix.

Least squares estimation of regression parameters

The least squares estimation of α i.e. choosing α to minimize the sum of squares of the error terms: S(α)=i=1nεi2=εTε=(YXα)T(YXα)=i=1n(yiα0α1xi)2.

A regular equation is obtained: XTXα=XTY

Solving the regular equation gives the least squares estimate of the regression parameter α as: α^=(α^0,α^1)=(XTX)1XTY

Bringing α^=(α^0,α1) into model (1) and omitting the error term is called: y=a^0+a^1x is the regression equation, using the regression equation can be from the observed value of independent variable x to find the predicted value of y.

Linear regression model significance test

In the study of the actual problem, we can't conclude that there is a linear relationship between the dependent variable y and the independent variable x, therefore, before estimating the regression parameters, we use the linear regression equation to fit the relationship between the dependent variable and the independent variable, which is only an assumption made on the basis of some qualitative analysis. The validity of the regression model obtained from the statistical analysis, the validity of the factors introduced, the existence of a linear correlation between the variables, and the applicability of the model have to be determined by testing.

The test of correlation coefficient (R2 test) uses the ratio of the regression sum of squares (Y^Y¯)2 to the total deviation sum of squares (YY¯)2 to illustrate the correlation between X and Y, i.e., there is a R2= (Y^Y¯)2 (YY¯)2, R2 is called the coefficient of determination, or also known as the coefficient of decidability. The coefficient of determination is calculated to test the fit of the regression equation to the sample observations. The R2 describes the proportion of the total change in y that can be reflected by the value of the linear function of the independent variable, and R2 should take a value between 0 and 1. The closer its value is to 1, the higher the degree of correlation.

Regression equation test (F test), the regression equation for the test of significance, that is, to see whether the independent variable from the total on the dependent variable has a significant effect. In a one-way linear regression equation, since there is only one independent variable, the test of independent variable x is equivalent to the test of the whole equation: F= (Y^Y¯)2/1 (YY^)2/(n2)~Fa(1,n2)

When F > Fα(1,n–2), the original hypothesis is rejected and the alternative hypothesis is accepted, i.e. the linear equation is significant.

When F < Fα(1,n–2), the original hypothesis is accepted and the alternative hypothesis is rejected, i.e. the linear equation is insignificant.

Regression coefficient test (t-test), in the regression equation, the regression coefficient illustrates the quantitative relationship between the independent variable and the dependent variable. When the regression coefficient is zero, it indicates that there is no quantitative relationship between the independent variable and the dependent variable, then the independent variable has no significance in the equation and should be eliminated from the equation. Otherwise, when the regression coefficient is not zero, it should be retained. That is, the significance test of the regression coefficients tests whether each independent variable xm acts significantly on the dependent variable Y. If it is not significant, the independent variable should be dropped from the regression equation.

Tourism consumption forecasting under the gray system forecasting model

Gray GM (1, 1) model for the need for less data, testable, easy to operate and short-term prediction of higher accuracy, showing its superiority over the traditional forecasting methods, GM (1, 1) model of the advantages of the model has led to more and more scholars in the modeling of prediction choose to use it for prediction [20]. Take the mean GM (1, 1) model as an example, its modeling process is as follows:

Let the observed value of a behavioral characteristic sequence of the system be: X(0)={ x(0)(1),x(0)(2),,x(0)(n) },

Its one-time cumulative generation sequence is: X(1)={ x(1)(1),x(1)(2),,x(1)(n) } where x(1)(k)=i=1kx(0)(i), k= 1,2,…,n.

The mean generating sequence is: Z(1)={ z(1)(2),z(1)(3),,z(1)(n) }

Among them: z(1)(k)=12(x(1)(k)+x(1)(k1)),k=2,3,,n

We weigh in: x(0)(k)+az(1)(k)=b is the homogeneous form of the model for GM (1,1).

Model the first order linear differential equation for X(1): dx(1)(t)dt+ax(1)(t)=b

Eq. (14) is called the whitened differential equation in the form of the mean of the GM(1,1) model.

Where a is the development coefficient, b is the gray role quantity, k is the time series, and the value of the parameter vector a^=[a,b]T can be estimated by applying the least squares method: a^=(BTB)1BTY where Y,B is respectively: Y=[ y(0)(2)y(0)(3)y(0)(n) ],B=[ z(1)(2)1z(1)(3)1z(1)(n)1 ]

The solution of the differential equation (14) is: x^(1)(t)=(x(0)(1)ba)ea(t1)+ba

Perform a cumulative reduction of equation (15): x^(0)(k)=x^(1)(k)x^(1)(k1),k=1,2,,n

The gray prediction model for the original series is: x^(0)(k)=(1ea)(x(0)(1)ba)ea(k1),k=1,2,,n

Forecasting tourism consumption under the exponential smoothing forecasting model
Overview of exponential smoothing forecasting methods

Exponential smoothing method is based on the moving average method, through the recent data to give a larger weight, and relatively early data to give a smaller weight, in order to make the earlier data on the prediction of the results of the impact of the reduction of the recent data to play a relatively large role in the impact of the data, usually has a good short-term prediction accuracy [2122].

Brown's single-parameter linear exponential smoothing method is a kind of exponential smoothing prediction method in the second exponential smoothing prediction method, is in the first exponential smoothing value based on another exponential smoothing. Its basic principle is to correct the linear trend under the time series by using the smoothing value, so as to establish a linear smoothing model to realize the prediction of the main body of the study, and its advantage lies in the fact that through the application of Brown's single-parameter linear exponential smoothing method, it can very well solve the phenomenon of prediction lag that exists in both the moving average method and the primary exponential smoothing method.

Exponential Smoothing Forecasting Model Construction

The basic principle of Brown's single-parameter linear exponential smoothing method is similar to that of the linear quadratic moving average method because when a trend exists, both the primary and secondary smoothed values lag behind the actual values, and the trend can be corrected by adding the difference between the primary and secondary smoothed values to the primary smoothed value. The formula for this model is: St=αxt+(1α)St1 St=αSt+(1α)St1 where St is the primary exponential smoothing value. St is the secondary exponential smoothing value. αt=St+(StSt)=2StSt bt=α1α(StSt) Ft+m=αt+btm where m is the number of periods ahead of the forecast.

Tourism Consumption Forecasting under the Optimal Combination Model

The size of tourism consumption is affected by many internal and external factors, and at present, the forecasting of tourism consumption generally adopts a single forecasting method, such as the regression forecasting, gray system forecasting and exponential smoothing forecasting methods used above. However, a single prediction method often only captures certain major influencing factors, and does not encompass the full range of effective information on tourism consumption, which makes the accuracy of the prediction results is relatively low, in order to make full use of the effective information of a single prediction model to improve the accuracy of the prediction results, here, this paper adopts the optimized combination of prediction methods to analyze and predict the tourism consumption.

Let there exist n different forecasting methods (or models) Yit(i = 1,2,…,n;t = 1,2,…,N) for the same forecasting object, and if Yt is the actual observed value in period t, then the general form of the optimal combination forecasting model can be established: Yt=i=1nwi(Y^it+eit)

Where eit=Y^itYt is the prediction error of the i nd method for period t and is the random perturbation term. wi is the weight in the combined prediction model of the i th method, which satisfies i=1nwi=1. And Y^t=i=1nwiY^it is the selected combined prediction model, and ei=Y^tYt is the fitting deviation of the combined prediction model.

Set E=[ t=1Ne1t2t=1Ne1te2tt=1Ne1tentt=1Ne2te1tt=1Ne2t2t=1Ne2tentt=1Nente1tt=1Nente2tt=1Nent2 ], the mathemetical planning model for determining the optimal weights based on the principle of least squares is: { MinionmumJ=WTEW s.t. WTRn=1W0

The analytical form of the optimal weight vector is obtained using the Lagrangian function method: W0=(RnTE1Rn)1E1Rn where Rn = (1,1,…,1)T is a n-dimensional column matrix. To ensure that E−1 exists, it is required that the error vectors ei of the n models are linearly independent.

Forecast analysis of tourism consumption trends
Analysis of model prediction performance

This subsection takes Province S as the research object, and through the optimized combination model constructed in the previous section, the forecasting performance of each forecasting model is tested by comparing the relative errors between the actual values of the per capita consumption of inbound tourism data and the forecasting values of the forecasting models in Province S from 2007 to 2023.

The per capita consumption of inbound tourism in Province S from 2007 to 2023 is shown in Table 1. From Table 1, it can be found that before 2020, the per capita consumption amount of tourists traveling into Province S shows a rising trend year by year, increasing from 3,520 yuan in 2007 to 5,242 yuan in 2019, and the per capita consumption amount has increased by 48.92% during the 13- year period. 2020-2022, due to the major public health hazards, the tourism industry of China suffered a heavy blow, and the number of inbound tourists plummeted, and the the amount of per capita consumption drops dramatically. It is not until 2023 that per capita consumption basically returns to the 2019 level.

2007-2023 tourism per capita consumption

Year Per capita consumption Year Per capita consumption
2007 3520 2016 4857
2008 3646 2017 4894
2009 3838 2018 5135
2010 4152 2019 5242
2011 4166 2020 245
2012 4174 2021 1384
2013 4384 2022 3746
2014 4766 2023 5216
2015 4779

Using Pycharm Profession 2022.2.3 software, for the data of per capita consumption amount of inbound tourism in S city from 2007 to 2023 in Table 1, a single univariate regression prediction model (URM), a gray GM (1, 1) model and an exponential smoothing prediction model (ESM) were used to make predictions with the optimized combination of prediction models constructed in this paper, and the models were compared with the relative errors of the actual results to test the prediction performance of each model. The prediction results of each model were compared with the relative error of the actual results to test the prediction performance of each model.

The comparison of the prediction values and relative errors of the single prediction model and the optimized combination prediction model are shown in Table 2. The prediction effect of the univariate regression prediction model is the worst, with an average relative error of 10.26%, while the average relative error of the gray GM(1,1) model is 6.21%, and that of the exponential smoothing prediction model is 6.26%, and the prediction accuracies of the three single prediction models are inferior to that of the optimized combination prediction model in this paper. The prediction accuracy of the optimized combination prediction model in this paper is the highest, and the average relative error is only 1.65%. Therefore, the optimized combination prediction model in this paper has good prediction performance and can be used to predict the trend of inbound tourism consumption in S province.

Comparison of predict results of four predict models

Year URM GM(1,1) ESM Ours
Predict value Relative error/% Predict value Relative error/% Predict value Relative error/% Predict value Relative error/%
2007 3822 8.58 3479 1.16 3666 4.15 3537 0.48
2008 3982 9.22 3555 2.50 3945 8.20 3661 0.41
2009 4033 5.08 4017 4.66 4122 7.40 3902 1.67
2010 4373 5.32 4221 1.66 4280 3.08 4091 1.47
2011 4513 8.33 4433 6.41 4461 7.08 4164 0.05
2012 4565 9.37 4527 8.46 4520 8.29 4167 0.17
2013 4702 7.25 4600 4.93 4674 6.61 4324 1.37
2014 4977 4.43 4986 4.62 4998 4.87 4705 1.28
2015 5082 6.34 4994 4.50 5047 5.61 4834 1.15
2016 5201 7.08 5121 5.44 5075 4.49 4916 1.21
2017 5340 9.11 5324 8.79 5103 4.27 4977 1.70
2018 5554 8.16 5509 7.28 5281 2.84 5167 0.62
2019 5718 9.08 5531 5.51 5409 3.19 5260 0.34
2020 321 31.02 191 22.04 298 21.63 227 7.35
2021 1676 21.10 1498 8.24 1480 6.94 1447 4.55
2022 4391 17.22 3822 2.03 3576 4.54 3895 3.98
2023 5620 7.75 5600 7.36 5046 3.26 5225 0.17
Mean relative error 10.26 6.21 6.26 1.65
Analysis of the results of tourism consumption forecasts

This subsection takes City A, the capital city of Province S, as the research object, and analyzes the number of inbound tourists and income of City A from 2007 to 2023 by using the optimal combination of forecasting models in this paper, and predicts the number of inbound tourists and income of City A from 2024 to 2028 on this basis.

Results of the forecast of tourist arrivals

According to the data published by the Bureau of Culture and Tourism of City A, the statistical data of the total number of inbound tourists in City S and the number of inbound tourists coming to City A from the seven major source countries of Japan, South Korea, Russia, the United States, Europe, Southeast Asia, and Australia in 2007-2023, the number of inbound tourists of City A in the period of 2024-2028 is obtained by designing the optimal combination forecast model, inputting data, and processing the data. The forecast results are shown in Table 3.The number of inbound tourists in City A from 2024 to 2028 increases year by year, and the scale of the tourism market keeps expanding, with an average growth rate of 6.40%, and the development of inbound tourism in City A continues to be good. Among the seven major tourist sources, the average growth rate of Southeast Asian and Russian tourists is the highest, 13.43% and 10.64% respectively, while the growth rate of European tourists is the lowest, 2.98%. The reason for this difference may be that City A's tourism development was earlier, and in the past, inbound tourists mainly came from developed European countries. After years of development, the development space of the European market has become smaller, so the growth rate of European tourists is lower. On the other hand, in recent years, City A has launched tourism consumption marketing to markets that are geographically closer (such as Southeast Asia), which has increased the attractiveness of tourism to residents of these regions, and thus the growth rate of inbound tourists from these regions is higher.

Predict results of inbound tourist quantity from 2024 to 2028

Year 2024 2025 2026 2027 2028 Average growth rate/%
Total number 179529 190931 203972 213265 229994 6.40
Japan 33123 34488 36830 37309 39580 4.57
Korea 25778 26398 28725 30138 32241 5.78
Russia 22132 27534 29758 30053 32782 10.64
USA 28391 28439 30351 32577 34644 5.14
Europe 35612 35774 36727 38267 40031 2.98
Southeast Asia 13337 16289 17876 19122 21967 13.43
Australia 21156 22009 23705 25799 28749 8.00

The overall trend of the number of inbound tourists in City A in the future is mainly due to the fact that in recent years, as the Municipal Government of City A has continuously increased its investment in overseas promotion and has set up tourism offices overseas, it has fully mobilized the enthusiasm of the tourism enterprises, and at the same time encouraged them to take effective measures to increase the number of inbound tourists, and the number of international travel agencies and the number of high-end hotels in City A have been constantly increasing. In addition, the number of international travel agencies and high-end tourism hotels in City A is also increasing, and these tourism enterprises continue to innovate their promotional methods and launch new routes and products in overseas markets through tourism websites in a timely manner, and the air, sea and land transportation facilities in City A are becoming more and more developed, so more overseas tourists will travel to City A in the future. By forecasting the number of inbound tourists to City A, it is possible to adjust the tourism supply of City A even if necessary, so as to prepare in advance for receiving more overseas tourists.

Results of tourism revenue projections

In this paper, based on the incremental tourism revenue data of A city for 17 years from 2007 to 2023, the incremental percentage sequence of tourism foreign exchange is divided into states, and the results of the division are shown in Table 4. The raw data of international tourism foreign exchange revenue in City A are processed to obtain the incremental sequence as shown in Table 5. Due to the outbreak of a major public health event (New Crown Epidemic) in December 2019, the tourism industry of City A in the following 3-4 years has been seriously affected, so the data of 2020-2022 is not universally referential, so the international tourism foreign exchange revenue before 2020 is used as the basis for forecasting the international tourism foreign exchange revenue of City A in the next 5 years (2024-2028).

Status division of incremental percentage sequence of foreign tourism exchange income

Status A B C D
Annual increment percentage 0%-10% 10%-20% 20%-30% 30%-40%

The relative annual increment status of foreign tourism exchange income

Year Number Foreign tourism exchange income/104 yuan Incremental percentage/% Status
2007 0 2628 - -
2008 1 2811 6.96 A
2009 2 3799 35.15 D
2010 3 4167 9.69 A
2011 4 4726 13.41 B
2012 5 4993 5.65 A
2013 6 5628 12.72 B
2014 7 6133 8.97 A
2015 8 7033 14.67 B
2016 9 8015 13.96 B
2017 10 8613 7.46 A
2018 11 9404 9.18 A
2019 12 10861 15.49 B
2020 13 184 / /
2021 14 915 / /
2022 15 4258 / /
2023 16 8512 / /
2024 17 10866 27.66 C
2025 18 14316 31.75 D
2026 19 17783 24.22 C
2027 20 21817 22.68 C
2028 21 26524 21.57 C

Taking the forecast of the five years after 2023 as an example, that is, from 2024 to 2028, the relative annual increment percentage is 0.2766 in C state, 0.3175 in D state, 0.2422 in C state, 0.2268 in C state, and 0.2157 in C state, according to the principle of maximum probability, in 2028, the incremental percentage of inbound tourism foreign exchange income in city A should be taken in C state, that is, 20%-30%, in order to simplify the prediction model, On this basis, assuming that the average annual growth rate is in the range of [20%, 30%], the value range of foreign exchange income from inbound tourism in city A is [26180, 28362]. The same applies to forecasts for any other year.

Conclusion

In this paper, the univariate regression prediction model, the gray system prediction model and the exponential smoothing prediction model are combined to construct an optimal combination prediction model, which is used in the prediction of tourism consumption trend. After comparing the performance of each prediction model, the number of tourists and income in the next few years are predicted based on the tourism data of previous years.

The relative errors between the prediction results and the actual results of the univariate regression prediction model, the gray GM (1, 1) prediction model, the exponential smoothing prediction model and the optimized combination prediction model in this paper are 10.26%, 6.21%, 6.26% and 1.65%, respectively. The optimized combination prediction model in this paper has the best prediction performance.

Based on the tourism data of City A from 2007 to 2023, the number of tourist arrivals and revenues from 2024 to 2028 are forecasted.The number of inbound tourists in City A from 2024 to 2028 shows an upward trend, with the average growth rate of the number of inbound tourists amounting to 6.40%.The average growth rate of inbound tourists from the seven major source countries of Japan, South Korea, Russia, the U.S.A, Europe, Southeast Asia, Australia were 4.57%, 5.78%, 10.64%, 5.14%, 2.98%, 13.43% and 8.00% respectively. Due to the disruption of the New Crown Epidemic, City A's international tourism foreign exchange earnings in 2024 will be basically the same as in 2019, and by 2028, City A's international tourism foreign exchange earnings are expected to reach 265.24 million yuan, with annual incremental percentages of greater than 20% in each of the years from 2024 to 2028.