Big data analysis based on the correlation between live-streaming with goods, perceived value and consumer repurchase

Clarifying the correlation between live banding, perceived value


Introduction
In the post-epidemic era, the popularity of the mobile Internet and the expanding scale of mobile online consumer users have driven a shift in consumer consumption habits, with live-streaming with goods becoming the first choice for most consumer purchases [1].In today's society, with the rapid rise of e-commerce shopping, people can access a wide range of information about goods simply by going online [2].
The Internet has brought all factories to people's eyes, and in the face of the abundant supplies coming to them, people are increasingly accustomed to using the Internet to place orders with a single click, relying on fast and professional modern logistics to obtain the needed supplies most quickly and conveniently without leaving home [3]- [4].The emergence of e-commerce shopping has greatly improved the efficiency of modern life and consumers' quality of life.Directly carried goods appear everywhere in every aspect of our lives.Even if we do not deliberately look for them, they will appear in our lives by osmosis [5]- [6].
Numerous studies may give a little reference for the correlation analysis of live-streaming, sensory value and consumers' repurchase behavior.The literature [7] argues that live-streaming is a marketing method in which merchants create contextualized consumption scenarios for audiences with specific needs, deliver information to consumers at a lower cost, satisfy consumers' emotional appeals and stimulate their desire to consume.In the literature [8], it is argued that live streaming is a product marketing method created by merchants to meet consumers' shopping experience, which is interactive, entertaining, realistic and visual.The literature [9] found that the perceived value of consumers in the live-streaming context affects their trust in the product or trust in the seller and promotes the creation of custom fit.The literature [10] argues that live banding differs from traditional advertising because of its strong immediacy, which can stimulate consumers' desire for interaction.The literature [11] studied live e-commerce and concluded that the sense of engagement immersion in the live social context positively affects purchase intention.
In addition, literature [12] verified that the interactivity, authenticity and vividness of live shopping could influence consumers' purchase intention, using spatial proximity and social proximity as mediating variables.The literature [13] investigated the impact of mobile traffic on consumers' mobile shopping intentions and found that both perceived usefulness and perceived ease of use were significantly correlated with mobile traffic, mobile traffic played an overall mediating role between consumer attitudes and perceived usefulness, and traffic was significantly correlated with both consumer attitudes and purchase intentions.The literature [14] argues that the sense of interactive experience on a website is a stimulus that positively affects the customer's perceived value and also increases the user's willingness to repurchase.The literature [15] argues that perceived value is more of an expression of subjective will and further states that perceived value is the consumer's overall evaluation of product quality, shopping convenience, and shopping purpose.
This paper first provides a detailed description of big data analytics technology and proposes the basic framework of big data analytics and six main directions of big data analytics, namely visual analytics, data mining algorithms, predictive analytics capabilities, semantic engines, data quality and data management, data storage and data warehousing.The differences between big data analytics and traditional data mining are discussed, and the common analytic algorithms for big data analytics are explained.Secondly, because of the characteristics of the research content of this paper, the KNN algorithm is described as the main method of data analysis, and the basic principle of the KNN algorithm, the definition of distance criterion and the specific measures of optimizing KNN algorithm, by using Gaussian kernel function are introduced, respectively, from two aspects of improvement of distance function and improvement of decision rule.This paper also does performance analysis for the KNN algorithm optimized with Gaussian kernel function and cross-validates this paper's algorithm with five other algorithms with eight data sets of UCI database in ten folds.The concepts of live streaming with goods, sensory value and consumer repurchase are explained in detail, and the KNN algorithm is used to mine the Taobao live streaming platform data by example.Finally, the data examples of the Taobao live streaming platform are analyzed to explore the correlation between live streaming with goods, sensory value and consumer repurchase.

Big Data Analytics
Big data analytics refers to the analysis of data of enormous size.Big data analytics was created with the aim of IT management, where companies can combine real-time data flow analysis with historically relevant data, then big data analytics and discover the models they need.In turn, helping to predict and prevent future operational disruptions and performance issues.Further, they can use Big Data to understand usage models, deepening Big Data insights into important users.They can also track and record network behavior, and Big Data analytics can easily identify business impact.The basic framework of big data analytics is shown in Figure 1 [16].

The main directions of big data analytics
Big data analytics is developing with the technology, and the application area is also deepening.Current big data analytics is divided into six main areas: visual analytics, data mining algorithms, predictive analytics capabilities, semantic engines, data quality and data management, data storage and data warehousing.The six directions constitute the basic framework of current big data analytics and provide more visual data support for our life.

The difference between big data analytics and data mining
Although big data analysis is also illustrated for volume-related analysis, there are certain differences from traditional data mining.The details are shown in Table 1.The data analysis process by Big Data requires using relevant data analysis algorithms.Commonly used big data analysis algorithms include the C4.5 classification decision tree algorithm, K-means algorithm, support vector machine, Apriori algorithm, AdaBoost iterative algorithm, K-nearest neighbor classification algorithm, plain Bayesian algorithm, CART: Classification and regression tree algorithm, and so on.In this paper, the K-nearest neighbor classification algorithm is chosen as the main research method based on the data characteristics of the research content.

Big Data Analytics Algorithm -K Nearest Neighbor Classification Algorithm
The KNN algorithm is a widely used classification and regression method in machine learning.KNN algorithm, also called the K-nearest neighbor algorithm, determines the similarity of the sample to be tested based on the distance feature of the nearest neighbor samples to classify them, i.e., by calculating the distance between the sample to be tested and the nearest neighbor samples in training set to determine the class to which the sample to be tested belongs.

The basic principle of the KNN algorithm
The KNN algorithm, a commonly used supervised learning method in big data analytics, can perform regression and classification prediction tasks.Its working principle is simple and straightforward compared to other big data analytics algorithms [17].The specific steps are as follows: Step 1: Given the feature data of the test set, the distance between the test sample and the training sample is calculated based on some distance metric (e.g., Euclidean distance, Marxian distance, Manhattan distance, Chebyshev distance, etc., usually, Euclidean distance is chosen).
Step 2: For the test samples, we select the nearest K training samples from the training set, then make predictions based on the values of these K "neighbors". k Step 3: In the classification task, the category with the highest number of occurrences among K samples is selected as the result of the classification prediction.In the regression task, the average label values corresponding to the K samples is selected as the result of the regression prediction.The KNN algorithm is schematically shown in Figure 2.

Definition of distance criterion for KNN algorithm
After defining the basic features in the sample data, they can be represented by vectors, and the degree of similarity between two data in the feature vector space, can be expressed by distance.In the training sample set with known labels, samples with the smallest Euclidean distance from the sample to be tested are selected, assuming that a data in the sample data set belongs to a -dimensional space , which is measured by the Euclidean distance.Assuming that the th sample , where is the th feature attribute of the th sample, the Euclidean distance between the two samples and is defined as [18]: (1)

Gaussian kernel function
Based on the above idea, for the first number in the observation, we can use ("kernel") to fit the probability density of far smaller and nearer.The selection of the "kernel" has little effect on the probability distribution, and the Gaussian kernel function is used for the estimation analysis [19].
The kernel density estimate is generally expressed as: ( Where, is the estimated bandwidth, is the distance between two point distributions, and is the number of data distributions.The application of density estimation is interpreted in spatial data as the probability magnitude of the distance between an instance and its neighboring instances [20].And the kernel function is generally a Gaussian function, i.e.: ( , ,, ) Simplifying the function constant , the following expression for the Gaussian kernel function is derived: (4)

Distance weights of Gaussian kernel functions
According to the Gaussian kernel function, the distance weight of neighboring instances is calculated by the expression: (5) The estimated bandwidth is the Euclidean distance of the variable neighborhood threshold set , for .
The Gaussian kernel function calculates the distance weights under the influence coefficient.The influence factor indicates the degree of influence on the distance-based influence degree calculation when the intersection distances are different; the distance weights for neighboring distances say that the closer the instances are to each other, the greater the degree of association.

Improved KNN algorithm based on Gaussian kernel function
Suppose the original data has total samples, training samples, -dimensional space, and the training samples can be represented as .Let be the set of labeled training set, and has labels, i.e., .Then each data in the dataset can be represented as , and the label of is .

Improvement of distance function
In the traditional KNN algorithm, there are several commonly used distance methods.Suppose is -dimensional test data and is -dimensional training data.
1) Euclidean distance: the distance between two points or the natural length of two vectors in  -dimensional space, i.e., the distance from the point to the origin.Its formula is: 2) Manhattan distance: the sum of the absolute axis distances of two points in a standard coordinate system.Its formula is: { , , , }, 3) Marginal distance: indicates the covariance distance between two points.Its formula is: (8) where is the ∑ !" covariance matrix.
Since it is not guaranteed that all features are consistent with the classification if the distance is calculated without giving weights to the features, the distance between the nearest neighbors is controlled by irrelevant features, and the nearest neighbor method is sensitive to this situation.Therefore, the affinity distance function, a local distance function, is proposed, which considers the distance along with the local learning in terms of features, and its distance formula is: The formula for the weight function in Eq. ( 9) is: (10) The in Eq. ( 10) represents the sum of the distances between the test point and all training points on the th feature, and the represents the sum of the distances between the training point and the rest of the training points on the th feature.
For the weight function, the Gaussian kernel function is chosen as a weight function in this paper, and its expression is: (11) where is the distance between two points in order to make , and denotes the kernel function.
Based on the local distance of the affinity function and the weight of the Gaussian kernel function, this paper proposes a distance function based on the Gaussian kernel, whose expression on thedimensional space is: where the first term on the right side of the equation is the absolute distance between the test point and the training point, the second term is relative to the weight function in the above equation, and in Eq. is the bandwidth.

Improvement of decision rules
The traditional KNN method does not use of similarity function.However, it has been proved that the similarity function used in classification gives higher classification accuracy than the traditional KNN algorithm, so a similarity function calculation method is proposed for KNN.Namely: (13) where is the normal number, and are the average distances between data points calculated over the entire data set.

Steps of the algorithm
In this paper, the steps of the KNN algorithm after the Gaussian kernel function are as follows: Step 1: Standardized processing data.In the comprehensive analysis.When the values of the features differ greatly from each other, the features with higher values will play a decisive role, so the original data set should be standardized before the experiment, and the formula is as follows: (15) where is the mean and standard deviation of all samples, respectively.
Step 2: Cross-validation.To prevent the occurrence of over-and under-learning states, this paper uses ten-fold cross-validation in all experiments, which are run 10 times and averaged.
Step 3: Distance metric.The distance metric in this algorithm uses a distance function based on a Gaussian kernel.
Step 4: Sorting.The values of the distance function obtained in the previous step are sorted in ascending order to get the top K nearest neighbors.
Step 6: Calculate the category score.Calculate the test sample's category score to determine the sample's category.

Introduction to the experimental data set
This section performs performance analysis for the improved KNN algorithm based on the Gaussian kernel function, and eight datasets from the UCI database are selected for ten-fold cross-validation analysis.The details of the eight datasets are shown in Table 2.In order to make sure that the proposed algorithm does not have good classification results for a certain class of datasets, datasets with 55 attributes and datasets with 4 attributes were selected.In addition, the selected datasets also include dichotomous datasets and multiclassified datasets.

Comparison between different distance functions
From the model of the KNN algorithm, we can see that when the training set and K value are determined, the model will also receive the influence of the distance metric.Therefore, in this section, after the training set and K values are determined, the influence of different distance function metrics on the classification accuracy is verified, and the results of the influence of the distance function are shown in Figure 3. 6 datasets, the improved distance function proposed in this paper has higher classification accuracy than several other distance functions on the four datasets, Sonar, Wine, WDBC, and Heart, with 88.53%, 97.36%, 96.79%, and 82.74%, respectively.97.36%, 96.79%, and 82.74%, respectively.However, there are two datasets with low classification accuracy.One is the Balance dataset, which has the highest accuracy on the Marxian distance, 0.3% higher than the distance function proposed in this paper, and the other is the Parkinsons dataset, which has the highest classification accuracy on the Manhattan distance function, 0.34% higher than the distance function proposed in this paper, but the difference between the two data is not very huge.Therefore, we can consider that the distance function proposed in this paper obtains the best classification result.

Comparative analysis with other classification methods
From the previous section, we studied the effect of different distance functions on the algorithm model, and in this section, the algorithm proposed in this paper will be compared with SVM, plain Bayesian algorithm, random forest algorithm, and decision tree algorithm in terms of classification accuracy and F1 evaluation index, respectively.The results of comparing the classification method in this paper with other classification methods are shown in Figure 4.
The algorithm proposed in this paper has the best classification results on four datasets, Balance, Sonar, Wine, and Heart, compared with other algorithms, with classification accuracies of 85.54%, 86.13%, 89.09%, and 89.85%, respectively.However, on the WDBC dataset, the classification accuracy of a decision tree is higher, which is 3.51 percentage points higher than the algorithm in this paper.On the Parkinsons dataset, the classification accuracy of the random forest algorithm is 2.36 percentage points higher than that of the algorithm in this paper.The overall difference is not significant, so the classification result of this algorithm is the best among the five algorithms.
In terms of the F1 value, the closer to 1 indicates the better comprehensive performance of the algorithm.The F1 values of the proposed algorithm on the Balance, Sonar, Wine, and WDBC datasets tend to be 1 more than the other algorithms, with F1 values of 0.944, 0.899, 0.956, and 0.895, respectively.Nevertheless, on the Heart dataset, the SVM algorithm has a better comprehensive performance, which is 0.031 higher than the algorithm in this paper.Parkinsons dataset, the comprehensive performance of the plain Bayesian algorithm is better, which is 0.03 higher than the algorithm in this paper.

Analysis of the correlation between live streaming with goods, perceived value and consumer repurchase
Today, life is no longer an emerging thing, live with goods not only solves the problem of traditional enterprises online being difficult to connect but also solves the problem of traditional e-commerce timely interaction.Nowadays, the live marketing method of the e-commerce industry is in the wind, and it is necessary to study the deep influence behind live banding to create a new environment for live banding better.Based on the KNN algorithm under big data analysis technology, this paper conducts correlation data analysis for the three aspects of live-streaming with goods, perceived value and consumer repurchase to provide a new reference direction for live-streaming with goods.

Analysis of live e-commerce with goods
As a subdivision product of live webcasting, e-commerce live streaming belongs to consumptionbased live streaming.In the live broadcast, the anchor creates "presence" conditions for users through sampling, eating, wearing, and trying so that users are immersed in different live scenarios and influence consumers' willingness to purchase in the interaction [21].
The essence of e-commerce live with goods is a new model of e-commerce and webcast integration of e-commerce, two in the combination of e-commerce and webcast advantageous features in ecommerce live better show, specifically: authenticity, real-time interactivity, community, immersion, and thus the formation of a new retail scenario.

Study of perceived value
Perceived value was first expanded from customer value.Some scholars propose that perceived value reflects consumers' overall evaluation of goods in the shopping process, and the evaluation method mainly compares related costs and benefits.Perceived value reflects consumers' evaluation of product attributes, performance, and experience, and the evaluation results can influence consumers' shopping needs, and customers' learned perceptions and personal preferences are the sources of perceived value.In other words, perceived value is an evaluation derived after the consumer compares the perceived benefits of paying costs after getting a certain good.

Measurement of perceived value
In fact, we can see that perceived value is consumers' subjective feelings.We can divide perceived value into four dimensions: perceived price, perceived function, perceived emotion and perceived social dimension.A valid measure of perceived value is necessary to analyze the correlation between live banding and perceived value [22].
1) Take the emotion of perceived value as the measurement direction.Live broadcasts with goods create economic value, transmit the corresponding positive values, and guide the correct consumption concept.By grasping consumers' emotions, we can make them empathize and thus promote consumption.
2) To perceive the value of trust as the basis for measurement.Living with goods needs to be genuine.If only a marketing gimmick to do the mainstream, then it will undoubtedly be a lingering haze to live with goods.

Analysis of consumer repurchase behavior
Consumer buying behavior is also known as consumer behavior.All the personal behavior of consumers related to consumption occurs around the purchase of living materials.It includes the psychological, physiological and other substantive activities that occur during the purchase or consumption process, from the formation of the demand motivation to the occurrence of the purchase behavior to the post-purchase feeling summery.It is generally expressed in five stages: 1) Confirmation of need: The consumer perceives a need after an internal physiological activity or an external stimulus.
2) Information gathering: Consumers obtain information about products through relevant mass influences, mass media campaigns and personal experience.
3) Evaluate choices: analyze and weigh the information obtained and make preliminary choices.

4) Purchase decision:
The final purchase intention expressed by the consumer.
5) Post-purchase consumption effect evaluation: including the degree of satisfaction and the attitude of whether to repurchase.

Analysis of the correlation between live streaming with goods, perceived value and consumer repurchase
This section takes Taobao live banding platform data as an example and explores the correlation between live banding, perceived value and consumer repurchase by using the KNN algorithm based on Gaussian kernel function optimization under big data analysis technology to mine and analyze the data.

3.4.1
The correlation between live streaming, perceived value and consumer repurchase 1) Live broadcast of goods and the perceived value of correlation mining In live-streaming with goods, the attributes conveyed by the goods allow consumers to feel where the perceived value of live-streaming lies, promoting the growth of the transaction volume of live-streaming with goods.In this section, seven indicators are mined based on the KNN algorithm optimized by the Gaussian kernel function to verify the correlation between the live carry and the perceived value.The details are shown in Table 3. 2) The correlation between the live broadcast of goods and consumer repurchase The purpose of live streaming is to sell goods and create economic value, but the sale is not the end of the sale but also to consider the issue of consumer repurchase.The association between live streaming and consumer repurchase is conducive to forming a complete chain of commodity consumption.By grasping the components of live streaming, we can explore the full range of consumer repurchases.In this section, 14 indicators are mined based on the KNN algorithm optimized by the Gaussian kernel function to verify their correlation, as shown in Table 4.The anchor is charming LGCR1 The anchorman is very humorous and funny LGCR2 I agree with this anchor's lifestyle LGCR3

Interactivity
The anchor responded positively to the audience's questions LGCR4 The audience will respond positively to the topic initiated by the anchor LGCR5

Sense of reality
Watching it live makes me feel real LGCR6 Watching the studio makes me feel social LGCR7 Unconsciously feel that the product is right in front of you LGCR8

Mechanism of incentive
The special offer in the studio will catch my attention LGCR9 I will buy it because of the limited number of live items LGCR10 I will buy new items for live streaming LGCR11 Edge information I can share the experience and feeling of shopping with other viewers LGCR12 I pay attention to the behavior of other people in the audience LGCR13 I will keep an eye on the number of people in the room and the likes LGCR14

Analysis of the correlation between live streaming with goods, perceived value and consumer repurchase
In the previous subsection, the correlation between live streaming, perceived value and consumer repurchase was explored, and in this section, the correlation between the three will be explored by analyzing the live-streaming data from the Taobao Live platform concerning these indicators.
1) The correlation analysis of live banding and perceived value Live carry needs to assign perceived value to goods to gain consumers' emotional perceptions better.In this paper, we analyzed the data of live banding on the Taobao Live platform based on the KNN algorithm optimized by the Gaussian kernel function, and the results of live banding and perceived value are shown in Figure 5.
The seven examples of indicators on live banding and perceived value A, B, C and D ratings accounted for 38.41%, 36.73%,34.54% and 35.4%, respectively.Overall each indicator reflects the sensory value of consumers.

Figure 5. Correlation analysis between live streaming and perceived value
In the A-level index, the perceived value believes that live-streaming with goods can provide services that satisfy consumers and also make them feel relaxed and satisfied, creating a sense of trust in the live-streaming platform.In the B-level index, 40.89% believe that live-streaming with goods can make consumers buy the goods they need and enhance consumers' trust in live-streaming with goods products.In the C-level and D-level evaluation, some consumers think that live-streaming with goods does not give relevant sensory value experience, which requires live-streaming with goods anchors and platforms to more comprehensive commodity services and commodity value to enhance the sensory value of consumers.
2) Analysis of the correlation between live goods and consumer repurchase Consumer repurchase is the purpose of live-streaming with goods, to provide intimate aftersales service and guarantee, and to make consumers willing to pay for the products with goods again.In this paper, we analyzed the data of Taobao live-streaming with the KNN algorithm optimized by the Gaussian kernel function, and the results of live-streaming and consumer repurchase tendency are shown in Figure 6.
The mean value of the 14 live carryover data evaluating consumers' tendency to repurchase is 33.46%.The maximum value is 43.65%, indicating that the tendency to buy is due to the limited amount of live banding, which influences consumers' tendency to repurchase.The minimum value is 18.16%, indicating that this part of consumers will follow the behavior of other viewers in the live broadcast to prompt their tendency to purchase.Also, 43.56% of consumers said that the humor of the anchor would make them choose to buy the products recommended by the anchor again.In addition, more than 40% of consumers are influenced by the personal charisma of the anchor, and the charisma conveyed by the anchor is also an important factor for consumers to repurchase.Based on the correlation analysis between the live broadcast and consumer repurchase under big data analysis, we can see that the interactive and attractive factors of the live broadcast can effectively enhance the probability of consumer repurchasing.

Conclusion
In this paper, under the big data analysis technology, the KNN algorithm optimized by Gaussian kernel function is used as the main data analysis means, supplemented by Taobao live-streaming with data examples, and the evaluation indexes of two aspects are analyzed and mined.As far as the livestreaming with goods and sensory value evaluation is concerned, the four levels of A, B, C and D evaluation account for 38.41%, 36.73%,34.54% and 35.4%, respectively.It shows that there is a good sensory experience and bad sensory value in live-streaming.In terms of life with goods and consumer repurchase evaluation, the average value of the evaluation is 33.46%, to live with the limited amount of goods and the tendency to buy, but also because of the purchase behavior of other viewers in the live room to choose.
Based on big data analysis can effectively analyze the various situations in the current life with goods and targeted data analysis as the direction of improvement.It will be able to correlate life with goods with the sensory value of consumers and repurchase behavior.In turn, it promotes the thriving of life with goods to create a new e-commerce shopping ecology.

Figure 1 .
Figure 1.Basic framework of big data analysis

Figure 2 .
Figure 2. Diagram of KNN algorithm The score of test point is calculated for each category by equation , and the final category of is the category with the largest score of .This section proposes a new distance function for the similarity function based on the Gaussian kernel function.The score function used is:(14)

Figure 3 .
Figure 3.Comparison of different distance functions

Figure 4 .
Figure 4. Comparison with other classification algorithms

Figure 6 .
Figure 6.Correlation analysis between live streaming and consumer repurchase

Table 1 .
Difference between big data analysis and data mining

Table 2 .
Attributes and categories of eight data sets

Table 3 .
Indicators of live delivery and perceived value

Table 4 .
Indicators of live delivery and perceived value