With the rapid development of the e-commerce industry, online reviews of goods are a great help for consumers to make decisions. With the sharp increase in online order for goods and the explosion of product reviews, some merchants began to hire consumers to make fake purchases for profit, which led to the problem of identifying fake reviews. In this paper, we propose a method that uses feature engineering to eliminate the comments of false reviewers and combines convolutional neural network and recurrent neural network to classify and recognise reviews from the perspective of text. Traditional neural network models such as CNN, LSTM and BILSTM are compared with the hybrid model proposed by the text. The model is optimised by pre-training on the Baidu Baike commodity review database instead of the initial randomising word vector. The experimental results show that the combination of convolutional neural network and recurrent neural network can better extract the global and local features of false comments, and the model has a good effect. The updating of the pre-trained word vector makes the recognition effect of each model better.

#### Keywords

- fake review identification
- deep learning
- the neural network

In recent years, the development of the e-commerce platform has changed from the initial change of consumer consumption mode to the current change from ‘quantity’ to ‘quality.’ Since 2013, China has been the world's largest online retail market for eight consecutive years. By December 2020, the number of Internet users had reached 782 million, accounting for 79.1% of the total Internet users [1]. Under the development pattern of the domestic cycle as the main body and the domestic and international double cycle as the assist, online consumption has played a driving role in cultivating the new market momentum and boosting the ‘quality’ and ‘quantity’ double upgrade of consumption.

In the current global COVID-19 environment, consumers will increasingly adopt the lifestyle of purchasing products online instead of going for a physical purchase. In the process of online shopping, consumers begin to change their behaviour from simply satisfying their shopping needs to finding the most desirable goods among the numerous options. This is not only a change in the online shopping consumer behaviour, but also a result of the rapid development of the e-commerce platform. At the same time, consumers are also getting used to giving comments on goods after shopping and sharing their shopping feelings with the public [2]. As a result, the number of comments on commodities increases rapidly along with the sharp rise in the volume of commodities.

Amid a sharp rise in the number of online reviews of goods, some of these are starting to get mixed up. From the perspective of interests, merchants hope their product reviews will show the advantages of their products in many aspects, thus resulting in false reviews through the temptation of interests. Some consumers, driven by the interests of the merchants, fabricate their consumption experience of using the product or make extravagant comments on the product [3]. This is not only bad for consumers to make normal shopping decisions, but also positive for the development of the entire e-commerce platform; hence it is urgent to find an algorithm to identify fake e-commerce reviews.

At present, there are some feature-based screening methods, but the prediction effect is still not ideal. Based on the deep learning method, this paper considers both the global and local variables of the text, which improves the accuracy while ensuring the efficiency.

The concept of false review was first put forward by Jindal and Liu [4], and later more and more scholars have redefined it. On the one hand, fake reviews are generated by the fake shopping behaviour, that is, the whole order is manipulated by the sales demand of the merchant, so the fake shopping behaviour naturally produces fake reviews. On the other hand, consumers with real product demand make large commodity reviews driven by merchant interests such as ‘cash back on good comments’ after real purchase behaviour. The former needs to identify false comments by combining with reviewers, while the latter only needs to identify comment text. In this paper, the concept of fake reviews is defined as those reviews that cannot bring purchase reference to consumers, including both invalid reviews generated by fake buyers and false reviews generated by ordinary buyers in specific situations. Therefore, relevant research and introduction will be carried out from these two aspects in the following.

Baberjee and Chua [5] conducted an in-depth study on the process of writing fake reviews to further analyse how fake reviews are generated. They asked volunteers to post fake reviews for a hotel, and then interviewed the volunteers to study the changes in the language and mentality of these fake reviews during the review process, so as to explore the differences between true and false reviews. Then, the supervised learning algorithm was used to identify the fake reviews from the following aspects: Writing style, articulation clarity, description detail and cognitive index, which are compared with the other two methods, and the advantages of this method are confirmed. The text mining technology can greatly improve the accuracy of text information mining. Using this technology, a comment text analysis model is built, in order to excavate comments with important attributes, such as nouns, verbs. The quantifier accounts for the proportion of the whole comment text and the specific percentage of quantifiers and nouns and verbs [6]. The experimental results show that this model has a good effect. Lim et al. [7] point out that since the person who publishes false comments has an interest relationship with the seller, he will only serve the merchant who employs him, and will not serve the merchant who does not pay remuneration. As a result, the reviewer will comment several times on some merchants or certain products, which is called the abnormal reviewer rating behaviour. Based on this assumption, a method to identify false reviews is proposed, which can be used for reviewers’ abnormal evaluation behaviour. This algorithm combines the initial rating bias model with the full rating bias model to account for such behaviour. Zeng uses the bidirectional long short-term memory (LSTM) model coding to get three local representations, and uses self-attention and attention mechanisms to encode three local representations into a global feature representation. The classification results were obtained by the Softmax classifier [8].

Mukherjee created an unsupervised classifier (ASM) on the basis of Bayes formula, combined with the extracted 9-dimension features, including the features mentioned above, and tested the fake comment distributors [9]. The experimental results showed that this method had good performance. Li et al. [10] point out that there are two rules for fake reviewers: one is that multiple fake reviewers comment on commodities at the same time, and the other is that they gather rapidly in a period of time to comment on commodities intensively and actively. According to these two laws, an implicit Markov model with two MOD is proposed to identify false reviewers by combining the characteristics that the number of false reviewers increases rapidly in a certain period of time. According to the Markov random field identification model, Akglu et al. [11] proposed a dichotomy graph, which can reflect the relationship between the commenter and the commodity. The weight of edges in the figure represents the weight of comments. The nodes in the figure are judged by the unsupervised recognition model, so as to identify false reviewers. Rayhana and Akoglu [12] proposed a relational matrix model to describe the relationship among reviewers, reviews and commodities. Based on Markov's random model, the model was identified by combining the behavioural characteristics of the publishers of fake reviews, and the experiment proved that the model had excellent performance. Lu proposed a kind of graph model that can detect fake text and identify fake commentators [13]. It combines each feature of describing comments and reviewers, and simultaneously detects both the comment text and the reviewer. The experiment in the research proves that this algorithm is superior to other reference algorithms in each index.

To sum up, there are some algorithms that can identify false comments, but the recognition procedures are complex and the accuracy is not satisfactory. This paper improves on the algorithm based on deep learning to adapt to the sparse and complexity characteristics of short text. On the premise of eliminating the fake shopping behaviour, the semantic information of the short text is mined, so as to identify both the fake comments generated by the fake shopping behaviour and the exaggerated comments generated by the real shopping behaviour.

This paper uses the Python software to capture the online review data of a brand of mobile phone products on the Jingdong website. Since there is a lot of duplicate data information in network comments, the first thing to do is to deprocess the data. After reprocessing, the distribution of positive and negative samples is uneven. Unbalanced samples will have a certain impact on the performance of the model, so the up-sampling process should be carried out. Finally, the data set is divided into test set, verification set and training set in a ratio of 1:1:8. Specific data are shown in Table 1.

Data set information

The original data set | 38,874 | 21,213 | 17,661 |

The data set after de-duplication | 36,674 | 20,038 | 16,636 |

Sampled data set | 40,076 | 20,038 | 20,038 |

Due to the complex semantics of Chinese text, a series of data preprocessing should be carried out after the above data collection. The deep learning-based Chinese short text data preprocessing process proposed in this paper is shown in Figure 1.

Remove duplicates. The main content of this paper is to identify false comments, but there are many repeated comments in the basic data, which adds unnecessary burden to the false comment recognition model. Therefore, the collected data should be reprocessed.

Data sampling. In the review of data acquisition of a mobile phone brand of Jingdong in this paper, the positive sample data is slightly higher than the negative sample data, and the uneven data distribution will affect the accuracy and operating efficiency of the model. Therefore, the upsampling method is used to fill the negative sample data to achieve the balance of positive and negative samples.

Word segmentation. As word is used as the unit of measurement in the subsequent processing of this paper, the original short text sentences need to be cut and processed. The common word segmentation technology is the Jieba word segmentation tool, which is also used as a reference tool in this paper.

Remove stop words. After word segmentation processing, there are still some words that are not helpful for semantic analysis, and they only play a cohesive role in sentences, such as ‘of’ and ‘the.’ Therefore, we also need to use the Baidu stop words table to remove stop words after word segmentation.

Convolutional neural network is a deep multi-layer neural network model composed of regular connections of multi-layer neurons [14], specifically including input layer, hidden layer (convolution layer, pooling layer) and output layer. Since the convolutional neural network has the feature of sparse connection, that is, local connection, each node of the convolutional layer in the model can only be connected to the local region of the upper layer. Therefore, the convolutional neural network can be applied to the recognition model of false comments. Through the design of the convolutional kernel, the information of the text can be extracted, and the semantics of adjacent words of the text can be recognised due to the characteristics of local connection, which is suitable for the application in short text classification and recognition algorithm.

Recurrent neural network is also a deep neural network model, but different from the convolutional neural network; it has the function of self-connection between each hidden layer, which enables the information of elements to be transmitted layer by layer, that is, the recurrent neural network has the function of ‘memory’ [14]. Due to the characteristics of its own structure, it makes the processing of sequence information more accurate, so it is also commonly used to process text and sound data.

In the process of text processing, the recurrent neural network not only processes the current word, but also transmits the information of the word to the next word to be processed by weight. Therefore, through a series of weight multiplication, the gradient explosion will occur when |

The real product comments are a series of statements made by consumers who have actual shopping needs of the product through the normal shopping process, after receiving the product according to their own using experience and shopping process experience. Since the essence of the product reviews is a collection of short texts generated by buyers, the fake reviews should be identified by buyers and review texts in the construction of the model. Therefore, this paper combines fake commenter identification based on feature engineering and comment text identification based on deep learning to identify fake comments from the whole process of shopping behaviour. The specific model is constructed as shown in Figure 2.

The short text of an online review is made up of sentences, and each sentence is made up of a number of words, that is, _{1},...,_{n}

The generation of fake reviews not only includes the comments generated by the fake shopping behaviour, but also includes the fake comments generated by the normal shopping behaviour in the evaluation link. Therefore, in addition to using the hybrid neural network model to extract the global and local features of the text, we should first eliminate the comments generated by the fake reviewers through the fake shopping behaviour. Therefore, the following indicators are used to quantify the shopping behaviour of reviewers, and then the false reviewers are identified, so as to improve the overall recognition effect of the model.

Correlation between search keywords and commodity. Consumers search through the search bar, the search keywords should be the same as the final purchase of the product description keywords. However, the reviewers with fake reviews may have no search behaviour or the two behaviours are not consistent in their purchase behaviour.

Time to complete the shopping activity. Normal shopping takes more time from keyword search to placing an order, while fake shopping takes a relatively short time because there is no need for comparison and browsing.

The similar items by browsing other shops. After keyword search in normal shopping, consumers will browse and compare related products. However, the reviewers with fake behaviours will directly purchase the products designated by the merchants, so the number of similar products in other stores is relatively small.

Browsing number of other items purchased in the store. When ordinary consumers lock a pre-purchased product in the process of browsing products, they will enter its store to browse other products in the store, while the reviewers with fake behaviours will not browse during the process of purchasing.

The act of collecting and paying attention to a store. Ordinary consumers will collect relevant products after locking the pre-purchased products, and stores will also pay attention to the products so as to make a second purchase later. But the reviewers of the fake behaviors do not engage in these behaviors because they do not buy the product repeatedly.

User behaviour characteristics

The correlation between the search term and the product | The correlation between keywords and final products is low |

Time to complete the shopping activity | Lower browsing time |

The similar items by browsing other shop | Browse less of the same kind of goods |

Browsing number of other items purchased in the store | Browsing number of other items purchased in the store is less |

The act of collecting goods and paying attention to stores | They don’t collect and pay attention |

The mixed neural network model uses the convolutional network model to represent the local features of text, which is realised through a convolutional kernel ^{t∗d} whose width and dimension of word vector are

As shown in the above formula, each convolution kernel will extract _{i}_{max} of each local feature and connect the optimal features. The specific methods are as follows:

The global features of the model are extracted by the recurrent neural network. Due to the potential problems of gradient explosion or gradient disappearance caused by the short-term memory function of the recurrent neural network mentioned above, LSTM was adopted in this paper to optimise the model and to avoid the above problems by adding a hidden layer and a gated unit. Bidirectional LSTM, namely BI-LSTM, is to learn the text from the front and back directions, so that the model has a better effect. Specific output status is as follows:
_{i}_{i−1} is the output of the previous neuron and _{i}

The next step is to extract the global features of the statement from the hidden layer. Since the forward and reverse output sequences obtained in the previous step only contain the global features, different methods can be selected in the extraction process. In the following paper, the advantages and disadvantages of the extraction method can be judged by comparing the recognition accuracy. Specifically, the global features of the hidden layer are extracted by the following three methods:

The first approach is to take the final state of

Connect the global variable obtained in the previous step to the local variable, and the specific formula is as follows:

Next, the full connection layer fusion is implemented, and its output is a vector representation containing global and local features. The specific formula is as follows:

Finally, the Softmax classifier is used to classify the above vectors to achieve the ultimate goal of text recognition. The specific operation is as follows:

Activation functions are introduced to add a non-linear element to a model that is linear at every level. Since neurons can only be activated when a certain threshold is reached, the selection of Relu function is more consistent with the characteristics of biological neurons [16], and its specific expression formula (12) and function image (Figure 3) are as follows:

The Dropout layer is set between the full connection layer and the Softmax layer to avoid over-fitting of the model. The specific operation is to discard the neurons probabilistically during the operation of the model, so as to reduce the training parameters and avoid the occurrence of overfitting of the model. The propagation formula of neurons is as follows:

The model descent method aims to improve the training time of the model and enhance the robustness of the model by selecting a small amount of data several times in the process of training the model. The specific formula is as follows:

In the test of all the models in this paper, as shown in the Figure 4, the total amount of data identifying false comments as false comments is A, the total amount of data identifying false comments as true comments is B, the total amount of data identifying real comments as false comments is C, and the total amount of data identifying real comments as true comments is D.

Specific evaluation indicators are as follows:

precision: refers to the proportion of all captured data classified correctly;

precision_fake: the percentage of data predicted to be false that actually turn out to be false reviews;

recall_fake: refers to the percentage of false review data predicted to be false;

F-s core-fake: the accuracy rate and the recall rate of fake reviews were integrated to evaluate the identification effect of fake reviews;

precision_true: it refers to the percentage of predicted true comments that turn out to be true;

recall_true: it refers to the percentage of real review data that is predicted to be true;

F-s core-true: to evaluate the identification effect of fake reviews by combining the accuracy rate and the recall rate of real reviews;

Since there are many models of the false comment recognition model in the classification algorithm, the accuracy of the model and whether it is suitable for classification of short texts should be measured by the above indexes. In this paper, convolutional neural network and recurrent neural network are combined to construct the model, so it is necessary to compare it with the CNN and RNN models, respectively. The CNN model uses the convolution layer, pool layer and Softmax classifier to recognise the local features of text and achieve the classification effect. The RNN model is divided into the LSTM and BI-LSTM models. Here, comparative tests will be carried out, respectively, to verify the optimisation effect of the bidirectional LSTM model. However, in the process of extracting vectors with hidden global characteristics from the RNN model, there are still three ways of selecting the final state, average state and reference attention mechanism. In order to verify the advantages and disadvantages of the three extraction methods, two methods of final state and average state were also selected in the BI-LSTM model to extract the global features, which were, respectively, referred to as BI-LSTM-e and BI-LSTM-av. Similarly, the hybrid neural network model is also denuded as hybird-e, hybird-av and hybird-at by the three extraction methods, respectively. In addition, the mixed model with feature engineering recognition steps is compared with the model that only mixes the convolutional neural network and recurrent neural network, and the RNN model of the two uses the same method in the process of global feature extraction. This is to verify that the method of identifying and classifying global and local features of short texts using hybrid neural network after eliminating the online comments of false reviewers for feature engineering recognition has better recognition effect.

In this paper, Python language is used to write the models, and Tensor Flow, which is widely used in the field of deep learning, is used as the framework to make horizontal comparison of the models. The hidden state sequence of the recurrent neural network is set at 64, and the performance of the recurrent neural network model is optimal at this time. At the same time, in order to ensure a more significant performance comparison among models, the number of convolution cores at each layer is set to 100. In the process of training the model, it is found that the overall accuracy of the model is negatively correlated with the loss of the model in the early stage of the training process, and they tend to be stable in the later stage. Therefore, in the training process, the model is set randomly to select for a one-time energy test every 200 times. In this way, when the model performance is no longer improved, the results can be found as soon as possible, so as to improve the efficiency, reduce the running time of the model and prevent the occurrence of model overfitting. The recognition effect of the final mixed model and the horizontal comparison of each model are shown in Table 3.

Deep learning model results

precision_fake | 0.882 | 0.878 | 0.880 | 0.798 | 0.892 | 0.893 | 0.887 | 0.894 |

recall_fake | 0.901 | 0.894 | 0.902 | 0.832 | 0.912 | 0.912 | 0.909 | 0.911 |

identify_fake | 0.891 | 0.886 | 0.891 | 0.815 | 0.902 | 0.902 | 0.898 | 0.902 |

precision_true | 0.897 | 0.892 | 0.899 | 0.829 | 0.915 | 0.916 | 0.914 | 0.917 |

recall_true | 0.879 | 0.881 | 0.878 | 0.802 | 0.897 | 0.899 | 0.891 | 0.903 |

identify_true | 0.888 | 0.887 | 0.888 | 0.815 | 0.906 | 0.907 | 0.902 | 0.910 |

precision | 0.893 | 0.890 | 0.892 | 0.824 | 0.902 | 0.905 | 0.898 | 0.906 |

LSTM, long short-term memory.

The above experimental results are comprehensively analysed as follows:

In all the models of transverse comparison, the best recognition effect is the method that combines with feature engineering for eliminating the false reviewers’ comments, then mixing the convolutional neural network and the recurrent neural network starting from the text itself to extract the global and local features. The overall recognition accuracy of the model reached 90.6%, which was not only better than the single neural network, but also better than the mixed neural network without the feature engineering. This indicated that the elimination of the false reviewers’ comments based on feature engineering was beneficial to improve the recognition effect of the model.

From the comparison of the two separate neural network models, it can be seen that the recognition effect of the convolutional neural network is better than that of the recurrent neural network, because of the fact that the convolutional neural network recognises the text through local feature, while the recurrent neural network is through the global feature extraction. However, online comments belong to the category of short text, and the extraction of global features is mainly focused on the emotional tendency of the whole comments, and the contribution advantage to the identification of false comments is not obvious. However, the local characteristics of online commodity reviews are relatively clear, and they often carry out specific evaluations on the attributes of products. Therefore, the extraction of these attributes is more beneficial to distinguish whether the comments are true or false.

For the recurrent neural network, three different methods of extracting global variables have little difference in the final recognition effect of the model. Especially in the two kinds of neural network hybrid model, although the overall recognition of three kinds of model accuracy is higher, the difference between each other is lesser. The reason is that for commodity reviews, the overall length of the text is short, so there is little difference in the extraction of global features whether choosing the average value or the final state. However, relatively speaking, the best performance should be the method of taking the average value, which is also the reason why the hybrid model combined with feature engineering also uses the average extraction method to identify the global features in the part of the recurrent neural network.

According to the comparison between the LSTM and bilateral LSTM models, from the perspective of model recognition effect, BILSTM-E has the best recognition effect and it is similar to the LSTM model. However, the BILSTM model which uses average value to extract global features has a poor performance. This indicates that although the bilateral LSTM model solves hidden dangers such as gradient explosion, the compensation effect of the bilateral model will be weakened due to the short text length of product reviews, and the advantage of the recognition effect will be less significant.

In the above experiment, the input word vector selection of the input layer of the classification model was selected by the random initialisation method. However, this method will lead to the situation that the training time is too long and the semantic meaning of the word vector after the training is not accurate enough in the background of the data scale is not large enough. Therefore, the initial selection of word vectors should be extracted from a more accurate corpus after pre-training.

This paper crawled the corpus related to Baidu Baike mobile phone products through the Python software and took it as the training set of pre-trained word vectors, so as to optimise the model. Through the training of a large number of product-related corpuses, the deep meaning of words and the relationship between word vectors are excavated.

Word2vec is a word vector generation tool based on deep learning, which is divided into two language patterns: CBOW and skip-gram. The two types have opposite model structures, in which CBOW predicts the semantics of its context by taking a word as an input. Skip-gram, on the other hand, predicts the word by contextual input. In the calculation process, the workload is complicated by calculating the possibility that all the words in the word list appear in the context of the target word, so the algorithm should be optimised by hierarchical Softmax and negative sampling method.

The difference between the hierarchical Softmax algorithm and the Softmax algorithm is that the former uses the characteristics of Huffman tree and the product form of conditional probability to make the probability easy to calculate. The Softmax layer is transformed layer by layer, and the binary logistic regression method is used to fit each conditional probabilityto j

The algorithm aims to transform the neural network into a Huffman tree, in which the nodes of the tree correspond to the nodes of the hidden layer of the neural network, and the word vector of the root node corresponds to the word vector after the projection. By dividing the data set D layer by layer until the last word remains, the calculation amount and runtime time of the whole model are reduced, and the final model recognition efficiency is optimised.

Negative sampling method aims to mark words unrelated to the centre words as negative samples, so that only a small part of randomly selected negative sample weights can be updated in the training process, without the need to adjust the corresponding parameters after each training. Such optimisation method greatly reduces the workload of calculation and saves the training time. The specific improvement plan is as follows:

Equations (26) to (29) are the logistic regression representation, likelihood function, negative sample probability and optimisation objective of the sample (

As can be seen from Table 4, after the use of pre-trained word vectors instead of randomised initial word vectors, each recognition model has a better recognition accuracy and classification effect, which indicates that the performance of each model has improved. The best recognition effect is still the hybrid neural network model with the combination of feature engineering at an accuracy of 91.5%, showing an increase of 0.9%. The results show that the pre-trained word vector will affect the performance of the model, and the selection of the initialised word vector will greatly affect the running time and efficiency of the model; it can also mine the semantic features of the text more comprehensively so as to improve the accuracy of false comment recognition.

The results of the deep learning model after the improved training words

precision_fake | 0.891 | 0.887 | 0.892 | 0.864 | 0.905 | 0.902 | 0.895 | 0.902 |

recall_fake | 0.905 | 0.907 | 0.909 | 0.850 | 0.923 | 0.920 | 0.912 | 0.921 |

identify_fake | 0.898 | 0.897 | 0.900 | 0.857 | 0.914 | 0.911 | 0.903 | 0.898 |

precision_true | 0.902 | 0.905 | 0.908 | 0.853 | 0.921 | 0.919 | 0.919 | 0.923 |

recall_true | 0.892 | 0.892 | 0.894 | 0.862 | 0.902 | 0.903 | 0.899 | 0.905 |

identify_true | 0.897 | 0.898 | 0.901 | 0.857 | 0.911 | 0.911 | 0.909 | 0.897 |

precision | 0.899 | 0.898 | 0.901 | 0.859 | 0.912 | 0.913 | 0.908 | 0.915 |

LSTM, long short-term memory.

This paper proposes a hybrid neural network model combining with the characteristics of engineering. It is compared with the convolutional neural network model, LSTM model, bilateral LSTM model and the hybrid model which uses three different methods to extract global features combining the convolutional and recurrent neural networks. They are judged by the evaluation index to determine the effectiveness of recognition. The random initialised word vectors are optimised and referenced in the above models. The experimental results show that the combination of convolutional and recurrent neural networks can better improve the local and global features of text. The addition of feature engineering can eliminate some comments of false reviewers and make the model have better recognition effect. The overall performance of the model is improved to a certain extent after optimisation of the randomly initialised word vector, and the recognition accuracy of the hybrid model proposed in this paper reaches 91.5%. The method proposed in this paper can be well applied in the e-commerce platform shopping environment, to effectively identify the fake reviews generated by the fake reviewers and ordinary buyers driven by interests, so as to prevent fake online reviews to a certain extent, and create a good shopping environment.

#### The results of the deep learning model after the improved training words

precision_fake | 0.891 | 0.887 | 0.892 | 0.864 | 0.905 | 0.902 | 0.895 | 0.902 |

recall_fake | 0.905 | 0.907 | 0.909 | 0.850 | 0.923 | 0.920 | 0.912 | 0.921 |

identify_fake | 0.898 | 0.897 | 0.900 | 0.857 | 0.914 | 0.911 | 0.903 | 0.898 |

precision_true | 0.902 | 0.905 | 0.908 | 0.853 | 0.921 | 0.919 | 0.919 | 0.923 |

recall_true | 0.892 | 0.892 | 0.894 | 0.862 | 0.902 | 0.903 | 0.899 | 0.905 |

identify_true | 0.897 | 0.898 | 0.901 | 0.857 | 0.911 | 0.911 | 0.909 | 0.897 |

precision | 0.899 | 0.898 | 0.901 | 0.859 | 0.912 | 0.913 | 0.908 | 0.915 |

#### Data set information

The original data set | 38,874 | 21,213 | 17,661 |

The data set after de-duplication | 36,674 | 20,038 | 16,636 |

Sampled data set | 40,076 | 20,038 | 20,038 |

#### Deep learning model results

precision_fake | 0.882 | 0.878 | 0.880 | 0.798 | 0.892 | 0.893 | 0.887 | 0.894 |

recall_fake | 0.901 | 0.894 | 0.902 | 0.832 | 0.912 | 0.912 | 0.909 | 0.911 |

identify_fake | 0.891 | 0.886 | 0.891 | 0.815 | 0.902 | 0.902 | 0.898 | 0.902 |

precision_true | 0.897 | 0.892 | 0.899 | 0.829 | 0.915 | 0.916 | 0.914 | 0.917 |

recall_true | 0.879 | 0.881 | 0.878 | 0.802 | 0.897 | 0.899 | 0.891 | 0.903 |

identify_true | 0.888 | 0.887 | 0.888 | 0.815 | 0.906 | 0.907 | 0.902 | 0.910 |

precision | 0.893 | 0.890 | 0.892 | 0.824 | 0.902 | 0.905 | 0.898 | 0.906 |

#### User behaviour characteristics

The correlation between the search term and the product | The correlation between keywords and final products is low |

Time to complete the shopping activity | Lower browsing time |

The similar items by browsing other shop | Browse less of the same kind of goods |

Browsing number of other items purchased in the store | Browsing number of other items purchased in the store is less |

The act of collecting goods and paying attention to stores | They don’t collect and pay attention |

Law of interest rate changes in financial markets based on the differential equation model of liquidity Basalt fibre continuous reinforcement composite pavement reinforcement design based on finite element model Industrial transfer and regional economy coordination based on multiple regression model Satisfactory consistency judgement and inconsistency adjustment of linguistic judgement matrix Spatial–temporal graph neural network based on node attention A contrastive study on the production of double vowels in Mandarin Research of cascade averaging control in hydraulic equilibrium regulation of heating pipe network Mathematical analysis of civil litigation and empirical research of corporate governance Health monitoring of Bridges based on multifractal theory Health status diagnosis of the bridges based on multi-fractal de-trend fluctuation analysis Performance evaluation of college laboratories based on fusion of decision tree and BP neural network Application and risk assessment of the energy performance contracting model in energy conservation of public buildings Sensitivity analysis of design parameters of envelope enclosure performance in the dry-hot and dry-cold areas The Spatial Form of Digital Nonlinear Landscape Architecture Design Based on Computer Big Data Analysis of the relationship between industrial agglomeration and regional economic growth based on the multi-objective optimisation model Constraint effect of enterprise productivity based on constrained form variational computing The impact of urban expansion in Beijing and Metropolitan Area urban heat Island from 1999 to 2019 TOPSIS missile target selection method supported by the posterior probability of target recognition Ultrasonic wave promoting ice melt in ice storage tank based on polynomial fitting calculation model The incentive contract of subject librarians in university library under the non-linear task importance Application of Fuzzy Mathematics Calculation in Quantitative Evaluation of Students’ Performance of Basketball Jump Shot Visual error correction of continuous aerobics action images based on graph difference function Application of Higher Order Ordinary Differential Equation Model in Financial Investment Stock Price Forecast Application of Forced Modulation Function Mathematical Model in the Characteristic Research of Reflective Intensity Fibre Sensors Radioactive source search problem and optimisation model based on meta-heuristic algorithm Research on a method of completeness index based on complex model Fake online review recognition algorithm and optimisation research based on deep learning Research on the sustainable development and renewal of Macao inner harbour under the background of digitisation Support design of main retracement passage in fully mechanised coal mining face based on numerical simulation Study on the crushing mechanism and parameters of the two-flow crusher Interaction design of financial insurance products under the Era of AIoT Modeling the pathway of breast cancer in the Middle East Corporate social responsibility fulfilment, product-market competition and debt risk: Evidence from China ARMA analysis of the green innovation technology of core enterprises under the ecosystem – Time series data Reconstruction of multimodal aesthetic critical discourse analysis framework Image design and interaction technology based on Fourier inverse transform What does students’ experience of e-portfolios suggest Research on China interregional industrial transformation slowdown and influencing factors of industrial transformation based on numerical simulation The medical health venture capital network community structure, information dissemination and the cognitive proximity Data mining of Chain convenience stores location The optimal model of employment and entrepreneurship models in colleges and universities based on probability theory and statistics A generative design method of building layout generated by path Parameter Id of Metal Hi-pressure State Equation Analysis of the causes of the influence of the industrial economy on the social economy based on multiple linear regression equation Research of neural network for weld penetration control Intelligent Recommendation System for English Vocabulary Learning – Based on Crowdsensing Regarding new wave distributions of the non-linear integro-partial Ito differential and fifth-order integrable equations Research on predictive control of students’ performance in PE classes based on the mathematical model of multiple linear regression equation Beam control method for multi-array antennas based on improved genetic algorithm The influence of X fuzzy mathematical method on basketball tactics scoring Application of regression function model based on panel data in bank resource allocation financial risk management Research on aerobics training posture motion capture based on mathematical similarity matching statistical analysis Application of Sobolev-Volterra projection and finite element numerical analysis of integral differential equations in modern art design Influence of displacement ventilation on the distribution of pollutant concentrations in livestock housing Research on motion capture of dance training pose based on statistical analysis of mathematical similarity matching Application of data mining in basketball statistics Application of B-theory for numerical method of functional differential equations in the analysis of fair value in financial accounting Badminton players’ trajectory under numerical calculation method Research on the influence of fuzzy mathematics simulation model in the development of Wushu market Study on audio-visual family restoration of children with mental disorders based on the mathematical model of fuzzy comprehensive evaluation of differential equation Difference-in-differences test for micro effect of technological finance cooperation pilot in China Application of multi-attribute decision-making methods based on normal random variables in supply chain risk management Exploration on the collaborative relationship between government, industry, and university from the perspective of collaborative innovation The impact of financial repression on manufacturing upgrade based on fractional Fourier transform and probability AtanK-A New SVM Kernel for Classification Validity and reliability analysis of the Chinese version of planned happenstance career inventory based on mathematical statistics Visual positioning system for marine industrial robot assembly based on complex variable function Mechanical behaviour of continuous girder bridge with corrugated steel webs constructed by RW Research on the influencing factors of agricultural product purchase willingness in social e-commerce situation Study of a linear-physical-programming-based approach for web service selection under uncertain service quality A mathematical model of plasmid-carried antibiotic resistance transmission in two types of cells Burnout of front-line city administrative law-enforcing personnel in new urban development areas: An empirical research in China Calculating university education model based on finite element fractional differential equations and macro-control analysis Educational research on mathematics differential equation to simulate the model of children's mental health prevention and control system Analysis of enterprise management technology and innovation based on multilinear regression model Verifying the validity of the whole person model of mental health education activities in colleges based on differential equation RETRACTION NOTE Innovations to Attribute Reduction of Covering Decision System Based on Conditional Information Entropy Research on the mining of ideological and political knowledge elements in college courses based on the combination of LDA model and Apriori algorithm Adoption of deep learning Markov model combined with copula function in portfolio risk measurement Good congruences on weakly U-abundant semigroups Research on the processing method of multi-source heterogeneous data in the intelligent agriculture cloud platform Mathematical simulation analysis of optimal detection of shot-putters’ best path Internal control index and enterprise growth: An empirical study of Chinese listed-companies in the automobile manufacturing industry Determination of the minimum distance between vibration source and fibre under existing optical vibration signals: a study Nonlinear differential equations based on the B-S-M model in the pricing of derivatives in financial markets Nonlinear Differential Equations in the Teaching Model of Educational Informatisation Fed-UserPro: A user profile construction method based on federated learning The evaluation of college students’ innovation and entrepreneurship ability based on nonlinear model Smart Communities to Reduce Earthquake Damage: A Case Study in Xinheyuan, China Response Model of Teachers’ Psychological Education in Colleges and Universities Based on Nonlinear Finite Element Equations Institutional investor company social responsibility report and company performance Mathematical analysis of China's birth rate and research on the urgency of deepening the reform of art education First-principles calculations of magnetic and mechanical properties of Fe-based nanocrystalline alloy Fe _{80}Si_{10}Nb_{6}B_{2}Cu_{2}The Effect of Children’s Innovative Education Courses Based on Fractional Differential Equations Fractional Differential Equations in the Standard Construction Model of the Educational Application of the Internet of Things Optimization in Mathematics Modeling and Processing of New Type Silicate Glass Ceramics Has the belt and road initiative boosted the resident consumption in cities along the domestic route? – evidence from credit card consumption MCM of Student’s Physical Health Based on Mathematical Cone Attitude control for the rigid spacecraft with the improved extended state observer Sports health quantification method and system implementation based on multiple thermal physiology simulation Research on visual optimization design of machine–machine interface for mechanical industrial equipment based on nonlinear partial equations Research on identifying psychological health problems of college students by logistic regression model based on data mining Abnormal Behavior of Fractional Differential Equations in Processing Computer Big Data Mathematical Modeling Thoughts and Methods Based on Fractional Differential Equations in Teaching A mathematical model of PCNN for image fusion with non-sampled contourlet transform Nonlinear Differential Equations in Computer-Aided Modeling of Big Data Technology The Uniqueness of Solutions of Fractional Differential Equations in University Mathematics Teaching Based on the Principle of Compression Mapping Influence of displacement ventilation on the distribution of pollutant concentrations in livestock housing Cognitive Computational Model Using Machine Learning Algorithm in Artificial Intelligence Environment Application of Higher-Order Ordinary Differential Equation Model in Financial Investment Stock Price Forecast Recognition of Electrical Control System of Flexible Manipulator Based on Transfer Function Estimation Method Automatic Knowledge Integration Method of English Translation Corpus Based on Kmeans Algorithm Real Estate Economic Development Based on Logarithmic Growth Function Model Informatisation of educational reform based on fractional differential equations Financial Crisis Early Warning Model of Listed Companies Based on Fisher Linear Discriminant Analysis Research on the control of quantitative economic management variables under the numerical method based on stochastic ordinary differential equations Network monitoring and processing accuracy of big data acquisition based on mathematical model of fractional differential equation 3D Animation Simulation of Computer Fractal and Fractal Technology Combined with Diamond-Square Algorithm The Summation of Series Based on the Laplace Transformation Method in Mathematics Teaching Optimal Solution of the Fractional Differential Equation to Solve the Bending Performance Test of Corroded Reinforced Concrete Beams under Prestressed Fatigue Load Radial Basis Function Neural Network in Vibration Control of Civil Engineering Structure Optimal Model Combination of Cross-border E-commerce Platform Operation Based on Fractional Differential Equations Research on Stability of Time-delay Force Feedback Teleoperation System Based on Scattering Matrix BIM Building HVAC Energy Saving Technology Based on Fractional Differential Equation Human Resource Management Model of Large Companies Based on Mathematical Statistics Equations Data Forecasting of Air-Conditioning Load in Large Shopping Malls Based on Multiple Nonlinear Regression System dynamics model of output of ball mill Optimisation of Modelling of Finite Element Differential Equations with Modern Art Design Theory Mathematical function data model analysis and synthesis system based on short-term human movement Sensitivity Analysis of the Waterproof Performance of Elastic Rubber Gasket in Shield Tunnel Human gait modelling and tracking based on motion functionalisation Analysis and synthesis of function data of human movement The Control Relationship Between the Enterprise's Electrical Equipment and Mechanical Equipment Based on Graph Theory Financial Accounting Measurement Model Based on Numerical Analysis of Rigid Normal Differential Equation and Rigid Functional Equation Mathematical Modeling and Forecasting of Economic Variables Based on Linear Regression Statistics Design of Morlet wavelet neural network to solve the non-linear influenza disease system Nonlinear Differential Equations in Cross-border E-commerce Controlling Return Rate Differential equation model of financial market stability based on Internet big data 3D Mathematical Modeling Technology in Visualized Aerobics Dance Rehearsal System Children’s cognitive function and mental health based on finite element nonlinear mathematical model Motion about equilibrium points in the Jupiter-Europa system with oblateness Fractional Differential Equations in Electronic Information Models Badminton players’ trajectory under numerical calculation method BIM Engineering Management Oriented to Curve Equation Model Optimal preview repetitive control for impulse-free continuous-time descriptor systems Development of main functional modules for MVB and its application in rail transit Study on the impact of forest fire prevention policy on the health of forest resources Mathematical Method to Construct the Linear Programming of Football Training The Size of Children's Strollers of Different Ages Based on Ergonomic Mathematics Design Stiffness Calculation of Gear Hydraulic System Based on the Modeling of Nonlinear Dynamics Differential Equations in the Progressive Method Relationship Between Enterprise Talent Management and Performance Based on the Structural Equation Model Method Value Creation of Real Estate Company Spin-off Property Service Company Listing Selection by differential mortality rates Digital model creation and image meticulous processing based on variational partial differential equation Dichotomy model based on the finite element differential equation in the educational informatisation teaching reform model Nonlinear Dissipative System Mathematical Equations in the Multi-regression Model of Information-based Teaching The modelling and implementation of the virtual 3D animation scene based on the geometric centre-of-mass algorithm The policy efficiency evaluation of the Beijing–Tianjin–Hebei regional government guidance fund based on the entropy method The transfer of stylised artistic images in eye movement experiments based on fuzzy differential equations Research on behavioural differences in the processing of tenant listing information: An eye-movement experiment A review of the treatment techniques of VOC Some classes of complete permutation polynomials in the form of ( x ^{pm}−x +δ )^{s}+ax ^{pm}+bx overF _{p2m}The consistency method of linguistic information and other four preference information in group decision-making Research on the willingness of Forest Land’s Management Rights transfer under the Beijing Forestry Development A mathematical model of the fractional differential method for structural design dynamics simulation of lower limb force movement step structure based on Sanda movement Fractal structure of magnetic island in tokamak plasma Numerical calculation and study of differential equations of muscle movement velocity based on martial articulation body ligament tension Study on the maximum value of flight distance based on the fractional differential equation for calculating the best path of shot put Sports intensity and energy consumption based on fractional linear regression equation Analysis of the properties of matrix rank and the relationship between matrix rank and matrix operations Study on Establishment and Improvement Strategy of Aviation Equipment Research on Financial Risk Early Warning of Listed Companies Based on Stochastic Effect Mode Characteristics of Mathematical Statistics Model of Student Emotion in College Physical Education Mathematical Calculus Modeling in Improving the Teaching Performance of Shot Put Application of Nonlinear Differential Equation in Electric Automation Control System Nonlinear strategic human resource management based on organisational mathematical model Higher Mathematics Teaching Curriculum Model Based on Lagrangian Mathematical Model Optimization of Color Matching Technology in Cultural Industry by Fractional Differential Equations The Marketing of Cross-border E-commerce Enterprises in Foreign Trade Based on the Statistics of Mathematical Probability Theory The Evolution Model of Regional Tourism Economic Development Difference Based on Spatial Variation Function The Inner Relationship between Students' Psychological Factors and Physical Exercise Based on Structural Equation Model (SEM) Fractional Differential Equations in Sports Training in Universities Higher Education Agglomeration Promoting Innovation and Entrepreneurship Based on Spatial Dubin Model