Along with the popularisation of the internet, e-commerce has gradually integrated into people's lives and become an integral part. With the increasingly fierce competition among major e-commerce companies, it is very necessary to build an excellent recommendation system. Amazon, as a giant in the foreign e-commerce industry, took the lead in using the recommendation system to provide services for users, which not only brought huge economic benefits in a short period of time, but also trained many loyal users. The success of Amazon has led to a new wave of application of recommendation systems in the e-commerce field. Foreign agricultural e-commerce giants, such as LocalHarvest and Hello Fresh, have also deployed their own recommendation systems one after another and achieved considerable economic growth. In China, although the research on recommendation system in China is relatively late, there are also several representative e-commerce platforms that deploy recommendation systems, such as Taobao, JD.COM and Alibaba. Recently, based on the concept of ‘precise poverty alleviation’, e-commerce of agricultural products is gradually entering people's vision as a new economic income-generating point. There are also been e-commerce giants of agricultural products, such as JD.COM Fresh, Fruit Day, No. 1 Fresh and HQW.COM, in China. All these e-commerce platforms provide users with good shopping experience by constructing multiple recommendation models according to their own business characteristics. Furthermore, they have driven the wave of online sales of agricultural products. This shows the importance of recommendation system in the field of agricultural products e-commerce. A good recommendation system can not only improve users’ loyalty, but also create a lot of economic growth.
Collaborative topic regression (CTR) is the first to use LDA and probability matrix factorisation (PMF) to combine item content information and scoring matrix for collaborative training, and it has achieved good results. With the development of deep learning, Wang HAO et al. proposed a collaborative deep learning (CDL) model based on deep learning. To solve the problem that the sparse text information of CTR is inversely proportional to the validity, the model enables the Stacked Denoising Auto-Encoder (SDAE) to better extract the hidden features of the content information of items, and the hidden features are used to constrain the hidden features V of items decomposed by the PMF, which alleviates the cold start problem and further improves the recommendation effect. The model is shown in Figure 1.
Collaborative deep learning model.
As shown in the figure, on the right is the naive Bayesian SDAE proposed by the author. W+ represents the set of offset vectors and weight matrices of each layer in SDAE and L represents the number of layers of SDAE. X0 is a set of vector representing N item content information. When the noise is added, it becomes X1. The role of SDAE is to restore X1 by making X
SDAE in CDL uses bag-of-words to represent the vectors of the text. Although it can automatically extract the potential features of the text, this method considers only the frequency of words, ignores the context of words, and cannot well represent the description information of items. To solve this problem, ConvMF uses CNN to process and extract potential features in the item description text. Its model is shown in Figure 2.
Convolutional matrix factorisation model.
As shown in the figure,
According to the actual business logic of the e-commerce websites crawled and the types of data to be crawled, a web crawler is written in a custom way, thus completing the data crawling work. The data come from large agricultural products e-commerce websites. Generally speaking, it can be divided into three parts: web access, web page analysis and data storage. The overall structure diagram is shown in Figure 3.
Crawler structure diagrams.
To better complete the task of data crawling, Web Collector framework is used to build crawlers. Web Collector is an integration framework whose kernel is written in Java. Due to its feature that it does not need to configure and provide portable API interfaces, it can define crawlers that meet its own task requirements with a small amount of code, thus carrying out online data crawling. Using the initial URL queue to be crawled as the starting URL, the user-defined web crawler is started to be executed, and the page is analysed according to the stored starting URL, thus generating the URL to be analysed.
HTML document is a kind of text document written with unified standards. Label pairs represent various information and information display methods in web pages. Browsers can display web pages by analysing these label pairs. In general, the positions of various label pairs of HTML documents are fixed, and the HTML documents can be read and filtered by line. By writing corresponding filtering rules (such as regular expressions), these features of HTML can be utilised to obtain data.
After parsing the data from the web page successfully, the next step is to store the data. Generally, when storing content, one can choose either relational database represented by MySQL or non-relational database represented by MongoDB. The advantage of non-relational database is that it can directly store the whole data without considering the influence of fields. Using MySQL to store data, setting the form of fields and data types in advance would take up more time and storage space, but it would be more convenient to extract or further process the data, thus reducing some preprocessing work in disguised form.
This article mainly uses three fields of user comment, commodity description and user score to carry out model research and experimental proof, but the initial data directly crawled from the web page are not stan-dardised, so it is necessary to process the crawled original data to generate a training data set for later use. The specific processing steps are as follows:
Filter system default records Clear special characters Clear duplicate content. However, due to some language habits of users, some sentences, such as the common ‘say important things three times’, are repeated, which undoubtedly reduces the efficiency of opinion word retrieval. Text error correction. As user comments belong to the space for users to play freely, it is inevitable that there will be some typos. However, when mining viewpoints and words, typos have a great influence, so it is necessary to correct these typos in advance. Set the comment threshold. If a user has only one comment, it is difficult to obtain user characteristics through this comment, so to better complete the experiment only user records containing at least five user comments are kept.
ConvMF model uses CNN to extract hidden features from item description information, which is greatly improved in comparison with the feature extraction model based on bag-of-words representation. However, the model has also three shortcomings. First, CNN's local field of view principle can capture only short-distance context information, which would result in serious loss in context information between long-distance words. Second, the traditional static word embedding method does not consider the situation that semantics will vary with different contexts. Third, only item description information and scoring matrix are used for modelling, and user comment information is not considered. Therefore, based on ConvMF, an improved collaborative filtering model is proposed by using pretraining model BERT, bidirectional long–short memory network Bi-LSTM and user comment information.
In the previous ConvMF model, the traditional Word2vec was used to construct word vectors, but the word vectors constructed in this way were static. Even in different contexts, the same word has only one coded expression, thus ignoring the situation of polysemy. For example, ‘apple’ can be either ‘fruit’ or ‘mobile phone’. It is obviously unreasonable to use the same static word vector to express it. In addition, although ConvMF uses CNN to extract certain context information from text, due to CNN's own convolution calculation characteristics, the model cannot extract context information between long-distance words, i.e. long-term dependence. To solve this problem, the current common method is to replace the traditional Word2vec with BERT language model. As a powerful pretraining language model, BERT can construct word vectors with rich contextual semantics, which not only solves the problem of polysemy but also directly obtains sentence vectors. A user often comments on multiple items, but there is only one item description, so the processing of user comments is more complicated. Here, taking the processing of user comments as an example, the module is shown in Figure 4.
User comment text processing.
For the comments generated by the user
The sentence-level vector generated by each user comment after Bert layer processing contains rich contextual semantics, so it does not need to be processed by CNN like ConvMF. Bi-LSTM is used here to extract higher-order features from the entire review set. Specifically, each comment of the user reflects a certain user preference. To obtain a potential feature vector of the user, it is necessary to synthesise all the comments of the user. The forward LSTM processes the data from B
To complete our subsequent recommendation tasks, the function of the output layer here is also to map the overall expression A
Finally, through the processing of the above steps, to facilitate the representation, all parameters appearing in the whole model are represented as W
In contrast, the feature extraction of item description text is much simpler. After using Bert to process the description text X
ConvMF model uses CNN to obtain better item features and successfully integrates deep learning model into collaborative filtering algorithm through PMF model. Although good recommendation effect has been achieved, it considers neither the polysemy, nor the improvement of user features to recommendation model. Therefore, to improve these problems, a recommendation model based on user comments and item description is proposed in this chapter. The model introduces both the user's comment information and the item description information, uses the feature extraction model constructed in Section 3.1 to obtain the potential features of the user and the item, respectively and finally uses these features and the potential features decomposed by PMF to combine into the final potential features of the user and the item. The specific model diagram is shown in Figure 5.
A recommendation model based on user comments and item descriptions.
The meaning of each parameter in the figure is basically the same as that in ConvMF, and Y represents the introduced user comment text. The middle part of the model is PMF part, and the left and right sides are feature extraction part in 4.3. 2. These two parts are connected by acting on the final item hidden feature V and user feature U together. The generation process of the model is as follows:
For the feature extraction part, to prevent confusion, the weight and offset parameters in this part are collectively referred to as For each user i:
Potential feature vector of user is
The user potential feature vector generated by the user comments is Final user potential feature vector is For each item j, there are:
Potential feature vector of item is
The item description that generates the item potential feature vectors is Final item potential feature vector is The prediction score
The calculation of the model is similar to that of ConvMF. Since feature extraction is also carried out from user comments, the update rule of
The data set includes 55,231 scores and comments generated by 13,239 users on 1,246 commodities. The composition of the data set is given in Table 1.
Data set composition
Fields | Description |
---|---|
UserID | User id |
ItemID | Commodity id |
Rating | User's rating of commodities |
Review | User comment text |
ItemText | Item content text |
Timestap | Time stamp |
The experimental environment is shown in Table 2.
Experimental environment settings
Environment | Description |
---|---|
Operating system | Centos7 |
Experimental language | Python 3.5.6 |
Deep Learning Framework | Tensorflow_gpu-1.3.0 |
GPU | NVIDIA Tesla |
CUDA | 8 |
cuDNN | 6 |
The main parameters set in the experiment are as follows: the BERT version of the pretraining model used is Chinese_L-12_H-768_A-12, which is officially pretrained and open source by Google. All parameters in this section use their default parameters except the max_seq_length (MSL) parameter. The MSL parameter represents the text length of the input model. Text beyond this length will be truncated. The truncation length of each commodity description text will cover 90% of the commodity description, as will the truncation of each user comment. The data set is randomly divided into three parts according to the proportion of 0.8, 0.1 and 0.1, which are used as training set, verification set and test set, respectively. In addition, the number of user comments contained in each user comment set is set to 5. In the model, the number of hidden units l of LSTM is set to 200, and the dimension t of attention weight vector is set to 400. To prevent overfitting, dropout is added and set to 0.2. Some parameter settings of PMF are the same as those described by ConvMF.
According to the different application fields and objects of concern, the evaluation indices of the recommendation model are also slightly different. MSE is widely used in scoring prediction tasks, so MSE is used as the evaluation standard of the experiment. The equation is as follows:
Among them,
The model proposed in this article is based on PMF and improved by introducing additional data sources and deep learning technology, so we use PMF, CDL and ConvMF to carry out comparative experiments under this data set. The experimental results are shown in Table 3.
Experimental results
Model | MSE |
---|---|
PMF | 1.473 |
CDL | 1.324 |
ConvMF | 1.142 |
Our | 1.031 |
According to the advantages and disadvantages of the above three models, the proposed improved model uses BERT to represent word vectors, which not only solves the problem of polysemy of a word, but also retains rich contextual relationships. Therefore, it obtains more complete feature expression from item description. The model also introduces user comments. BERT is used to obtain rich user features, and Bi-LSTM is used to fully explore the relationship between user comments, which can not only effectively model user features, but also further alleviate the sparse problem of scoring data. Therefore, the model proposed in this article is superior to CDL and ConvMF at the same time, and the MSE value obtained by training is 1.031, which is 9.7% lower than ConvMF.