Informacje o czasopiśmie
Format
Czasopismo
eISSN
2444-8656
Pierwsze wydanie
01 Jan 2016
Częstotliwość wydawania
2 razy w roku
Języki
Angielski
Otwarty dostęp

Realization of Book Collaborative Filtering Personalized Recommendation System Based on Linear Regression Equation

Przyjęty: 12 Apr 2022
Informacje o czasopiśmie
Format
Czasopismo
eISSN
2444-8656
Pierwsze wydanie
01 Jan 2016
Częstotliwość wydawania
2 razy w roku
Języki
Angielski
Model and algorithm analysis
Equation Model
Definition 1

The linear regression equation constructed according to the reading satisfaction of books can be used to express the uncertain linear relationship between Y and XI. The specific formula is as follows: $y=∑i=13βixi+ε$ y = \sum\limits_{i = 1}^3 {{\beta _i}{x_i} + \varepsilon }

In the above formula, Y represents the dependent variable, namely, reader satisfaction, xi represents the evaluation factor affecting reader satisfaction, β is a regression coefficient, and ε refers to random error.

Theorem 1

It is assumed that the satisfaction of library users will be affected by three indicators, the first is collection resources and equipment X1, the second is library service attitude X2, and the last is service and opening time x3. According to the accumulated data collected, the usual form of the assumed ternary linear regression equation is: $y^=b0+b1x1+b2x2+b3x3$ \hat y = {b_0} + {b_1}{x_1} + {b_2}{x_2} + {b_3}{x_3}

Then the least square estimation can be used to obtain: $b1=L1y−L12b2−L13b3L11b2=(L1yL21−L2yL11)(L132−L33L11)−(L1yL31−L3yL11)(L13L12−L11L23)(L122−L22L11)(L132−L33L11)−(L13L12−L11L23)2b3=(L1yL31−L3yL11)(L122−L22L11)−(L1yL21−L2yL11)(L13L12−L11L23)(L13L12−L11L23)2−(L122−L22L11)(L132−L33L11)b0=y¯−b1x¯1−b2x¯2−b3x¯3$ \eqalign{ & {b_1} = {{{L_{1y}} - {L_{12}}{b_2} - {L_{13}}{b_3}} \over {{L_{11}}}} \cr & {b_2} = {{\left( {{L_{1y}}{L_{21}} - {L_{2y}}{L_{11}}} \right)\left( {L_{13}^2 - {L_{33}}{L_{11}}} \right) - \left( {{L_{1y}}{L_{31}} - {L_{3y}}{L_{11}}} \right)\left( {{L_{13}}{L_{12}} - {L_{11}}{L_{23}}} \right)} \over {\left( {L_{12}^2 - {L_{22}}{L_{11}}} \right)\left( {L_{13}^2 - {L_{33}}{L_{11}}} \right) - {{\left( {{L_{13}}{L_{12}} - {L_{11}}{L_{23}}} \right)}^2}}} \cr & {b_3} = {{\left( {{L_{1y}}{L_{31}} - {L_{3y}}{L_{11}}} \right)\left( {L_{12}^2 - {L_{22}}{L_{11}}} \right) - \left( {{L_{1y}}{L_{21}} - {L_{2y}}{L_{11}}} \right)\left( {{L_{13}}{L_{12}} - {L_{11}}{L_{23}}} \right)} \over {{{\left( {{L_{13}}{L_{12}} - {L_{11}}{L_{23}}} \right)}^2} - \left( {L_{12}^2 - {L_{22}}{L_{11}}} \right)\left( {L_{13}^2 - {L_{33}}{L_{11}}} \right)}} \cr & {b_0} = \bar y - {b_1}{{\bar x}_1} - {b_2}{{\bar x}_2} - {b_3}{{\bar x}_3} \cr}

Among them, $Lij=Lji=∑m=1197(xim−x¯i)(xjm−x¯j)=∑m=1197ximxjm−(∑m=1197xi)(∑m=1197xj)m,i,j=1,2,3,Liy=∑m=1197(xim−x¯i)(y−y¯)=∑m=1197ximym−(∑m=1197xim)(∑m=1197ym)197,i,j=1,2,3,y¯=∑m=1197ym197,x¯1=∑m=1197x1m197,x¯2=∑m=1197x2m197,x¯3=∑m=1197x3m197$ \eqalign{ & {L_{ij}} = {L_{ji}} = \sum\limits_{m = 1}^{197} {\left( {{x_{im}} - {{\bar x}_i}} \right)\left( {{x_{jm}} - {{\bar x}_j}} \right) = \sum\limits_{m = 1}^{197} {{x_{im}}{x_{jm}} - {{\left( {\sum\limits_{m = 1}^{197} {{x_i}} } \right)\left( {\sum\limits_{m = 1}^{197} {{x_j}} } \right)} \over m},i,j = 1,2,3,} } \cr & {L_{iy}} = \sum\limits_{m = 1}^{197} {\left( {{x_{im}} - {{\bar x}_i}} \right)\left( {y - \bar y} \right) = \sum\limits_{m = 1}^{197} {{x_{im}}{y_m} - {{\left( {\sum\limits_{m = 1}^{197} {{x_{im}}} } \right)\left( {\sum\limits_{m = 1}^{197} {{y_m}} } \right)} \over {197}},i,j = 1,2,3,} } \cr & \bar y = {{\sum\limits_{m = 1}^{197} {{y_m}} } \over {197}},{{\bar x}_1} = {{\sum\limits_{m = 1}^{197} {{x_{1m}}} } \over {197}},{{\bar x}_2} = {{\sum\limits_{m = 1}^{197} {{x_{2m}}} } \over {197}},{{\bar x}_3} = {{\sum\limits_{m = 1}^{197} {{x_{3m}}} } \over {197}} \cr}

Calculate the available $L11=16.1,L22=58.4,L33=83.6,L12=L21=−4.8,L13=L31=−9.6,L23=L32=13.8,L1y=4,L2y=53,L3y=106,b1=0.54,b2=0.167,b3=0.019,b0=29.85$ \eqalign{ & {L_{11}} = 16.1,{L_{22}} = 58.4,{L_{33}} = 83.6,{L_{12}} = {L_{21}} = - 4.8,{L_{13}} = {L_{31}} = - 9.6,{L_{23}} = {L_{32}} = 13.8, \cr & {L_{1y}} = 4,{L_{2y}} = 53,{L_{3y}} = 106,{b_1} = 0.54,{b_2} = 0.167,{b_3} = 0.019,{b_0} = 29.85 \cr}

It can be seen that the linear regression equation between y and X1, x2 and x3 is: $y^=14.165+1.207x1+0.712x2+1.189x3$ \hat y = 14.165 + 1.207{x_1} + 0.712{x_2} + 1.189{x_3}

The variance analysis results are as follows: $Lyy=∑m=1197(y−y¯)2=∑m=1197ym2−(∑m=1197ym)2197=219,U=∑m=1197(y^m−y¯)2=∑m=1197biLiym=169.78,i=1,2,3,Q=Lyy−U=49.22$ \eqalign{ & {L_{yy}} = \sum\limits_{m = 1}^{197} {{{\left( {y - \bar y} \right)}^2} = \sum\limits_{m = 1}^{197} {y_m^2 - {{{{\left( {\sum\limits_{m = 1}^{197} {{y_m}} } \right)}^2}} \over {197}} = 219,} } \cr & U = \sum\limits_{m = 1}^{197} {{{\left( {{{\hat y}_m} - \bar y} \right)}^2} = \sum\limits_{m = 1}^{197} {{b_i}{L_i}{y_m} = 169.78,i = 1,2,3,} } \cr & Q = {L_{yy}} - U = 49.22 \cr}

The standardized regression coefficients obtained from the above research not only directly show the influence of various indicators on the overall satisfaction, but also grasp more basic information related to system design. For example, among the factors studied in constructing the equation model in this paper, collection resources and facilities are the most influential, followed by service and opening time, and finally the service attitude of library staff. Therefore, it is very important to select the collaborative filtering algorithm based on the results of linear regression equation analysis to build a personalized recommendation system for libraries, which is not only the basic platform of collection resources, but also an important facility for the implementation of various services.

Personalized recommendation algorithm
Proposition 2

On the one hand, content recommendation. This technology belongs to text processing content in essence, and its practical application and theoretical research are very mature. The most common application method is TF-IDF algorithm, which uses keywords to analyze wiJ of important level of text. The specific formula is as follows[1]: $wij=TFijIDFi=fijmaxzfzjlogNni$ {w_{ij}} = {{T{F_{ij}}} \over {ID{F_i}}} = {{{f_{ij}}} \over {{\max_z}{f_{zj}}}}\log {N \over {{n_i}}}

In the above formula, N represents the total number of texts, fij represents the number of occurrences of key word Ki in text DJ, and the actual text DJ can use vector DJ = (w1j, w2j... , WKJ), k refers to the number of keywords.

If Userprofile (C) = (w1c, w2c... , WKC) refers to the user's preference degree for various keywords. Then ric calculation formula of user C's preference score for text I is shown as follows: $ric=∑j=1kwjcwji/∑j=1kwjc2∑j=1kwji2$ {r_{ic}} = \sum\nolimits_{j = 1}^k {{w_{jc}}{w_{ji}}/\sqrt {\sum\nolimits_{j = 1}^k {w_{jc}^2\sum\nolimits_{j = 1}^k {w_{ji}^2} } } }

The principle of content recommendation technology is very simple. It only needs to define the fixed content attributes of users and projects. Therefore, problems such as system cold start and data sparsity will not occur during operation. The accuracy of actual recommendations can be improved by assuming that users and project content can be represented visually with features.

Collaborative filtering, on the other hand. The principle of this technology can be divided into two types, one is memory algorithms, such as user-based or project-based collaborative filtering; The other is model algorithm, such as clustering algorithm, graph model algorithm and other content.

Firstly, user-based collaborative filtering needs to know user preference information first, and then calculate the similarity between active users and other users. Then select neighbors for active users according to similarity; Finally, the specific scores of active users for evaluation items are predicted by combining the historical preference information of similar neighbors, and the recommendation results are formed. The specific flow chart is as follows

Lemma 3

Firstly, the similarity of users should be calculated. The most commonly used method is cosine similarity and Pearson correlation coefficient. The specific formula is as follows: $simuv=cos(u,v)=uv|u||v|=∑i∈IUVruirvi∑i∈Iurui2∑i∈Ivrvi2$ {{sim}_{uv}} = \cos (u,v) = {{uv} \over {\left| u \right|\left| v \right|}} = {{\sum\nolimits_{i \in {I_{UV}}} {{r_{ui}}{r_{vi}}} } \over {\sqrt {\sum\nolimits_{i \in {I_u}} {r_{ui}^2} } \sqrt {\sum\nolimits_{i \in {I_v}} {r_{vi}^2} } }}

In the above formula, simuv represents the similarity between user U and user V, u and V represent the scoring vectors of two users, | u | and | v | represent the modules of two scoring vectors, rui and Rvi represent the scores of two users on item I.

The Pearson correlation coefficient can be calculated as follows: $simuv=∑i∈IUV(rui−r¯u)(rvi−r¯v)∑i∈Iuv(rui−r¯u)2∑i∈Iuv(rvi−r¯v)2$ {{sim}_{uv}} = {{\sum\nolimits_{i \in {I_{UV}}} {\left( {{r_{ui}} - {{\bar r}_u}} \right)\left( {{r_{vi}} - {{\bar r}_v}} \right)} } \over {\sqrt {\sum\nolimits_{i \in {I_{uv}}} {{{\left( {{r_{ui}} - {{\bar r}_u}} \right)}^2}} } \sqrt {\sum\nolimits_{i \in {I_{uv}}} {{{\left( {{r_{vi}} - {{\bar r}_v}} \right)}^2}} } }}

In the above formula, Rui and Rvi represent the score of user U and user V for item I, the average score of users, and Iuv represents the set of items jointly scored by two users.

Secondly, it is necessary to analyze and search for similar neighbors. The most common way is to use the K-nearest neighbor method, that is, to select K users with the largest similarity with the current active users as the actual similar neighbors. At the same time, after setting a certain threshold, the similarity between the user and the current active user can be judged. If the value exceeds the threshold, user V is regarded as a similar neighbor.

Corollary 4

Finally, to study and generate recommendation results, the simplest way is to average the scores of neighboring users, and the specific formula is as follows[2]: $rui=1k∑v∈Nrvi$ {r_{ui}} = {1 \over k}\sum\nolimits_{v \in N} {{r_{vi}}}

In the above formula, N represents the similar nearest neighbors set of active users, and k represents the actual number of nearest neighbors in N. In order to obtain better recommendation effect, the method of weighted average can be used for calculation, one is to directly carry out weighted average of neighbor scores, the other is to calculate the increase of user scores first, and then carry out weighted average calculation, the specific formula is as follows: $rui=∑v∈Nsimuvrvi∑v∈N|simuv|rui=r¯u+∑v∈Nsimuv(rvi−r¯v)∑v∈N|simuv|$ \eqalign{ & {r_{ui}} = {{\sum\nolimits_{v \in N} {{{sim}_{uv}}{r_{vi}}} } \over {\sum\nolimits_{v \in N} {\left| {{{sim}_{uv}}} \right|} }} \cr & {r_{ui}} = {{\bar r}_u} + {{\sum\nolimits_{v \in N} {{{sim}_{uv}}\left( {{r_{vi}} - {{\bar r}_v}} \right)} } \over {\sum\nolimits_{v \in N} {\left| {{{sim}_{uv}}} \right|} }} \cr}

Second, collaborative filtering algorithm based on item. According to the analysis of the operation flow chart shown in the figure below, there are two ways to calculate the similarity of items first, one is cosine vector, and the corresponding calculation formula is as follows: $simij=cos(i,j)=rirj|ri||rj|=∑u∈Uijruiruj∑u∈Uirui2∑u∈Ujruj2$ {{sim}_{ij}} = \cos (i,j) = {{{r_i}{r_j}} \over {\left| {{r_i}} \right|\left| {{r_j}} \right|}} = {{\sum\nolimits_{u \in {U_{ij}}} {{r_{ui}}{r_{uj}}} } \over {\sqrt {\sum\nolimits_{u \in {U_i}} {r_{ui}^2} } \sqrt {\sum\nolimits_{u \in {U_j}} {r_{uj}^2} } }}

The other refers to the correlation coefficient, and the actual calculation formula is: $simij=∑u∈Uij(rui−ri)(rui−rj)∑u∈Uij(rui−ri)2∑u∈Uij(ruj−rj)2$ {{sim}_{ij}} = {{\sum\nolimits_{u \in {U_{ij}}} {\left( {{r_{ui}} - {r_i}} \right)\left( {{r_{ui}} - {r_j}} \right)} } \over {\sqrt {\sum\nolimits_{u \in {U_{ij}}} {{{\left( {{r_{ui}} - {r_i}} \right)}^2}} } \sqrt {\sum\nolimits_{u \in {U_{ij}}} {{{\left( {{r_{uj}} - {r_j}} \right)}^2}} } }}

Conjecture 5

Secondly, similar neighbors should be found, and the selection method is consistent with user-based collaborative filtering algorithm. Finally, the weighted average algorithm is also used to obtain the recommendation results, the specific formula is as follows: $rui=∑j∈Nsimijruj∑j∈N|simij|rui=r¯i+∑j∈Nsimij(ruj−r¯j)∑j∈N|simij|$ \eqalign{ & {r_{ui}} = {{\sum\nolimits_{j \in N} {{{sim}_{ij}}{r_{uj}}} } \over {\sum\nolimits_{j \in N} {\left| {{{sim}_{ij}}} \right|} }} \cr & {r_{ui}} = {{\bar r}_i} + {{\sum\nolimits_{j \in N} {{{sim}_{ij}}\left( {{r_{uj}} - {{\bar r}_j}} \right)} } \over {\sum\nolimits_{j \in N} {\left| {{{sim}_{ij}}} \right|} }} \cr}

Thirdly, model-based collaborative filtering needs to construct a recommendation model by machine learning or mathematical model in accordance with the user's scoring data, so as to recommend the content with the largest predicted value to users on the basis that there is no scoring item value in the predictive analysis.

According to the analysis of the personalized recommendation system proposed in the above research, it can be seen that in order to evaluate their recommendation effect in the book service system, the subordinate content should be selected for analysis:

First, it refers to the error standard. This calculation should start from two aspects. On the one hand, it refers to the average absolute error, which is one of the most common and simplest standards for system performance evaluation. The accuracy of the actual data is judged by comparing and analyzing the deviation between the predicted average value of the system and the actual score value. Example 6. For target user U, the score data of the actual test set is {Q1, Q2... , qn}, the data set of prediction score obtained by system calculation is {P1, P2... Pn}, then the corresponding mean absolute error can be calculated as follows[3]: $MAE=∑i=1n|pi−qi|n$ {MAE} = {{\sum\nolimits_{i = 1}^n {\left| {{p_i} - {q_i}} \right|} } \over n}

On the other hand, it refers to the mean square error, which can intuitively show the discrete level of scoring data. The corresponding calculation formula is as follows: $RMSE=1n∑i=1n(pi−qi)2$ {RMSE} = \sqrt {{1 \over n}\sum\nolimits_{i = 1}^n {{{\left( {{p_i} - {q_i}} \right)}^2}} }

In the above formula, n refers to the number of scores, QI refers to the actual score result of users, and PI refers to the predicted score result of users.

The second is the probability of hit ratio. This calculation should also start from two points, one is the accuracy rate, the specific formula is as follows: $Precision=∑u∈U|R(u)∩T(u)|/|R(u)|$ {Precision = }\sum\nolimits_{u \in U} {\left| {R(u) \cap T(u)} \right|/\left| {R(u)} \right|}

Note 7

In the above formula, T (u) represents the test data set, and R (u) represents the list of recommended items finally constituted by the system.

The second is the recall rate, the specific formula is as follows: $Recall=∑u∈U|R(u)∩T(u)|/|T(u)|$ {Recall} = \sum\nolimits_{u \in U} {\left| {R(u) \cap T(u)} \right|/\left| {T(u)} \right|}

The last refers to the coverage rate, which is the proportion of the final recommended projects in the total number of projects, visually presenting the algorithm recommendation level. The specific formula is as follows: $Coverage=|∪u∈UR(u)|/|I|$ {Coverage} = \left| {{ \cup _{u \in U}}R(u)} \right|/\left| I \right|

In the above formula, U represents the user set, R (U) represents the combination of recommended items for a certain user in the user set, and I represents the set of all commodities.

Clustering algorithm
Open Problem 8

Clustering is a process of dividing the data in the data set into different classes according to specified standards. It does not require any prior knowledge of classification information, but selects specific mathematical methods to automatically complete classification according to the rules. Suppose the data sample set is {X}, which contains N m-dimensional vectors, divided into k categories in total. If the center point of data class Cj is Y={y1, y2... Ym}, then the clustering criterion is that the sum of the distances between sample points in the class and the cluster center should be minimized. The actual calculation formula is as follows[4]: $min∑j=1k∑x∈cjd(x,y)$ \min \sum\limits_{j = 1}^k {\sum\limits_{x \in {c_j}} {d\left( {x,y} \right)} }

In the above formula, Y represents the center point of class Cj, x represents the vector of class Cj, and the distance d (x, y) between vector X and the center point of clustering can be calculated as follows: $d(x,y)=∑i=1m|xi−yi|qq$ d\left( {x,y} \right) = \root q \of {\sum\limits_{i = 1}^m {{{\left| {{x_i} - {y_i}} \right|}^q}} }

In the above formula, m represents the dimension of the data, d (x, y) represents the Manhattan distance if q is equal to 1, and D (x, y) represents the Euclidean distance if q is equal to 2.To improve the traditional clustering algorithm, the clustering center should be selected first, that is, the final cluster number. It should be noted that quantity selection is subjective because it can be influenced by the user's experience. Therefore, in order to avoid affecting the final value, this paper proposes the idea of self-organizing clustering, that is, a membership threshold is specified to automatically finish the clustering process, and the actual number is determined according to the algorithm.

Analysis of recommendation algorithm based on fuzzy clustering

As the application content proposed by Gori, Pucci and others in their research, ItemRank algorithm will use the scoring mechanism of random walk to recommend. This algorithm is mainly divided into two parts, one is to constitute the association graph, the other is to carry out random walk in the association graph. When the association graph is generated, all products are regarded as nodes on the way of association, and there is a weight value WIJ between points. The actual size is the number of users scoring points, and the corresponding association matrix is[5]: $W=[w11w12w1mw21w22w2mwm1wm2wmm]$ W = \left[ {\matrix{ {{w_{11}}{w_{12}}{w_{1m}}} \hfill\cr {{w_{21}}{w_{22}}{w_{2m}}} \hfill\cr {{w_{m1}}{w_{m2}}{w_{mm}}} \hfill\cr } } \right]

The final correlation matrix can be obtained in the normalization process.

Assume that all user UJ's meet the following conditions: $1≤i≤NSi(0)=[1M1M1M]T$ \eqalign{ & 1 \le i \le N \cr & {S_i}(0) = {\left[ {{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 M}}\right.}\!\lower0.7ex\hbox{$M$}}}{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 M}}\right.}\!\lower0.7ex\hbox{$M$}}}{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 M}}\right.}\!\lower0.7ex\hbox{$M$}}}} \right]^T} \cr}

So for M vectors, you can get it: $Si(t+1)=αWsi(t)+(1−α)RiT$ {S_i}(t + 1) = \alpha W{s_i}(t) + \left( {1 - \alpha } \right)R_i^T

In the above formula, Ri represents the ith row in the scoring matrix R, and α represents the user defined constant term, which is generally 0.85. The improved content of ItemRank algorithm based on clustering is shown as follows:

The first step is to tag the user. In order to improve the computational efficiency of the application algorithm, the user rating vector can be analyzed by clustering. The improved clustering algorithm proposed in this paper only needs to propose predefined constants and does not need to specify the number of classes directly. At the same time, according to the scoring habits of different users, all scoring data can be normalized before clustering, so as to reduce data error. If the scoring vector of all user UI is [RI1, RI2... rim], then the normalized calculation formula is: $xij=rij∑k=1mrik1≤i≤n,1≤j≤m$ {x_{ij}} = {{{r_{ij}}} \over {\sum\limits_{k = 1}^m {{r_{ik}}} }}1 \le i \le n,1 \le j \le m

In the above formula, n and m represent the number of users and products in the system, and the user UI rating vector after normalization will be transformed into [xi1, XI2,... xim].

The second is matrix dimension reduction. For all products pj, its z-dimensional eigenvector xj is regarded as (xj1, xj2... , XJZ), z represents the number of clustering in the above operations, and the element numerical calculation formula of feature vector Xj is as follows: $xjk=P(ck|pj)=∑d=1nrdj×δdk∑d=1nrdj,1≤k≤z$ {x_{jk}} = P\left( {{c_k}|{p_j}} \right) = {{\sum\nolimits_{d = 1}^n {{r_{dj}} \times {\delta _{dk}}} } \over {\sum\nolimits_{d = 1}^n {{r_{dj}}} }},1 \le k \le z

Where the value range of δ DK is controlled as follows $δdk={1yd=ck0yd≠ck$ {\delta _{dk}} = \left\{ {\matrix{ 1 \hfill & {{y_d} = {c_k}} \hfill \cr 0 \hfill & {{y_d} \ne {c_k}} \hfill \cr } } \right.

The actual matrix T is: $T=[t11t12t1qt21t22t2ptm1tm2tmq]$ T = \left[ \matrix{ {t_{11}}{t_{12}}{t_{1q}} \hfill \cr {t_{21}}{t_{22}}{t_{2p}} \hfill \cr {t_{m1}}{t_{m2}}{t_{mq}} \hfill \cr} \right]

Transform the original high-dimensional scoring matrix of order N ×m into low-dimensional scoring matrix, and the specific formula is as follows: $B=[B1B2Bn]=[R1R2Rn]T$ B = \left[ {\matrix{ {{B_1}} \cr {{B_2}} \cr {{B_n}} \cr } } \right] = \left[ {\matrix{ {{R_1}} \cr {{R_2}} \cr {{R_n}} \cr } } \right]T

Again, it refers to composing the association graph. Among them, the weight calculation formula between nodes is as follows: $wij=∑k=1nlimit(bkibkj)$ {w_{ij}} = \sum\limits_{k = 1}^n {\lim it\left( {{{{b_{ki}}} \over {{b_{kj}}}}} \right)}

Among them, $limit(a1a2)={0 a1=0 or a2=0a1a2 a1 \lim it\left( {{{{a_1}} \over {{a_2}}}} \right) = \left\{ \matrix{ 0\,\,\,{a_1} = 0\,\,\,{or}\,\,\,{a_2} = 0 \hfill \cr {{{a_1}} \over {{a_2}}}\,\,\,\,\,\,\,\,{a_1} < {a_2} \hfill \cr 1\,\,\,\,\,{other}{\kern 1pt} \hfill \cr} \right.

In the above formula, if the weight value is too large, it is difficult to accurately recommend the product after partial clustering. Therefore, the actual upper limit can be set to 1. The corresponding association matrix is as follows: $W=[w11w12w1qw21w22w2qwq1wq2wqq]$ W = \left[ {\matrix{ {{w_{11}}{w_{12}}{w_{1q}}} \cr {{w_{21}}{w_{22}}{w_{2q}}} \cr {{w_{q1}}{w_{q2}}{w_{qq}}} \cr } } \right]

Normalization is carried out according to Founder's recommendation, and the specific formula is: $Qj=∑k=1qwkj wij=wijQj$ {Q_j} = \sum\limits_{k = 1}^q {{w_{kj}}\,\,\,\,\,\,\,\,\,\,{w_{ij}} = {{{w_{ij}}} \over {{Q_j}}}}

In the above formula, the values of I and j are lower than q.

Finally, it refers to the recommendation, and the specific formula is as follows: $Si[j]=tj1Vi[1]+tj2Vi[2]++tjqVi[q]$ {S_i}\left[ j \right] = {t_{j1}}{V_i}\left[ 1 \right] + {t_{j2}}{V_i}\left[ 2 \right] + + {t_{jq}}{V_i}\left[ q \right]

Empirical research on collaborative filtering personalized recommendation model of books
Data Preprocessing

Based on the analysis of multiple linear regression equation constructed by the above research, and based on the collection of relevant book data resources, it is clear that three influencing factors, namely collection resources and facilities, service and opening time, and service attitude of library staff, have an impact on readers’ satisfaction. At this time, in order to meet the needs of readers and build an effective recommendation system by using the collaborative filtering algorithm studied above, it is necessary to pre-process the book information to remove worthless original records and save only the information useful for the recommendation process. The actual data cleaning code is shown in the Table 1 below:

Part of the code for data cleansing

 SELECT User ID Book ID Day FROM Borrow_Update WHERE UserID In (SELECT Distinct(User ID) FROM Borrow_Update GROUP BY User ID HAVING Count(*)>=10)

At the same time, the scoring information should be obtained according to the above equation model. Generally speaking, the longer a person borrows a book, the more he likes it. Therefore, the borrowing time of a user can be regarded as scoring information. If the user scores a book within 10 days and borrows a book for 90 days, the actual scoring formula is as follows: $rui={10tui≥90tui90×10tui<90$ {r_{ui}} = \left\{ {\matrix{ {10} \hfill & {{t_{ui}} \ge 90} \hfill \cr {{{{t_{ui}}} \over {90}} \times 10} \hfill & {{t_{ui}} < 90} \hfill\cr } } \right.

In the above formula, Rui represents user U's score for book I, and TUi represents user U's borrowing time for book I. By calculating the average number of days for users to borrow books, the specific formula is as follows: $tu=∑i∈B(u)tui|B(u)|$ {t_u} = {{\sum\limits_{i \in B\left( u \right)} {{t_{ui}}} } \over {\left| {B\left( u \right)} \right|}}

And put into the above scoring formula, a new scoring formula can be obtained[7,8]: $rui={10tui≥2tutui2tu×10tui<2tu$ {r_{ui}} = \left\{ {\matrix{ {10} \hfill& {{t_{ui}} \ge 2{t_u}} \hfill \cr {{{{t_{ui}}} \over {2{t_u}}} \times 10} \hfill& {{t_{ui}} < 2{t_u}} \hfill \cr } } \right.

At the same time, the user's reading time can further reduce the scoring error, as shown in the following formula: $rui={10tui≥tu+tituitu+ti×10tui {r_{ui}} = \left\{ {\matrix{ {10} \hfill & {{t_{ui}} \ge {t_u} + {t_i}} \hfill \cr {{{{t_{ui}}} \over {{t_u} + {t_i}}} \times 10} \hfill & {{t_{ui}} < {t_u} + {t_i}} \hfill \cr } } \right.

In the above formula, TI represents the average number of days a book is borrowed.

Experimental Verification

In this paper, the selected data set is the book borrowing records of a university library in 2019 and 2020. After data cleaning and merging, 10,000 valid borrowing information are finally obtained. On the basis of sorting out and analyzing the multiple linear regression equation model proposed in this paper, the improved fuzzy recommendation algorithm (SCC) studied in this paper and the traditional ItemRank recommendation algorithm are compared and analyzed by designing experiments. The actual evaluation criteria are divided into three parts: MAE, precision and recall. The specific calculation formula is described above.

Users are divided into multiple subsets according to the number of users’ scores, and five users are selected from each interval to participate in the 50% discount cross-validation recommendation. The actual running results are shown in the following Table 2:

Analysis of experimental running results

Interval Indicators MAE Precision Recall
ItemRank SCC ItemRank SCC ItemRank SCC
25 0.9136 0.8759 25.6345 28.918 21.6987 36.8932
50 0.8695 0.8412 26.8482 32.7127 23.6854 35.2653
75 0.8373 0.8223 27.8232 37.0952 21.7249 32.0987
100 0.8223 0.8106 29.2438 39.1182 20.2262 31.9691
125 0.8191 0.8109 31.0859 42.9914 20.6581 32.0761
150 0.8113 0.8101 32.1153 43.2067 21.2249 33.1001
175 0.8104 0.8081 33.2696 45.2217 21.4403 31.9589
200 0.8095 0.8078 35.5989 46.2654 19.9823 31.1310
225 0.8093 0.8073 35.5621 48.1123 21.4682 33.1041
250 0.8088 0.8071 37.6690 48.1182 23.7641 33.6662

Combined with the above chart analysis, it is found that compared with the traditional ItemRank algorithm, the COMPUTATION performance of SCC algorithm studied in this paper has been effectively improved. From the calculation results of mean absolute error, the smaller the actual value is, the faster the prediction progress of the algorithm is and the stronger the application performance of the actual algorithm is. However, the broken lines formed by SCC algorithm in this paper are below the traditional ItemRank algorithm, in other words, the actual performance is superior. When the data is in the range of 25, the gap between them is more obvious, that is to say, when the data is too sparse, the improvement effect of SCC algorithm is more significant than that of traditional ItemRank algorithm. In the calculation of accuracy and recall rate, the higher the value of both, the stronger the application performance of the algorithm. The broken line of SCC algorithm is above the traditional ItemRank algorithm, which proves that the recommendation accuracy of the two algorithms will continue to improve with the increase of interval number, that is to say, the sparsity of data is getting lower and lower, and the number of scores proposed by users is increasing, and the accuracy of the two algorithms will also improve. In this paper, SCC algorithm conducts cluster analysis for collected data, and the actual calculation results are more clear. Therefore, the performance of SCC algorithm during calculation is superior to that of traditional ItemRank algorithm, and it is bound to provide interested books for users eventually[9,10].

Conclusion

To sum up, in the field of books, the construction of collaborative filtering personalized recommendation system based on recommendation algorithm is the main research topic at the present stage. Therefore, this paper selects ItemRank algorithm with clustering improvement as the core during the recommendation period, and the final practice results prove that it can not only effectively improve the comprehensive level of the application algorithm, but also can really recommend the books that users are interested in by complementing each other's advantages.

Analysis of experimental running results

Interval Indicators MAE Precision Recall
ItemRank SCC ItemRank SCC ItemRank SCC
25 0.9136 0.8759 25.6345 28.918 21.6987 36.8932
50 0.8695 0.8412 26.8482 32.7127 23.6854 35.2653
75 0.8373 0.8223 27.8232 37.0952 21.7249 32.0987
100 0.8223 0.8106 29.2438 39.1182 20.2262 31.9691
125 0.8191 0.8109 31.0859 42.9914 20.6581 32.0761
150 0.8113 0.8101 32.1153 43.2067 21.2249 33.1001
175 0.8104 0.8081 33.2696 45.2217 21.4403 31.9589
200 0.8095 0.8078 35.5989 46.2654 19.9823 31.1310
225 0.8093 0.8073 35.5621 48.1123 21.4682 33.1041
250 0.8088 0.8071 37.6690 48.1182 23.7641 33.6662

Part of the code for data cleansing

 SELECT User ID Book ID Day FROM Borrow_Update WHERE UserID In (SELECT Distinct(User ID) FROM Borrow_Update GROUP BY User ID HAVING Count(*)>=10)

Dongmin Mo, Xue-Gang Chen, Sheng Duan, Lu-Da Wang, Qian Wu, Meiling Zhang, Lanqing Xie. Personalized Resource Recommendation Based on Collaborative Filtering Algorithm[J]. Journal of Physics: Conference Series,2019,1302(2): MoDongmin ChenXue-Gang DuanSheng WangLu-Da WuQian ZhangMeiling XieLanqing Personalized Resource Recommendation Based on Collaborative Filtering Algorithm [J]. Journal of Physics: Conference Series 2019 1302 2 10.1088/1742-6596/1302/2/022025 Search in Google Scholar

Yuning Bian, Yeli Li, Qingtao Zeng, Mengyang Liu. Design and implementation of book publishing topic selection system based on collaborative filtering algorithm[J]. IOP Conference Series: Materials Science and Engineering,2019,563(5): BianYuning LiYeli ZengQingtao LiuMengyang Design and implementation of book publishing topic selection system based on collaborative filtering algorithm [J]. IOP Conference Series: Materials Science and Engineering 2019 563 5 10.1088/1757-899X/563/5/052018 Search in Google Scholar

Yonghong Tian, Bing Zheng, Yanfang Wang, Yue Zhang, Qi Wu. College Library Personalized Recommendation System Based on Hybrid Recommendation Algorithm[J]. Procedia CIRP,2019,83: TianYonghong ZhengBing WangYanfang ZhangYue WuQi College Library Personalized Recommendation System Based on Hybrid Recommendation Algorithm [J]. Procedia CIRP 2019 83 10.1016/j.procir.2019.04.126 Search in Google Scholar

Yuning Bian, Yeli Li, Qingtao Zeng, Mengyang Liu. Design and implementation of book publishing topic selection system based on collaborative filtering algorithm[A].The Hong Kong Global Research Association. Proceedings of 20192nd International Conference on Advanced Electronic Materials, Computers and Materials Engineering(AEMCME 2019)[C]. Hong Kong Global Research Association: Hong Kong Global Research Association, 2019:7. BianYuning LiYeli ZengQingtao LiuMengyang Design and implementation of book publishing topic selection system based on collaborative filtering algorithm[A].The Hong Kong Global Research Association Proceedings of 20192nd International Conference on Advanced Electronic Materials, Computers and Materials Engineering(AEMCME 2019) [C]. Hong Kong Global Research Association: Hong Kong Global Research Association 2019 7 Search in Google Scholar

Yangdi Liu. Data Mining of University Library Management Based on Improved Collaborative Filtering Association Rules Algorithm[J]. Wireless Personal Communications,2018,102(4): LiuYangdi Data Mining of University Library Management Based on Improved Collaborative Filtering Association Rules Algorithm [J]. Wireless Personal Communications 2018 102 4 10.1007/s11277-018-5409-y Search in Google Scholar

Muhammad Jabbar, Qaisar Javaid, Muhammad Arif, Asim Munir, Ali Javed. An Efficient and Intelligent Recommender System for Mobile Platform[J]. Mehran University Research Journal of Engineering and Technology,2018,37(4): JabbarMuhammad JavaidQaisar ArifMuhammad MunirAsim JavedAli An Efficient and Intelligent Recommender System for Mobile Platform [J]. Mehran University Research Journal of Engineering and Technology 2018 37 4 10.22581/muet1982.1804.02 Search in Google Scholar

Anu Taneja, Anuja Arora. Cross domain recommendation using multidimensional tensor factorization[J]. Expert Systems With Applications,2018,92: TanejaAnu AroraAnuja Cross domain recommendation using multidimensional tensor factorization [J]. Expert Systems With Applications 2018 92 10.1016/j.eswa.2017.09.042 Search in Google Scholar

Xiaoqiang Guo, Lichao Feng, Yalou Liu, Xiuli Han. Collaborative filtering model of book recommendation system[J]. Int. J. of Advanced Media and Communication,2016,6(2/3/4): GuoXiaoqiang FengLichao LiuYalou HanXiuli Collaborative filtering model of book recommendation system [J]. Int. J. of Advanced Media and Communication 2016 6 2/3/4 10.1504/IJAMC.2016.080974 Search in Google Scholar

Belgaid, Youcef, Helal, Mohamed and Venturino, Ezio. “Mathematical analysis of a B-cell chronic lymphocytic leukemia model with immune response” Applied Mathematics and Nonlinear Sciences, vol.4, no.2, 2019, pp.551–558. https://doi.org/10.2478/AMNS.2019.2.00052 BelgaidYoucef HelalMohamed VenturinoEzio “Mathematical analysis of a B-cell chronic lymphocytic leukemia model with immune response” Applied Mathematics and Nonlinear Sciences 4 2 2019 551 558 https://doi.org/10.2478/AMNS.2019.2.00052 10.2478/AMNS.2019.2.00052 Search in Google Scholar

Sulaiman, Tukur Abdulkadir, Bulut, Hasan and Atas, Sibel Sehriban. “Optical solitons to the fractional Schrdinger-Hirota equation” Applied Mathematics and Nonlinear Sciences, vol.4, no.2, 2019, pp.535–542. https://doi.org/10.2478/AMNS.2019.2.00050 SulaimanTukur Abdulkadir BulutHasan AtasSibel Sehriban “Optical solitons to the fractional Schrdinger-Hirota equation” Applied Mathematics and Nonlinear Sciences 4 2 2019 535 542 https://doi.org/10.2478/AMNS.2019.2.00050 10.2478/AMNS.2019.2.00050 Search in Google Scholar

Polecane artykuły z Trend MD