Détails du magazine
Format
Magazine
eISSN
2444-8656
Première parution
01 Jan 2016
Périodicité
2 fois par an
Langues
Anglais
Accès libre

# Study on Interactive Relations between Enterprise Social Media and Decision Style Based on a vector Autoregressive Model

###### Accepté: 21 Apr 2022
Détails du magazine
Format
Magazine
eISSN
2444-8656
Première parution
01 Jan 2016
Périodicité
2 fois par an
Langues
Anglais
Method analysis
Cosine similarity

Definition 1. Two texts in the text space model are selected to calculate the cosine value of the Angle between vectors, so as to measure and analyze the similarity of text content. Assume that the representation of text A and text B in n-dimensional space is: $(a1, a2 …an) and (b1, b2 …bn)$ \left({{a_1},\,{a_2}\, \ldots {a_n}} \right)\,and\,\left({{b_1},\,{b_2}\, \ldots {b_n}} \right)

Then the cosine similarity calculation formula is as follows: $cos〈A,B〉=A⋅B‖A‖‖B‖=(a1, a2 …an) (b1, b2 …bn)∑i=1nai∑i=1nbi$ \cos \left\langle {A,B} \right\rangle = {{A \cdot B} \over {\left\| A \right\|\left\| B \right\|}} = {{\left({{a_1},\,{a_2}\, \ldots {a_n}} \right)\,\left({{b_1},\,{b_2}\, \ldots {b_n}} \right)} \over {\sqrt {\sum\nolimits_{i = 1}^n {{a_i}}} \sqrt {\sum\nolimits_{i = 1}^n {{b_i}}}}}

After analysis, it can be obtained as follows: $cos〈A,B〉=∑i=1naibi∑i=1nai∑i=1nbi$ \cos \left\langle {A,B} \right\rangle = {{\sum\nolimits_{i = 1}^n {{a_i}{b_i}}} \over {\sqrt {\sum\nolimits_{i = 1}^n {{a_i}}} \sqrt {\sum\nolimits_{i = 1}^n {{b_i}}}}}

Theorem 1. The value range of cosine similarity will be controlled between −1 and 1. The higher the similarity of two texts is, the calculated value of actual cosine similarity will also increase; otherwise, it will decrease.

Tf - idf

In natural language processing, a lot of text is thought of as a vector space with a word at its core. In vector space, all dimensions are subject words, and text is a spatial vector constructed with dimensions as the core. Its coordinates on all dimensions represent the occurrence frequency of subject words in corresponding dimensions. Therefore, M×N word-library matrix can be constructed from corporate social media corpus. This process belongs to feature extraction in text mining.

Proposition 2 The most common method to extract features during text mining is TF-IDF, which will obtain keywords in part of text after calculating and analyzing the product between word frequency and inverse text frequency. It should be noted that word frequency refers to the number of occurrences of the subject word in a certain text, while the frequency of the inverse text belongs to the reciprocal of the occurrence frequency of the subject word in the whole corpus, thus proving the value of this subject word in the expression of the text. From the perspective of time, this method can not only realize dimensional processing, but also alleviate the problem of dimensional explosion which is prone to occur during language. In practice, the specific calculation formula is as follows: $tfidfij=tfij×idfij$ tfid{f_{ij}} = t{f_{ij}} \times id{f_{ij}}

Lemma 3 In the above formula, tFIDfij represents the product of word frequency Tfij and reverse word frequency IDfij.

The formula of Tf word frequency is as follows: $tfij=ni,j∑knk,j$ t{f_{ij}} = {{{n_{i,j}}} \over {\sum\nolimits_k {{n_{k,j}}}}}

The calculation formula of idF inverted word frequency is as follows: $idfi,j=log|D|1+|Dn|$ id{f_{i,j}} = \log {{\left| D \right|} \over {1 + \left| {{D_n}} \right|}}

Conjecture 5 In the above formula, D represents the total number of texts in the corpus, and IDfij represents the actual number of texts containing the feature word Dij. In this study, all the information of microblog social media is preprocessed and an M×N matrix is constructed, where M represents the number of microblog posts and N represents the number of subject words. Then all elements in the matrix can be obtained by using the method, and the specific structure is shown in the following formula: $x1,1…x1,n…xm,1…xm,n$ \matrix{{{x_{1,1}} \ldots {x_{1,n}}} \hfill \cr \ldots \hfill \cr {{x_{m,1}} \ldots {x_{m,n}}} \hfill \cr}

In the matrix above, each row refers to a microblog vector, and the corresponding vector value refers to the weight ratio occupied by all subject words in microblog. This matrix contains the correlation between topic keywords and microblog, which can be further studied by flying matrix decomposition technology.

Non-negative matrix decomposition algorithm

Example 6. This method is an application method with data nonlinear reduction as the core, and requires non-negative constraints during decomposition, so as to ensure that the decomposed matrix can represent the original matrix in dimensionality reduction, so as to achieve the purpose of clustering. In this analysis, the non-negative matrix is assumed to be defined as

X × Rn×m, Find two non-negative matrices WRn×q, and HRm×q, Then we can get: $X≈W×H=X'$ X \approx W \times H = X'

In this way, a higher dimensional matrix can be decomposed into a product of left and right nonnegative matrices. Since non-negative constraints need to be paid attention to during the decomposition, the column vectors in the original matrix X can be regarded as the weight values of all the column vectors in the left matrix W, and the actual weight coefficients are the corresponding elements. The nonnegative matrix factorization is NP problem and can approach local optimum by stepwise iteration.

From the perspective of the square distance loss function, if the noise follows normal distribution, the results of the maximum likelihood estimation method are equivalent to the least square calculation, and the corresponding loss function can be calculated using Euclidean distance formula, as shown below: $D(X|WH)=‖X−WH‖2=∑i,j(Xi,j−(WH)i,j)2$ D\left({X|WH} \right) = {\left\| {X - WH} \right\|^2} = \sum\nolimits_{i,j} {{{\left({{X_{i,j}} - {{\left({WH} \right)}_{i,j}}} \right)}^2}}

The corresponding objective function is: $minW⋅H≥0DE(X|WH)=‖X−WH‖2$ \mathop {\min}\limits_{W \cdot H \ge 0} {D_E}\left({X|WH} \right) = {\left\| {X - WH} \right\|^2}

Open Problem 8. Based on the objective function obtained from the above analysis, if the noise meets the Gaussian distribution, the derivative analysis should be performed by combining the maximum likelihood function first, and then the gradient descent formula is obtained as follows: $∂DE(X|WH)∂wik=(XHT)ik−(WHHT)ik∂DE(X|WH)∂hik=(WTX)kj−(WTWH)kj$ \matrix{{{{\partial {D_E}\left({X|WH} \right)} \over {\partial {w_{ik}}}} = {{\left({X{H^T}} \right)}_{ik}} - {{\left({WH{H^T}} \right)}_{ik}}} \hfill \cr {{{\partial {D_E}\left({X|WH} \right)} \over {\partial {h_{ik}}}} = {{\left({{W^T}X} \right)}_{kj}} - {{\left({{W^T}WH} \right)}_{kj}}} \hfill \cr}

Combined with the formula, the mode of gradient descent is used for iteration, thus the final solution formula can be obtained: $wik←wik−uik∂DE(X|WH)∂wikhkj←hkj−nkj∂DE(X|WH)∂hkj$ \matrix{{{w_{ik}} \leftarrow {w_{ik}} - {u_{ik}}{{\partial {D_E}\left({X|WH} \right)} \over {\partial {w_{ik}}}}} \hfill \cr {{h_{kj}} \leftarrow {h_{kj}} - {n_{kj}}{{\partial {D_E}\left({X|WH} \right)} \over {\partial {h_{kj}}}}} \hfill \cr}

Generally speaking, gradient descent is handled by addition and subtraction, and it is likely that the matrix elements will decrease into negative numbers. In order to ensure that all results belong to integers, the multiplication algorithm can be reasonably used to solve the unconstrained optimization problem, the specific formula is as follows: $uik=wik|WHHT|ik, nik=hkj|HTWH|kj$ {u_{ik}} = {{{w_{ik}}} \over {{{\left| {WH{H^T}} \right|}_{ik}}}},\,{n_{ik}} = {{{h_{kj}}} \over {{{\left| {{H^T}WH} \right|}_{kj}}}}

The formula for transforming the gradient descent method into the multiplication algorithm is: $wik←wik|VHT|ik|WHHT|ikhkj←hkj|WTV|kj|HTWH|kj$ \matrix{{{w_{ik}} \leftarrow {w_{ik}}{{{{\left| {V{H^T}} \right|}_{ik}}} \over {{{\left| {WH{H^T}} \right|}_{ik}}}}} \hfill \cr {{h_{kj}} \leftarrow {h_{kj}}{{{{\left| {{W^T}V} \right|}_{kj}}} \over {{{\left| {{H^T}WH} \right|}_{kj}}}}} \hfill \cr}

Example 7. Iterative analysis of the above formulas can guarantee the non-negative properties of W and H during matrix decomposition, and the convergence of the actual algorithm can also be further proved.

From the perspective of KL divergence loss function, if noise obeys Poisson distribution, the KL divergence loss function corresponding to the loss function can be expressed as follows: $D(X|WH)=∑i,j(XijlogXij(WH)i,j−Eij+(WH)i,j)$ D\left({X|WH} \right) = \sum\nolimits_{i,j} {\left({{X_{ij}}\log {{{X_{ij}}} \over {{{\left({WH} \right)}_{i,j}}}} - {E_{ij}} + {{\left({WH} \right)}_{i,j}}} \right)}

In the above formula, D(X|WH) ≥ 0, in ∑ Xi, j = ∑ Wi, j Hij, j =1 Under this condition, it belongs to KL divergence, and the corresponding minimization objective function is: $minDKL(X|WH)=∑i,j(XijlogXij(WH)i,j−Xij+(WH)i,j)$ \min {D_{KL}}\left({X|WH} \right) = \sum\nolimits_{i,j} {\left({{X_{ij}}\log {{{X_{ij}}} \over {{{\left({WH} \right)}_{i,j}}}} - {X_{ij}} + {{\left({WH} \right)}_{i,j}}} \right)}

For derivation and analysis of the objective function obtained above, the gradient descent algorithm should be selected, as shown below: $wik←wik−uik∂DKL(X|WH)∂wikhkj←hkj−nkj∂DKL(X|WH)∂hkj$ \matrix{{{w_{ik}} \leftarrow {w_{ik}} - {u_{ik}}{{\partial {D_{KL}}\left({X|WH} \right)} \over {\partial {w_{ik}}}}} \hfill \cr {{h_{kj}} \leftarrow {h_{kj}} - {n_{kj}}{{\partial {D_{KL}}\left({X|WH} \right)} \over {\partial {h_{kj}}}}} \hfill \cr}

By transforming the gradient descent algorithm into the multiplication algorithm, it can be obtained: $uik=1∑J=1Jhkj, nik=1∑J=1Jwik$ {u_{ik}} = {1 \over {\sum\nolimits_{J = 1}^J {{h_{kj}}}}},\,{n_{ik}} = {1 \over {\sum\nolimits_{J = 1}^J {{w_{ik}}}}}

The formula for changing the gradient descent algorithm into the multiplication algorithm is: $wik←wik∑J=1Jhkjxik/(WH)ik∑J=1Jhkjhkj←hkj∑J=1Jhkjxik/(WH)ik∑J=1Jhkj$ \matrix{{{w_{ik}} \leftarrow {w_{ik}}{{\sum\nolimits_{J = 1}^J {{h_{kj}}} {x_{ik}}/{{\left({WH} \right)}_{ik}}} \over {\sum\nolimits_{J = 1}^J {{h_{kj}}}}}} \hfill \cr {{h_{kj}} \leftarrow {h_{kj}}{{\sum\nolimits_{J = 1}^J {{h_{kj}}} {x_{ik}}/{{\left({WH} \right)}_{ik}}} \over {\sum\nolimits_{J = 1}^J {{h_{kj}}}}}} \hfill \cr}

The iterative formula of decomposition matrix W and H can be defined.

Build an extraction model based on text themes and interactive relationships
Overall Framework

First in this paper, we study corporate social media data set for data preprocessing, to remove the extraction of no value information, then combines text similarity and interactive relationship to build weibo - relation matrix, weibo - subject matrix, finally using nonnegative matrix decomposition technique and decomposition of the above model, thus obtaining weibo this social platform of clustering results, Relevant keywords were obtained to represent the theme. The actual research framework is shown in the figure 1 below[1,2]:

Data Preprocessing

Most of the text information contained in corporate social media posts appears in natural language in unstructured form, which is difficult to be directly utilized by the model because it involves a large amount of linguistic information. In this paper, the preprocessing of relevant data is mainly to facilitate the similarity calculation of subsequent text content and accurately evaluate the topic connectivity between microblog posts. Therefore, it is necessary to clean up data text, remove words or segmentation, etc.[3,4]

Relationship Model

Suppose a microblog post is defined as a tuple. Pw, Fw, Tw, Cw&gt;, so the element information contained in the tuple is as follows: Pw stands for the user contained in this microblog, Fw stands for the URL set of the original microblog post and the URL set of the forwarded microblog post, Tw stands for the collection of microblog topic tags saved by the original state; Represents the text word set of this microblog after text processing.

By defining the microblog relationship between two microblogs WI and WJ as R (WI, WJ), the similarity and interaction of texts should be fully studied during the analysis and processing. Firstly, m (Pwi, Pwj) should be explicitly mentioned, followed by f (Fwi, Fwj), and finally, the topic tag T (Twi, Twj) and text similarity SIMM (Cwi, Cwj). The numerical value of the relationship between microblog and microblog is controlled between 0 and 1, 0 means there is no relationship between microblog pairs, and the relationship between them is getting stronger and stronger with the increasing value. The actual calculation formula is as follows: $R(wi, wj)={1, f(Fwi, Fwj)> 0 ort(Twi, Twj)>00.8m (Pwi, Pwj)>0sim(Cwi, Cwj), otherwise$ R\left({{w_i},\,{w_j}} \right) = \left\{{\matrix{{1,\,f\left({{F_{wi}},\,{F_{wj}}} \right) > \,0\,ort\left({{T_{wi}},\,{T_{wj}}} \right) > 0} \hfill \cr {0.8m\,\left({{P_{wi}},\,{P_{wj}}} \right) > 0} \hfill \cr {sim\left({{C_{wi}},\,{C_{wj}}} \right),\,otherwise} \hfill \cr}} \right.

In the above formula, the relationship m (Pwi, Pwj) refers to the intersection between the users covered by the two microblog posts, in other words, the original microblog creator proposed by the two microblog posts, and the specific formula is as follows: $m(Pwi, Pwj)=|Pwi∩Pwj|$ m\left({{P_{wi}},\,{P_{wj}}} \right) = \left| {{P_{wi}} \cap {P_{wj}}} \right|

Combined with the interaction analysis of the topic tag T (Twi, Twj), the actual calculation formula is as follows: $t(Twi, Twj)=|Twi∩Twj|$ t\left({{T_{wi}},\,{T_{wj}}} \right) = \left| {{T_{wi}} \cap {T_{wj}}} \right|

The calculation formula of interaction relationship is as follows: $f(Fwi, Fwj)={1, (Fwi=j) or (i=Fwj) or (Fwi=Fwj)0, otherwise$ f\left({{F_{wi}},\,{F_{wj}}} \right) = \left\{{\matrix{{1,\,\left({{F_{wi}} = j} \right)\,or\,\left({i = {F_{wj}}} \right)\,or\,\left({{F_{wi}} = {F_{wj}}} \right)} \hfill \cr {0,\,otherwise} \hfill \cr}} \right.

Text similarity is mainly used to analyze the content of weibo posts, and the specific formula is as follows: $sim(Cwi, Cwj)={11+e−cos(Cwi, Cwj), cos(Cwi, Cwj)>00, otherwisewhere cos(Cwi, Cwj)=Cwi. Cwj‖Cwi‖‖Cwj‖$ \matrix{{sim\left({{C_{wi}},\,{C_{wj}}} \right) = \left\{{\matrix{{{1 \over {1 + {e^{- \cos \left({{C_{wi}},\,\,{C_{wj}}} \right)}}}},\,\cos \left({{C_{wi}},\,\,{C_{wj}}} \right) > 0} \hfill \cr {0,\,otherwise} \hfill \cr}} \right.} \hfill \cr {where\,\cos \left({{C_{wi}},\,\,{C_{wj}}} \right) = {{{C_{wi}}.\,{C_{wj}}} \over {\left\| {{C_{wi}}} \right\|\left\| {{C_{wj}}} \right\|}}} \hfill \cr}

After obtaining the above four relationship components, the relationship value R (WI, WJ) between microblog posts can be accurately calculated, which provides an effective basis for the subsequent relationship matrix construction. 2.3 Topic Extraction

On the one hand, it refers to the cluster analysis of microblog posts. Firstly, the relationship matrix A between microblog and microblog should be constructed, and the specific formula is as follows: $Aij=R(wi,wj)$ {A_{ij}} = R\left({{w_i},{w_j}} \right)

The non-negative matrix decomposition technique is used to transform it into two low-dimensional matrices, as shown below: WRm×k and YRk×n

The multiplication rule of the actual update iteration is: $W=WYT(A/(WY))YTIY=Y(A/(WY))WTIWT$ \matrix{{W = W{{{Y^T}\left({A/\left({WY} \right)} \right)} \over {{Y^T}I}}} \hfill \cr {Y = Y{{\left({A/\left({WY} \right)} \right){W^T}} \over {I{W^T}}}} \hfill \cr}

On the other hand, it refers to the clustering analysis of subject words. Micro-blog - subject matrix W and micro-blog - subject word matrix V should be constructed first. The former is formed according to the clustering results of previous microblog posts, while the latter is measured by using TF-IDF method. The actual iteration multiplication rule formula is:[5] $H=H(V/(WH))WTIWT$ H = H{{\left({V/\left({WH} \right)} \right){W^T}} \over {I{W^T}}}

Combined with the above contents, the process diagram of microblog theme extraction can be obtained as shown in the figure below. The actual application algorithm is shown in the table below:

Application algorithm

 Algorithm 1 Topic derivation process INPUT: number of topic k, weibo-weibo relationship matrix A, weibo-term matrix VOUTPUT: weibo-topic matrix W and topic-term matrix H 1:initialize W, Y and H 2:NMF on A ≈ WY 3:repeat 4:H ← f(V, W, H) 5:until V ≈ WH 6:return W, H
Analysis of empirical results
Interactive Relationship

Based on the above research, the following hypotheses can be put forward for empirical analysis based on corporate social media:

H1: The social media orientation of enterprises will encourage boundary personnel to use social media more to conduct task-type interaction or relationship-type interaction with partners.

H2: The social media capability of enterprises will encourage border personnel to use social media more to conduct task-type interaction or relationship-type interaction with partners.

H3: Boundary personnel of enterprises use social media more to conduct task-type interaction or relationship-type interaction with partners.

H4: Task-based interaction plays a mediating role in the enterprise's social media orientation and social media ability on the cooperative relationship between enterprises.[6]

H5: Relational interaction plays a mediating role in the enterprise's social media orientation and social media capability in the cooperative relationship between enterprises.

According to the proposed hypothesis, 550 complete questionnaires were finally obtained by collecting survey data of Chinese enterprises for verification, and the efficiency of the questionnaire reached 91.2%. Exploratory factor analysis and confirmatory factor analysis were used to study and evaluate the reliability and validity of the scale. Finally, it was found that the square root of the extraction variance of all variables was lower than the correlation coefficient of variables, which indicated that the scale had good discriminant validity. At the same time, using the variance of the opposite sex 2 test method for the analysis of validity, and to measure the content of the two related, to ensure that the limit model of correlation coefficient is 1, the correlation coefficient of freedom of the model evaluation, the comparison results show that the variance of the opposite sex test all have a certain visibility, such as social media guide VS social media can get:[7]

Δx(1) = 155.088

p<0.001

In order to test the hypotheses proposed above, this paper uses the multivariate hierarchical regression method to conduct in-depth discussion on the orientation, capability, interaction behavior and cooperative relationship of enterprise social media. The specific results are shown as follows:

Data analysis results

variable TIBModel 1 RIBModel 2 COOPModel 3 COOPModel 4 COOPModel 5 COOPModel 6 COOPModel 7
Textile and apparel −0.068 −0.106″″ 0.032 0.048 −0.001 0.019 0.024
mechanical −0.056 −0.058 0.008 0.007 −0.011 0.005 0.002
Medical apparatus and instruments −0. 012 0.006 0.044 0.033 0.041 0.044 0.039
Electronic products −0.018 0.010 0.049 0.035 0.008 0.013 0.006
Food and beverage 0. 003 −0.104″ −0.061 − 0.013 −0. 037 − 0.038 −0. 014
software −0.033 −0.031 0.022 0.018 0.020 0.030 0.027
The sales revenue −0.187″ −0. 129″ −0.194″ −0.238″ −0.230″ −0.177″ −0.201″
The number of employees 0.064 0.034 0.096″ 0.119″ 0.085″ 0.066 0.077
state-owned 0.036 0.013 0. 055 0.066 0.086 0.076″ 0.084″
A joint venture 0.027 −0.081″ −0.078″ −0.027 −0.033 −0.041 −0.015
Sole proprietorship − 0.068 −0.058 −0.002 −0.012 −0.020 0. 000 − 0.006
Social Media Orientation (SMO) 0.473 ″ 0.427″ 0.476 ″ 0.340″ 0.378 ″
Social media Skills (SUC) 0.199 ″ 0.232″ 0.224″″ 0.167″ 0.171″
Task-based interaction behavior (TIB) 0.531″ 0.288″
Relational interaction behavior (RIB) 0.493″ 0.231″
Adjust the R 0.379 0. 381 0.376 0.333 0.436 0. 487 0.468
VIF 1.702 1. 702 1. 757 1.719 1. 702 1. 760 1. 730
F 26.719″ 26.972″ 28.593″ 23.867″ 33.701″ 38.221″ 35.550

Combined with the above table analysis, it can be seen that hypothesis H1 and H2 correspond to Model 1 and Model 2. Relevant data verification shows that the orientation and capability of enterprise social media can promote the two kinds of interaction between editors and partners, which proves the validity of the two hypotheses. It is assumed that H3 and H4 correspond to Model 3 and Model 4. Relevant data prove that the influence of two kinds of interaction behaviors on the cooperation between enterprises is significant, and the relationship between them is positive, thus proving that the two hypotheses are valid. For hypothesis H4 and H5 Baron and Kenny the validated method, put forward the model of five enterprises social media orientation and ability to the positive influence to the cooperation between enterprises, while the model 6 is on the basis of the model 5 added task-based interaction behavior, the final result shows that the relationship between the two still has significant influence, However, the regression coefficient is lower than that of Model 5, which proves that task-based interaction has a partial mediating effect between enterprise social media and enterprise cooperation. Model 7 adds relational interaction on the basis of Model 5. The final results show that the relationship between the two still has a significant impact, but the regression coefficient is lower than that of Model 5, which proves that relational interaction has a partial mediating effect between enterprise social media and enterprise cooperation.[8,9]

Case Analysis

After clarifying the interaction between enterprise social media and decision-making style, this paper selects Microblogging open API and Python Scrapy tool in the study, and starts with the analysis of real cases. We got 291,816 pieces of data, including 30,599 forwarded microblogs, from the social media platform in the past two days. After obtaining the whole data set, filter and filter the data information, remove unnecessary characters or words, and use Jieba word segmentation tool to process, so as to save and forward the post information of the hashtag separately.

The selected assessment method is compared against common baseline methods in the other two subject extraction areas, as shown below:

On the one hand, it refers to the LDA model, which refers to the Dirichlet distribution function in the probabilistic latent semantic analysis model, and the relevant research content is a further extension of the Dirichlet distribution function. This method will make all the text representation in the text space, so as to achieve the purpose of natural language to statistical modeling, and use the idea of “word bag” to promote the document can side show the distribution of the topic, and the topic in the document can also use the document theme to form.

On the other hand, it refers to NMF algorithm. As an application algorithm with nonlinear minimization as its core, NMF algorithm should meet non-negative constraint conditions during decomposition. After decomposition, the matrix can be used to represent the original matrix and finally achieve the application goal of clustering.[10]

Select the subject label data set manually marked, and use the clustering accuracy level of subordinate content to verify the effectiveness of the application method:

On the one hand, purity index. As the most common clustering evaluation method, purity method has the specific formula as follows: $purity(W,C)=1N∑ImaxJ|xi∩cj|$ purity\left({W,C} \right) = {1 \over N}\sum\limits_I {\mathop {\max}\limits_J} \left| {{x_i} \cap {c_j}} \right|

In the above formula, W={w1, w2 …, wk} represents the set of the KTH cluster in the cluster set, C={c1, c2 …, cj} represents microblog set, cj represents the JTH document, N represents the total number of microblog, $maxJ|wi∩cj|$ \mathop {\max}\limits_J \left| {{w_i} \cap {c_j}} \right| Represents the purity of a class and is the amount of text from the ith input class to be clustered into the JTH class. The completely incorrect cluster purity number is 0, while the completely correct cluster purity number is 1.

On the other hand, it refers to standard mutual information indicators. Assuming that the joint distribution of the two random variables X and Y is P (X, Y) and the actual marginal distribution is P (X) and P (Y), then the mutual information is the relative entropy of the joint distribution P (X, Y) and the product distribution P (X) × P (Y). The actual formula is as follows: $I(X, Y)=∑x∑yp(x, y)logp(x,y)p(x)p(y)$ I\left({X,\,Y} \right) = \sum\limits_x {\sum\limits_y {p\left({x,\,y} \right)\log {{p\left({x,y} \right)} \over {p\left(x \right)p\left(y \right)}}}}

Standard mutual information is also regarded as normalized mutual information, and the corresponding formula for evaluating clustering results is as follows: $NMI(X, Y)=2I(X, Y)H(X)+H(Y)$ NMI\left({X,\,Y} \right) = 2{{I\left({X,\,Y} \right)} \over {H\left(X \right) + H\left(Y \right)}}

For the evaluation of the manual label data set collected by the Practice Research Institute, the labels of six topics are mainly selected. In order to verify the performance of the application method, the topic is 6 as the experimental input of the three methods. Three algorithms were run for 20 times in the data set, and specific parameters were adjusted scientifically to obtain the best clustering effect. Among them, the average density of non-negative matrix decomposition algorithm in micro-blog relationship matrix and micro-blog keyword matrix can reach 32% and 0.08%, which proves that the data information in micro-blog keyword matrix is too sparse.

Combined with the purity assessment results shown in the following figure, it can be seen that the number of topics and assessment data formulated by the three algorithms are consistent, and the cluster purity obtained by the algorithm proposed in this paper is stronger than that obtained by the other two theme extraction methods, which proves that this clustering method has strong performance.

At the same time, during the experiment assessment analysis standardization of mutual information (NMI) found specific indicators, as shown in the figure below, the evaluation results and purity of NMI evaluation result is the same, this demonstrates that the two methods of baseline is relatively close, and study algorithm is proposed in this paper has the obvious promotion in the NMI index, which proved that this method has validity.

From the analysis of the extraction effect of subject words, it is found that it not only involves the result of subject recognition of microblog data, but also includes the key words of how to express each subject quickly in text data. The research method in this paper is also stronger than the other two baseline methods in terms of subject consistency, which also proves that the research method in this paper has certain practical significance in subject readability.

Conclusion

To sum up, in the context of the development of big data era, enterprise social media platforms have gradually become an important place for users to exchange information and gather ideas. Because it contains a wealth of social relations, personal emotions, social hot spots and other content, so start with the analysis and identification, can provide an effective basis for the economic investment of enterprises. Therefore, it is of practical significance to ensure the accuracy and effectiveness of topic keywords extraction.

#### Data analysis results

variable TIBModel 1 RIBModel 2 COOPModel 3 COOPModel 4 COOPModel 5 COOPModel 6 COOPModel 7
Textile and apparel −0.068 −0.106″″ 0.032 0.048 −0.001 0.019 0.024
mechanical −0.056 −0.058 0.008 0.007 −0.011 0.005 0.002
Medical apparatus and instruments −0. 012 0.006 0.044 0.033 0.041 0.044 0.039
Electronic products −0.018 0.010 0.049 0.035 0.008 0.013 0.006
Food and beverage 0. 003 −0.104″ −0.061 − 0.013 −0. 037 − 0.038 −0. 014
software −0.033 −0.031 0.022 0.018 0.020 0.030 0.027
The sales revenue −0.187″ −0. 129″ −0.194″ −0.238″ −0.230″ −0.177″ −0.201″
The number of employees 0.064 0.034 0.096″ 0.119″ 0.085″ 0.066 0.077
state-owned 0.036 0.013 0. 055 0.066 0.086 0.076″ 0.084″
A joint venture 0.027 −0.081″ −0.078″ −0.027 −0.033 −0.041 −0.015
Sole proprietorship − 0.068 −0.058 −0.002 −0.012 −0.020 0. 000 − 0.006
Social Media Orientation (SMO) 0.473 ″ 0.427″ 0.476 ″ 0.340″ 0.378 ″
Social media Skills (SUC) 0.199 ″ 0.232″ 0.224″″ 0.167″ 0.171″
Task-based interaction behavior (TIB) 0.531″ 0.288″
Relational interaction behavior (RIB) 0.493″ 0.231″
Adjust the R 0.379 0. 381 0.376 0.333 0.436 0. 487 0.468
VIF 1.702 1. 702 1. 757 1.719 1. 702 1. 760 1. 730
F 26.719″ 26.972″ 28.593″ 23.867″ 33.701″ 38.221″ 35.550

#### Application algorithm

 Algorithm 1 Topic derivation process INPUT: number of topic k, weibo-weibo relationship matrix A, weibo-term matrix VOUTPUT: weibo-topic matrix W and topic-term matrix H 1:initialize W, Y and H 2:NMF on A ≈ WY 3:repeat 4:H ← f(V, W, H) 5:until V ≈ WH 6:return W, H

Tongbin Zhang, Jinkai LI, Liyan CHENG. Research on the Internal relationship between economic structure, growth mode and environmental pollution: An empirical Analysis based on time-varying parameter Vector autoregressive Model [J]. China Environmental Science, 2016(7):11. ZhangTongbin LIJinkai CHENGLiyan Research on the Internal relationship between economic structure, growth mode and environmental pollution: An empirical Analysis based on time-varying parameter Vector autoregressive Model [J] China Environmental Science 2016 7 11 Search in Google Scholar

Dongkai Zhang, Jiayin QI. Journal of Beijing University of Posts and Telecommunications: Social Sciences, 2016(5):11.] ZhangDongkai QIJiayin Journal of Beijing University of Posts and Telecommunications: Social Sciences 2016 5 11 Search in Google Scholar

Hongmiao Zhu, Xin YAN, Zhen JIN, et al. Journal of Systems Engineering, 35(2):11. (in Chinese) ZhuHongmiao YANXin JINZhen Journal of Systems Engineering 35 2 11 (in Chinese) Search in Google Scholar

Yitong Han. Service Science and Management, 2018, 007(005):P.107–115. HanYitong Service Science and Management 2018 007 005 107 115 10.12677/SSEM.2018.75014 Search in Google Scholar

Bao Dai, Aiwen Deng. Employer brand communication based on social media: Research Status and Prospects [C]// Proceedings of the 13th China Management Annual Conference. 2018. DaiBao DengAiwen Employer brand communication based on social media: Research Status and Prospects [C] Proceedings of the 13th China Management Annual Conference 2018 Search in Google Scholar

Xiaofang Tan. Modern Enterprise Culture, 2015, No. 361(12):16–17. TanXiaofang Modern Enterprise Culture 2015 No. 361 12 16 17 Search in Google Scholar

Zhuojun Li, Deluo Huang. Corporate social media strategy in Hong Kong [J]. 2021(2016-10):24–26. LiZhuojun HuangDeluo Corporate social media strategy in Hong Kong [J] 2021 2016-10 24 26 Search in Google Scholar

Mundewadi, R. A. and S, Kumbinarasaiah. “Numerical Solution of Abel's Integral Equations using Hermite Wavelet” Applied Mathematics and Nonlinear Sciences, vol.4, no.2, 2019, pp.395–406. https://doi.org/10.2478/AMNS.2019.2.00037 MundewadiR. A. KumbinarasaiahS “Numerical Solution of Abel's Integral Equations using Hermite Wavelet” Applied Mathematics and Nonlinear Sciences 4 2 2019 395 406 https://doi.org/10.2478/AMNS.2019.2.00037 10.2478/AMNS.2019.2.00037 Search in Google Scholar

Sagna, Yaya. “Multidimensional BSDE with Poisson jumps of Osgood type” Applied Mathematics and Nonlinear Sciences, vol.4, no.2, 2019, pp.387–394. https://doi.org/10.2478/AMNS.2019.2.00034 SagnaYaya “Multidimensional BSDE with Poisson jumps of Osgood type” Applied Mathematics and Nonlinear Sciences 4 2 2019 387 394 https://doi.org/10.2478/AMNS.2019.2.00034 10.2478/AMNS.2019.2.00034 Search in Google Scholar

Shirakol, Shailaja, Kalyanshetti, Manjula and Hosamani, Sunilkumar M.. “QSPR Analysis of certain Distance Based Topological Indices” Applied Mathematics and Nonlinear Sciences, vol.4, no.2, 2019, pp.371–386. https://doi.org/10.2478/AMNS.2019.2.00032. ShirakolShailaja KalyanshettiManjula HosamaniSunilkumar M. “QSPR Analysis of certain Distance Based Topological Indices” Applied Mathematics and Nonlinear Sciences 4 2 2019 371 386 https://doi.org/10.2478/AMNS.2019.2.00032. 10.2478/AMNS.2019.2.00032 Search in Google Scholar

White G, Ghosh S K. A stochastic neighborhood conditional autoregressive model for spatial data[J]. Computational Statistics & Data Analysis, 2009, 53. WhiteG GhoshS K A stochastic neighborhood conditional autoregressive model for spatial data[J] Computational Statistics & Data Analysis 2009 53 10.1016/j.csda.2008.08.010272401219672326 Search in Google Scholar

Articles recommandés par Trend MD