A Study on Sentiment Visual Analysis of Educational Public Opinion Based on Online Big Data

Big data technology brings much convenience to people’s daily life and work, as can be seen from the current development of Chinese society, the Internet has been popularised throughout the country. WeChat and microblogging have become an important part of Internet users’ online lives, and many of them express their views and opinions on social networking sites and forums [1–4]. Eighty per cent of Internet users discuss news and hot events that have just occurred in society, which indicates to a certain extent that the Internet has become a gathering place for people’s thoughts and culture, as well as a dissemination place for social opinions [5–8].

And education, as the cornerstone of social development, is no exception. Educational public opinion, as analysis and study of public opinion in the field of education, is of great significance to various stakeholders such as schools, the education industry and parents [9–11]. The study of online education public opinion sentiment is of great value in visualising people’s attitudes and gaining a deeper understanding of the current situation of education. Data visualisation technology can map difficult-to-understand abstract data into visible forms such as symbols, shapes and colours through graphics and images, which is a powerful way of data analysis in the era of online big data [12–15]. However, most users in current society do not have knowledge and data analysis experience in the field of education, but need educational opinion analysis [16–18]. Therefore, it is a challenge for the current research on the sentiment of online public opinion to design effective visualisation forms to show the characteristics of online education sentiment and assist them in the effective analysis [19–21].

In this paper, web crawlers are used to crawl the internet resources related to education and public opinion in the web platform, and the resources are preprocessed to extract the emotion theme keywords by calculating the keyword weights. Then, the attention mechanism is applied to calculate the attention value of the comment statements to select the key comment data of educational public opinion in the data. The encoded text information is input into the feature extraction layer, and the features of word vectors and mixed word vectors are extracted by using the dual channels of the convolutional neural network model and then the softmax classifier is used to identify and classify the emotions. Subsequently, a sentiment capsule model is constructed to achieve the sentiment analysis of educational public opinion based on big online data, and a sentiment visualisation system is constructed based on this model to demonstrate the sentiment changes in educational public opinion through garden diagrams and line graphs. In addition, this study also verifies the accuracy of the sentiment analysis method, statistically analyses the education public opinion events in the network, and selects two typical cases for the sentiment visualisation analysis of education public opinion.

2

Methods of analysing the sentiment of educational public opinion

2.1

Acquisition and processing of network data

2.1.1

Web crawler acquisition programme

Theme web crawler [22] is based on a certain theme established in advance and carries out directed page crawling collection in this restricted domain, which serves as the foundation and core of the whole public opinion monitoring system, crawling as much as possible Internet resources related to a certain theme according to the relevant crawling strategy, and improving the breadth and depth of the resources of the theme. The theme crawler module of the Internet education public opinion system is based on the framework of a general-purpose web crawler, and it modifies and adds the keyword extraction code for the content of webpage documents, where only the title of the webpage document and the title of the webpage body are extracted, i.e., the content inside the <title></title> and <h1></h1> tags, and the extracted keywords are compared with the professional thesaurus of education and public opinion to carry out the content filtering operations, in addition to this, through the URL filter on the web page link address to do the initial filtering.

By setting initial URLs for the theme crawler (websites related to the education field, major mainstream news, forum sites, etc.), constantly analysing the link information in the web documents of these URLs, applying URL de-weighting and filtering technology to remove duplicated URLs and URLs with low relevance, and working with the thesaurus of professional themes of education public opinion to filter out web documents with low relevance to the theme to ensure the purity of the extracted web documents, we can achieve a theme web page of education public opinion information with the purity of the extracted keywords. This ensures the purity of the extracted web documents and thus achieves the purpose of crawling web pages with the theme of educational public opinion information.

2.1.2

Emotional Theme Word Extraction Methods

1)

Pre-processing. Preprocessing is mainly intended to convert multi-format text, use the Chinese lexical system, split the text, remove stop words, and so on.

2)

Based on the emotional theme word list, identify the theme words and perform word frequency statistics.

3)

Calculate the weight of keywords and sort them to extract emotional subject words.

The TF-IDF algorithm is used to calculate the weight of the subject words. TF is the number of times a word appears in the content of a document. IDF inverse document frequency is used to reflect the number of times the word appears in the entire document library, if the number of documents containing the word is less, the greater the IDF value, the word will be able to characterise the content of this document, and distinguish it from other documents. The formula is: 1 $W_{i k} = \frac{(\log C_{i k} + 1) \cdot \log (N / n_{k})}{\sqrt{\sum_{k = 1}^{t} {[(\log C_{i k} + 1) \cdot \log (N / n_{k})]}^{2}}}$

This formula is mainly to calculate the degree of importance of a word k in an article Di, its calculated value is the word relative to the weight of the article, in the specific operation of the calculation, the value of the TF that is the number of times the word divided by the length of the document: 2 $T F_{i k} = \frac{C_{i k}}{\sum_{j = 1}^{t} C_{i j}}$

Where C_ik denotes the number of times the lexeme k appears in the document Di. And IDF it denotes the importance of the lexeme in the document collection: 3 $I D F_{k} = log \frac{N}{n_{k}}$

Where N denotes the total number of documents in the document collection and N_k denotes the number of documents in which the lexeme k appears. In this way, given the TF and IDF, the weight of word k in document Di can be calculated according to the formula. The TF-IDP weight value is in essence the product of TF and IDF values (W_ik):W_ik = TF_ik × IDF_k, the weight value is the final result to be obtained, the height of the weight value directly indicates whether or not the inscription responds to the theme of the document.

2.2

Sentiment analysis model construction

2.2.1

Attention layer

The attention mechanism [23] can selectively focus on the important information of the text of educational opinion reviews, and in this paper, we use multi-head attention to capture the key information of a text sequence from multiple subspaces. For a given text S = {w₁,w₂,…,w_L} of length L, where w_i. is the ith word in the sentence S, and each word is mapped into a D - dimensional vector, i.e. S ∈ R^L×D.

Firstly, the word vector matrix s is linearly transformed and cut into 3 matrices Q∈R^L×D,K∈R^L×D, V∈R^L×D of the same dimension and mapped into several different subspaces: 4 $\begin{matrix} [Q_{1}, \dots, Q_{h}] = [Q W^{Q_{1}}, \dots, Q W^{Q_{h}}] \\ [K_{1}, \dots, K_{h}] = [K W^{K_{1}}, \dots, K W^{K_{h}}] \\ [V_{1}, \dots, V_{h}] = [V W^{V_{1}}, \dots, V W^{V_{h}}] \end{matrix}$

Where Q_i, K_i, V_i is the query, key, and value matrix for each subspace; W^Qi, W^Ki, W^Vi is the transformation matrix; and h is the number of heads.

Then, the attention value of each subspace is calculated in parallel: 5 $h e a d_{i} = s o f t m a x (\frac{Q_{i}, K_{i}^{T}}{\sqrt{d}}) V_{i}$

Where head_i is the attention value of the i nd subspace, and $\sqrt{d}$ is to change the attention matrix to a standard normal distribution to prevent the gradient from disappearing during backpropagation.

Subsequently, the attention values of each subspace are spliced and linearly transformed: 6 $M u l t i_{h e a d} = c o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{M}$

Where w^M is the transformation matrix, Multi_head is the attention value for the whole sentence, and concat is the splicing operation.

Finally, Multi_head is residually concatenated with S to obtain the sentence matrix: 7 $X = r e s i d u a l_c o n n e c t (S, M u l t i_h e a d)$

Where X∈R^L×D is the output of the multinomial attention and residual _ Connect the residual operation.

2.2.2

Emotion feature extraction layer

1)

Feature extraction layer

The encoded sentiment text representation is fed into the feature extraction layer, which uses dual channels for feature extraction of word vectors and mixed word vectors, respectively. The feature extraction of the word vector channel uses convolution kernels of 2, 3, 4 and 5 scales, respectively, to extract local N-gram features [24] from the text representation data, where the number of convolution kernels for each scale is 256, and the convolution operation process is shown in Eq: 8 $V c = C o n v (C)$

The global maximum pooling operation [25] is applied to retain the largest component in the convolutional computation window at each scale, and the features are extracted again while the dimensionality is reduced to achieve the effect of reducing the amount of computation. The calculation process is shown in Eq: 9 $v c = GlobalMaxPooling (V c)$

When randomly initialising the convolution kernel, the initial values of the weights in each layer are small, and this small-weighted input hinders the propagation of the gradient as the depth of the convolution increases, causing the network to converge gradually only after many iterations. To solve this problem, the residual connection is introduced, adding a shortcut in every two layers between the convolution layer and pooling layer to form a residual structure block. This structure can also effectively alleviate the gradient vanishing. The feature extraction process of the residual structure is shown in Eq: 10 $V w = C o n v (W) + W$

The features extracted by deep convolution are fed into the maximum pooling layer to scan the maximum value in the region and suppress the mean drift caused by the estimation error. The calculation process is shown in Eq: 11 $v w = MaxPooling (V w)$

In sentiment analysis, Chinese text has many words that prominently indicate sentiment, and certain sentiment words will affect the sentiment direction of the whole sentence. Adding the attention mechanism selects the key sentiment words from the many word and character vectors, and increases the weight percentage assigned to the key points in the feature map to achieve the purpose of improving the performance of sentiment analysis.

Calculating the attention distribution is the dot product operation between the input vector and the target vector, and the specific calculation process is shown in Eq: 12 $a c_{i} = \frac{e x p (v c_{i})}{\sum_{i = 1}^{n} e x p (v c_{i})}$ 13 $a w_{i} = \frac{e x p (v w_{i})}{\sum_{i = 1}^{n} e x p (v w_{i})}$

Where vc_i∈vc, vw_i∈vw_i∈vw. The weight function is mapped on a fixed interval by means of the softmax function, which weights the attentional weights in the vectors and sums them, and is computed as shown below: 14 $V_{a c} = \sum_{i = 1}^{n} a c_{i} \cdot V c$ 15 $V_{a w} = \sum_{i = 1}^{n} a w_{i} \cdot V w$

2)

Output layer

After two channels for different granularity of text information feature extraction, respectively, get the feature vector V_ac based on word representation and the feature vector V_aw based on mixed word representation, the two vectors are spliced operation, the fused feature vector v will be fed into the fully connected network, and the probability p value of text data in all the classifications is obtained by using the softmax classifier, in which the category with the highest probability value is the final sentiment analysis category, p is calculated as follows: 16 $p = s o f t m a x (W_{f} [V_{a c} \oplus V_{a w}] + b_{f})$ 17 $y_{i} = \frac{e x p (p_{i})}{\sum_{j = 1}^{m} e x p (p_{j})}$

Where w_f is the weight transformation matrix of the fully connected layer, b_f is the bias vector of the fully connected layer, ⊕ denotes the connectivity operation, and m is the type of classification labels.

2.2.3

Emotional capsule construction

An emotion capsule consists of a representation module, a probabilistic module and a reconstruction module. The representation module constructs the capsule feature representation using the attention mechanism v_c,i. The probability module predicts the capsule activation probability using the sigmoid activation function [26] P_i The reconstruction module matrix-multiplies P_i and v_c,i to obtain the reconstructed feature representation r_s,i of the capsule.

The representation module combines the spliced feature vectors H with the attention mechanism to construct the internal emotional feature representation of the capsule. The attention mechanism is calculated as follows: 18 $u_{i, t} = t a n h (W_{w} H + b_{w})$ 19 $α_{i, t} = \frac{e x p (u_{i, t}^{T} u_{w})}{Σ_{t} \exp (u_{i, t}^{T} u_{w})}$ 20 $v_{c, i} = \sum_{t} α_{i, t} H$

Where H is the spliced text feature representation and u_i,t is obtained as an implicit representation by inputting H to the fully connected layer. The importance of words is determined by calculating the similarity between u_i,t and a randomly initialised context vector u_w and normalised using the softmax function to obtain the attention weights of the words in the sentence α_i,t. Based on the weight matrix, the vectors H are weighted and summed to obtain the output of the attention mechanism v_c,i∈R^2d. W_w and u_w are the weight matrices, and b_w is the bias value, which are all learnt from the training process. The attention mechanism generates higher level deep features v_c,i to obtain key semantic sentiment information.

The probability module calculates the activation probability of the capsule based on the semantic features v_c,i combined with Eq: 21 $P_{i} = σ (W_{P, i} v_{c, i} + b_{P, i})$

Where P_i is the activation probability of the i nd capsule, W_p,i and b_p,i are the weight matrix and bias matrix respectively, and σ is the sigmoid activation function.

The reconstruction module multiplies the semantic feature v_c,i. with the probability matrix P_i to obtain the reconstructed semantic feature representation r_s,i∈R^2d, as shown in Eq: 22 $r_{s, i} = P_{i} v_{c, i}$

In addition to this, the ultimate goal of model training is to maximise the activation probability of the capsule matching the text sentiment while minimising the error between the reconstruction vectors and the text instance vectors. The second is to minimise the activation probability of other capsules while maximising the error between the vectors. Therefore, the hinge loss function is applied as shown in Eq: 23 $J (θ) = \sum m a x (0, 1 + \sum_{i = 1}^{N} y_{i} P_{i})$ 24 $U (θ) = \sum m a x (0, 1 + \sum_{i = 1}^{N} y_{i} V_{s} r_{s, i})$

Where y_i is the sentiment category label corresponding to the text. And the final loss function i.e: 25 $L (θ) = J (θ) + U (θ)$

3

Emotional Visualisation System for Educational Public Opinion Based on Web Data

This system uses the flask-SQLAlchemy database tool based on ORM framework and adopts B/S structure design. B/S structure is realised through the browser and supports multi-scripting language, which can realise powerful functions without the need for complex professional software. It saves development costs and can also be user-friendly. The education opinion and emotion visualisation system uses the current mature technology framework, and the technologies mentioned above have been briefly introduced, so we will not introduce them in detail here. The overall structural design of the system is shown in Figure 1. The web-based education opinion visualisation system designed and implemented in this paper adopts the structure of a representation layer, a logic layer, and a data layer. The layers are connected through interfaces to facilitate future iterative development and make the system portable. The representation layer relies on the browser interface, which is the interface between the user and the visualization system. The page displays the development process of hot topics and public opinion analyses in the form of charts and graphs, and users can bookmark topics according to their own needs in order to follow the trend of public opinion distribution. If the user initiates a request for a form, the presentation layer sends a command to the logic layer in the form of a form. The logic layer relies on the server side to implement the business logic of specific functions. For example, it obtains information on web topics uploaded by the web backend management system and data for analysing web topics and other code that handles requests for specific functions. The logic layer first responds to the command request sent by the presentation layer, then carries out the business processing of specific functions, and at the same time sends commands to the data layer, and finally feeds the processing results back to the presentation layer. The data layer is the code that accesses, stores, and manages the database. It responds to commands sent by the logic layer and operates the database. Finally, the results of sentiment analysis are visualized and displayed by the output layer, and the main representation methods are opinion garden charts and line graphs.

4

Application Analysis of the Sentiment Visualisation System for Educational Public Opinion

4.1

Validation of the accuracy of the sentiment analysis method

4.1.1

Evaluation indicators

To evaluate the performance of various aspect-level sentiment analysis algorithms from different aspects, this paper uses the accuracy rate and macro-averaged F1 as evaluation metrics for the proposed and compared methods. The accuracy rate reflects the percentage of correctly predicted samples out of the total samples, and the following formula can calculate the accuracy rate: 26 $A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

Where TP and TN are the number of correctly predicted positive and negative samples respectively, while FP and FN represent the number of incorrectly predicted samples.

The sample distribution of each dataset of aspect-level sentiment analysis is very uneven, and it is difficult to summarise the classification effect of the algorithm by accuracy alone, so this paper adds the use of F1 values to assist in evaluating the model, which are divided into two types, MacroF1 and Micro-F1. The harmonic mean of precision recall can calculate F1: 27 $F 1 = \frac{2 P R}{P + R}$

Where P and R represent precision and recall, respectively, and are calculated by Eq: 28 $P = \frac{T P}{T P + F P}$ 29 $R = \frac{T P}{T P + F N}$

Macro calculates F1 values for multiple classifications by directly averaging the F1 values for multiple classifications and Micro calculates F1 by calculating the overall TP, FP and FN values. Micro has FP = FN = F and F + TP = ALL in the multiple classifications. Therefore: 30 $F 1 = \frac{2 P R}{P + R} = \frac{2 T P}{2 T P + F P + F N} = A c c u c r a y$

Since the value of Micro-F1 in multiclassification is equal to the accuracy, only two evaluation metrics, accuracy and Macro-F1, are used in this paper.

4.1.2

Results of accuracy analyses

In order to prove the effectiveness of the proposed sentiment analysis method in this paper, the proposed sentiment analysis method for educational public opinion is compared with Feature-SVM, RAM, IAN, MemNet, MCTNN, CNN-ASP, GCAE, ReMemNN, SK-GCN, and BERT-SPC methods on Restaurant, Laptop, and Twitter datasets respectively, and the experimental results are shown in Table 1. Where “-” represents that no data is provided in the original article in this method, and “Acc” and “F1” denote the accuracy and Macro-F1, respectively. The traditional machine learning method, Feature-SVM, uses artificially crafted features. It has a higher accuracy rate (78.11%, 73.93% and 66.07%) compared to most mainstream methods. RAM, IAN, MemNet, and ReMemNN, which utilize LSTM with the attention mechanism as the main body of the structure, ignored the importance of local information in the aspect-level sentiment analysis task and, therefore, performed weakly on all three datasets (Acc: 63.21%-80.27%, F1:62.51%-73.15%). The SK-GCN method utilizes a graph to construct the syntactic tree information so that the relationship between words in a sentence across positional distances does not suffer from importance cuts caused by the distance between two words in the input sequence.

Table 1.

Analysis of the analysis of emotional analysis of different methods

Method	Restaurant		Laptop		Twitter
Method	Acc	F1	Acc	F1	Acc	F1
Feature-SVM	78.11	—	73.93	—	66.07	65.69
RAM	76.41	65.9	72.77	66.57	66	65.49
IAN	77.5	66.29	73.37	66.74	63.21	62.51
MemNet	75.79	66.42	71.16	72.4	66.43	67.37
CNN-ASP	84.52	70.99	76.09	78.17	76.24	74.39
GCAE	86.48	71.3	77.48	78.42	77.07	74.86
ReMemNN	80.27	66.66	73.94	73.15	69.41	67.76
SK-GCN	84.3	70.94	75.8	77.84	74.65	74.14
MCTNN	84.2	69.19	75.06	75.04	73.47	72.41
BERT-SPC	83.55	68.83	74.65	73.16	71.55	70.65
This article	88.62	72.16	77.88	79.78	78.91	76.29

In contrast, GCAE, which employs CNNs as its main body, uses an approach that focuses more on local information, introducing fine-grained information from aspect-level sentiment analysis into sentiment classification to enhance the model effect. The sentiment analysis models proposed in this paper have higher accuracy and F1 compared to the CNN-ASP and GCAE models, which are also based on CNN, and the models in this paper have the highest accuracy (88.62%, 77.88%, and 78.91%) and F1 values (72.16%, 79.78%, and 76.29%). This is due to the fact that the multi-angle convolutional transformation based on the proposed method in this paper is able to acquire local features more efficiently, which strengthens the classification effect of the model.

4.2

Analysis of the overall sentiment of public opinion in education

4.2.1

Trends in Educational Public Opinion Concerns

In this paper, the proposed crawling method of topics related to education public opinion is used to focus on microblogging, weibo, online media, news websites and clients from 2016-2023, and the results of the analysis of education public opinion concerns are shown in Table 2. A total of 101,263 topics related to education and public opinion were retrieved. Based on the text of media reports, the topics of reported news were consistently grouped into one education opinion event, and a total of 625 were obtained. In addition, the media participation in the past years is the number of media participation in the report, which includes the same media reporting the same event many times, and the number of media reports refers to the number of drafts of the media reporting the event. Since 2016, the calendar year, online education public opinion events have shown an upward trend, especially in 2021. The increase is larger. Citizens are becoming more and more fond of discussing education on the Internet. The subject of the college entrance examination in 2016 is different from the previous one, and the number of media reports has reached 13,505. Excluding the influence of this event, the number of media reports on education public opinion has increased year by year since 16 years.

Table 2.

Online education public opinion trend

Year	The number of education	Media participation	Media coverage
2016	34	986	13505
2017	47	1062	8049
2018	78	2356	9256
2019	78	2605	9987
2020	89	3021	11245
2021	95	3340	15472
2022	101	3706	16321
2023	103	3908	17428

4.2.2

Results of the Overall Sentiment Visual Analysis of Public Opinion in Education

The Opinion Garden Map shows the emotional tendency of education events throughout the year, and the results of the specific visual analysis are shown in Figure 2, in which pink circles represent flowers, and the color dots within the flowers represent different emotions, with red, blue, green and orange representing “angry”, “happy”, “fear” and “sadness”, respectively. There are three main types of education public opinion events in the period of 2016-2023, which are campus violence, college entrance examination corruption and rural education. It can be clearly observed that the more events there are, the more emotional clusters there are, especially in the theme of “college entrance examination corruption”, in which there are three emotional clusters, and it is found that Sina.com reports the most articles in each cluster in the theme of “college entrance examination corruption”. The two main emotions in the theme of school violence were “fear” (25%) and “anger” (70%). Since the media must be objective and neutral when publishing related events, the articles cannot have obvious emotional tendencies. However, because of the invisibility of the subject in online public opinion, online users are more free when commenting, and therefore, the comments mainly show angry emotions on the subject of school violence. However, there is another anomalous cluster here (5% “happy”), which exhibits a positive affective tendency. Upon further review of the list of incidents in the cluster, it was found that while the incidents were still negative and the main type of emotion in the articles was “angry,” many of the comments on these incidents used ironic rhetorical devices, such as “happy,” “It’s good, these bad students should go to jail, not in school”. Irony’s use of positive words to express negative emotions sometimes causes sentiment analysis based on emotion words to mischaracterize these comments as positive emotions. It can also be observed from the figure that there is one theme where the sentiment tendency is completely positive, namely “rural education”. This phenomenon is due to the fact that China introduced a series of policies to encourage rural education in 2017, and the year-on-year growth rate of rural teaching teachers was 65.23%. In addition, according to the time distribution of education public opinion events shown in the garden chart, the period between 2021 and 2022 is the period of high public opinion events.

4.3

Results of Educational Public Opinion Sentiment Case Analyses

Combined with the high incidence period of education public opinion events found in the analysis above, this paper explores two major education public opinion events during 2021-2022 as typical cases for analysis. Both types of events are related to the college entrance examination. In Event A, there are 42,568 relevant microblog data from 2 March 2021 to 15 April 2021, with an average of 112 microblog retweets, 58 comments and 596 likes, which have good communication interactivity and discussion. Event B’s time stage is from 15 August 2021 to 24 December 2021. It also has good communication interactivity and discussion, with further in-depth research value. In the following section, we will combine the sentiment analysis method constructed in this paper to analyze the sentiment visualization of the two selected cases respectively.

4.3.1

Case Study One

Through the quantitative statistics of the intensity of the emotions displayed in the related microblogs in Event A, the results of its change over time are shown in Figure 3, which reveals that there are mainly three dominant emotions, based on the intensity of “angry”, “happy” and “sad” respectively. “Sadness”. In the early stage of the outbreak of public opinion on Event A, the difference in the intensity of the three emotions was not significant, and on the 26th day (March 27th), the “sadness” category reached the peak (42,115.67) within the statistical date, reflecting the sadness of microblogging users towards Goumou, the protagonist of the event, and his encounters in the early stage of the event. It also partly reflects disappointment with the college entrance examination system, with related comments such as “the college entrance examination system is chilling” and “the despair of the children of poor families”. As the official investigation deepened, the intensity of “anger” reached a peak (193001.93) on the 25th day (March 27th). Microbloggers strongly condemned the offenders, including the high school homeroom teacher and the director of the enrollment office of the protagonist of the incident, accompanied by comments such as Gou’s “suspicion that he had been selected” and “facing threats from his homeroom teacher,” and so on. The microbloggers’ “anger” was completely aroused. On the 32nd day (April 2), after the release of the official investigation results, microbloggers praised the handling efficiency and process of Incident A, and the intensity of the “happy” emotion reached the peak (195,000.42) within the statistical time. However, at the same time, due to the over-packaging in the development of incident A, the truth was somewhat different from the initial narrative of Gou, the main character of the incident, and some microbloggers believed that Gou had lied to gain attention and excessively occupied public resources. The relevant expressions such as “exaggerating the truth”, “lying by design”, “collapse of persona,” and so on.

4.3.2

Case Study Two

Compared with incident A, incident B lasted longer, the protagonist did not exaggerate the description, and microbloggers held a more supportive attitude towards the protagonist Wang Mou from the beginning to the end of the incident. In addition, the emotional mobilization channel of incident A is mainly the main character Goumou’s microblog, while the emotional mobilization channel of incident B is mostly the news media microblog, which influences microbloggers by interviewing and describing, and there are some differences between the two. The heat of event B is measured by counting the number of retweets, comments and likes of related microblogs, and its change over time is shown in Figure 4, which shows that there are multiple peaks when the number of microblogging interactive behaviors is used to evaluate the heat of discussion. In terms of the number of likes, the three most obvious occurrences were on day 2 (41,498), day 9 (127,391) and day 27 (32,158). Combined with the evolution time of event B, the education event was in the public opinion brewing period from August 15-20, 2021, when microbloggers gained a better understanding of the incident of “Henan Zhoukou girl forced to go to university” through the media reports of People’s Daily and Guangming Daily. At the same time, the impostor said, “What’s the use of tossing? Tossed to the United Nations, we are not afraid of.”

The argument is quickly triggered by microblogging netizens to explore the number of interactive behaviors that appeared at the first peak (likes 41,498, comments 16,200, retweeted 11,718 times). Microblogging netizens generally support the main character Wang Mou solidarity attitude. Published comments are mostly “I’m afraid that this kind of thing is more than one, to check there will always be it” and “so domineering behind the power of who?”. The Chinese newsweekly, Pengfei, and the Chinese newsweekly. Reprints by news media such as China Newsweek, Pengfeng News, and China News Network also contributed to the promotion of Event B in the early stages of public opinion development.

By visualizing and analyzing the intensity of the emotions shown in the relevant microblogs in the Wang Mou incident, the results of the change over time are shown in Figure 5. It can be found that there are two dominant emotions, namely “angry” and “happy,” according to the intensity, and the expression of emotions is simple and direct. In the early stage of the outbreak of public opinion on the B incident, i.e., around August 15, the intensity of the emotion of “angry” is higher, reflecting the Weibo users’ derogatory feelings towards the scholar who replaced Wang. However, due to the lack of an exact subject of blame at this time and the limited understanding of Weibo netizens of the original cause of the incident, there was no large-scale outbreak of the emotion of “anger”. It is worth noting that on the 33rd day (September 16) around, “angry” emotional intensity had once jumped to a high level (13744.58) back to the relevant microblogging found that the Department of Investigation Working Group on September 15 published a version of the findings of the investigation, “due to the case of a key figure. The key figure in the case, the uncle of the top man has died, in-depth investigation of the clues interrupted”, the statement triggered microblogging netizens widely questioned that the relevant personnel involved in the case punishment is not painless. At this time, some of the derogatory sentiments expressed are also directed at the education incident investigation working group. There is a suspicion of “dumping”. Therefore, around September 17, the protagonist Wang Mou the investigation report put forward a reasonable question and asked for a complete public investigation of the investigation report of the working group, most microblogging netizens to Wang Lou to further explore the truth of the matter of the spirit of the support, praise, B incident investigation was launched again. With the official investigation in-depth, the 116th day of the investigation working group to update the release of the investigation report, for the first time on the flow of Wang’s college entrance examination admission notice process in detail open. Under the mobilization of the absurd flow of college entrance examination acceptance notices and Wang Mou’s angry exposition, the emotional intensity of microblogging users’ “anger” reached its peak (15,455.45), and the relevant persons involved in the case were strongly blamed, while also expressing continuous concern about the follow-up compensation matters of the B incident.

5

Conclusion

The emergence of “Internet+Education” has attracted people’s warm attention and discussion, so it is necessary to study the education network public opinion about “Internet+Education.” In this paper, we combine the attention mechanism module and the convolutional neural network module to construct a sentiment analysis model of education public opinion in network data, as well as a sentiment visualization system based on B/S architecture.

Sentiment analysis method accuracy validation found that the sentiment analysis model proposed in this paper has higher accuracy and F1 compared to the CNN-ASP and GCAE models, which are also based on CNN, and the accuracy (88.62%, 77.88%, and 78.91%) and F1 value (72.16%, 79.78%, and 76.29%) of the model in this paper are the highest. The systematic analysis retrieved a total of 101,263 articles on the topic of education public opinion from 2016 to 2023. In addition, according to the time distribution of education public opinion events shown in the garden diagram, a high incidence of public opinion events was found from 2021 to 2022. In addition, typical case A of educational public opinion reached a peak of “sadness” on the 26th day (March 27) (42115.67) on the statistical data and reached a peak on the 25th day (March 27) (193001.93) with the deepening of the official survey. In Education Public Opinion Event B, the intensity of netizens’ “anger” reached a peak (15455.45) after the survey working group updated and released the survey report on the 116th day.

To sum up, the system in this paper can timely discover the emotional changes and problems of education online public opinion, and can help the relevant personnel to put forward coping strategies and effective solutions to the emotions of education online public opinion.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

A Study on Sentiment Visual Analysis of Educational Public Opinion Based on Online Big Data

Xiong Wei

Pubblicato online: 03 feb 2025

Ricevuto: 31 ago 2024

Accettato: 25 dic 2024

DOI: https://doi.org/10.2478/amns-2025-0030

Parole chiaveWeb crawler, Attention mechanism, Convolutional neural network, Feature extraction, Softmax, Education public opinion

© 2025 Xiong Wei, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Web crawler, Attention mechanism, Convolutional neural network, Feature extraction, Softmax, Education public opinion