Sentiment Analysis of Korean Modern Novel Texts Applying Deep Learning Models

Sentiment analysis, as an important discipline of natural language processing technology, plays an increasingly important role in the Web 2.0 era and has been widely emphasized [1-2]. Sentiment analysis technology, also known as opinion mining, refers to the process of analyzing, processing, summarizing and reasoning about subjective texts with emotional overtones [3]. In the era of big data, the public has more and more vocal comment channels for products, services, etc., such as various types of apps, social media, etc., on which comments with praising or depreciating nature are left [4-5]. In the field of literature, some scholars have already begun to use sentiment analysis techniques to study the evaluation of various books, and then obtain and summarize data and information of practical significance, such as book popularity and future development prediction [6-8]. Sentiment analysis techniques are applied to a large number of book reviews to quantify readers’ emotional attitudes and analyze popular reviews and the characteristics of novels that readers are most concerned about [9-11].

In today’s era, cultural influence has become an important indicator of a country’s comprehensive national power. Great powers whose economic and political strengths have developed to a certain extent inevitably seek worldwide appeal, and the attraction and appeal of the country’s culture to the people of other countries is an extremely important part of it [12-15]. The Korean modern novel unfolds a discussion on the position of the individual in a rapidly changing society and the conflict between tradition and modernity. There are many backgrounds in which modern Korean novels have arisen, which are closely related to the historical changes in Korean society. Along with the rise of the democratization movement in Korea, deeper critical thinking on political and social issues emerged in literary creation [16]. Since the new century, modern Korean novels have continued to develop, not only in terms of increasingly diversified content, but also in terms of innovative forms, with the emergence of network novels as a new form of literature, reflecting the complexity and diversity of contemporary Korean society [17-18]. Readers’ acceptance of a work is the key to realizing the intended meaning of a literary work, and the lack of research on readers may make the interpretation of a work appear incomplete [19-20]. Therefore, the use of sentiment analysis techniques to collect and analyze users’ positive or negative evaluations of Korean novels provides a reference to improve the cultural interpretation and dissemination of modern Korean novels.

Aiming at the traditional sentiment analysis task in which the use of models such as Word2Vec to generate text word vectors cannot effectively solve the problem of polysemous word representations, and models such as classical RNN cannot adequately extract semantic features, the BERT-BGRU text sentiment analysis model for modern Korean novels combined with the attention mechanism is proposed. After pre-processing the text of modern Korean novels, the text is vectorized using the BERT pre-trained language model, and then the text information features are extracted from the front and back directions using the BGRU network, so that the features can be integrated with the contextual information, and then combined with the attention mechanism, the extracted feature information is weighted and computed, so that the model pays more attention to the key information, and the final output is the result of the sentiment classification. After verifying the performance of the model in the relevant dataset, we use Park Wan-su’s novels as an example to sort out the emotional tone of his novel text and perform sentiment analysis.

2

A Model for Sentiment Analysis of Modern Korean Fiction Texts

The basic process of text sentiment analysis of Korean modern novels using deep learning methods is shown in Fig. 1, where the text data is preprocessed, then the organized text is converted into word vectors, the constructed neural network model is used to extract the text features, and the final output of the text is the result of the sentiment classification, which includes the processes of natural language processing, text vectorization representation, and so on.

2.1

Natural Language Processing

2.1.1

Text pre-processing

In a long novel text, words are the basic units of the text. When pre-processing a long text, it is necessary to encode it for conversion, word separation, de-escalation, lexical annotation, lexical form reduction, named entity recognition, lexical meaning disambiguation, and other text processing. 1)

Text Segmentation

When doing text processing, the first preprocessing to be done is word separation. According to the different nature of the composition of the language, in the English text, each word in the sentence is separated by a space as a separator, so the English language can be divided into words according to the space. In English text preprocessing, the processing of participle is relatively simple, modern participle is based on statistical participle, and most of the statistical samples come from some standard corpus. In the text of the word processing, the main objectives are to separate words and punctuation, to separate words whose last digit is a comma or an apostrophe, to separate words divided by quotation marks, to treat most of the punctuation as a separate word, and so on.

2)

Deactivation Filtering

After the word separation process, the long text is transformed into separate words for subsequent processing. In the text analysis work can be found, there are many meaningless function words or difficult to help subsequent analysis and reduce the processing efficiency of the words appear in the text, then the need for deactivation of word filtering. Discontinued words means that in text processing, in order to reduce the impact of the processing results and errors, while improving the efficiency of retrieval, in processing natural language data need to filter out certain words or words, these words or words are also called deactivated words. When filtering deactivated words, according to the deactivated word list of the text in related fields and manual input, the words that appear in the deactivated list are removed, which greatly improves the accuracy and efficiency for the post-processing work and purifies the analysis results.

3)

Lexical annotation

Lexical annotation, also known as word class annotation or abbreviation annotation, refers to labeling a correct lexical property for each word as a result of word division. Lexical annotation is the main task of lexical analysis. Lexicity is the basic grammatical attribute of vocabulary. To determine the lexical status of each word, it is labeled to provide assistance for further analysis.

4)

Word Form Reduction

In English text, due to changes in word forms, a root word may correspond to several different string forms, which need to be processed appropriately in lexical analysis, such as prefixes, suffixes, and endings. Morphological reduction is the reduction of a word of any form to its basic form. Lexical reduction is primarily utilized in text mining and natural language processing to provide more precise and finer-grained text analysis and representation.

2.1.2

Named entity identification

Named entity recognition is a common task in natural language processing based on words and has a wide range of applications in many text analysis characters. Named entities are usually used to refer to entities that have special significance or strong referential importance in the text, usually including names of people, places, time, proper nouns, etc. Named entity recognition system is to extract the above entities from unstructured text and can recognize more categories of entities [21].

In the novel text, the named entities contained and the relationship between the entities have an important value for analyzing the text, and this information can be used to show the relevant information about a specific character in the novel, which is very useful for text analysis and text visualization. The information obtained through entity extraction can most intuitively express the characters and other identifiers in the novel text, which has great practical value, and can also be used in the analysis of key characters, character relationships, and character emotions. Since most novel texts are unstructured text data, the number of named entities is huge, and there are no specific rules to follow. Currently, the commonly used methods for this task are rule-based and model-based.

Rule-based entity extraction is the easiest way to implement. For entities with special contexts, or texts where the entities themselves are characterized by many attributes, the approach of using rule-based extraction for named entities is simple and effective. In the case of a short corpus and rules can be extracted by regular expressions, but if the entities in the text to be extracted are complex the rules have to be modified to satisfy all possible cases. As the amount of corpus increases, the situations to be faced become more complex and the rules may conflict with each other. Therefore, the rule-based approach is more suitable for semi-structured or standardized text extraction tasks.

In the model-based approach, the named entity recognition problem is actually the sequence labeling problem. Sequence labeling problem means that the input of the model is a sequence, and for each unit of the input sequence, a specific label is output. Common sequence labeling models include HMM, CRF, RNN, and other models.

CRF (Conditional random field) is a discriminative model that satisfies pairwise, local and global Markovianity. For sequence labeling problems, linear chain conditional random fields are generally used [22].

A conditional random field is an undirected graph model that guesses the probability of an output sequence given an input sequence. Assume that $X = {x^{1}, x^{2}, x^{3}, \dots x^{T}}$ denotes the set of observed sequences and $Y = {y^{1}, y^{2}, y^{3}, \dots y^{T}}$ denotes the set of labeled states being predicted.

An undirected graph G = (V, E) where V denotes the set of vertices and E denotes the set of edges. Then the markers S are used as vertex indexes, i.e., $Y = {Y_{v} | V \in V}$ , i.e., each node in V corresponds to a variable of the labeled sequence denoted by a random variable Y_v. When each random variable Y_v has Markovianity, then (X, Y) is called a Markov-conditional random field, with probability of the random variable: (1) $p (Y_{v} | X, Y_{u}, u \neq v, {u, v} \in V)$

According to graph theory, the construction of Fig. G can be arbitrary, and he implies the condition of independence in the labeling sequence. From there, a simple first-order chain structure is used as an example, where each node in the construction corresponds to an element in S, as shown in Figure 2.

The basic steps that need to be implemented in the statistical model mainly include: statistical model, extraction of features, labeling and training model, using machine learning methods to obtain the corresponding statistical model, and the test corpus is uniformly labeled, the corresponding CRF model will be obtained. The tested sequences are subjected to subsequent processing, so that the specific results of named entity recognition can be obtained.

2.2

Vector representation of text

2.2.1

Discrete representation based on TF-IDF

For some shortcomings in one-hot coding, an optimization method is to give a weight to each word in the text, and it is natural to think that using the frequency of word occurrence as the weight of each word is the easiest way, but the higher the frequency of the word, does not mean that the word has a greater impact on the meaning of the sentence, such as “the” in English, “ah”, “of” in Chinese, etc., which appear very frequently in daily use, but have a very low impact on the meaning of the sentence.

Word Frequency - Inverse Document Frequency (TF-IDF) is a more effective way of calculating frequency weights, where TF (Term Frequency>represents the frequency of a word being used in a document. The IDF (Inverse Document Frequency) refers to the degree of prevalence of the word, the fewer the number of documents containing the word, the greater the IDF, which means that the use of the word can be very good for the identification of different documents. the overall formula for the TF-JIDF is: (2) $\begin{array}{rrl} T F - I D F & = & \frac{Frequency of occurrence of the word w in the document}{Total number of words in the document} \\ \times & \log \frac{Total number of documents in the corpus}{Number of documents containing the word w + 1} \end{array}$

This method can effectively reduce the weight of irrelevant words and interfering words and retain important information. The TF-IDF method can solve the problem of weight distribution, but there is another important problem of word vector space representation, because the text in the process of matrix representation due to the row transformation characteristics of the matrix, the matrix after the row transformation and the original matrix is equivalent, and does not take into account the different order of words in the text may lead to the emotional semantics may be different.

2.2.2

Distributed representation based on the Word2Vec model

Word2Vec model was proposed by Google in 2013, its basic algorithmic idea is based on the word vector training model proposed by Mikolov et al. Its algorithmic complexity is lower compared to the traditional model, and it can acquire word vectors efficiently.The Word2Vec model consists of two basic network structures, which are the Continuous Bag of Words (CBOW) model and the Skip-gram model [23]. .

The structure of the CBOW model is shown in Fig. 3, and the basic idea is to predict the probability of occurrence of a target word W_t given its context as input. As shown in Eq. (3), where S_j is the jth sentence in text T, w_ij denotes the ith word in the sentence, and its context is denoted as C_ij, and θ is the default parameter, the weight matrix obtained at the end of training is the word vector matrix of the text. (3) $\arg \max_{θ} \prod_{j}^{T} [\prod_{i j = 1}^{s_{j}} p (w_{i j} | C_{i j}; θ)]$

The Skip-gram model can be viewed as the inverse structure of the CBOW model, and the optimization objective of the Skip-gram model is to maximize the posterior probability of the text: (4) $\arg \max_{θ} \prod_{w_{i j}}^{T} [\prod_{c \in C_{i j}} p (c | w_{i j}; θ)]$

2.3

BERT-BGRU Sentiment Analysis Incorporating Attention Mechanisms

2.3.1

BERT pre-training models

The BERT model adopts the Encoder part of Transformer, which is different from the previous LSTM encoding from left to right or right to left, it utilizes bidirectional encoding representation, and it is pre-trained on a large English corpus to fully learn the relevant semantic knowledge.The BERT model solves the problem of multiple meanings of a word which cannot be solved by the traditional word embedding models, such as Word2vec and Glove, so that the input sentence obtains a better word vector representation and semantic representation. The BERT model solves the problem of multiple meanings of a word which cannot be solved by traditional word embedding models such as Word2vec, Glove, etc., so that the input sentence can get better word vector representation and semantic representation.

BERT uses the following two approaches for pre-training: 1)

Masked language model (MLM)

As in the case of completions, the MLM randomly masks a certain percentage of words and uses $[M A S K]$ markers to replace these masked words, causing the model to learn the words that should be filled in position $[M A S K]$ through the global context, realizing the ability of bidirectional encoding.

2)

Next Sentence Prediction (NSP)

NSP can be viewed as a binary classification problem for sentences, where two sentences in a text are given, and the logical relationship between the sentences is mined by determining whether the latter sentence comes after the previous one.

Through the joint learning of these two pre-training tasks, the BERT pre-training model is able to capture the semantic information contained in the sentences more accurately and comprehensively, and also provide appropriate initial values of model training parameters for the subsequent model fine-tuning task.

2.3.2

BGRU layer

Unidirectional GRUs can only capture semantic information in one direction, whereas in text sentiment categorization, the output of the current moment is not only related to the previous information, but also to the information that comes after it. Therefore, this chapter uses a bidirectional gated recurrent neural network (BGRU) to capture semantic information from both directions.

BGRU solves the problem that unidirectional GRU networks cannot encode from backward to forward. BGRU combines forward and backward GRUs, as shown in Fig. 4, which can capture both forward and backward semantic information, and combine with the context to deeply extract text sentiment features.

In the BGRU model, two GRU units with opposite directions coexist at each moment, where $\vec{h_{t}}$ represents the forward transmission of the GRU at moment t, $\bar{h_{t}}$ represents the reverse transmission of the GRU at moment t, h_t represents the output of the BGRU at moment t, and x_t is the input at moment t. The computation of the state at each moment in the BGRU model is shown in Eqs. (5) and (6), and the final output is jointly determined by the GRU units of the two directions, as shown in Eq. (7). GRU units are jointly determined as shown in Eq. (7). (5) $\vec{h_{t}} = G R U (x_{t}, \vec{h_{t - 1}})$ (6) $\overset{\leftarrow}{h_{t}} = G R U (x_{t}, \bar{h_{t - 1}})$ (7) $h_{t} = w_{t} {\vec{h}}_{t} + v_{t} {\overset{\leftarrow}{h}}_{t} + b_{t}$

where: $G R U ()$ denotes a nonlinear transformation of the input to encode the word vector into the corresponding hidden layer state; w_t is the forward propagated weight coefficient matrix; v_t is the inverse backpropagated weight coefficient matrix; and b_t is the bias value at moment t.

2.3.3

Attention layer

To address the problem that the BGRU layer does not consider the sentiment weight of English words, this chapter adds an attention layer after the BGRU layer. The attention mechanism can mark and identify the contribution of different words to the sentiment polarity of the text, allowing the model to ignore unimportant information and focus on the key information, which improves the accuracy of classification. The attention mechanism uses weight coefficients to indicate the importance of the information, and larger weight coefficients indicate more important information.

The calculation process of the attention layer is as follows:

The feature representation h_t obtained from BGRU learning is used as the input to the attention layer, w is the weight matrix, and b is the bias vector to compute the target attention weight u_t, as shown in (8): (8) $u_{t} = \tanh (w h_{t} + b)$

The target weights are normalized by the Softmax function to obtain the weight coefficient a_t: (9) $a_{t} = s o f t \max (u_{t}) = \frac{\exp (u_{t})}{\sum_{i = 1}^{n} \exp (u_{t})}$

To carry out the weight configuration, the feature representation h_i is calculated by weighted summing with the weight coefficient a_t to obtain the feature representation s_t that highlights the emotional importance, and the calculation formula is shown in (10): (10) $s_{t} = \sum_{i = 1}^{n} a_{t} h_{t}$

2.3.4

Model structure

The structure of the BERT-BGRU model incorporating the attention mechanism proposed in this chapter is shown in Figure 5.

It is mainly divided into the following parts: 1)

Input layer

Input preprocessed English text data, the text is denoted as ${w_{1}, w_{2}, w_{3} \dots w_{n}}$ and w_i denotes the words in the sentence.

2)

BERT Layer

Through the training of the BERT model to obtain the vector representation of the text, i.e., the text is processed into a form that can be received by the BGRU.

3)

BGRU Layer

It includes a forward GRU part and a backward GRU part. This layer traverses the obtained word vectors in both forward and backward directions to extract feature information. The output information is $h_{t} = [{\vec{h}}_{t}, \vec{h_{t}}]$ , where $\vec{h_{i}}$ denotes the forward transmission of the GRU at moment t; $\bar{h_{t}}$ denotes the reverse transmission of the GRU at moment t; and h_t denotes the output of the BGRU at moment t.

4)

Attention layer

The attention layer inputs the extracted feature information and performs weighting calculation to strengthen the feature extraction of important information.

5)

Output Layer

The processed feature vector is input to the fully connected layer and sentiment classification is performed using Softmax classifier, the calculation process is shown in equation (11): (11) $y = s o f t \max (w s_{t} + b)$

where s_t denotes the upper output feature vector, w denotes the weight matrix, and b denotes the bias.

The output vector y is a multidimensional vector with dimensions equal to the number of classification categories, the value of each dimension corresponds to each sentiment classification probability, and each value in y is mapped to the range of 0~1 by Softmax function, and finally the category with the largest probability value is selected as the prediction result.The formula of Softmax is shown in equation (12): (12) $f_{i} = \frac{\exp (y)}{\sum_{j = 1}^{n} \exp (y)}$

3

Emotional Analysis of Korean Modern Fiction Texts

3.1

Sentiment Analysis Experiments and Analysis

3.1.1

Data sets

The SST-2 dataset and IMDB dataset were selected for the experiment. In the SST-2 dataset, the average length of the text is about 19.29, and the median length is 11. In the IMDB dataset, the average length of the text is about 585.26, and the median length is 159, which is used to check the performance of the model on text data of different lengths.

SST is a sentiment analysis dataset released by Stanford University. Its main content is movie reviews, and the statistics of the sample size of the dataset are shown in Table 1.

Table 1.

Sst-2 data set

	Sentence number	Positive	Negativity
Training set	6524	3472	3241
Verification set	785	393	382
Test set	1745	886	893

The BERT model achieves the effect of reducing the lexical dimension by breaking down uncommon words into smaller words or proto-words with prefixes and suffixes, etc., when performing word splitting. Therefore, the statistics of sentence length here are relatively larger in value compared to the common length.

For the SST-2 dataset and IMDB dataset, the statistics on the distribution of sentiment words are presented in this section.The distribution of sentiment words in the SST-2 dataset and IMDB dataset are shown in Fig. 6 and Fig. 7. Where the horizontal axis is the text serial number in the dataset, and the vertical axis is the number of emotion words contained in the text corresponding to that serial number.

According to the statistical results, the total number of words in the SST-2 dataset is 124,575, and the total number of emotion words contained in it is 11,251, which is 9.03% of the total number of emotion words, while the total number of words in the IMDB dataset is 5,511,214, and the total number of emotion words contained in it is 381,251, which is 6.92% of the total number of emotion words.

3.1.2

Evaluation indicators

Experiments were conducted to evaluate the model’s performance using precision (P), recall (R), and F1 values. Where precision rate refers to the ratio of the number of positive classes in the predicted results to the total results. Recall refers to the proportion of positive classes that are completely identified in the sample, while F1 value is the result of the average of precision and recall, which is used to balance the distortion of precision and recall in extreme cases. The confusion matrix is shown in Table 2.

Table 2.

Confusion matrix

Answer	Positive	Negative
Forecast	Positive	Negative
Positive)	Kidney-YANG	False resistance
Negative	False Yin	Kidney-YIN

Where TP and TN are the positive and negative label categories whose model classification is the same as the actual labeling, respectively, FP is the label category identified as positive by the system but actually labeled as negative, and FN is the label category labeled as negative by the system but actually labeled as positive.

The calculation process of precision, recall and F1 value is shown in Eqs. (13), (14) and (15). (13) $P = \frac{T P}{T P + F P}$ (14) $R = \frac{T P}{T P + F N}$ (15) $F 1 = \frac{2 P R}{P + R}$

In order to reduce the bias due to the initialization of parameters, this chapter uses a different SEED for each model to initialize, and after conducting five experiments, the average value is taken as the final result. The calculation is shown in Equation (16). (16) $F_{1_s c o r e_m e a n} = \frac{\sum_{i = 1}^{5} F_{1_s c o r e (i)}}{5}$

where F_{1_score_mean} is the final average F1 value.

3.1.3

Experimental results and analysis

The experimental results of the BHW_LEX_GA model proposed in this section are compared with the existing models. In order to evaluate the performance of the models in general, the experiments use different initialization values of the parameters, and the average of the results of several experiments is taken as the final result, and the performance of each model in the three metrics of Precision, Recall, and F1 value is recorded.

The experiments are described as follows:

Model (1): BERT, using only the BERT model as the word embedding method, using the [CLS] token in the last hidden layer, and directly accessing the linear layer afterward, this method serves as the baseline for comparing the effect of other models.

Model (2): BG, using the BERT model as the word embedding method, without any processing of the hidden layer, taking the last hidden layer results into the downstream neural network. Bidirectional GRU is chosen as the downstream neural network.

Model (3): BL, the word embedding part is the same as model (2), and the downstream neural network part is replaced from bi-directional GRU to bi-directional LSTM.

Model (4): BGA, the embedding layer and downstream neural network are constructed the same as model (2), and the attention mechanism is introduced in the model.

Model (5): BLA, the embedding layer and downstream neural network are constructed the same as model (3), and the attention mechanism is introduced in the model.

Model (6): BH, using the BERT model as the word embedding method, fusing some hidden layers in the model, and the hidden layer weights are all set the same and fixed values. Other parts of the structure are the same as model (4).

Model (7): BHW, using the BERT model as a word embedding method, fuses part of the hidden layers in the model with the same initial values of the hidden layer weights, and the weights participate in the training process as parameters that can be learned by the network. Other parts of the structure are the same as model (4).

Model (8): B_LEX, using the sentiment dictionary to filter the text and incorporate the sentiment word information into the model, the other parts are the same as model (1).

Model (9): BHW_LEX, using the BERT model as a word embedding method, dynamic weighting of the hidden layer of the BERT model, and incorporating emotion word information, other parts are the same as model (6).

Model (10): BHW_LEX_GA, the proposed new model, uses the BERT model as a word embedding method, adds emotion word information with the help of an emotion dictionary, performs weighted fusion on the hidden layer of the BERT model, and also participates in the computation of weights as model parameters. The downstream neural network uses the bidirectional GRU model and introduces the attention mechanism at the end.

The performance of each model on the SST-2 dataset is shown in Table 3. The performance of each model on the IMDB dataset is shown in Table 4. As shown by the results in Table 3 and Table 4, the BERT-BGRU model improves the F1 value by 0.849 on the SST-2 dataset compared to the BERT model with only linear layers, while on the IMDB dataset it improves the F1 value by 0.960 compared to the BERT model with only linear layers.BERT-BGRU achieves the best performance, while on the IMDB dataset, the BERT-BGRU model obtained the second best performance after model (8).

Table 3.

Results of the experiment on SST-2

Model serial number	Model	Accuracy rate (P)	Recall rate (R)	F1 value (F1-score)
(1)	BERT	87.289	95.586	91.137
(2)	BG	90.649	93.275	91.914
(3)	BL	87.533	94.926	91.071
(4)	BGA	91.323	92.212	91.748
(5)	BLA	91.533	92.432	91.931
(6)	BH	89.523	93.826	91.595
(7)	BHW	90.595	93.092	91.816
(8)	B_LEX	88.354	94.339	91.237
(9)	BHW_LEX	88.24	95.146	91.557
(10)	BERT-BGRU	89.935	94.082	91.976

Table 4.

Results of the experiment on IMDB

Model serial number	Model	Accuracy rate (P)	Recall rate (R)	F1 value (F1-score)
(1)	BERT	86.592	95.182	90.316
(2)	BG	89.986	92.395	91.099
(3)	BL	89.305	93.129	91.128
(4)	BGA	90.12	91.589	90.789
(5)	BLA	88.515	93.422	90.897
(6)	BH	89.249	91.258	89.978
(7)	BHW	92.815	88.985	90.844
(8)	B_LEX	91.276	91.295	91.281
(9)	BHW_LEX	90.166	91.699	90.861
(10)	BERT-BGRU	87.965	94.669	91.276

The BERT-BGRU model has the best combined performance on both the SST-2 and IMDB datasets, obtaining better results and stability compared to the other models.

3.2

Analysis of Emotional Channels in Park Wan-su’s Novels

3.2.1

Maternal Colors and Emotional Veins in Park Wan-su’s Novels

Korea is a typical oriental culture country, deeply influenced by Chinese Confucian feudalism, and although the society is highly developed economically, the patriarchal ideology is still very strong, and discrimination against women fills every corner of the society.

As Park Wan-su’s debut novel and also her famous novel “Naked Wood”, she takes the Korean War, which has profoundly influenced the modern history of Korea, as the background, and describes people’s desire for love, their extravagant hope for happiness, and their deep inner loneliness through the story of the social status quo of those who are unable to live a normal life in the era of the war.

The delicate emotions in Park Wan-su’s novel are found throughout the novel. In the novel, the mother feels that her life has lost its meaning because of the death of her son, who died in the war, and she is unable to do anything, does not dress up, and has no interest in household chores. As if her life disappeared along with her son, she lived in her own imagination and phantasmagoria, in a state of loss of vitality, just like a walking corpse.

In her other masterpiece “Mom’s Stakes”, the image of motherhood is depicted as both paternal and selfless, reflecting the coexistence of fatherhood and motherhood. After the death of her husband, the mother in the novel has to become the head of the family, taking up the heavy responsibility of supporting the family and educating the children. In order to get rid of old habits and traditions, she leaves her hometown and comes to the city with her children, declaring war on the traditional patriarchal system with her own stubbornness and independence, trying to find women’s self-respect and self-reliance.

In previous literature, influenced by the strong patriarchal discourse, it is often the father who imposes his will on his children, but in “Mom’s Stakes”, due to the lack of fatherhood and the pressure of life, the mother plays this role as a matter of course, and the image of a strong mother is thus formed. In the text, the mother cuts off her daughter’s hair without regard to her daughter’s opinion. In addition, the mother decided to move to Seoul without consulting the child. In the daughter’s view, “Mother is a person who doesn’t consult with anyone and is completely arbitrary”.

However, no matter how strong the maternal figure is, the maternal love for her children that she was born with is undeniable and inexhaustible. In the novel, it is told that due to the war, there is not enough food, so the mother, in order to feed her children, goes to beg for food in spite of the danger of being caught by the gendarmes and may lose her life at any time. This episode fully demonstrates that the image of motherhood in the text has both paternal characteristics of imposing one’s own will on the children and traditional maternal characteristics of a selfless and fearless loving mother for the sake of her children.

In this paper, we will take “Mom’s Stakes” as an example to analyze the emotions in the novel and analyze the emotional curve of the characters’ story.

3.2.2

Generation of the emotional profile of “Mom’s Stakes” novels

Figures 8 and 9 show the sentiment curves of the novel “Mom’s Stakes”. Figure 8 is a fixed-length sentiment curve generated according to previous work, and Figure 9 is a variable-length sentiment curve generated according to the methodology presented in this paper.

In general, the local maximums of the sentiment curves generated from this set of comparisons move relatively similarly. However, it can be found in the figure that the fixed-length curves appear to be smoother. Analyzing the specific data, the total number of words in the curve plotting stage of the “Mom’s Stakes” novel text after text preprocessing is 27,538 words. Due to the limitation of the target of the previous generation curve method, while the window size of the sampled text is fixed, the same number of times must be sampled for different texts to get the same length of the curve. Then for the fixed setting, the window length is 10000 and the time series length is 200, the actual window spacing is (27538-10000-1)/200~86, that is, it is equivalent to a moving average operation for the time series with a window length of 86 for about 10000/86~116 times the window. Compared to the 4 times window moving average set for the generation of variable length curves by the method in this paper, it is too large. In fact, observation of the obtained generation curves shows that the generated fixed-length curves, although having higher temporal resolution, are significantly smoother as well.

If the generated curves are too smooth, the effective information in the original text will be lost. Observing the two generated curves, we can find that in the first half of the curves, the emotion of the text shows a gradual upward trend in general. However, the variable-length curve generated by the method of this paper shows the emotional fluctuation more clearly, while the method of generating the fixed-length curve smooths out most of the emotional fluctuation. Scholars believe that this smoothing actually causes the loss of information in the original text.

The smoothing phenomenon of the original method is more apparent in the last 20% of the curve. In “Mom’s Stakes”, in order to emphasize the mother’s authoritarianism and stubbornness, the mother ignores her daughter’s opinion and cuts off her daughter’s hair without her permission, and the daughter experiences a lot of negative emotions. In the end, with the daughter’s understanding, the emotions begin to calm down.

The curves generated by this paper’s method reflect this change very well, while the fixed-length curves generated by the previous method only show the situation of falling back from the highest point to the mean value, and do not reflect the negative emotions that are concentrated at the end.

In conclusion, for the emotion curve generated by the novel “Mom’s Stakes” as a sample, the emotion curve generated by this paper’s method solves the problem that the smooth excess generated by the previous methods under their configuration cannot reflect the characteristics of the text, and it can better fit the emotion change of the original novel text.

4

Conclusion

The study utilizes deep learning methods to construct a model for sentiment analysis of modern Korean novels, and then proposes a sentiment analysis method for BERT-BGRU that incorporates the attention mechanism. IMDB and SST-2 datasets are selected and BERT-BGRU is evaluated against existing methods. The experimental results show that compared with the method that only uses the BERT model as a word embedding method connected to the downstream network, BERT-BGRU improves 0.849 and 0.960 on the F1 value on the SST-2 and IMDB datasets, and the BERT-BGRU method all obtains better results. Then, taking the novel “Mom’s Stakes” by Park Wanxu as an example, this paper compares a method proposed by the predecessors to generate a fixed-length emotion curve with the BERT-BGRU emotion model proposed in this paper, and the experimental results show that the BERT-BGRU emotion model can better express the emotion curve characteristics of the novel text.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Sentiment Analysis of Korean Modern Novel Texts Applying Deep Learning Models

Yidan Piao

Publié en ligne: 26 mars 2025

Reçu: 03 nov. 2024

Accepté: 21 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0812

Mots clésAttention mechanism, BERT-BGRU model, Text vectorization representation, Modern Korean novel, Text sentiment analysis

© 2025 Yidan Piao, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Attention mechanism, BERT-BGRU model, Text vectorization representation, Modern Korean novel, Text sentiment analysis