A Study on the Integrated Application of Deep Learning and Semantic Analysis Techniques in Sentiment Interpretation of Medical Texts

The development of artificial intelligence can be divided into two stages: perceptual intelligence and cognitive intelligence. In recent years, with the rapid development of big data technology and machine learning technology represented by deep learning, artificial intelligence has made rapid progress in the stage of perceptual intelligence, and can reach the level of human experts in tasks such as image recognition and speech recognition [1–3]. However, the development in the cognitive intelligence stage, especially in natural language understanding, is still relatively limited. Compared with the rich linguistic experience and linguistic knowledge reserve of human beings, it is difficult to generate real intelligence by relying solely on data-driven deep learning based on data [4–6]. In order to break the performance bottleneck of deep learning, attempting to perform semantic analysis combined with deep learning models will become the next breakthrough of AI in cognitive functions, which can assist in analyzing the linguistic materials obtained from user research, and can significantly improve the efficiency of analyzing the results of user research, and even mine important information that is difficult to be found by manual analysis [7–9].

Medical text interpretation is an important embodiment of medical narrative ability, and mastering the emotion of text interpretation is an important means of improving medical narrative ability and thus medical literacy. Medical text interpretation ability is the ability to be able to absorb, interpret and be moved by the story of the disease, which integrates medicine and literature, advocates the integration of narrative methods into medical diagnosis and treatment, and then makes up for the lack of human temperature in medical activities that simply emphasize science [10–12]. In text interpretation, sentence structural forms vary widely in linguistic practice. Meaningful structural forms often carry the important function of expression and meaning in the text. When interpreting texts, consciously guiding readers to focus on typical and atypical forms of utterances and analyzing and appreciating them is not only conducive to readers finding the correct path of appreciation to grasp the author’s emotional interest, but also conducive to the formation of readers with a certain constructive linguistic ability [13–15]. And the study of the integrated application of deep learning and semantic analysis technology in the emotional interpretation of medical texts can more deeply explore the different attitudes between doctors and patients towards the body and the body, ruling and being ruled, concealing and searching are the deep reasons for their differences. After all, illness for patients not only means physical ailments, but also means the separation of self and body, the loss of self-worth, and the stripping of life roles. The attention and interpretation of medical text interpretation to the emotion of the text is exactly what is necessary to improve the ability of medical narrative [16–18].

In this paper, we first discuss the general process of medical text sentiment analysis, propose a deep semantic feature extraction method, combine BiLSTM, CNN and AM models, and construct a medical text sentiment analysis model based on the attention mechanism and bidirectional recurrent convolutional neural network - AC-BiLSTM.To verify the model’s practicality, the effects of different input ports on the performance of the model are analyzed, and its performance is compared with other models. Finally, the model is applied to the task of analyzing the sentiment of doctor-patient public opinion texts to further support the reliability of the model.

2

General process of sentiment analysis of medical texts

The process of semantic sentiment analysis of medical text is to firstly take the training medical text which has been labeled with the categories, and after a series of processing, according to the degree of distinction between the features of the medical text or the relationship between the features and their respective categories as the basis for the recognition of the classifier, and then to build a suitable classification model and finally to use the model for category prediction of the unlabeled test medical text. The general flow of medical text sentiment analysis is shown in Figure 1. As can be seen from the figure, all learning needs to go through a training and testing phase, and more detailed steps include feature selection, text representation methods, etc., and the final classification results need to be evaluated using evaluation metrics.

2.1

Preprocessing of Chinese medical text

The main purpose of preprocessing is to eliminate irrelevant information and convert medical text into a format that can be recognized by computers, so as to facilitate the acquisition of structural information from medical texts. The steps of preprocessing mainly include text segmentation, lexical annotation, and deactivation. First of all, the segmentation stage should be based on the structural characteristics of Chinese sentences, and the complete sentence should be divided into individual words or phrases as the unit of independent parts. After word segmentation, the de-duplication process is carried out, and the de-duplicated words are mostly commonly used but without practical significance. A large number of de-duplication words in the text not only increase the dimension of the feature space, but also disperse the feature weight and make the feature extraction difficult, and also reduce the performance of machine learning. The preprocessing operation is mainly concerned with the following aspects.

2.1.1

Segmentation processing

Existing participle algorithms can be divided into three main categories: Chinese participle character matching method, Chinese participle comprehension method, and Chinese participle statistics method. 1)

Chinese Segmentation Character Matching This method is a dictionary-based segmentation method, first of all, you need to build a large enough machine segmentation dictionary, and then match the string of words in the text to be segmented with the words in the segmentation dictionary. If the string can be found in the dictionary and a word is recognized, then it is treated as a word to be split from the text. Otherwise, it is not split. This method is commonly used in text proofreading, keyword retrieval, spam processing, and other applications.

2)

Chinese word segmentation and comprehension method is to simulate human understanding of sentences by letting the computer learn a large amount of linguistic knowledge and information, so as to achieve the effect of recognizing words. The basic idea is to use syntactic and semantic analysis of the text processed by participle to determine whether there is ambiguity or not, and the method is helpful for ambiguous phenomena and the problem of multiple meanings of words.

3)

Chinese lexical statistics method only needs to count the frequency of word groups in the corpus according to the fixed collocation rules in Chinese, which belongs to the lexiconless lexical separation method, and the method can recognize unregistered words and eliminate ambiguity automatically with high efficiency, but the recognition accuracy of this method for the commonly used words is rather poor due to the fact that it does not use the cut-off lexicon, and it is easy to cut out the combinations of words that have a high frequency of co-occurrence but are not commonly used words and phrases.

In this study, ICTCLAS, a Chinese lexical analysis system, was used to preprocess medical texts. The main functions of the system include Chinese word segmentation, lexical annotation, named entity recognition, neologism recognition, and also support for user dictionaries, and its segmentation speed is 996KB/s on a stand-alone basis, and its segmentation accuracy can reach 98.45%.

2.1.2

De-duplication of words

Textual content can generally be represented in terms of meaning by real words such as nouns, verbs and adjectives, while imaginary words and words that appear frequently in the text but do not indicate any meaning are called deactivated words. Since these deactivated words do not indicate the actual meaning of the text, they do not have any value for text categorization, but instead increase the spatial complexity of the text. So in order to improve the accuracy of text classification algorithms, the operation of removing deactivated words from the text is a key step. Constructing a deactivation word list can effectively identify deactivated words, and words that match the deactivation word list in the word list can be removed, making it convenient to remove deactivated words in batches.

2.1.3

Establishment of an emotional lexicon

The performance of sentiment classification of medical texts can be improved by constructing a sentiment dictionary for medical texts. The performance of sentiment classification is very dependent on the sentiment dictionary, but the reality is that there is no perfect technology to build an accurate and complete Chinese sentiment dictionary, and new words are constantly emerging on the Internet. Therefore, the sentiment dictionary used in this paper’s experiment merges Hownet and NTSUSD, removes duplicate words, words with unclear sentiment tendencies, and manually adds new words containing sentiment tendencies in response to new popular words in the Internet.

2.2

Medical text feature extraction process

Medical text feature extraction is the process of extracting representative features from medical text and using them to transform the target medical text into feature vectors. The general feature extraction process can be divided into: text preprocessing, text sequence representation, feature selection, and computation. And feature selection and computation can be divided into traditional feature selection and deep semantic feature selection.

2.2.1

Medical Text Sequence Representation

Text serialized representation is the conversion of text into a digital representation. Common methods of text sequence representation are: vector space model (VSM), bag of words model (BOW), n-gram model, word vector model. 1)

Vector Space Model

Vector space model (VSM) considers each target object as a point in an m-dimensional spatial model, or as an m-dimensional vector in the spatial model.

A vector space is mathematically represented by a set of linearly independent basis vectors, the dimension of which is the dimension of the space vector into which the target object is transformed. In a three-dimensional vector space model, X,Y,Z represents the basis vectors respectively, and the target object is represented as the space vector (x, y,z). To compute the similarity of two target objects, a common method is to compute the cosine similarity of the target objects. For example, sentence s₁ =(x₁, y₁, z₁),s₂ =(x₂, y₂, z₂), then the similarity of s₁ and s₂ can be transformed to find cosθ, i.e.: 1 $\cos θ = \frac{x_{1} x_{2} + y_{1} y_{2} + z_{1} z_{2}}{\sqrt{x_{1}^{2} + y_{1}^{2} + z_{1}^{2}} + \sqrt{x_{2}^{2} + y_{2}^{2} + z_{2}^{2}}}$

2)

Bag of Words Model

Bag of words model (BOW) is to consider a target object, as a bag of words. Each word is treated as a feature, and the weight of the feature is usually taken as the frequency of occurrence of the word. Among the common text sequence representation methods, BOW is the simplest and most general method, but it does not take into account the sequence relationship between the occurrence of words, just for the vocabulary of the statistics, can not mine the depth of information.

3)

n-gram model

Compared with BOW, the n-gram model takes into account the sequence of occurrence of words and digs deeper semantic information. Assuming that a sentence s can be represented as sequence s = w₁w₂…w_m by participle, then its probability is calculated by chain rule: 2 $P (s) = P (w_{1}, w_{2}, \dots w_{i}, \dots, w_{m}) = \prod_{i = 1}^{m} p (w_{i} | w_{1} w_{2} \dots w_{i - 1})$

Using the assumption of Markov chains, this can be simplified to: 3 $p (w_{i} | w_{1} w_{2} ... w_{i - 1}) = p (w_{i} | w_{i - n + 1} ... w_{i - 1})$

Typically, n takes the values 2, 3, 4. If n = 3, then there is: 4 $P (w_{1}, w_{2}, ... w_{i}, ..., w_{m}) = \prod_{i = 1}^{m} p (w_{i} | w_{i - 1} w_{i - 2})$

Then using maximum likelihood estimation, the probability of occurrence of term w_i can be derived: 5 $p (w_{i} | w_{i - 1} w_{i - 2}) = \frac{C (w_{i - 2} w_{i - 1} w_{i})}{C (w_{i - 1} w_{i - 2})}$ where c denotes the number of occurrences of word w_i in the corpus.

4)

Word Vector Model

Researchers usually use neural networks to train word vector models, which can well avoid the problems of dimensionality catastrophe, lexical isolation, etc., and can better grasp the meaning of words. The more classical language modeling frameworks trained with neural networks are NNLM model and Word2Vec model. In this paper, we use Word2Vec model for medical text sequence representation.

The Word2Vec model consists of a simple three-layer neural network, which can quickly train an optimized model based on a given corpus to effectively express a word in the form of word vectors, and is often used for text processing prior to deep learning algorithms [19]. When performing word vector training, the dimension size of the word vector is the number of initial feature terms that represent each word has. The word vectors to be obtained are in fact the weight matrices of the hidden layers obtained after model training. The core architecture of Word2Vec is two training models: the CBOW and the Skip-gram.

The principle of CBOW (Continuous Bag-Of Words) model can be understood as the context determines the current word CBOW uses a bag-of-words model so that the order of context words is not taken into account. The CBOW model is shown in Fig. 2(a). In this paper, Hierarchical Softmax is used to train the optimization model. The CBOW model based on Hierarchical Softmax takes the One-hot of context words as input Context(w) and then sums the input layer by accumulating a number of vectors, i.e.: 6 $X_{w} = \sum_{i = 1}^{c} V (C o n t e x t {(w)}_{i})$

And the output layer corresponds to a Huffman tree. It is constructed with the words that have appeared in the corpus as leaf nodes and the number of occurrences as weights. Then the probability of the current word can be expressed as: 7 $p (w | C o n t e x t (w)) = \prod_{j = 2}^{l} p (d_{j}^{w} | X_{w}, θ_{j - 1}^{w})$ 8 $p (d_{j}^{w} | X_{w}, θ_{j - 1}^{w}) = [σ {(X_{w}^{T} θ_{j - 1}^{w})}^{1 - d_{j}^{w}}] \times {[1 - σ (X_{w}^{T} θ_{j - 1}^{w})]}^{d_{j}^{w}}$ where $d_{j}^{w}$ corresponds to the Huffman coding number of word w and $θ_{j}^{w}$ denotes the vector corresponding to the non-leaf node. The optimization is then solved by maximum likelihood.

Skip-gram is shown in Fig. 2(b), which predicts the context by the current word, exactly the opposite of the CBOW model.The objective function of Skip-gram is: 9 $L = \sum \log (p (C o n t e x t (w) | w))$ where $p (C o n t e x t (w) | w) = \prod_{u \in C o n t e x t (w)} p (u | w)$ . Again, according to the idea of Hierarchical Softmax, using Huffman trees, it can be derived: 10 $p (u | w) = \prod_{j = 2}^{l} p (d_{j}^{u} | v (w), θ_{j - 1}^{w})$ 11 $p (d_{j}^{u} | v (w), θ_{j - 1}^{u}) = [σ {(v (w) θ_{j - 1}^{u})}^{1 - d_{j}^{u}}] \times {[1 - σ (v (w) θ_{j - 1}^{u})]}^{d_{j}^{u}}$

2.2.2

Traditional Text Feature Selection and Weight Calculation

The calculation of weights for traditional text features is mainly based on feature evaluation functions, which include: TF - IDF,MI et al. 1)

TF-IDF

Word Frequency-Inverse Document Frequency (TF-IDF), a weighting technique for information retrieval, is commonly used to assess the degree of importance of a word for a document, which can be defined as: 12 $T F - I D F = T F * I D F$

TF denotes the word frequency, which refers to the frequency of occurrence of a word in that document, i.e. for a given word t_i, its importance can be expressed as: 13 $t f_{i, j} = \frac{n_{i, j}}{\sum_{k} n_{k, j}}$ where n_i,j is the number of occurrences of the word in the document and the denominator is the sum of all occurrences of the word in the document d_j. IDF denotes the Inverse Document Frequency, which is calculated as: 14 $i d f_{i} = l o g \frac{| D |}{| {j : t_{i} \in d_{j}} | + 1}$ where |D | denotes the total number of documents in the corpus, and |{j : t_i ∈ d_j}| denotes the number of documents containing the word t_i.

From the frequency of high words within a particular document, and the frequency of low documents for that word in the whole set of documents, a highly weighted TF-IDF can be derived.

TF-IDF is a very common text mining method, but it is only a comparison of words that occur together in a document and does not take into account the semantic associations behind the text.

2)

MI

Mutual Information (MI), in terms of mathematical theory, is the value that measures the interdependence of two variables and can be defined as: 15 $I (X; Y) = \sum_{y \in Y} \sum_{x \in X} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)})$ where p(x, y) denotes the joint probability distribution function for X and Y and p(x), p(y) denotes the marginal probability distribution function for X and Y. Intuitively, mutual information is a measure of the information shared by X and Y : the degree of uncertainty about one of the variables while knowing the other.

And in the task of text categorization, mutual information is used to describe the amount of direct information between a feature word of a document and the category of the document. Thus it can be defined as: 16 $M I (t, C) = \log \frac{P (t, C)}{P (t) P (C)}$ where denotes P(t,C) the ratio of the number of documents in category C containing feature t to the number of all documents. P(t) denotes the ratio of all documents containing feature t to the total number of documents, and p(C) denotes the ratio of documents in category C to the total number of documents.

When performing text categorization tasks, the degree of relevance between a word and a category may be judged, but the influence of word frequency is not taken into account in the calculation, so the results of MI and TF-IDF can be combined so that both feature frequency and feature distribution can be taken into account.

2.2.3

Deep semantic feature selection and weight calculation

In addition to the traditional extraction of text features, there is an abstract feature extraction called deep semantic features. Deep semantic features are intrinsic features that can be mined using techniques like neural networks and deep learning. Such deep semantic features are often derived during the training process of the model and do not refer to an actual concept.

Word2Vec can also be considered as a type of deep semantic feature. In building models for deep learning, deep semantic features are often used as the initial input. Then, the deeper features are subsequently combined with the deep learning network to dig deeper.

3

AC-BiLSTM-based Sentiment Analysis Model for Medical Texts

In this paper, we constructed a medical text sentiment analysis model based on attention mechanism and bidirectional recurrent convolutional neural network, AC-BiLSTM, by combining BiLSTM, CNN and AM models.The model is firstly trained to get the word vector representations of the medical text using the Word2Vec model, and then the trained word vectors are input into the BiLSTM model to learn the rich contextual semantic features of the medical text through the BiLSTM model. The trained word vectors are then inputted into the BiLSTM model, which learns the rich contextual semantic features of the medical text. Then the most important information in the medical text is focused by the attention mechanism and the corresponding feature weights are assigned. Finally, deeper semantic features are mined by CNN to determine the emotional tendencies of the utterance.

3.1

General Architecture of AC-BiLSTM Modeling

The overall structure of AC-BiLSTM model is shown in Fig. 3, which is mainly composed of BiLSTM layer, AM layer and CNN layer. Firstly, the Word2Vec model is utilized to train the word vectors that contain rich semantic information, and then the text is mined for deep contextual semantic information by BiLSTM. After that, the feature vectors output from BiLSTM are used as input for AM, which optimizes the feature vectors and gives greater weight to certain key information in the sentence. Then the output of AM is used as the input of CNN, and CNN is utilized for deeper feature mining, and the training parameters are reduced by Maxpool, and finally the final sentiment classification labels are output by Softmax layer.

3.2

BiLSTM layer

3.2.1

LSTM

LSTM, which has a similar chain structure with traditional RNN, can be a good solution to the shortcomings of RNN that can not learn the long-term dependency, as a subclass of RNN, LSTM in the cyclic module structure and RNN but not the same. The standard LSTM model is generally composed of input, hidden and output layers [20]. Input gates, output gates, and forgetting gates form the hidden layer of the three-layer structure, which, in addition to this, contains an important constituent element, the memory unit. Memory units are specifically designed to control the flow of information and the selective manipulation of information by the memory units is determined by the gate structure.The LSTM neural network has three different gate structures that can protect and control the state of the nodes, which are input gates, forgetting gates and output gates.

Assuming that the input of a node of the LSTM neural network is x_t, the weight corresponding to the input is W_x, the output at the moment of t is h_t, the weight corresponding to the output is W_h, and σ denotes some kind of activation function, the process of updating information can be divided into the following stages: 1)

Forgetting gate

The LSTM can be controlled to selectively discard the information that has been memorized using the sigmod layer, which belongs to the forgetting layer. In this case, the inputs are denoted as h_t−1 and x_t, and then a random number is outputted after C_t−1, which takes values between 0 and 1, i.e., memorizing all the information is recorded as 1 and discarding all the information is recorded as 0, i.e.: 17 $f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$

2)

Input gate

The selection of the memorized information by the memory cell is controlled by the input gate. First, the sigmod selects a portion of the information to be updated, and then ${\tilde{C}}_{t}$ is generated via the tanh of the input gate to update the state of the memory cell. The calculation process is as follows: 18 $i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})$ 19 $C_{t} = tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{C})$

The memory unit state can be updated once the input as well as the forgotten information has been determined, and the information that can be discarded is obtained by taking the product of f_t and C_t−1. To this, the product of i_t and ${\tilde{C}}_{t}$ is added to obtain the candidate values: 20 $C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}$

Where, f_t denotes the output value of the forgotten gate, C_t−1 denotes the old state value, i_t denotes the output value of the input gate and ${\tilde{C}}_{t}$ denotes the current state value.

3)

Output gate

The output gate carries out the selection of the output information, and the current state of the memory cell determines the output information. The information to be output from the state of the memory cell is calculated in the sigmod layer, and then the state values are processed using the activation function, and then the output results are obtained by calculating the output values of the previous two. The calculation is as follows: 21 $o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$ 22 $h_{t} = o_{t} * \tanh (C_{t})$

With the traditional LSTM, the memory unit at a certain moment can only memorize the input before this time node, and cannot memorize the input after it, in order to obtain more emotional information, this paper adopts a bi-directional long and short-term memory neural network (BLSTM) to encode the semantic information of the text.

3.2.2

BiLSTM

Bidirectional LSTM (BiLSTM) is developed from BRNN and the structure of BLSTM is shown in Fig. 4. BiLSTM is constituted by combining forward LSTM and reverse LSTM so that both forward and backward semantic information can be memorized at the same moment [21].

BiLSTM is useful for simultaneous attention to context by making the output layer jointly connected by two LSTMs in different directions, so that the bidirectional LSTM can memorize the semantic information of both the previous and the subsequent moments.

The hidden state H_t of BLSTM at moment t contains forward ${\vec{h}}_{t}$ and backward ${\overset{\leftarrow}{h}}_{t}$ : 23 $\vec{h_{t}} = \vec{L S T M} (h_{t - 1}, w_{t}, c_{t - 1}), t \in [1, T]$ $$\overrightarrow {{h_t}} = \overrightarrow {LSTM} \left( {{h_{t - 1}},{w_t},{c_{t - 1}}} \right),\>t \in \left[ {1,T} \right]$$ 24 $\overset{\leftarrow}{h_{t}} = \vec{L S T M} (h_{t + 1}, w_{t}, c_{t + 1}), t \in [T, 1]$ $$\overleftarrow {{h_t}} = \overrightarrow {LSTM} \left( {{h_{t + 1}},{w_t},{c_{t + 1}}} \right),\>t \in \left[ {T,1} \right]$$ 25 $H_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}]$

The output of BLSTM H_t is used as the feature vector of the medical text. The training of BLSTM is similar to that of BRNN, and the backpropagation algorithm is used to compute the gradient.

3.3

AM layer

An important function of the Attention Mechanism (AM) is to identify the most important sentiment features and assign their corresponding weights. [22] The scope of application of Attention Modeling in the field of Natural Language Processing at the beginning was mainly for tasks such as machine translation, which can combine all the important factors in the text efficiently and optimize the sentiment information by highlighting the degree of contribution of each word.

Here the input of the attention model is the output of the BiLSTM, and the probability value of the attention distribution of the input text pair of the resulting sentiment vector is calculated by the attention model, and then a matching probability between the two is obtained by comparing the result after the calculation with the h_i of the hidden layer corresponding to all the words. Different attention models use different alignment functions.

For the model training of AM, this paper uses a feed-forward neural network which has 2 layers of neural networks, and adopts the number of hidden layer neurons of BiLSTM as the number of memory units of the attention model. The assigned attention probability for each feature vector is calculated as follows: 26 $e_{i} = g (W h_{t}, U h_{i}, c)$ 27 $a_{i} = \frac{\exp (e_{i})}{Σ_{j}^{T} \exp (e_{j})}$

In the above equation, the number of input texts are T, W, U are represented as weight matrix, the state of the hidden layer for each input text element is represented as h_i, and the activation function is g. Attention probability scores are obtained by normalizing the context with the Softmax function, and a_i represents the magnitude of the influence of the input text i on the result.

The semantic encoding of the final output of the attention model is obtained by summing all the attention probability scores: 28 $o = \sum_{t = 1}^{T} a_{t} h_{t}$

3.4

CNN layer

After the feature vector optimization of the output of BiLSTM by AM, the second round of deep feature mining is carried out by CNN [23]. CNN uses pooling method to optimize the feature extraction, and then Softmax is used as the classifier of the model, which is calculated as follows:

For the error function in the training process, this paper uses cross entropy, which is calculated as follows: 29 $L (Y, O) = - \frac{1}{N} \sum_{i \int N} y_{i} \log (y_{i})$ $$L\left( {Y,O} \right) = - \>{1 \over N}\mathop \sum \limits_{i\smallint N} {y_i}\log \left( {{y_i}} \right)$$

In the above equation, the labeling vector of the input text is ${\hat{y}}_{i}$ , the output vector of the model is y_i, and N denotes the number of input texts. The model finally outputs the sentiment category labels of the text by Softmax.

4

Results and analysis of model experiments

4.1

Effect of different model input port sizes on model performance

The size of the model input port should be set according to the distribution characteristics of the length of different datasets, setting too large may cause too many training parameters to affect the training efficiency, and setting too small will shield the important semantic information of the data, so setting the appropriate model input port not only helps the model to understand the textual information, but also reduces the complexity of the model, improves the model performance and training efficiency. In this paper, IMDB, SST-2 and Medical-Reviews datasets are selected as the text input to the model, and the data length distributions of IMDB, SST-2 and Medical-Reviews datasets are shown in Fig. 5 (a)~(c), respectively.

As can be seen from Fig. 5, for the IMDB dataset, the distribution of datasets with lengths of 900-1800 is small, the number of datasets with lengths of 300-900 is moderate, and the distribution of datasets with lengths of 0-300 is large. Therefore, 900, 1200, 1500, and 1800 are selected as model input port sizes on the [900,1800] interval, 300, 500, and 700 are selected as model input port sizes on the [300, 900) interval, and 75, 150, and 225 are selected as model input port sizes on the [0, 300) interval.

For the SST-2 dataset, it can be found that the dataset is shorter in length relative to the IMDB dataset, which are all concentrated in the interval range of [0, 90], and most of them are concentrated in the interval of [0, 60]. Therefore, 60, 75, and 90 are selected as model input port sizes in the [60, 90] interval, and 10, 20, 30, 40, and 50 are selected as model input port sizes on the [0, 60) interval.

For the Medical-Reviews dataset, it is mainly concentrated in the [0, 100] interval, while the dataset in (100, 250) is very sparse. Therefore, 150 and 200 were chosen as the model input port sizes on the (100, 250) interval, and 20, 40, 60, 80, and 100 were chosen as the model input port sizes on the [0,100] interval.

4.2

Model Comparison Experiments

In order to test the effectiveness of AC-BiLSTM applied in medical text sentiment interpretation, this paper selects LSTM, CNN-LSTM, BiLSTM to participate in the performance comparison study of the model. With the help of the open-source Chinese annotated corpus on Github, we collect comments involving various aspects of medicine, medical treatment, doctor-patient relationship, totaling 80,000 comments (40,000 positive and negative comments each), and construct the Medical-Reviews dataset. Among them, 64000 reviews are used as training data and 16000 as test data. With the help of word2vec, the words in the text of each review are vectorized, and the dimension of each word is set to 100, and the sequence complement length is set to 100.The settings of the main parameters of CNN-ATTBiLSTM are shown in Table 1.

Table 1.

The parameter settings of the AC-BiLSTM model

Parameter name	Parameter setting
Word vector dimension	100
Convolution number	32
Convolution kernel size	3×100
Maximum pooling	2
Number of BiLSTM units in the first layer	128
Number of BiLSTM units in the second layer	64
Batch_size (batch processing)	128
L2 regularization coefficient	0.001
Optimizer	Adam
Dropout	0.5
Number of iterations	12

Precision, Recall, and F1 values were selected as the evaluation metrics for the models. The statistical results obtained for each model acting on the test set are shown in Table 2.

Table 2.

The parameter settings of the AC-BiLSTM model

Model	Precision/%	Recall/%	F1/%
LSTM	87.48	85.79	86.63
BiLSTM	88.32	86.94	87.62
CNN-LSTM	89.79	86.54	88.14
AC-BiLSTM	91.45	88.65	90.03

From the F1 values in Table 2, it can be seen that among the above four models, AC-BiLSTM outperforms the other models in the medical text sentiment binary classification task, and it improves its F1 values by 3.40%, 2.41% and 1.89% over the LSTM, BiLSTM and CNN-LSTM models, respectively. First, by comparing the LSTM and BiLSTM models, BiLSTM improves 0.84%, 1.15%, and 0.99% in Precision, Recall, and F1 values, respectively, compared to LSTM, which indicates that bi-directional LSTM can learn the contextual information of the medical text, which is helpful to realize the sentiment classification of medical text. Secondly, CNN-LSTM improves 0.52% in F1 value compared to BiLSTM, indicating that after extracting text features by convolutional neural network, the classification result using only one-way LSTM surpasses the two-way LSTM model, which confirms that convolutional neural network has certain advantages in medical text feature extraction. Finally, after adding the attention mechanism, AC-BiLSTM has a 1.89% improvement in F1 value compared to CNN-LSTM, indicating that the model can selectively focus on the information that has a greater impact on the medical text categories during the medical text feature learning process, which can improve the accuracy of the sentiment classification of the medical text, as well as enhance the model’s robustness and generalization ability.

In addition, based on the training set and the test set, the variation curves of the loss function values and classification accuracy obtained by each model during the training process are shown in Fig. 6. Among them, Train loss and Val loss are the loss values of the models acting on the training set and validation set, respectively, and Train acc and Val acc are the recognition accuracies of the models acting on the training set and validation set, respectively, during the training process.

Figures 6(a)~(h) portray the variation of loss values obtained by LSTM, BiLSTM, CNN-LSTM, and AC-BiLSTM acting on the training and validation sets during the training process with the number of iterations, respectively, with the horizontal coordinates indicating the number of rounds of training and the vertical coordinates indicating the loss values. As can be seen from Fig. 6(a), (c), (e), and (g), during the training process of the above medical text sentiment analysis model, the change in the loss value of AC-BiLSTM is relatively smooth and gradually becomes smaller compared to other models. In addition, from Fig. 6(b), (d), (f), (h), it can be seen that the recognition accuracy curves obtained by the above four models acting on the training set and validation set during the training process are generally the same at the beginning of the training period, but with the increase of the number of training times, the accuracy rate of the AC-BiLSTM model via the training set and the validation set is higher than that obtained by the other three models, and the accuracy rate of its acting on the validation set gradually tends to be 98%, while the accuracy rate of the AC-BiLSTM model on the validation set gradually converges to 98%, and the accuracy rate of its acting on the validation set gradually converges to 98%. The accuracy rate of the AC-BiLSTM model on the validation set tends to be 98%, while the accuracy rates of the LSTM, BiLSTM and CNN-LSTM on the validation set tend to be 91%, 93% and 96%, respectively. Therefore, AC-BiLSTM has the best effect on the validation set for medical text sentiment classification.

4.3

Analysis of Emotional Satisfaction and Concern of Doctor-Patient Public Opinion

To further validate the efficacy of AC-BiLSTM model in medical text sentiment interpretation, this paper takes doctor-patient public opinion text as an experimental object and analyzes the sentiment states in it to achieve the prediction of text sentiment.

The public’s emotional satisfaction and concern are crucial in the evolution of doctor-patient public opinion. In order to refine the theme-word distribution of doctor-patient public opinion, this paper explores the theme-word distribution of doctor-patient public opinion in each year. After the iterative calculation of the LDA model, the distribution of themes in each year can be obtained. When the number of topics is K=4, the model reaches the optimal topic extraction effect. As a result, 16 segmented themes from 2018 to 2021 can be obtained, and the keywords with the top 10 probability distributions under each segmented theme can be obtained. According to the keywords, each sub-theme is named separately, and the annual “theme-word” distribution of doctor-patient public opinion is finally obtained as shown in Table 3.

Table 3.

The “theme-word” distribution of the medical and public opinion of the year

Year	Topics	Key words
2018	Topic1: Thank you, doctor	Injury (0.051), Heart (0.044), Thank you (0.042), World (0.041), It’s worth (0.030)
	Topic2: Hope to understand	Reason (0.116), I hope (0.048), Social (0.048), Doctor (0.047), That happened at (0.045).
	Topic3: Doctor injury	Candles (0.125), Doctor (0.077), Hospital (0.030), Death (0.026), Injured doctor (0.024)
	Topic4: Medical dispute	Medical disputes (0.148), Violence (0.087), Dispute (0.069), Medical treatment (0.050), Medical (0.033)
2019	Topic5: Blog forwarding	Smile (0.171), Forward (0.091), Medical staff (0.069), News (0.051), Child (0.042)
	Topic6: Get angry	Doctor (0.136), Hospital (0.059), Patient (0.044), Patient (0.042), Social (0.033)
	Topic7: Understanding protection	Occupation (0.091), Death (0.085), Understand (0.078), Doctor (0.071), Protection (0.062)
	Topic8: Doctor’s benevolence	Treatment (0.065), Doctor (0.057), Heart (0.038), Clinical (0.036), Benevolence (0.035)
2020	Topic9: Protect the doctor	Hospital (0.049), Protection (0.045), Doctor (0.042), Wife (0.040), Support at (0.039)
	Topic10: Execution	Death penalty (0.201), Execution (0.082), Judgement (0.063), Anger (0.054), Smile (0.047)
	Topic11: Kill the doctor	Death (0.119), Kill (0.103), Doctor (0.074), Medical troubles (0.042), Hurt (0.031)
	Topic12: Terrible anger	Heat (0.120), Search for (0.111), State (0.058), Terrible (0.043), Anger (0.038)
2021	Topic13: Conflict resolution	Health care commission (0.081), Death (0.067), State (0.056), Solve (0.040), Law (0.032)
	Topic14: Call for punishment	Call on (0.125), Punish (0.123), News (0.041), Medical personnel (0.037), Unfortunately (0.029)
	Topic15: Protection protection	Doctor (0.078), Response (0.043), Protection (0.041), I hope (0.039), Medical staff (0.033)
	Topic16: Attention to medical ethics	Hospital (0.096), Occur (0.069), Epidemic situation (0.048), Take seriously (0.047), Bad (0.029)

In order to explore the public’s sentiment and concern for the 16 subdivided themes, an attempt was made to construct a comprehensive analysis framework of concern-satisfaction. The horizontal coordinate of the framework is theme attention (TA), and its calculation formula is shown in equation (30): 30 $T A = \frac{T_{K, t}}{\sum_{i = 1}^{N} P_{i, t}}$

Where, T_K,t is the number of documents for the K rd topic under t time and P_i,t is the number of documents under t time.

The vertical coordinate is emotional satisfaction (ES), which is calculated as in equation (31): 31 $E S = \frac{\sum_{i = 1}^{n_{K}} E_{P} (P_{i})}{T_{K, t}}$

Where E_p (P_i) is the positive sentiment value calculated by the BERT-LDA model. The horizontal and vertical coordinates of each segmented theme can be obtained through the calculation, so as to construct the “attention-satisfaction” analysis framework.

The “Attention-Satisfaction” analysis framework analyzes the combined topic attention-emotional satisfaction for each annual segmented theme from 2019 to 2022, and the distribution of each segmented theme is shown in Figure 7.

As can be seen from Figure 7, the first quadrant is a high concern and high satisfaction area, which contains four sub-topics: Topic4, Topic6, Topic8 and Topic9. It can be seen that the public’s concern and emotional satisfaction with topics related to “doctor’s benevolence” and “protection of doctors” are high, indicating that on the one hand, the public cares about the safety of doctors’ lives, and on the other hand, the public recognizes the hard work of doctors. For the improvement of doctor-patient relationship, we can try to make a breakthrough from “praising doctors” and “protecting doctors”, which can not only attract public attention, but also inspire netizens to improve their emotional satisfaction.

The second quadrant is the low-attention-high-satisfaction region, which includes Topics 1, 5, and 7. The results showed that the public’s emotional satisfaction with the topics related to “thank the doctor”, “Weibo forwarding” and “understanding and protection” was high, but the attention to this topic was low. To a certain extent, this indicates that the relevant departments as well as the public lack of understanding and appreciation for doctors. While the low concern-high satisfaction region can achieve more positive emotions for doctor-patient public opinion, it also reveals some factors that can be easily overlooked. These neglected factors are often the key to the doctor-patient relationship.

The third quadrant is the Low Attention-Low Satisfaction region, which includes a total of six sub-themes in Topic2, Topic10, Topic11, Topic12, Topic13, and Topic14. It can be seen that this region mainly involves fatal incidents in medical disputes, including patients’ surgical deaths and doctors’ deaths by hacking. Among them, the public’s emotional satisfaction with the topics of “killing doctors” and “calling for severe punishment” is the lowest among all the topics. This reflects that netizens on social media platforms tend to have much more negative than positive emotions when it comes to topics with low attention and low satisfaction, and that appropriate intervention by relevant government departments is needed to avoid large-scale outbreaks of doctor-patient public opinion.

The fourth quadrant is the high attention-low satisfaction region, which includes Topic3, Topic15, and Topic16.It can be seen that the two segmented topics of Topic3 and Topic15 are right next to each other on the axes, with a high level of topic attention and intermediate levels of affective satisfaction, while Topic16 belongs to the more pronounced category of high attention and low Topic16 belongs to the theme of high concern and low satisfaction. This reflects that the public is extremely concerned about the theme of “emphasizing medical ethics”, but the emotional attitude is inclined to be negative, reflecting the public’s dissatisfaction with the current requirements and status quo of doctors’ professionalism. Improving the cultivation and higher requirements for medical personnel from the aspect of medical ethics is an important way to improve the public’s emotional satisfaction.

4.4

Predictive Analysis of Doctor-Patient Public Opinion Sentiments

A deep understanding of the evolutionary law of public opinion can lay the foundation for early warning, prevention, and control of emergencies. In order to further clarify the future evolution of doctor-patient public opinion sentiment, this paper utilizes the AC-BiLSTM model to analyze the sentiment of medical review texts, so as to make scientific and reasonable foresight and prediction of the sentiment trend of doctor-patient public opinion. The obtained comparative analysis results of doctor-patient public opinion sentiment prediction are shown in Figure 8. Among them, (a)~(b) denote the results of positive sentiment prediction and negative sentiment prediction, respectively.

According to Figure 8, it can be seen that the doctor-patient public opinion sentiment prediction trend in the period of the second half of 2020, the positive sentiment and negative sentiment trend both rise sharply to reach the highest value. The rest of the period keeps fluctuating up and down within a certain range. This is closely related to the outbreak of the New Crown Pneumonia epidemic. Comparing Figure 8(a) and (b), it is found that in the prediction of the future trend of doctor-patient public opinion, the gap between negative sentiment and positive sentiment is gradually decreasing, which indicates to a certain extent that the doctor-patient relationship is approaching in the direction of reconciliation in the future. However, negative emotions are still higher than positive emotions, so we should not slacken our vigilance in the construction of a doctor-patient relationship and the prevention of doctor-patient public opinion.

5

Conclusion

In this paper, based on BiLSTM, CNN and AM models, AC-BiLSTM, a medical text sentiment analysis model, is constructed, which successfully realizes the integrated application of deep learning and semantic analysis techniques in sentiment interpretation, and empirically analyzes the model performance. The conclusions obtained are as follows: 1)

IMDB, SST-2 and Medical-Reviews datasets are chosen as the input text of the model, and it is found that the distribution of sample lengths on different datasets is different, so different model input ports need to be set up separately to help the model to understand the textual information, to reduce the complexity of the model, and to improve the model performance and training efficiency.

2)

The F1 values of AC-BiLSTM in the medical text sentiment classification task are 3.40%, 2.41% and 1.89% higher than those of LSTM, BiLSTM, and CNN-LSTM models, respectively, and the performance of sentiment classification is better than that of other models. During the model training process, compared with the other models, the change of Loss value of AC-BiLSTM is relatively smooth and gradually becomes smaller, and the accuracy of sentiment classification is higher than the other three models. As the number of iterations increases, the accuracy of AC-BiLSTM on the validation set gradually converges to 98%, while that of LSTM, BiLSTM and CNN-LSTM gradually converges to 91%, 93% and 96%, respectively. It indicates that AC-BiLSTM has the best effect on the validation set for medical text sentiment classification.

3)

For the improvement of doctor-patient relationship, we can make a breakthrough in “praising doctors”, “protecting doctors”, and cultivating and improving the medical ethics of medical personnel to improve public emotional satisfaction. At the same time, appropriate intervention by relevant government departments is needed to avoid large-scale outbreaks of doctor-patient public opinion.

4)

In the prediction of the future trend of doctor-patient public opinion, the gap between negative and positive emotions gradually decreases, indicating that the doctor-patient relationship is approaching towards reconciliation in the future. However, the negative sentiment is still higher than the positive sentiment, which indicates that we should not slack off in building doctor-patient relationships and preventing negative public opinion towards doctors.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

A Study on the Integrated Application of Deep Learning and Semantic Analysis Techniques in Sentiment Interpretation of Medical Texts

Chunjun Cheng

Shui Cao

Guangyan Tang

Fang Ma

Di Cui

Saggella Madhumitha

Publicado en línea: 19 mar 2025

Recibido: 22 oct 2024

Aceptado: 19 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0461

Palabras claveBiLSTM, CNN, Attention mechanism, Medical text, Semantic sentiment analysis

© 2025 Chunjun Cheng et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
BiLSTM, CNN, Attention mechanism, Medical text, Semantic sentiment analysis