Otwarty dostęp

Research on Medical Dialogue Generation of External Knowledge


Zacytuj

Introduction

With the COVID-19 epidemic in 2019, an increasing number of businesses have launched a “home office” model, and online consultation has also become a new trend in the medical industry of various countries. In order to solve the problem of surge in user consulting demand and the shortage of medical personnel who provide online diagnosis and consultation services, it is imminent to develop a smart medical dialogue system that can provide users with online diagnosis and treatment solutions. These “online doctors” can greatly reduce the burden on human doctors, reduce the chance of cross -infection among patients, and improve the efficiency of medical resource operations.

The current task -type dialogue generation method can be divided into two types: traditional methods and deep learning methods. The advantage of using the traditional method to build a dialogue system is that it is simple, but only the answers to similar responses in the template can only be found, and high -skilled talent teams provide different templates in different fields. To other areas[2]. In deep learning methods, the method of decoder -based method creates a precedent for using neural network methods in task -type dialogue. At the same time, language models and neural network methods are used. The generated sentences have diversity and reduced artificial characteristics.

With the proposal and development of several large scale pre-training structures such as Transformer and GPT, various variants in the field of task -type dialogue generation have also received widespread attention. This model can improve the ability of model modeling language through the pre -training mechanism to improve the model modeling language. , Cross -current - order attention mechanism can better process the structured MR (semantic information) input to achieve the best results at present. The filter mechanism is eliminated with medical knowledge with low historical compatibility with doctors and patients, enhance the selection of accurate knowledge in the reply, thereby improving the accuracy of the model, making it a more accurate and higher medical reference value.

RELATED work

In the task of establishing a task -oriented dialogue system, dialogue generation is an indispensable part of it. Many scholars are committed to studying various methods for dialogue generation. In 2015, we will be based on the decoder. The method is applied to the dialogue generation task, and a statistical language generator based on a combined cycle and convolutional neural network structure can be proposed. This structure can be trained on dialogue behavior and discourse. Essence Then in 2015, Wen et al. [4] referred to the research in the selection field, and proposed an Encoder-Decoder architecture that adapt to NLG based on a attention mechanism, which achieved a better effect than decoder-based methods.

In 2016, DUšEk et al. [5] explored the advantages and disadvantages of the two methods to generate sentence planning trees and directly generate natural languages with the SEQ2SEQ method. In the same year, Dukek et al. [6] introduced information about language models from the previous model. In the case of maintaining the overall framework of the model, they improved the input and spliced the user’s words in front of the input MR (semantic information) three yuan group, as the front of the front; and the newly added one above Coder, coding user discourse alone.

In 2017, Van-Khanh Tran et al. [7] It proposed the encoder-polymer-decoder model based on the extension of the encoder-decoder architecture of recursive neural network.

In 2018, Wei Z, LiU Q et al. [8] proposed a dialogue system for automatic diagnosis, constructed the Medical DS dataset, and proposed a framework of the dialogue system based on strengthening learning (RL). So as to improve the accuracy of automatic diagnosis.

In 2019, SHANG-Yu Su et al.[9] proposed a new type of language understanding and generating learning framework based on dual supervision and learning, providing a method that uses puppets. Experiments show that this method improves significantly improvement can be significantly improved. Language understanding and generating learning performance.

In 2020, PENG et al. [10] proposed the SC-GPT model to enter the MR tank into the GPT pre-training model, and directly obtain the results using the sequence to sequence method.

In 2021, Yangming Li et al. [12] proposed a new heterogeneous rendering (HRM) framework that explains how the nerve generator render the input dialogue (DA) as discourse. For each generation step, the mode switch is concentrated from the renderer to select the appropriate decoder to generate items (words or phrases). This model can well explain the rendering process of the nerve generator.

On the basis of predecessors’ research and deep learning development, although the natural language generation (NLG) mission in the dialogue system has achieved good performance, it is still found that if only dialogue data is used to train the model during the dialogue generation process, the model is found to train the model for training. It is easy to produce such security responses such as “good” and “can”. This type of reply has no reference value to the therapy of physician. Thus, to solve these challenges, this article adds and has added and added to predecessors’ research. The symptoms described by patients have high-fitting medical knowledge, followed by effectively incorporating knowledge and tracking the patient’s speech, testing the doctor’s behavior entity, and eventually generated a reply through professional medical knowledge, patient’s condition and doctor behavior to help Patients provide diagnosis and treatment opinions and suggestions in effective ways.

Research methods
Task definition

Medical dialogue generation tasks are to help doctors provide diagnosis and treatment suggestions in the case of consultation. This needs to be based on the history of the doctor and patient dialogue and related medical knowledge, to generate the same text reply to the user’s and conform to the context and meet the medical principles. This article uses the dialogue history H = {{h1, r1},{h2,r2},…,{hi,ri},…,{hn,rn}} and medical knowledge K = {k1, k2,…, ki,…,kn} between patients and doctors. Among them, {hi, ri} the history of the dialogue between doctors and patients, ki is a collection of medical knowledge of the relevant dialogue history Each of these knowledge ki contains head entities h, relationships r, and tail entities t that have some relationship with head entities. In the form of ternary groups, the model needs to generate a reply Y according to the history of dialogue H and related medical knowledge K Among them, Y = {y1, y2,…, yi,…, yn}, yi it is the second in the reply. The goal of this article is to maximize the genetic reply Y, so that the response is accurate and referenceable.

Model overview

In this paper, a medical dialogue generation model that integrates external knowledge is proposed, as shown in Figure 1. The model contains: doctor-patient dialogue coding, patient status tracking, doctor behavior detection, medical knowledge distillation, and reply generation. The model needs to obtain a vector with the context information of the doctor-patient dialogue by coding the doctor-patient dialogue to track the patient’s condition status during the dialogue, and then predict the doctor’s next round of actions, and finally fuse the medical knowledge extracted by the medical knowledge distillation module to produce relevant replies.

Figure 1.

Structure diagram of knowledge-based medical dialogue generation

Doctor-patient dialogue coding module

For the doctor-patient dialogue coding module, this paper first uses the bidirectional gate circulation unit (BiGRU) to encode the doctor-patient dialogue history and convert it into a vector expression H={{ h1,r1 },{ h2,r2 },{ hi,ri },{ hn,rn } } with doctor-patient context information.

The BiGRU neural network can greatly reduce the amount of calculation and complexity of the model. The input representation of its doctor-patient dialogue coding module is shown in Figure 2.

Figure 2.

Input representation of doctor-patient dialogue coding module

After using BiGRU to encode the doctor-patient dialogue, a vector with the context information of the doctor-patient dialogue is obtained, where H is the doctor-patient dialogue history. Formally, the output of the encoder is shown in Equation (1): Eh=BiGRU(H)

Patient status tracking module

The goal of the patient status tracking module is mainly to track the patient’s condition status in the doctor-patient dialogue history, use tracking to the patient’s state S to predict the doctor’s next round of behavior B, and finally produce a responseY that is in line with the context of the doctor-patient dialogue. Therefore, a state tracker composed of variational autoencoder is added to the patient status tracking module to indirectly learn the posterior distribution p(y|x) of the data to obtain the probability distribution of the patient state.

The learning process first learns the joint probability distribution p(x, y) from the dialogue data, and then reverses the posterior distribution by solving the prior probability and categorical conditional probability of the patient’s state, and the posterior probability obtained at this time is the probability of the patient’s state tracking distribution at time t as sought in this module. The structure of the model is shown in Figure 3.

Figure 3.

(Variational autoencoder) Generates a model structure diagram

In each round of medical conversation that tracks the patient’s status, the status tracker first samples a previous round of patient status st1q from the approximate posterior distribution qϕs(st–1) of the state, and then enters the previous round of patient status st1q into BiGRU to obtain the current round of patient status stq, that is, the prior patient status stq of the T round is combined from the patient status st1q of the T-1 round, the doctor’s response Rt–1, and the patient description Ut of the T round. The formula is shown in Equation (2). pθs(St)=pθs(St|St1,Rt1,Ut)

Specifically, after obtaining the prior patient state pθs(St) at time t, in the decoding stage, the model takes the doctor’s behavior state bt,0SP=Wsp[htc;ht1Sq] predicted at the very beginning moment (i.e., 0 moment) as the initial hidden state of the decoder (Wsp in which a learnable parameter, [·; ·] is represented by a cascade operation of the vector), at the decoding moment of the ith word, the decoder decodes serially St, receives the encoding vector et,i1SP of the word of the previous moment as input, outputs the doctor’s behavior state bt,iSP at the i moment, and maps it to the patient’s state bt,iSP space. This article fixes the length of the patient state St to |S|, as shown in Equation (3). pθs(St)=i=1|S|softmax(MLP(bt,iSP))

Where MLP represents a multilayer perceptron (MLP), the state tracker is inferred to be similar to the above process, but additionally combines a vector representation Rt (i.e.,) as input. The BiGRU decoder is used bt,0Sq=Wsq[ htc;ht1Sq;htR ] to initialize, where Wsq is a learnable parameter, and at the ith decoding moment, output the predicted doctor’s actions bt,iSq in the t round. Among them, the approximate posterior distribution is formulated as shown in Equation (4). qϕs(St)=i=1|S|softmax(MLP(bt,iSq))

Doctor behavior detection module

The doctor behavior detection module is designed to predict the entity that the doctor will reply to in the next conversation, this paper builds a doctor behavior classifier in this module, which mainly divides the doctor behavior into four types such as asking symptoms, diagnosis, prescribing drugs, small talk, etc., doctors should judge whether there is a shift between behavioral states according to the conversation history, that is, to determine whether the next sentence should continue to ask the patient’s symptoms or give diagnostic suggestions, etc., the doctor behavior detection process is shown in Figure 4.

Figure 4.

Flow chart of doctor behavior detection

Specifically, in this module, similar to the Patient Status Tracking Module, this paper models a priori distribution and an approximate posterior distribution of physician actions, and its prior strategy network and posterior strategy network are also based on encoder-decoder structures, and define the representation of physician behavior At at moment t as a physician action category Atc and a series of explicit keywords Atk, At={ Atc,Atk }, where Atk the length is fixed |A|.

For a priori strategy network, in the encoding stage, this paper first uses a BiGRU encoder to encode the patient state Stp at time t, and outputs a hidden vector htSp. In the decoding and reply stage, an action classifier Atc is designed to infer the doctor’s action category, and the hidden layer vector htc at time t is used to perform attention operations qt on the obtained doctor’s behavior, and an attention vector qt about the doctor’s behavior is calculated. Next, the action classifier combines the obtained attention vector to divide the physician’s actions into four categories, namely, asking about symptoms, diagnosis, prescribing medication, and gossiping, and the doctor’s prior behavior is formulaically described as shown in Equation (5). Pθa,c(Atc)=softmax(Wcp[htSp;htc;qt])

Learners are described Wcp, and the model samples the action categories Atc,p from the prior behavior Pθa,c(Atc) of the doctor.

Atk are serialized by a BiGRU decoder, and at each decoding step, the context inference detector maps the output vector of the BiGRU decoder into the action space. The decoder is initialized by a doctor’s behavior bt,0Ak,p=Wkp[ htSp;htc;etAc,p ] at 0 moment, where etAc,p is the embedded representation Atc,p. At the ith decoding step, the decoder outputs bt,iAk,p. The contextual inference detector is based on the doctor’s behavior output vector bt,iAk,p at i-moment, co-infers the action vector Ai,ik of the doctor at i-moment, learns from the original doctor-patient dialogue history and the detected patient state, and infers its prior distribution Ai,ik as shown in Equation (6) through a multilayer perceptron MLP. Pθa,c(At,ik)=1zAexp(MLP([ htSp;htc;etAc,p;bt,iAk,p ]))

Finally, the prior distribution of the doctor’s behavior At is calculated as shown in Equation (7). Pθa(At)=pθa,ci=1|A|[ pθa,d(At,ik)+pθa,g(At,ik) ]

The inference strategy network approximates the posterior distribution of action categories by extracting key information with directivity from previous doctors’ responses Rt. Use a BiGRU encoder Rt to encode as a hidden vector htR and a hidden vector Stq respectively htSq. Then obtain an approximate posterior distribution of the physician’s action class at time t, as shown in Equation (8). qθa,c(Atc)=softmax(Wcq[ htc;htSq;htR ])

Finally, the model extracts the doctor’s action vector to the T moment by approximating the posterior distribution qθa,c(Atc) of the doctor’s action category Atc,q at time t. In order to enhance the influence of the information in the doctor’s reply Rt at moment t, this paper inference policy network designs a posterior distribution Atk of contextual inference detector de-approximation. The decoder is represented by bt,0Ak,q=Wkq[ htc;htSq;etc,q;htR ] initialization, where the word embed din e representation representing the doctor’s behavior A. Wkq is a matrix of learnable parameters. In the i-th decoding step, the decoder outputs a vector representation bt,iAk,p, so the approximate posterior distribution of the i-th action keyword can be obtained in this paper, as shown in Equation (9). qϕa,d(At,ik)=softmax(MLP([ htc;htSq;etAc,q;bt,iAk,q ]))

Finally, the approximate posterior distribution of the doctor’s behavior detection at the t moment is obtained by the doctor action category and the ith action keyword of the t moment, and the formula is shown in Equation (10). qϕa(At)=qϕa,c(Atc)i=1|A|qϕa,d(At,ik)

Medical Knowledge Distillation Module

The most important thing in the dialogue system that integrates knowledge is to screen the appropriate knowledge as the input of the dialogue system, but due to the large scale of external knowledge and many data types, all possible knowledge as input may lead to more noise and high computational intensity, so this paper adds a noise filtering mechanism in the medical knowledge distillation module to select better knowledge and generate a more accurate response.

Specifically, the medical knowledge distillation module of this paper first performs a global pooling operation on the encoded vector with dialogue history context information and the correct medical background knowledge vector information obtained to obtain the corresponding scalar representation sum {Sk1, Sk2,…, Skn }, which uses the doctor-patient dialogue context vector and each dimension of information with medical background knowledge vector, so that the scalar obtained can represent the semantic information represented by the vector to a certain extent. Moreover, after the vector is converted to the corresponding scalar, the complexity of the feature representation is greatly reduced, and the complexity of the model calculation is greatly reduced.

Secondly, in order to allow simultaneous interaction between knowledge and knowledge and between knowledge and the historical context of doctor-patient dialogue, this paper stitches each piece of knowledge and the historical context of doctor-patient dialogue and performs a two-layer full connection operation to obtain the weight r = {r1, r2,…, rN+1} assigned to each piece of knowledge and the historical context of doctor-patient dialogue, and the result of summing the weights of all knowledge is used k' as the representation of the selected knowledge. After global pooling and Layer 2 full connection operation, the model can learn more interactive information.

Where the weights can be expressed as shown in Equation (11): r=σ(W2δ(W1(Sk1Sk2,,SkNSx))

Where W1 and W2 is the learnable parameter, δ is the ReLU activation function, σ is the sigmoid activation function.

The calculation method of k' is shown in (12) (13). kr'=riki(i=1,2,,N) k'=i=1Nki'

Where ki' is the weight corresponding to each piece of knowledge, and the output k' of the knowledge distillation module is and will be used to participate in the generation of replies in the reply generation module.

Reply to the build module

The reply generation module is composed of the patient status obtained by the patient status tracking module, the next doctor behavior predicted by the doctor behavior detection module, and the knowledge that is highly consistent with the doctor-patient dialogue context screened out by the knowledge distillation module, and jointly generate a response through the reply generation module to give doctors a reference to the diagnosis and treatment response.

In the first stage of reply generation, the model uses a BiGRU encoder to encode the patient state Stq at t moment as an embeddable vector Stq, followed by a word-level encoding matrix of the patient state Stq, in which Stq each row is represented as an embedding vector corresponding to a word. Similarly, physician behavior Atk,q is encoded as an embeddable vector Atk,q. Then the hidden vectors of the sum are calculated separately Stq and Atk,q expressed as sum, and the reply decoder is based on a BiGRU unit in the decoding bt,0R=Wd[ htc;htSq;etAc,q;htAk,q ] is initialized, where there Wd is a parameter matrix. In the i-th decoding step, the output bi,t1R of the decoder obtains a vector bt,ih for the context representation Ht of the attention operation. At the same time, attention manipulation is performed for the patient’s state Stq at time t and the doctor’s behavior Atk,q at moment t, and the hidden vector sum after attention operation is obtained, respectively. The relevant hidden vector and embedding vector [bt,ih;bt,is;bt,ia;et,i1R] are then fed into the BiGRU unit of the decoder and output the i word bt,iR in the i-moment, which et,i1R is demonstrated. The probability Rt,i of the model generating a reply is the sum of the generation probability and the probability of copying knowledge, which is formulated as shown in equation (14)(15) and (16). pθg(Rt,i)=pθgg(Rt,i)+pθgc(Rt,i) pθgg(Rt,i)=1zRexp(MLP(bt,iR)) pθgc(Rt,i)=1zRj:Wj=Rt,iexp(hjWTbt,iR)

The generation probability is represented pθgg(Rt,i) and the probability of copying knowledge is represented by pθgc(Rt,i). zR is a regular item shared with pθgc(Rt,i). This article will write the sequence of Rt–1, Ut, Atk,q and Stq to concatenate W, Wj to represent the j word in W.

Experimental results and analysis
Datasets

To validate the essentiality of the essay method, this paper uses large-scale medical datasets KaMed and a large-scale COVID-19 Chinese dialogue dataset proposed by Zeng et al. [10] in 2020. Among them, COVID-19 contains more than 1,088 conversations, covering common COVID questions and answers, and providing granular entity-level annotations; KaMed includes more departments and a wider variety of diseases, including more than 17K medical conversations and 5682 entities. The dataset used in this article is shown in Table 1.

Dataset statistics

Datasets Dialog Utterances Tokens Knowledge
KaMed 17864 153,000 6,663,272 Y
COVID-19 1088 9494 406,550 N
Experiment setup

The experimental hardware environment in this article uses Intel I9-10900KCPU, Nvidia Geforce 2080Ti*2, 64G RAM, and SSD512G. The software environment uses the Windows 10 system, and at the same time uses Pycharm, Visual Studio Code, Pytorch, Neo4j, etc. for development, and uses TensorBoard for dialogue display.

Evaluation indicators

In order to evaluate the linguistic quality of the responses generated in this paper, the metrics BLEU@N, Distinct@N, and Perplexity confusion level (PPL) are used to evaluate the model proposed in this paper.

Experimental analysis

To validate whether the model proposed in this paper is better than the previous model, this article uses the Seq2Seq end-to-end model with attention mechanism, the HRED model and the model in this paper are experimented with KaMed, COVID-19 two datasets, first of all, in the experimental process, and the confusion matrix is used to summarize the generated results. The visual attention score obtained by different models obtained by the experiment can be seen: different models have different maximum attention scores for the context, the basic Seq2Seq model, for “cold” and “medicine” such entities are not high, and HRED and VHRED compared with the Seq2Seq model, the attention score of such entities has increased a certain accuracy, and finally the model that integrates knowledge in this paper has the most accurate attention score for the conversation context, which is for “cold” and “taking medicine” The high attention scores of such entities indicate that the method presented in this paper can track and speculate on medical entities. The results of medical dialogue generation under different datasets of different models are shown in Table 2 below.

Experimental results

Datasets Model B@1 B@2 D@1 D@2 Perplexity
KaMed Seq2Seq 2.71 1.58 1.24 6.85 24.82
HRED 2.59 1.59 1.17 6.65 27.14
VHRED 2.49 1.55 1.15 6.42 28.65
Ours 2.79 1.89 1.58 8.56 22.91
COVID-19 Seq2Seq 3.13 5.70 5.5 29.0 53.3
HRED 2.56 5.73 5.21 32.39 49.6
VHRED 3.31 5.65 5.65 34.56 47.2
Ours 3.57 5.90 5.89 31.21 40.8

Experiments show that the proposed method performs well on two different datasets. Compared with the baseline Seq2Seq method, the Perplexity value of the proposed method is reduced by 1.91 compared with the baseline Seq2Seq method, and the smaller the value, the higher the accuracy of the proposed method. Compared with the VHRED model, the B@1 value of this method increased by 0.3, the B@2 value increased by 0.34, and the D@2 value increased by 2.14, all of which indicate that this method is better and still the same on the dataset COVID-19. Figure 5 shows the following example of the conversation under test of the model in this document. Experiments show that the proposed method.

Figure 5.

Example of model generation

Model generation sample For example, when the patient proposes diarrhea, nasal congestion, runny nose and other symptoms and asks what to do, the doctor needs to look for knowledge related to the patient’s disease in the knowledge triad through the patient’s description of the disease, see which conditions the patient’s symptoms are related to, further inquire about the patient’s condition, and finally give a reply that the doctor can refer to.

Conclusion and outlook

In order to solve the problem of surging demand for user consultation and shortage of medical personnel providing online diagnosis and treatment consulting services in the field of intelligent healthcare, this paper constructs a knowledge-based medical dialogue generation model, although the traditional template-based method is simple, but it requires high-skilled talents to write different templates in different fields, the labor maintenance cost is high, and the dialogue system built by slot filling patients can only find the existing answer set, can not be expanded, and the user utilization efficiency is low.

Therefore, this paper proposes a medical conversation generation model that adds external knowledge, and under the premise of using pre-trained models to enhance the scalability and portability of the proposed model, this paper pioneeringly combines external knowledge to improve the medical response generation task by tracking the patient’s status and detecting the doctor’s behavior. A large number of experiments on KaMed and COVID-19 show that the medical dialogue generation model based on this paper is superior to most baseline models in BLEU value, Discrete value and Perplexity value, which verifies the effectiveness of this model.

In the next study, it is planned to integrate patients’ emotional variables into the dialogue generation model to enable the model to better understand the sentences expressed by the patient’s spoken expression, hoping to generate more personalized and targeted responses and further improve the performance of the model in generating responses.

eISSN:
2470-8038
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Computer Sciences, other