With the COVID-19 epidemic in 2019, an increasing number of businesses have launched a “home office” model, and online consultation has also become a new trend in the medical industry of various countries. In order to solve the problem of surge in user consulting demand and the shortage of medical personnel who provide online diagnosis and consultation services, it is imminent to develop a smart medical dialogue system that can provide users with online diagnosis and treatment solutions. These “online doctors” can greatly reduce the burden on human doctors, reduce the chance of cross -infection among patients, and improve the efficiency of medical resource operations.
The current task -type dialogue generation method can be divided into two types: traditional methods and deep learning methods. The advantage of using the traditional method to build a dialogue system is that it is simple, but only the answers to similar responses in the template can only be found, and high -skilled talent teams provide different templates in different fields. To other areas[2]. In deep learning methods, the method of decoder -based method creates a precedent for using neural network methods in task -type dialogue. At the same time, language models and neural network methods are used. The generated sentences have diversity and reduced artificial characteristics.
With the proposal and development of several large scale pre-training structures such as Transformer and GPT, various variants in the field of task -type dialogue generation have also received widespread attention. This model can improve the ability of model modeling language through the pre -training mechanism to improve the model modeling language. , Cross -current - order attention mechanism can better process the structured MR (semantic information) input to achieve the best results at present. The filter mechanism is eliminated with medical knowledge with low historical compatibility with doctors and patients, enhance the selection of accurate knowledge in the reply, thereby improving the accuracy of the model, making it a more accurate and higher medical reference value.
In the task of establishing a task -oriented dialogue system, dialogue generation is an indispensable part of it. Many scholars are committed to studying various methods for dialogue generation. In 2015, we will be based on the decoder. The method is applied to the dialogue generation task, and a statistical language generator based on a combined cycle and convolutional neural network structure can be proposed. This structure can be trained on dialogue behavior and discourse. Essence Then in 2015, Wen et al. [4] referred to the research in the selection field, and proposed an Encoder-Decoder architecture that adapt to NLG based on a attention mechanism, which achieved a better effect than decoder-based methods.
In 2016, DUšEk et al. [5] explored the advantages and disadvantages of the two methods to generate sentence planning trees and directly generate natural languages with the SEQ2SEQ method. In the same year, Dukek et al. [6] introduced information about language models from the previous model. In the case of maintaining the overall framework of the model, they improved the input and spliced the user’s words in front of the input MR (semantic information) three yuan group, as the front of the front; and the newly added one above Coder, coding user discourse alone.
In 2017, Van-Khanh Tran et al. [7] It proposed the encoder-polymer-decoder model based on the extension of the encoder-decoder architecture of recursive neural network.
In 2018, Wei Z, LiU Q et al. [8] proposed a dialogue system for automatic diagnosis, constructed the Medical DS dataset, and proposed a framework of the dialogue system based on strengthening learning (RL). So as to improve the accuracy of automatic diagnosis.
In 2019, SHANG-Yu Su et al.[9] proposed a new type of language understanding and generating learning framework based on dual supervision and learning, providing a method that uses puppets. Experiments show that this method improves significantly improvement can be significantly improved. Language understanding and generating learning performance.
In 2020, PENG et al. [10] proposed the SC-GPT model to enter the MR tank into the GPT pre-training model, and directly obtain the results using the sequence to sequence method.
In 2021, Yangming Li et al. [12] proposed a new heterogeneous rendering (HRM) framework that explains how the nerve generator render the input dialogue (DA) as discourse. For each generation step, the mode switch is concentrated from the renderer to select the appropriate decoder to generate items (words or phrases). This model can well explain the rendering process of the nerve generator.
On the basis of predecessors’ research and deep learning development, although the natural language generation (NLG) mission in the dialogue system has achieved good performance, it is still found that if only dialogue data is used to train the model during the dialogue generation process, the model is found to train the model for training. It is easy to produce such security responses such as “good” and “can”. This type of reply has no reference value to the therapy of physician. Thus, to solve these challenges, this article adds and has added and added to predecessors’ research. The symptoms described by patients have high-fitting medical knowledge, followed by effectively incorporating knowledge and tracking the patient’s speech, testing the doctor’s behavior entity, and eventually generated a reply through professional medical knowledge, patient’s condition and doctor behavior to help Patients provide diagnosis and treatment opinions and suggestions in effective ways.
Medical dialogue generation tasks are to help doctors provide diagnosis and treatment suggestions in the case of consultation. This needs to be based on the history of the doctor and patient dialogue and related medical knowledge, to generate the same text reply to the user’s and conform to the context and meet the medical principles. This article uses the dialogue history
In this paper, a medical dialogue generation model that integrates external knowledge is proposed, as shown in Figure 1. The model contains: doctor-patient dialogue coding, patient status tracking, doctor behavior detection, medical knowledge distillation, and reply generation. The model needs to obtain a vector with the context information of the doctor-patient dialogue by coding the doctor-patient dialogue to track the patient’s condition status during the dialogue, and then predict the doctor’s next round of actions, and finally fuse the medical knowledge extracted by the medical knowledge distillation module to produce relevant replies.
Structure diagram of knowledge-based medical dialogue generation
For the doctor-patient dialogue coding module, this paper first uses the bidirectional gate circulation unit (BiGRU) to encode the doctor-patient dialogue history and convert it into a vector expression
The BiGRU neural network can greatly reduce the amount of calculation and complexity of the model. The input representation of its doctor-patient dialogue coding module is shown in Figure 2.
Input representation of doctor-patient dialogue coding module
After using BiGRU to encode the doctor-patient dialogue, a vector with the context information of the doctor-patient dialogue is obtained, where H is the doctor-patient dialogue history. Formally, the output of the encoder is shown in Equation (1):
The goal of the patient status tracking module is mainly to track the patient’s condition status in the doctor-patient dialogue history, use tracking to the patient’s state
The learning process first learns the joint probability distribution
(Variational autoencoder) Generates a model structure diagram
In each round of medical conversation that tracks the patient’s status, the status tracker first samples a previous round of patient status
Specifically, after obtaining the prior patient state
Where MLP represents a multilayer perceptron (MLP), the state tracker is inferred to be similar to the above process, but additionally combines a vector representation
The doctor behavior detection module is designed to predict the entity that the doctor will reply to in the next conversation, this paper builds a doctor behavior classifier in this module, which mainly divides the doctor behavior into four types such as asking symptoms, diagnosis, prescribing drugs, small talk, etc., doctors should judge whether there is a shift between behavioral states according to the conversation history, that is, to determine whether the next sentence should continue to ask the patient’s symptoms or give diagnostic suggestions, etc., the doctor behavior detection process is shown in Figure 4.
Flow chart of doctor behavior detection
Specifically, in this module, similar to the Patient Status Tracking Module, this paper models a priori distribution and an approximate posterior distribution of physician actions, and its prior strategy network and posterior strategy network are also based on encoder-decoder structures, and define the representation of physician behavior
For a priori strategy network, in the encoding stage, this paper first uses a BiGRU encoder to encode the patient state
Learners are described
Finally, the prior distribution of the doctor’s behavior
The inference strategy network approximates the posterior distribution of action categories by extracting key information with directivity from previous doctors’ responses
Finally, the model extracts the doctor’s action vector to the T moment by approximating the posterior distribution
Finally, the approximate posterior distribution of the doctor’s behavior detection at the t moment is obtained by the doctor action category and the ith action keyword of the t moment, and the formula is shown in Equation (10).
The most important thing in the dialogue system that integrates knowledge is to screen the appropriate knowledge as the input of the dialogue system, but due to the large scale of external knowledge and many data types, all possible knowledge as input may lead to more noise and high computational intensity, so this paper adds a noise filtering mechanism in the medical knowledge distillation module to select better knowledge and generate a more accurate response.
Specifically, the medical knowledge distillation module of this paper first performs a global pooling operation on the encoded vector with dialogue history context information and the correct medical background knowledge vector information obtained to obtain the corresponding scalar representation sum {
Secondly, in order to allow simultaneous interaction between knowledge and knowledge and between knowledge and the historical context of doctor-patient dialogue, this paper stitches each piece of knowledge and the historical context of doctor-patient dialogue and performs a two-layer full connection operation to obtain the weight
Where the weights can be expressed as shown in Equation (11):
Where
The calculation method of
Where
The reply generation module is composed of the patient status obtained by the patient status tracking module, the next doctor behavior predicted by the doctor behavior detection module, and the knowledge that is highly consistent with the doctor-patient dialogue context screened out by the knowledge distillation module, and jointly generate a response through the reply generation module to give doctors a reference to the diagnosis and treatment response.
In the first stage of reply generation, the model uses a BiGRU encoder to encode the patient state
The generation probability is represented
To validate the essentiality of the essay method, this paper uses large-scale medical datasets KaMed and a large-scale COVID-19 Chinese dialogue dataset proposed by Zeng et al. [10] in 2020. Among them, COVID-19 contains more than 1,088 conversations, covering common COVID questions and answers, and providing granular entity-level annotations; KaMed includes more departments and a wider variety of diseases, including more than 17K medical conversations and 5682 entities. The dataset used in this article is shown in Table 1.
Dataset statistics
Datasets | Dialog | Utterances | Tokens | Knowledge |
---|---|---|---|---|
KaMed | 17864 | 153,000 | 6,663,272 | Y |
COVID-19 | 1088 | 9494 | 406,550 | N |
The experimental hardware environment in this article uses Intel I9-10900KCPU, Nvidia Geforce 2080Ti*2, 64G RAM, and SSD512G. The software environment uses the Windows 10 system, and at the same time uses Pycharm, Visual Studio Code, Pytorch, Neo4j, etc. for development, and uses TensorBoard for dialogue display.
In order to evaluate the linguistic quality of the responses generated in this paper, the metrics BLEU@N, Distinct@N, and Perplexity confusion level (PPL) are used to evaluate the model proposed in this paper.
To validate whether the model proposed in this paper is better than the previous model, this article uses the Seq2Seq end-to-end model with attention mechanism, the HRED model and the model in this paper are experimented with KaMed, COVID-19 two datasets, first of all, in the experimental process, and the confusion matrix is used to summarize the generated results. The visual attention score obtained by different models obtained by the experiment can be seen: different models have different maximum attention scores for the context, the basic Seq2Seq model, for “cold” and “medicine” such entities are not high, and HRED and VHRED compared with the Seq2Seq model, the attention score of such entities has increased a certain accuracy, and finally the model that integrates knowledge in this paper has the most accurate attention score for the conversation context, which is for “cold” and “taking medicine” The high attention scores of such entities indicate that the method presented in this paper can track and speculate on medical entities. The results of medical dialogue generation under different datasets of different models are shown in Table 2 below.
Experimental results
Datasets | Model | B@1 | B@2 | D@1 | D@2 | Perplexity |
---|---|---|---|---|---|---|
Seq2Seq | 2.71 | 1.58 | 1.24 | 6.85 | 24.82 | |
HRED | 2.59 | 1.59 | 1.17 | 6.65 | 27.14 | |
VHRED | 2.49 | 1.55 | 1.15 | 6.42 | 28.65 | |
Ours | 2.79 | 1.89 | 1.58 | 8.56 | 22.91 | |
Seq2Seq | 3.13 | 5.70 | 5.5 | 29.0 | 53.3 | |
HRED | 2.56 | 5.73 | 5.21 | 32.39 | 49.6 | |
VHRED | 3.31 | 5.65 | 5.65 | 34.56 | 47.2 | |
Ours | 3.57 | 5.90 | 5.89 | 31.21 | 40.8 |
Experiments show that the proposed method performs well on two different datasets. Compared with the baseline Seq2Seq method, the Perplexity value of the proposed method is reduced by 1.91 compared with the baseline Seq2Seq method, and the smaller the value, the higher the accuracy of the proposed method. Compared with the VHRED model, the B@1 value of this method increased by 0.3, the B@2 value increased by 0.34, and the D@2 value increased by 2.14, all of which indicate that this method is better and still the same on the dataset COVID-19. Figure 5 shows the following example of the conversation under test of the model in this document. Experiments show that the proposed method.
Example of model generation
Model generation sample For example, when the patient proposes diarrhea, nasal congestion, runny nose and other symptoms and asks what to do, the doctor needs to look for knowledge related to the patient’s disease in the knowledge triad through the patient’s description of the disease, see which conditions the patient’s symptoms are related to, further inquire about the patient’s condition, and finally give a reply that the doctor can refer to.
In order to solve the problem of surging demand for user consultation and shortage of medical personnel providing online diagnosis and treatment consulting services in the field of intelligent healthcare, this paper constructs a knowledge-based medical dialogue generation model, although the traditional template-based method is simple, but it requires high-skilled talents to write different templates in different fields, the labor maintenance cost is high, and the dialogue system built by slot filling patients can only find the existing answer set, can not be expanded, and the user utilization efficiency is low.
Therefore, this paper proposes a medical conversation generation model that adds external knowledge, and under the premise of using pre-trained models to enhance the scalability and portability of the proposed model, this paper pioneeringly combines external knowledge to improve the medical response generation task by tracking the patient’s status and detecting the doctor’s behavior. A large number of experiments on KaMed and COVID-19 show that the medical dialogue generation model based on this paper is superior to most baseline models in BLEU value, Discrete value and Perplexity value, which verifies the effectiveness of this model.
In the next study, it is planned to integrate patients’ emotional variables into the dialogue generation model to enable the model to better understand the sentences expressed by the patient’s spoken expression, hoping to generate more personalized and targeted responses and further improve the performance of the model in generating responses.