Named Entity Recognition refers to the process of extracting specific words from natural language corpus. Nominated entity recognition tasks also have some problems in entity labeling. Traditional model predictions often use BIO [1] or BIOES labeling methods, which define both entity location and entity category. This means that the prediction of each word by traditional named entity recognition models requires that the categories of named entities and entities be combined. This prediction method has a large problem. If any of the named entity subtags or entity category subtags within an entity predicts errors, the entire entity predicts errors, which can easily lead to the accumulation of errors. Currently, researchers have separated traditional entity recognition tasks into named entity prediction tasks and entity classification modules. Zheng [2] et al. proposed a named entity recognition algorithm based on Multitask learning. Multitask learning is an integrated learning method [3] [4], which improves multiple tasks by training several tasks at the same time. Based on BiLSTM model [5] [6], the traditional named entity recognition model is divided into two modules: named entity prediction task and entity category classification task. It also uses multitask learning to train, which achieves good results, but there are problems that context information other than named entities is not fully used and comprehensive information of entity categories is not introduced. Ding Yi qi [7] et al. proposed a Chinese named entity perception neural network model based on Multitask learning. The traditional named entity recognition task is divided into named entity perception task and entity classification task, and the loss function is optimized to better identify Chinese entities. However, it has the problem of identifying the beginning and end of an entity with two modules, which leads to inconsistency in training and prediction, and lack of organization in the representation of entity categories. This paper proposes a MTL-NER model (a named entity recognition model based on multi-task learning and cascading pointer network) and an entity labeling method based on How Net [8][9][10] semantics. The traditional named entity recognition task is decomposed into global named entity perception and entity classification. The Recognition calculation method based on entity comprehensive description is introduced into How Net knowledge base to classify entities and improve the recognition effect of the overall model. At the same time, entity category description statements are optimized for specific domains to improve the accuracy of domain entity recognition.
The model structure proposed in this paper is shown in Figure 1 below, the overall model is mainly composed of five layers: data preprocessing layer, shared feature extraction layer, multi task learning layer and output layer. In the data preprocessing stage, a semantic entity annotation method based on how net is proposed, which annotates named entities and categories, and constructs a sample set in sentence units. Combined with how net knowledge base, the comprehensive description of domain entity category is constructed as the input data of the model. In the model construction, the task of entity recognition and entity classification is based on the shared feature extraction layer for text vectorization and feature extraction. The sentence is encoded through the shared feature layer to obtain the feature vector of the sample, and then the domain named entity prediction is carried out.
MLT-NER structure diagram
The information of entity classification task comes from the natural language comprehensive description of sample set and entity category, and the same feature extraction layer is used to obtain the feature vector. In the multi task learning layer, the corresponding result vectors are generated from the input feature vectors according to the different tasks. Get the prediction probability results of each task. Finally, the output layer fuses the results of the two tasks, completes the determination of named entities and the classification of entity categories, and obtains the named entity recognition results.
In order to reduce the cumulative impact of traditional entity labeling errors, this paper proposes a new label based on How Net semantics, which labels entity categories and domain entities respectively. In How Net, words are composed of one or more semantics, and each semantics is composed of smaller semantic units (semantics) and dozens of dynamic roles. Figure 2 below is an example:
Words and meanings in How Net
The word “green” has two meanings, meaning category 1 is color, and meaning category 2 is environmental protection. Then construct labels to classify by meanings. For the input sequence
Combined with the definition of entity in How Net knowledge base, the meaning item is regarded as the category label of domain entity to improve the accuracy of entity class determination.
Combined with the semantic information of How Net knowledge base, as shown in Figure 4 below, it includes Chinese and English words, semaphores and d DEF_CONCEPT (combination of semantic and dynamic roles) and attributes; Relationships mainly include dynamic roles, hierarchical relationships, and other domain entity category description statements, which are supplemented by Wikipedia and the specific situation of the corpus.
The introduction of the comprehensive description information of the entity category assists the entity classification task, and improves the domain pertinence and classification accuracy of the model. Through the introduction of the entity category description, the ability of the model to obtain the domain information can be improved, so as to strengthen the pertinence of the specific domain. The example of the description statement constructed in this paper is shown in Table 1 below.
Example of comprehensive description of domain entity categories
color | Yellow green blue, environment-friendly color, hue, lightness, saturation and various phenomena of light |
Environmental-friendly | Characteristic value, protection, positive evaluation, low carbon, energy conservation and emission reduction, life, agriculture, circular economy, wind and solar power generation |
O | General text |
In the shared feature extraction layer, named entity recognition tasks and category classification tasks share an embedding layer and feature extraction layer for joint training. According to the sequence length
Where, pos is the character position, i is the character vector dimension,
In the feature extraction layer, the transformer model based on multi head attention mechanism is used to improve the feature extraction ability of context information. For input vector Xembedding decomposes according to formula (4) to obtain query matrix Q, key value matrix K and numerical matrix V. As the input of Transformer encoder module.
Then,
Where,
In this paper, the Transformer encoder [11][12][13] based on multi head attention mechanism is used as the feature extraction layer, and the encoder module can be superimposed many times. It realizes unsupervised character level learning and representation of input text sequence under the mechanism of position coding and multi head self-attention.
In the named entity recognition task, the shared feature extraction layer extracts the long-distance location dependent features of the context, and outputs the sample vector
Structure diagram of entity recognition model
The vector representation of the sample word level is obtained by the transformer encoder. In this paper, the cascade pointer network is used to realize the sequence annotation task, that is, two 0/1 sequences are generated by two binary classification networks to determine the start and end boundaries (spans) of entities in the sequence. Each span is determined by a head position pointer (start) and a tail position pointer (end). At the same time, multiple binary classification networks are used for entity recognition.
Each word (token) in the input sequence can be represented as the starting position of an element, and the span composed of any two tokens can be represented as any entity, which solves the problem of nested entity and multi class entity recognition, as shown in Figure 4:
Example of entity recognition
Figure 4 shows the annotation examples corresponding to the input samples. Each entity corresponds to a set of pointer vectors (start, end). By combining the start and end pointer vectors of all entity labels, two-dimensional matrices can be obtained, which are recorded as
In this paper, multiple groups of binary classification networks are used to predict the possibility that the input sequence is 0/1 at all positions of the start and end pointer vectors corresponding to each entity to determine the start and end positions of the elements. The whole task can be regarded as multi label classification of each token in the input sequence. The probabilities that the i token is predicted as the starting and ending positions of the elements of entity r are
Where,
With the help of How Net knowledge base, this paper improves the construction of domain entity classification and entity comprehensive description, and proposes a similarity calculation model based on entity comprehensive description to output the probability of entity category. The structure of entity classification model is shown in Figure 5.
Entity classification model
For the input sample eigenvector
Where,
Finally, the output results of category probability corresponding to each input character are obtained respectively.
In this paper, the entity recognition probability and entity classification probability are integrated, and the output results are obtained. The start and end positions of entities in the text are marked by the ruler taking method of double pointers, and the entity categories are marked by multiple binary classification networks. Finally, the loss function formula of the entity recognition part is obtained as follows:
Where n is the length of the input sequence;
In this paper, the loss function calculation formula based on the entity part of character classification result
Where
Based on the idea of multi task learning, this paper takes the loss function weighting of domain entity recognition task and entity classification as the overall loss function of the model, and the specific formula is as follows:
Among them,
MSRA is a Chinese dataset released by Microsoft Research Asia, which comes from the news field and is also the benchmark dataset for Chinese named entity recognition. It contains about 90000 Chinese named entities and annotation data. Entity categories include three categories: location, organization and personnel.
Ontonotes4.0 is a Chinese dataset covering multiple data sources. Sources are not limited to telephone conversations, news agencies, radio news, radio conversations and blogs. This article selects four categories of entities in the version, such as people and organizations.
CLUENER2020 data set is a Chinese fine-grained named entity recognition data set, which is based on the open-source text classification data set THUCNEWS, and selects some data for fine-grained annotation. The data set is divided into 10 different categories and 12000 sentences.
The CMeEE dataset originated from chip 2020 (China health information processing Conference). By extracting entities from sentences, they are classified into nine categories: diseases, clinical manifestations, drugs, medical equipment, medical procedures, body, physical examination, microorganisms and departments, with 25000 sentences.
This paper selects the criteria commonly used in named entity recognition tasks: precision (P), recall (R) and F1 score to evaluate the performance of the model, and selects the result of F1 as the main criterion. The specific calculation formulas of accuracy rate, recall rate and F1 score are as follows:
During the experiment, the parameters are shown in Table 2 below:
Experimental parameters
SGD | |
5e–6 | |
ReLU | |
16 | |
128 | |
6 | |
Pytorch | |
1 |
In this paper, the proposed named entity recognition model MTL-NER based on multi task learning and cascading pointer network is compared with the leading-edge model on the test set of each data set. In the experiment, this paper uses a new ternary tag [entity category, entity start position, end position], while BiLSTM-CRF and BERT-BiLSTM-CRF are trained with traditional BIOES tags, and BERT-MRC uses [entity start position, End position] as a label. In this paper, BiLSTM-CRF is used as the baseline of the experiment. BERT-BiLSTM-CRF is a Transformer based pre training model, which regards the named entity recognition task as a sequence marking task. BERT-MRC [14][15] is a retraining Bert model based on machine reading comprehension (MRC). MTL-NER, a named entity recognition model based on multi task learning and cascading pointer network, achieves the best accuracy, recall and F1 on four Chinese named entity datasets. The F1 of this model is improved by 0.77%, 2.62%, 2.27% and 3.32% respectively compared with the model with the best experimental results on the four Chinese entity public data sets of MSRA, OntoNotes4.0, CLUENER2020 and CMeEE. Compared with the baseline model based on BiLSTM-CRF, it is improved by 16.09%, 35.63%, 22.85% and 24.93% respectively, which is enough to show the progressiveness of this model. The specific performance of each model is shown in the following table 3:
MSRA dataset model indicators
P (%) | R (%) | F1 (%) | |
87.47 | 85.23 | 83.34 | |
95.15 | 94.85 | 95.00 | |
96.28 | 95.74 | 96.01 | |
97.07 | 96.43 | 96.75 |
OntoNotes4.0 dataset model indicators
P (%) | R (%) | F1 (%) | |
73.45 | 60.07 | 61.71 | |
79.23 | 79.58 | 79.40 | |
82.49 | 81.23 | 81.56 | |
84.87 | 82.56 | 83.70 |
CLUENER2020 dataset model indicators
P (%) | R (%) | F1 (%) | |
67.23 | 65.42 | 66.31 | |
77.42 | 78.15 | 77.78 | |
79.04 | 80.26 | 79.65 | |
82.14 | 80.79 | 81.46 |
CMEeE dataset model indicators
P (%) | R (%) | F1 (%) | |
56.41 | 49.52 | 52.74 | |
68.98 | 66.25 | 67.59 | |
71.26 | 69.34 | 70.29 | |
73.13 | 70.68 | 71.89 |
Because more features are integrated into the word vector generation stage, it can quickly achieve better performance in the training process. However, in order to obtain better results on the test set, it is very important to select the appropriate Dropout, which can prevent the model from over fitting and is more robust. In order to determine the appropriate value of Dropout, this paper conducts several groups of comparative experiments on four data sets in the process of selecting Dropout for named entity recognition, F1 value is mainly used as the measurement standard in the experiment, and the results are shown in the following figure6, figure7, figure8 and figure9:
MSRA dataset dropout
OntoNotes4.0 dataset dropout
CLUENER2020 dataset dropout
CMeEE dataset dropout
After comparison, in the CLUENER2020 data set, dropout=0.1 is selected, and in MSRA, Onto-Notes4.0 and CMeEE data sets, if dropout=0.2 is selected, the model can achieve the most ideal experimental results. We take the data set of CLUENER2020 data set as an example, and shows the recognition results of domain entities of each model we intercepted:
It is obvious that BiLSTM-CRF model and BERT-BiLSTM-CRF model have poor recognition effect on long text, while BERT-MRC model has multiple recognition errors, and the MTL-NER model proposed in this paper achieves ideal effect on long-span entity recognition.
In the experiment of named entity recognition model, compared with the advanced model, the MTL-NER model in this paper performs well on four Chinese public data sets, and achieves 96.75% F1 on MSRA data set, which is 0.77% higher than the existing advanced model BERT-MRC. This model performs well on multiple data sets covering multiple fields, which proves the versatility of this model for different fields. The experiment also finds that the model based on Transformer is better than the model based on BiLSTM, which verifies the information extraction and utilization ability of Transformer and the rationality of using Transformer as feature extractor in this paper. Finally, experiment Prove the progressiveness of the innovation model in this paper.