Knowledge graphs (KGs) are becoming the foremost driving force to enable the Artificial Intelligence (AI). Like human brains, KGs will become the brains for machines that can connect each other, perform cognitive inference, and most importantly, derive insights from the vast amount of heterogeneous data. A lot of KGs are built in AIrelated applications, for instance, recommender systems (Li et al., 2019), question answering (Hao et al., 2017), semantic searching and ranking (Gjergji et al., 2018; Xiong, Russell, & Jamie, 2017), and reading comprehension (Yang & Tom, 2017). However, independently, of the source of the creation of the KG, the data extracted for the initial KG is always not complete. Hence, it's an important step to predict missing links of the resulting knowledge graphs because it is costly to find all valid triples manually. This problem is also known as knowledge graph completion.
Indeed many techniques have been attempting to model the relation patterns and till now KG embedding approaches have proved to be a powerful approach for KG completion. TransE (Antoine et al., 2013) is a wellknown embedding method which views every relation as a translation from subject to object. For a factual triple (
To model pairwise interaction between entities and relations, DistMult (Yang et al., 2015) represents each relation as a matrix. For a valid triple (
Recently, many attempts (Chang et al., 2014, Liu, Wu, & Yang, 2017; Minervini et al., 2017; Zhong et al., 2015, 2017) have been made on incorporating extra information beyond KG triples. RUGE (Guo et al., 2018) uses AMIE+ (Galarraga et al., 2015) tool to extract soft rules from the training sets. The embeddings of entities and relations are learned from both labeled triples in a KG, and unlabeled triples whose labels need to be learned iteratively. As suggested by RUGE, length1 and length2 rules with PCA confidence higher than 0.8 are utilized. ComplExNNE+AER (Ding et al., 2018) uses two types of constraints: approximate entailment constraints for representing relations and nonnegativity constraints for representing entity. The aim of nonnegativity constraints is to learn simple, understandable entity representations, and the purpose of approximate entailment constraints is to further encode logical regularities into relation representations. Such constraints have advantages on imposing prior knowledge introduced by association rules upon the embedding space to improve embedding learning. LineaRE (Peng & Zhang, 2020) represents a relation as a linear function of two entities which views KG embedding as a linear regression task. That is, LineaRE represents a relation vector as two weight vectors and a bias vector. This model is simple and has a powerful reasoning ability.
The most relevant work to ours is that of SimplE (Seyed & David, 2018). To model the different position contexts of each relation, SimplE takes two vectors
To handle the aforementioned issues, one must consider both the relations that an entity connects to and the locations of a relation where an entity is located when encoding an entity. We present ContE model—a contextualized embedding model for link prediction task in KGs. It is necessary to distinguish entities in two circumstances to capture the multiple meanings of entities: 1 entities in different positions in the same relation and 2 entities in different relations. The forward and backward impacts of the relationship
The parameters and scoring functions in SOTA models and in ContE model are summarized in Table 1. Some notations are illustrated as follows: <
Parameters and scoring functions in SOTA baselines and in ContE model.
Model  Scoring function 
Parameters 

TransE   

ComplEx 


SimplE 


ConvE 


ConvKB  
Rotate   

HAKE   

− 

ContE  < 
It is worth mentioning that the previous approaches are static, i.e., there is only one embedding vector for each entity regardless of its context. This posed a notable problem that all meanings of the polysemous entity share one embedding vector. As in (Devlin et al., 2019; Vaswani et al., 2017), replacing static word embeddings with contextualized word representations has achieved great improvements on multiple natural language processing (NLP) tasks. This success shows that considering the contexts of entities will be beneficial for learning KG embeddings. We previously proposed the KGCR model (Pu et al., 2020) which encodes the contextual information of the relation through its forward and backward embedding vectors that is helpful for capturing the different entity's semantics when entity appears in different relations or in different positions of the relation. KGCR uses complex vector representations for entities and relations via its defined operation between two complex vectors, thus it has high complexity and is not fully expressive. Additionally, KGCR cannot model the compositional pattern of relationships. It is well known that the performance of compositional inference may largely depend on the ability to model compositional relationships. To overcome these shortcomings of KGCR, we propose ContE model to improve the performance of KGCR. ContE encodes entities and relations in real vector space with lower complexity than KGCR. ContE has fully expressiveness property and can model compositional relationships.
In sum, the contributions of this paper are threefold:
We propose the ContE model—a novel polysemous entity embedding method with consideration of relational contexts for KG completion.
ContE is full expressiveness, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones.
ContE is a bilinear model. As the number of entities or relations increases, the number of ContE's parameters increases linearly relative to the dimension of the embedded vectors. ContE outperforms existing SOTA embedding models on benchmark datasets for KG completion.
We organize the rest as follows. In Section 2, we first summarize related work. In Section 3, we detail ContE model. Then, in Section 4, we present experiments and results. At last, in Section 5, we draw some conclusions.
Link prediction with KG embedding based approaches has been extensively explored in recent years. Plenty of approaches have been designed towards this task. Integrating deep neural networks with KGs has caused wide concerns in a lot of NLP tasks, for instance, relation extraction (Ji et al., 2017; Zhang et al., 2019), language modeling (Liu et al., 2020; Sun et al., 2019), and knowledge fusion (Dong et al., 2014). NTN (Socher et al., 2013) is the first neural network based approach for reasoning over relationships between two entities. NTN utilizes a neural tensor network for the relation classification. NTN represents entities with their word vectors that are trained by pretrained models such as Word2Vec. ConvE (Dettmers et al., 2018) first uses CNN model for KG completion. It is a CNN with three layers, the lowest of which is the convolution layer performing 2D reshaping, the middle of which is the projection layer, and the top of which is the inner product layers. The 2D convolution can capture more interactions between different dimensional entries of
PTransE (Lin et al., 2015) utilizes relation paths to learn entity and relation representations. It not only takes one step path into consideration but also considers multisteps path. Sometimes, there are many relation paths between two entities. For any given relation path, to estimate its importance, the allocation of resources algorithm is presented in PTransE to provide the importance score of a path and select those relation paths with the importance score larger than a certain value. Three encodings of a relation path, namely, multiplication of relation representations on the path, addition of relation representations on the path, and recurrent neural networkbased relation representation, are utilized for relation representations. PTransE can make full use of local path information. But, it does not achieve an expected outcome that PTransE only utilizes relation paths to enhance the representation learning and still neglects relational contexts of entities. Since there are many entities linking fewer relations (a.k.a. longtailed entities), in order to learn the embeddings of longtailed entities, RSNs (Guo, Sun, & Hu, 2019) presents recurrent skipping networks. They are different from other models. The head entities could predict both their subsequent relations and their tail entities via skipping their connections directly. The relation path based learning methods such as PTransE, suffer from high computational cost and the length of a path of relations is limited to at most three steps. RSNs leverage biased random walks to model deep paths which can capture more relational dependencies. It is worth mentioning that RSNs only use relational paths to learn KG embeddings, whereas textual information of relations is still neglected.
So far, many existing approaches are static. Static representation learning has some limitations. It learns entity and relation embeddings from a single triple view. The contexts of relations are neglected. On the one hand, there are two different positions of a relation in which an entity appears. On the other hand, many entities connect different relational neighbors, i.e., different entities may link different relations. Hence, exploiting interactions between entities and relations for KG embedding is significantly important.
We use TransE's procedure to generate negative triples. For each positive triple (
Note that negative triples can also be produced via randomly corrupting relations. Whereas this method of generating negative triples possibly inputs false negative triples, that is, positive triples.
Under consideration of the contextual circumstance of the entity, we must take into account both the relations that the entity connects to and the locations of a relation where the entity is in. We propose an embedding model based on the relational contexts, named ContE, for KG completion. In ContE model, each relation
For each relation
After defining the scoring function, the model can be optimized. One can use a negative sampling loss function similar to (Mikolov et al., 2013) to effectively minimizing marginbased loss:
We present a new approach for drawing negative samples with this distribution:
Here,
ContE is a simple yet expressive model. The following theorem establishes the full expressiveness of ContE.
Given arbitrary ground truth on ℰ and ℛ that contains
We prove the ℰ · ℛ bound. For each head entity
Then, the (
Now we prove the
As stated in (Seyed et al., 2018), many types of translational models, for instance, STransE, TransE, TransR, FTransE, and TransH are not fully expressive. Since DistMult forces relations to be symmetric, it is also not fully expressive. ComplEx and SimplE are both fully expressive with embedding vectors of size at most ℰ ℛ.
Using contextualized representations of entities allows us to easily model primary relation patterns. Four patterns of relations, that is, inversion, symmetry, antisymmetry as well as composition, can be modeled by ContE.
Given a symmetry relation
If (
Given an antisymmetry relation
For any entities
Firstly, we expand the left term:
We then expand the right term:
It is obvious that these two terms are not equal because some of the terms have different signs.
Assume that relation
We only prove case (1) since case (2) is the same. First, we have
Hence, (
Relation pattern modeling and inference abilities of baseline models.
Model  Symmetry  Antisymmetry  Inversion  Composition 

TransE (Antoine et al., 2013)  ×  ✓  ✓  ✓ 
DistMult (Yang et al., 2015)  ✓  ×  ×  × 
ComplEx (Trouillon et al., 2016)  ✓  ✓  ✓  × 
SimplE (Seyed & David, 2018)  ✓  ✓  ✓  × 
ConvE (Dettmers et al., 2018)  −  −  −  − 
ConvKB (Nguyen et al., 2018)  −  −  −  − 
RotatE (Sun et al., 2019)  ✓  ✓  ✓  ✓ 
HAKE (Zhang et al., 2020)  ✓  ✓  ✓  ✓ 
KGCR (Pu et al., 2020)  ✓  ✓  ✓  × 
LineaRE (Peng & Zhang, 2020)  ✓  ✓  ✓  ✓ 
ContE  ✓  ✓  ✓  ✓ 
Comparison of SOTA baselines and ContE model in terms of time complexity and number of parameters.
Models  #Parameters  Time Complexity 

TransE  
NTN  
ComplEx  
TransR  
SimplE  
ContE 
Summary on datasets.
Dataset   
 
training  validation  test 

14,541  237  272,115  17,535  20,466  
135  46  5,216  652  661  
14  55  1,592  199  201  
271  2  1,111  24  24  
271  2  1,063  24  24  
271  2  985  24  24 
We evaluate the performance of ContE on four standard benchmarks, i.e., UMLS, Nations, FB15K237 and Countries. Table 3 summarizes the statistics of these datasets. UMLS (Kok & Domingos, 2007) contains data from the Unified Medical Language System, a database of biomedical ontology. It contains 135 concepts and 46 relations, of which 6,529 atoms are true. FB15K (Antoine et al., 2013) is extracted from Freebase which is a large KG containing general knowledge facts. FB15K237 removed inverse relations from FB15K because through inverse relations FB15K leads to test leakage (Dettmers et al., 2018). Countries dataset (Guillaume, Sameer, & Theo, 2015) contains 272 entities and 2 relations, which is used to test the ability of modelling and reasoning composition pattern. The two relations are
Experimental results for UMLS.
UMLS  

MRR  Hits@N  
1  3  10  
TransE (Antoine et al., 2013)  0.7966  0.6452  0.9418  0.9841 
DistMult (Yang et al., 2015)  0.868  0.821    0.967 
ComplEx (Trouillon et al., 2016)  0.8753  0.7942  0.9531  0.9713 
ConvE (Dettmers et al., 2018)  0.957  0.932    0.994 
NeuralLP (Yang, Zhang, & Cohen, 2017)  0.778  0.643    0.962 
NTPλ (Rocktaschel et al., 2017)  0.912  0.843    1.0 
MINERVA (Das et al., 2018)  0.825  0.728    0.968 
KGRRS+ComplEx (Lin et al., 2018)  0.929  0.887    0.985 
KGRRS+ConvE (Lin et al., 2018)  0.940  0.902    0.992 
Rotate (Sun et al., 2019)  0.9274  0.8744  0.9788  0.9947 
HAKE (Zhang et al., 2020)  0.8928  0.8366  0.9387  0.9849 
LineaRE (Peng & Zhang, 2020)  0.9508  0.9145  0.9992  
ContE  0.9811 
The ContE model achieves the best performance on all metrics except metric Hits@3 in Table 5.
Experimental results for Nations.
Nations  

MRR  Hits@N  
1  3  10  
TransE (Antoine et al., 2013)  0.4813  0.2189  0.6667  0.9801 
DistMult (Yang et al., 2015)  0.7131  0.5970  0.7761  0.9776 
ComplEx (Trouillon et al., 2016)  0.6677  0.5274  0.7413  0.9776 
ConvE (Dettmers et al., 2018)  0.5616  0.3470  0.7155  0.9946 
Rotate (Sun et al., 2019)  0.7155  0.5796  0.7985  1.0 
HAKE (Zhang et al., 2020)  0.7157  0.5945  0.7786  0.9851 
LineaRE (Peng & Zhang, 2020)  0.8146  0.7114  0.8881  0.9975 
ContE 
For all listed baselines except ConvE, we rerun the codes given by their original papers, and do a grid search to find the best values of hyperparameters. The results of ConvE are obtained by PyKEEN (Ali et al., 2021). For TransE, the range of hyperparameters is batch_size = 20,
From Table 6, ContE is the best model which outperforms the best SOTA baseline by 2.66% on metric MRR and by 4.73% on metric Hits@1, by 2.98% on metric Hits@3, respectively.
Experimental results for FB15K237.
FB15K237  

MRR  Hits@N  
1  3  10  
TransE (Antoine et al., 2013)  0.279  0.198  0.376  0.441 
DistMult (Yang et al., 2015)  0.281  0.199  0.301  0.446 
ComplEx (Trouillon et al., 2016)  0.278  0.194  0.297  0.45 
ConvE (Dettmers et al., 2018)  0.312  0.225  0.341  0.497 
ConvKB (Nguyen et al., 2018)  0.289  0.198  0.324  0.471 
RGCN (Schlichtkrull et al., 2018)  0.164  0.10  0.181  0.30 
SimplE (Seyed & David, 2018)  0.169  0.095  0.179  0.327 
CapsE (Nguyen et al., 2019)  0.150      0.356 
Rotate (Sun et al., 2019)  0.338  0.241  0.375  0.533 
ContE 
Table 7 shows that ContE achieves the best MRR as well as Hits@k (k=1,3,10) compared with the SOTA baselines.
Experimental results for Countries_S1.
Countries_S1  

MRR  Hits@N  
1  3  10  
TransE (Antoine et al., 2013)  0.8785  0.7708  1.0  1.0 
DistMult (Yang et al., 2015)  0.9028  0.8125  1.0  1.0 
ComplEx (Trouillon et al., 2016)  0.9792  0.9583  1.0  1.0 
Rotate (Sun et al., 2019)  0.8750  0.7708  1.0  1.0 
HAKE (Zhang et al., 2020)  0.9045  0.8333  0.9792  1.0 
LineaRE (Peng & Zhang, 2020)  1.0  1.0  1.0  1.0 
Experimental results for Countries_S2 and Countries_S3.
Countries_S2  Countries_S3  

MRR  Hits@N  MRR  Hits@N  
1  3  10  1  3  10  
TransE  0.6997  0.50  0.9375  1.0  0.1206  0.00  0.0833  0.3542 
DistMult  0.7813  0.5833  1.0  1.0  0.2496  0.0625  0.333  0.6250 
ComplEx  0.7934  0.6042  0.2731  0.0833  0.3958  
Rotate  0.6979  0.4792  0.9583  1.0  0.1299  0.00  0.0833  0.4792 
HAKE  0.6667  0.4583  0.8333  0.9583  0.2472  0.0625  0.3333  0.5417 
LineaRE  0.7873  0.6458  0.9583  0.9792  0.2393  0.0625  0.3542  0.5208 
ContE  0.9583  0.9792  0.625 
As can be seen from Tables 8 and 9, ContE shows the stronger reasoning ability compared with other baselines. Specially, on Countries_S2, ContE outperforms the best SOTA baseline by 4.36% on metric MRR and by 8.32% on metric Hits@1, respectively. On Countries_S3, ContE outperforms the best SOTA baseline by 19.64% on metric MRR, by 27.09% on metric Hits@1, and by 10.42% on metric Hits@3, respectively.
Testing the ability of inferring relation patterns on UMLS.
UMLS  

Symmetry  Antisymmetry  Inversion  Composition  
ComplEx    0.8806  0.8615  0.8732 
Rotate    0.9065  0.9069  0.9141 
HAKE    0.8558  0.8650  0.8723 
LineaRE    0.9446  0.9441  0.9490 
ContE   
Testing the ability of inferring relation patterns on Nations.
Nations  

Symmetry  Antisymmetry  Inversion  Composition  
ComplEx    0.6499  0.6487  0.6499 
Rotate    0.6825  0.6970  0.6907 
HAKE    0.6953  0.6835  0.6844 
LineaRE    0.8189  0.8240  
ContE    0.8303 
From Tables 10 and 11, ContE shows the stronger capability of inferring relation patterns compared with other baselines.
We propose the ContE model—a novel entity embedding model considering contexts of relations for KG completion. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. ContE is a simple bilinear model. As the number of entities or relations increases, the number of its parameters increases linearly relative to the dimension of the embeddings. ContE is also a fully expressive model compared to other methods such as TransE and HAKE. It is worth mentioning that ContE can model symmetry, antisymmetry, inversion as well as composition patterns. ContE has stronger reasoning capability compared with SOTA baselines.
The empirical results demonstrate that, for the link prediction task, ContE achieves better performance than existing SOTA baselines. In future works, we can leverage the entity's neighborhood to explore local graph structure and use attention to capture the main sense of the entity from its heterogeneous neighborhood, and will further extend the ContE model to multimodal knowledge graphs (Liu, Li, & Alberto, 2019) that contain not only (links to) images but also numerical features for all entities.
Testing the ability of inferring relation patterns on UMLS.
UMLS  

Symmetry  Antisymmetry  Inversion  Composition  
ComplEx    0.8806  0.8615  0.8732 
Rotate    0.9065  0.9069  0.9141 
HAKE    0.8558  0.8650  0.8723 
LineaRE    0.9446  0.9441  0.9490 
ContE   
Experimental results for Countries.S2 and Countries.S3.
Countries_S2  Countries_S3  

MRR  Hits@N  MRR  Hits@N  
1  3  10  1  3  10  
TransE  0.6997  0.50  0.9375  1.0  0.1206  0.00  0.0833  0.3542 
DistMult  0.7813  0.5833  1.0  1.0  0.2496  0.0625  0.333  0.6250 
ComplEx  0.7934  0.6042  0.2731  0.0833  0.3958  
Rotate  0.6979  0.4792  0.9583  1.0  0.1299  0.00  0.0833  0.4792 
HAKE  0.6667  0.4583  0.8333  0.9583  0.2472  0.0625  0.3333  0.5417 
LineaRE  0.7873  0.6458  0.9583  0.9792  0.2393  0.0625  0.3542  0.5208 
ContE  0.9583  0.9792  0.625 
Experimental results for Nations.
Nations  

MRR  Hits@N  
1  3  10  
TransE ( 
0.4813  0.2189  0.6667  0.9801 
DistMult ( 
0.7131  0.5970  0.7761  0.9776 
ComplEx ( 
0.6677  0.5274  0.7413  0.9776 
ConvE ( 
0.5616  0.3470  0.7155  0.9946 
Rotate ( 
0.7155  0.5796  0.7985  1.0 
HAKE ( 
0.7157  0.5945  0.7786  0.9851 
LineaRE ( 
0.8146  0.7114  0.8881  0.9975 
ContE 
Experimental results for Countries.S1.
Countries_S1  

MRR  Hits@N  
1  3  10  
TransE ( 
0.8785  0.7708  1.0  1.0 
DistMult ( 
0.9028  0.8125  1.0  1.0 
ComplEx ( 
0.9792  0.9583  1.0  1.0 
Rotate ( 
0.8750  0.7708  1.0  1.0 
HAKE ( 
0.9045  0.8333  0.9792  1.0 
LineaRE ( 
1.0  1.0  1.0  1.0 
Comparison of SOTA baselines and ContE model in terms of time complexity and number of parameters.
Models  #Parameters  Time Complexity 

TransE  
NTN  
ComplEx  
TransR  
SimplE  
ContE 
Summary on datasets.
Dataset   
 
training  validation  test 

14,541  237  272,115  17,535  20,466  
135  46  5,216  652  661  
14  55  1,592  199  201  
271  2  1,111  24  24  
271  2  1,063  24  24  
271  2  985  24  24 
Experimental results for FB15K237.
FB15K237  

MRR  Hits@N  
1  3  10  
TransE ( 
0.279  0.198  0.376  0.441 
DistMult ( 
0.281  0.199  0.301  0.446 
ComplEx ( 
0.278  0.194  0.297  0.45 
ConvE ( 
0.312  0.225  0.341  0.497 
ConvKB ( 
0.289  0.198  0.324  0.471 
RGCN ( 
0.164  0.10  0.181  0.30 
SimplE ( 
0.169  0.095  0.179  0.327 
CapsE ( 
0.150      0.356 
Rotate ( 
0.338  0.241  0.375  0.533 
ContE 
Experimental results for UMLS.
UMLS  

MRR  Hits@N  
1  3  10  
TransE ( 
0.7966  0.6452  0.9418  0.9841 
DistMult ( 
0.868  0.821    0.967 
ComplEx ( 
0.8753  0.7942  0.9531  0.9713 
ConvE ( 
0.957  0.932    0.994 
NeuralLP ( 
0.778  0.643    0.962 
NTPλ ( 
0.912  0.843    1.0 
MINERVA ( 
0.825  0.728    0.968 
KGRRS+ComplEx ( 
0.929  0.887    0.985 
KGRRS+ConvE ( 
0.940  0.902    0.992 
Rotate ( 
0.9274  0.8744  0.9788  0.9947 
HAKE ( 
0.8928  0.8366  0.9387  0.9849 
LineaRE ( 
0.9508  0.9145  0.9992  
ContE  0.9811 
Parameters and scoring functions in SOTA baselines and in ContE model.
Model  Scoring function 
Parameters 

TransE   

ComplEx 


SimplE 


ConvE 


ConvKB  
Rotate   

HAKE   

− 

ContE  < 
Relation pattern modeling and inference abilities of baseline models.
Model  Symmetry  Antisymmetry  Inversion  Composition 

TransE ( 
×  ✓  ✓  ✓ 
DistMult ( 
✓  ×  ×  × 
ComplEx ( 
✓  ✓  ✓  × 
SimplE ( 
✓  ✓  ✓  × 
ConvE ( 
−  −  −  − 
ConvKB ( 
−  −  −  − 
RotatE ( 
✓  ✓  ✓  ✓ 
HAKE ( 
✓  ✓  ✓  ✓ 
KGCR ( 
✓  ✓  ✓  × 
LineaRE ( 
✓  ✓  ✓  ✓ 
ContE  ✓  ✓  ✓  ✓ 
Testing the ability of inferring relation patterns on Nations.
Nations  

Symmetry  Antisymmetry  Inversion  Composition  
ComplEx    0.6499  0.6487  0.6499 
Rotate    0.6825  0.6970  0.6907 
HAKE    0.6953  0.6835  0.6844 
LineaRE    0.8189  0.8240  
ContE    0.8303 