Accesso libero

Learning Context-based Embeddings for Knowledge Graph Completion

INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction

Knowledge graphs (KGs) are becoming the foremost driving force to enable the Artificial Intelligence (AI). Like human brains, KGs will become the brains for machines that can connect each other, perform cognitive inference, and most importantly, derive insights from the vast amount of heterogeneous data. A lot of KGs are built in AI-related applications, for instance, recommender systems (Li et al., 2019), question answering (Hao et al., 2017), semantic searching and ranking (Gjergji et al., 2018; Xiong, Russell, & Jamie, 2017), and reading comprehension (Yang & Tom, 2017). However, independently, of the source of the creation of the KG, the data extracted for the initial KG is always not complete. Hence, it's an important step to predict missing links of the resulting knowledge graphs because it is costly to find all valid triples manually. This problem is also known as knowledge graph completion.

Indeed many techniques have been attempting to model the relation patterns and till now KG embedding approaches have proved to be a powerful approach for KG completion. TransE (Antoine et al., 2013) is a well-known embedding method which views every relation as a translation from subject to object. For a factual triple (h,r,t), r is viewed as a translative vector r such that embedding vectors h plus r should be almost equal to t. Thus, the scoring function is ψ(h,r,t) = ||h + rt||. With this translational schema, multiple models have been presented to extend TransE to utilize entity or relation projection additionally to translate head entity into tail entity in either the entity space or the relation space. TransH (Wang et al., 2014) proposes a method named translations on hyperplane which project entities into the relation-specific hyperplane to effectively handle one-to-many, many-to-one and many-to-many relations. TransD (Ji, He, & Xu, 2015) decomposes two projection matrices into the product of two mapping vectors. These two matrices depend on both entities and relations, which make two projection vectors interact sufficiently. KGs are characterised by the heterogeneity (a few relations will link more entity pairs, while others will not). TranSparse (Ji et al., 2016) then uses adaptive sparse projection matrices to handle these issues, the adaptiveness means that the entity pairs linked by relations determines the sparseness of projection matrices.

To model pairwise interaction between entities and relations, DistMult (Yang et al., 2015) represents each relation as a matrix. For a valid triple (h,r,t), its score is defined as ψ(h,r,t) = hTMrt. To simplify ψ(h,r,t), DistMult restricts matrix Mr to be a diagonal matrix. However, in order to model compositional reasoning of relations which contain more latent semantics, it requires a non-diagonal matrix Mr. ComplEx (Trouillon et al., 2016) utilizes complex-valued embeddings rather than real-valued embeddings to model asymmetric relations better. Rotate (Sun et al., 2019) views each relation as a rotation from the source entity to the target entity in complex vector space. The vector of target entity t equals to the Hadmard product of the vectors of source entity s and relation r. That is, t = hr, where ○ is the element-wise Hadmard product. Rotate can capture symmetry, antisymmetry, inversion as well as composition patterns. HAKE (Zhang et al., 2020) represents semantic hierarchies by mapping entities into a polar coordinate system, where the modular information represents different levels of the hierarchy and the phase information represents different entities at the same level of the hierarchy.

Recently, many attempts (Chang et al., 2014, Liu, Wu, & Yang, 2017; Minervini et al., 2017; Zhong et al., 2015, 2017) have been made on incorporating extra information beyond KG triples. RUGE (Guo et al., 2018) uses AMIE+ (Galarraga et al., 2015) tool to extract soft rules from the training sets. The embeddings of entities and relations are learned from both labeled triples in a KG, and unlabeled triples whose labels need to be learned iteratively. As suggested by RUGE, length-1 and length-2 rules with PCA confidence higher than 0.8 are utilized. ComplEx-NNE+AER (Ding et al., 2018) uses two types of constraints: approximate entailment constraints for representing relations and non-negativity constraints for representing entity. The aim of non-negativity constraints is to learn simple, understandable entity representations, and the purpose of approximate entailment constraints is to further encode logical regularities into relation representations. Such constraints have advantages on imposing prior knowledge introduced by association rules upon the embedding space to improve embedding learning. LineaRE (Peng & Zhang, 2020) represents a relation as a linear function of two entities which views KG embedding as a linear regression task. That is, LineaRE represents a relation vector as two weight vectors and a bias vector. This model is simple and has a powerful reasoning ability.

The most relevant work to ours is that of SimplE (Seyed & David, 2018). To model the different position contexts of each relation, SimplE takes two vectors he and te to represent two different position embeddings for each entity e. he captures the forward impact of r on e, if e is at the head of r. Similarly, te captures the backward impact of r on e, if e is at the tail of r. To address an information flow issue between he and te of an entity e, SimplE takes the inverse of the relations to handle this issue. Let he,te ∈ ℝd be the embedding of entity e if e is in the head or the tail of the relationship, respectively. Let vr ∈ ℝd be the embedding of relation r and νr−1 ∈ ℝd the embedding of its inverse relation r−1. In SimplE, the scoring function ψ of a fact (ei,r,ej) is defined as the average of two scores: < hei, νr, tej > to the score of (ei,r,ej) and < hej, νr−1, tei > to the score of (ej,r−1,ei). That is, ψ(ei,r,ej) = (< hei, νr, tej > + < hej, νr−1, tei >) / 2. Note that SimplE does not achieve an expected outcome since an entity could be multifaceted where it connects to different neighborhoods. SimplE only takes into account the position information of each entity e, it concerns whether e is in the head or in the tail of relation r. Each entity has two embedding vectors, which represent the forward context or the backward context of a relation, respectively. It is easy to know that he and te usually are unequal when entity e appearing in different positions. Whereas, in addition to the positions of the relation, the different relations that the entity associate with are also the key context of the entity. For instance, given two facts <e1,r1,e2>, <e1,r2,e3>, where e1, r1, and e2 stand for “Bank”, “issued”, and “bonds”, respectively, e1, r2, and e3 represent “Bank”, “played”, and “basketball”, respectively. SimplE encodes e1 using the same embedding vector even if e1 connects to two different relations r1 and r2. The meaning of the header entity e1 for the relation r1 is obviously different from the meaning of the header entity e1 for the relation r2. Therefore, two meanings of the polysemous entity e1 share one embedding vector. Hence, the contextual nature of the relation is ignored by SimplE, which could facilitate to learn KG embedding.

To handle the aforementioned issues, one must consider both the relations that an entity connects to and the locations of a relation where an entity is located when encoding an entity. We present ContE model—a contextualized embedding model for link prediction task in KGs. It is necessary to distinguish entities in two circumstances to capture the multiple meanings of entities: 1- entities in different positions in the same relation and 2- entities in different relations. The forward and backward impacts of the relationship r in ContE are mapped to two different embedding vectors fr and br, which represent the contextual information of the relationship. Given a triple <e1,r,e2>, we define its scoring function as ψ(e1,r,e2) = < fr + νe1, br + νe2 >, where vr = (fr,br). Usually, because of fr + vebr + ve, in different positions of relation r the embedding of entity e may be unequal. Similarly, for triples <e1,r1,t1> and <e1,r2,t2>, the embedding of entity e1 in triple <e1,r1,t1> is not equal to the one in triple <e1,r2,t2> due to fr1 + νe1fr2 + νe1 in general. For triples <e1,r1,t1> and <e2,r2,t1>, we have a similar conclusion that the embedding of entity t1 in triple <e1,r1,t1> usually does not equal to the one in triple <e2,r2,t1> because of br1 + νt1br2 + νt1. Therefore, ContE considers two different relational contexts to learn entity embeddings.

The parameters and scoring functions in SOTA models and in ContE model are summarized in Table 1. Some notations are illustrated as follows: < νh, νr, νt > = ∑iνhi νri νti represents a tri-linear dot product of vectors. ||u||q represents the q-norm of the vector u. ○ represents element-wise product or the Hadmard product. * is a convolution operator. • represents the dot product of vectors. g represents an activating function used in neural networks. concat is the concatenation operation on vectors. Ω represents a set of filters. ν^ \hat \nu represents a 2D reshaping operation on vector v.

Parameters and scoring functions in SOTA baselines and in ContE model.

Model Scoring function ψ(e1,r,e2) Parameters
TransE ||νe1 + νrνe2||p νe1, νr, νe2 ∈ ℝd
ComplEx Re(<νe1,νr,ν¯e2>) {{ Re}} \left( { < {\nu _{{e_1}}},{\nu _r},{{\bar \nu }_{{e_2}}} > } \right) νe1, νr, νe2 ∈ ℂd
SimplE 12(<he1,νr,te2>+<he2,νr1,te1>) {1 \over 2}\left( { < {h_{{e_1}}},{\nu _r},{t_{{e_2}}} > + < {h_{{e_2}}},{\nu _{{r^{ - 1}}}},{t_{{e_1}}} > } \right) he1, νr, te2, he2, νr−1, te1 ∈ ℝd
ConvE g(vec(g(concat(v^e1,νr)*Ω))W)νe2 g\left( {vec\left( {g\left( {concat\left( {{{\hat v}_{{e_1}}},{\nu _r}} \right)*\Omega } \right)} \right)W} \right) \cdot {\nu _{{e_2}}} νe1, νr, νe2 ∈ ℝd
ConvKB concat(g([νe1, νr, νe2] * Ω)) · w νe1, νr, νe2 ∈ ℂd
Rotate ||νe1νrνe2|| νe1, νe2, νr, ∈ Cd
HAKE ||νe1mνrmνe2m||2 νe1m,νe2mRd,νrmR+d {\nu _{e{1_m}}},{\nu _{e{2_m}}} \in {R^d},{\nu _{{r_m}}} \in R_ + ^d
λ ||sin(νe1p + νrpνe2p)||1 νe1p, rp, νe2p ∈ [0, 2π)d
ContE < fr + νe1, br + νe2 > νe1, νe2, fr, br, ∈ ℝd

It is worth mentioning that the previous approaches are static, i.e., there is only one embedding vector for each entity regardless of its context. This posed a notable problem that all meanings of the polysemous entity share one embedding vector. As in (Devlin et al., 2019; Vaswani et al., 2017), replacing static word embeddings with contextualized word representations has achieved great improvements on multiple natural language processing (NLP) tasks. This success shows that considering the contexts of entities will be beneficial for learning KG embeddings. We previously proposed the KGCR model (Pu et al., 2020) which encodes the contextual information of the relation through its forward and backward embedding vectors that is helpful for capturing the different entity's semantics when entity appears in different relations or in different positions of the relation. KGCR uses complex vector representations for entities and relations via its defined operation between two complex vectors, thus it has high complexity and is not fully expressive. Additionally, KGCR cannot model the compositional pattern of relationships. It is well known that the performance of compositional inference may largely depend on the ability to model compositional relationships. To overcome these shortcomings of KGCR, we propose ContE model to improve the performance of KGCR. ContE encodes entities and relations in real vector space with lower complexity than KGCR. ContE has fully expressiveness property and can model compositional relationships.

In sum, the contributions of this paper are threefold:

We propose the ContE model—a novel polysemous entity embedding method with consideration of relational contexts for KG completion.

ContE is full expressiveness, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones.

ContE is a bilinear model. As the number of entities or relations increases, the number of ContE's parameters increases linearly relative to the dimension of the embedded vectors. ContE outperforms existing SOTA embedding models on benchmark datasets for KG completion.

We organize the rest as follows. In Section 2, we first summarize related work. In Section 3, we detail ContE model. Then, in Section 4, we present experiments and results. At last, in Section 5, we draw some conclusions.

Related work

Link prediction with KG embedding based approaches has been extensively explored in recent years. Plenty of approaches have been designed towards this task. Integrating deep neural networks with KGs has caused wide concerns in a lot of NLP tasks, for instance, relation extraction (Ji et al., 2017; Zhang et al., 2019), language modeling (Liu et al., 2020; Sun et al., 2019), and knowledge fusion (Dong et al., 2014). NTN (Socher et al., 2013) is the first neural network based approach for reasoning over relationships between two entities. NTN utilizes a neural tensor network for the relation classification. NTN represents entities with their word vectors that are trained by pre-trained models such as Word2Vec. ConvE (Dettmers et al., 2018) first uses CNN model for KG completion. It is a CNN with three layers, the lowest of which is the convolution layer performing 2D reshaping, the middle of which is the projection layer, and the top of which is the inner product layers. The 2D convolution can capture more interactions between different dimensional entries of vh and vr concentrating on their local relationships. Instead of extracting local relationships between h and r, ConvKB (Nguyen et al., 2018) utilizes another convolutional operation on same dimensional entries of vh, vr and vt so as to extract their global relationship. Graph Convolutional Networks (GCNs) are based on CNNs and graph embedding, they operate on the graph structure to collectively aggregate information from the graph. R-GCN (Schlichtkrull, Thomas, & Bloem, 2018) first applies GCN to relational data. It is an auto-encoder for the missing link prediction task, R-GCN can produce latent entity representation similar to an encoder, and a tensor factorization model utilizes the entity representations to predict relation labels similar to a decoder. For the entity classification task, R-GCN introduces linear relation-specific transformations to accumulate feature embeddings from neighboring nodes. However, R-GCN does not outperform the CNN based models such as ConvE for the link prediction task. CapsE (Nguyen et al., 2019) uses a capsule network which has a deep architecture to model entries in triple-based data at the same dimension. It can effectively represent complex relationships such as many-to-many relationships. KBAT (Nathani et al., 2019) focuses on the multi-hop neighborhood of each entity that can capture multi-hop and similar relations. By extending the graph attention mechanism to entities and relations within each entity's neighborhood, KBAT collects multi-hop relation information from the n-hop neighborhood of the entity, which is particularly suitable for relation prediction tasks. It is worth noting that as described in (Sun et al., 2020), some neural network-based methods such as ConvKB, CapsE and KBAT use inappropriate evaluation protocols, so their performances highly outperform other SOTA baselines. The main reason is that if there are multiple triples with the same score from the model, they usually pick the correct triple to insert in the beginning of candidate set. The right way is that the correct triple is placed randomly in candidate set.

PTransE (Lin et al., 2015) utilizes relation paths to learn entity and relation representations. It not only takes one step path into consideration but also considers multi-steps path. Sometimes, there are many relation paths between two entities. For any given relation path, to estimate its importance, the allocation of resources algorithm is presented in PTransE to provide the importance score of a path and select those relation paths with the importance score larger than a certain value. Three encodings of a relation path, namely, multiplication of relation representations on the path, addition of relation representations on the path, and recurrent neural network-based relation representation, are utilized for relation representations. PTransE can make full use of local path information. But, it does not achieve an expected outcome that PTransE only utilizes relation paths to enhance the representation learning and still neglects relational contexts of entities. Since there are many entities linking fewer relations (a.k.a. long-tailed entities), in order to learn the embeddings of long-tailed entities, RSNs (Guo, Sun, & Hu, 2019) presents recurrent skipping networks. They are different from other models. The head entities could predict both their subsequent relations and their tail entities via skipping their connections directly. The relation path based learning methods such as PTransE, suffer from high computational cost and the length of a path of relations is limited to at most three steps. RSNs leverage biased random walks to model deep paths which can capture more relational dependencies. It is worth mentioning that RSNs only use relational paths to learn KG embeddings, whereas textual information of relations is still neglected.

So far, many existing approaches are static. Static representation learning has some limitations. It learns entity and relation embeddings from a single triple view. The contexts of relations are neglected. On the one hand, there are two different positions of a relation in which an entity appears. On the other hand, many entities connect different relational neighbors, i.e., different entities may link different relations. Hence, exploiting interactions between entities and relations for KG embedding is significantly important.

ContE model

Notation Let ℛ be the relation set, ℰ be the entity set. Let 𝒲 ⊂ ℰ × ℛ × ℰ represent the factual triple set (i.e. true in a world). Let 𝒲𝒞 be the complement of 𝒲. KGs 𝒢 = {(e1,r,e2)} ⊆ 𝒲 are the subsets of 𝒲. Let G′ be the complement of 𝒢. Each triple of 𝒢 is denoted by (head entity, relation, tail entity). The KG completion is formalized as a problem: for any triple (e1,r,e2), one needs to infer from 𝒢 whether (e1,r,e2) ∈ 𝒲 (or (e1,r,e2) ∈ 𝒲𝒞). For each triple (e1,r,e2), its score function ψ: ℰ × ℛ × ℰ ↦ ℝ is defined for measuring its plausibility. Triples of 𝒢 should have higher scores than those not in 𝒢.

We use TransE's procedure to generate negative triples. For each positive triple (e1,r,e2)∈𝒢, we generate negative triples by substituting either entity e1 or entity e2 with e1 e_1^\prime (or e2 e_2^\prime ) randomly selected from ℰ − {e1} (or ℰ − {e2}). Let 𝒢 the set of negative triples. Then, G={(e1,r,e2)|e1e1e1(e1,r,e2)G}{(e1,r,e2)|e2e2e2(e1,r,e2)G} \matrix{{{{\cal G}^ - } = \left\{ {\left( {e_1^\prime,r,{e_2}} \right)|e_1^\prime \in {\cal E} \wedge e_1^\prime \ne {e_1} \wedge \left( {{e_1},r,{e_2}} \right) \in {\cal G}} \right\}} \hfill \cr {\bigcup {\left\{ {\left( {{e_1},r,e_2^\prime} \right)|e_2^\prime \in {\cal E} \wedge e_2^\prime \ne {e_2} \wedge \left( {{e_1},r,{e_2}} \right) \in {\cal G}} \right\}} } \hfill \cr }

Note that negative triples can also be produced via randomly corrupting relations. Whereas this method of generating negative triples possibly inputs false negative triples, that is, positive triples.

Under consideration of the contextual circumstance of the entity, we must take into account both the relations that the entity connects to and the locations of a relation where the entity is in. We propose an embedding model based on the relational contexts, named ContE, for KG completion. In ContE model, each relation r is divided into two vectors fr and br, representing r's forward and backward impacts, respectively. The corresponding scoring function of ContE is as follows: for any triple (h,r,t), ψ(h,r,t)=<fr+νh,br+νt> \psi \left( {h,r,t} \right) = < {f_r} + {\nu _h},{b_r} + {\nu _t} > where, vh ∈ ℝd, vt ∈ ℝd and vr = (fr,br) ∈ ℝ2d are the respective embeddings of h, t and r. vr is the catenation of fr and br, where fr ∈ ℝd, br ∈ ℝd represent the forward and backward context vectors of the relation r, respectively. d represents the dimension of embedding vectors. The bilinear product <fr+νh,br+νt>=id[fr+νh]i[br+νt]i < \,{f_r} + {\nu _h},{b_r} + {\nu _t}\, > \, = \,\sum\nolimits_i^d {{{\left[ {{f_r} + {\nu _h}} \right]}_i}{{\left[ {{b_r} + {\nu _t}} \right]}_i}} , here, [x]k is the k-th entry of vector x.

For each relation r, fr and br can capture the position information of relation r as well as the context of r. The role of an entity that appears at the head of relation r differs from that of the entity that appears at the tail of relation r. This assertion is easy to obtain from fr + vebr + ve. On the other hand, the same entity has different roles at the same position in different relations.

Optimizing model

After defining the scoring function, the model can be optimized. One can use a negative sampling loss function similar to (Mikolov et al., 2013) to effectively minimizing margin-based loss: =k=1n1d[logσ(ψ(hk,r,tk)γ)+logσ(ψ(hk,r,tk)γ)]logσ(γψ(h,r,t)) {\cal L}\, = - \sum\nolimits_{k = 1}^n {{1 \over d}\left[ {log \,\sigma \left( {\psi \left( {h_k^\prime,r,{t_k}} \right) - \gamma } \right) + log\, \sigma \left( {\psi \left( {{h_k},r,t_k^\prime} \right) - \gamma } \right)} \right]} - log\, \sigma \left( {\gamma - \psi \left( {h,r,t} \right)} \right) here, σ represents a sigmoid function, d represents the dimension of entity embeddings, γ is a fixed margin, (hk,r,tk) \left( {h_k^\prime,r,{t_k}} \right) and (hk,r,tk) \left( {{h_k},r,t_k^\prime} \right) are both k-th corrupted triples.

We present a new approach for drawing negative samples with this distribution: p((hk,r,tk)|(hi,r,ti))=softmax(ψ(hk,r,tk))=expψ(hk,r,tk)iexpψ(hi,r,ti) p\left( {\left( {h_k^\prime,r,t_k^\prime} \right)|\left( {{h_i},r,{t_i}} \right)} \right) = softmax \left( {\psi \left( {h_k^\prime,r,t_k^\prime} \right)} \right) = {{exp \psi \left( {h_k^\prime,r,t_k^\prime} \right)} \over {\sum\limits_i {exp \psi \left( {h_i^\prime,r,t_i^\prime} \right)} }} where (hk,r,tk) \left( {h_k^\prime,r,t_k^\prime} \right) represents (hk,r,tk) \left( {h_k^\prime,r,{t_k}} \right) or (hk,r,tk) \left( {{h_k},r,t_k^\prime} \right) . Finally, the loss function with negative samples is as follows: =k=1n[p(hk,r,tk)logσ(ψ(hk,r,tk)γ)+p(hk,r,tk)logσ(ψ(hk,r,tk)γ)]logσ(γψ(h,r,t))+λθ22 {\cal L}\, = - \sum\nolimits_{k = 1}^n {\left[ {p\left( {h_k^\prime,r,{t_k}} \right)log \,\sigma \left( {\psi \left( {h_k^\prime,r,{t_k}} \right) - \gamma } \right) + p\left( {{h_k},r,t_k^\prime} \right)log\, \sigma \left( {\psi \left( {{h_k},r,t_k^\prime} \right) - \gamma } \right)} \right]} - log\, \sigma \left( {\gamma - \psi \left( {h,r,t} \right)} \right) + \lambda \left\| \theta \right\|_2^2

Here, θ represents the parameters of the model, that is, the parameters in entity embeddings and relation embeddings. θ22 \left\| \theta \right\|_2^2 is the regular term of the model, and λ is the corresponding regular coefficient.

Full expressiveness

ContE is a simple yet expressive model. The following theorem establishes the full expressiveness of ContE.

Theorem 1

Given arbitrary ground truth on ℰ and ℛ that contains ρ facts, ContE model has embeddings with size min(|ρ| + 1, |ℰ| · |ℛ|) which represent this ground truth.

Proof

We prove the |ℰ| · |ℛ| bound. For each head entity ep, relation rq, assume νep and frq (the forward contextual vector of relation rq) are embedding vectors of size |ℰ| · |ℛ| = N, then embedding vector νrq is of size 2N. We set the m-th element of νep = 1 / 2 if m mod |ℰ| = p, otherwise 0. By the same way, we set frq [m] = 1 / 2 if m mod |ℰ| = p, otherwise 0. Furthermore, to tail entity es and brq (the backward contextual vector of relation rq), we set νes[m]=brq[m]={12ifmdiv||=sand(ep,rq,es)G12ifmdiv||=sand(ep,rq,es)G0ifmdiv||s {\nu _{{e_s}}}\left[ m \right] = {b_{{r_q}}}\left[ m \right] = \left\{ {\matrix{{{1 \over 2}\,\,\,\,\,\,if\,m\,div\,\left| {{\cal E}} \right| = s\,and\left( {{e_p},{r_q},{e_s}} \right) \in {\cal G}} \hfill \cr { - {1 \over 2}\,if\,m\,div\,\left| {{\cal E}} \right| = s\,and\left( {{e_p},{r_q},{e_s}} \right) \in {\cal G}'} \hfill \cr {0\,\,\,\,\,\,\,if\,m\,div\,\left| {{\cal E}} \right| \ne s} \hfill \cr } } \right.

Then, the (s * |ℰ| + p)-th element of product < frq + νep, brq + νes > = 1 if (ep,rq,es) holds, otherwise −1. It implies that ψ(ep,rq,es) = 1 if ep,rq,es holds, otherwise −1.

Now we prove the ρ+1 bound. Let ρ be zero (base of the induction). For each relation r and entity e with embedding vectors of size 1, and contextual vectors fr, br of size 1, we set the value for entities to 1/2, for fr to 1/2, and for br to −3/2. Then, < frq + νep, brq + νes > = −1 for entities ep, es and relation rq. Hence, there are embeddings with size ρ+1 that represent this ground truth. The conclusion is correct for the initial value 0 of ρ. Now, suppose for arbitrary ground truth satisfying ρ = m − 1 (1 ≤ m ≤ |ℛ| |ℰ|2), an value assignment to embedding vectors with size m exists which represents this ground truth (induction assumption). It is necessary to verify that for arbitrary ground truth satisfying ρ = m, a value assignments to embedding vectors with size m + 1 exists which represents this ground truth. Suppose (ep,rq,es) is one of the m facts. Take into account a changed ground truth that has m − 1 facts, and (ep,rq,es) is assigned false. Based on the induction assumption, one could denote it by some embedding vectors with size m. Suppose u = < frq + νep, brq + νes >, where frq + νep and brq + νes are the embedding vectors representing that modified ground truth. An element is added to the end of all vectors νep, νes, fr, br and set it to 0. It will increase the length of vectors to m + 1, however no score has changed. Thus one can set the last element of νep to 1/2, fr to 1/2, br to 1 and νes to −u. This ensures that < frq + νep, brq + νes > =1 for the triple (ep,rq,es), and no other score is changed.

As stated in (Seyed et al., 2018), many types of translational models, for instance, STransE, TransE, TransR, FTransE, and TransH are not fully expressive. Since DistMult forces relations to be symmetric, it is also not fully expressive. ComplEx and SimplE are both fully expressive with embedding vectors of size at most |ℰ| |ℛ|.

Ability to model relational patterns

Using contextualized representations of entities allows us to easily model primary relation patterns. Four patterns of relations, that is, inversion, symmetry, antisymmetry as well as composition, can be modeled by ContE.

r2 is an inverse relation of r1 on entities ℰ, if ∀ e1,e2, (e1,r1,e2)∈𝒢 ⇔ (e2,r2,e1) ∈ 𝒢.

r is an antisymmetric relation on entities ℰ, if ∀e1,e2, (e1,r,e2)∈𝒢 ⇒ (e2,r,e1) ∈𝒢′.

r is a symmetric relation on entities ℰ, if ∀e1,e2, (e1,r,e2)∈𝒢 ⇔ (e2,r,e1) ∈𝒢.

r3 is a compositional relation of r1 and r2 on entities ℰ, if ∀e1,e2,e3, (e1,r1,e2)∈𝒢 and (e2,r2,e3)∈𝒢 ⇒ (e1,r3,e3) ∈ 𝒢.

Theorem 2

Given a symmetry relation r on entities ℰ, r can be encoded in model ContE by tying the parameters fr = br.

Proof

If (e1,r,e2) ∈ 𝒢, then ψ(e1,r,e2) = < fr + νe1, br + νe2 >. By tying fr = br, we conclude that < fr + νe2, br + νe1 > = < fr + νe1, br + νe2 >. So, ψ(e2,r,e1) = ψ(e1,r,e2). It implies that (e2,r,e1) is a positive triple in ContE.

Theorem 3

Given an antisymmetry relation r on entities ℰ, r can be encoded in model ContE by tying the parameters fr = −br.

Proof

For any entities e1, e2 ∈ ℰ, if fr = −br, we need to prove the following inequality when proving the antisymmetry patterns: ψ(e1,r,e2)ψ(e2,r,e1) \psi \left( {{e_1},r,{e_2}} \right) \ne \psi \left( {{e_2},r,{e_1}} \right)

Firstly, we expand the left term: ψ(e1,r,e2)=<fr+νe1,br+νe2>=i=1d[fr+νe1]i[br+νe2]i=i=1d[frbr+νe1br+frνe2+νe1νe2]i \matrix{{\psi \left( {{e_1},r,{e_2}} \right)} \hfill & = \hfill & { < \,{f_r} + {\nu _{{e_1}}},\,{b_r} + {\nu _{{e_2}}}\, > } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_r} + {\nu _{{e_1}}}} \right]}_i}{{\left[ {{b_r} + {\nu _{{e_2}}}} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_r}{b_r} + {\nu _{{e_1}}}{b_r} + {f_r}{\nu _{{e_2}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}} } \hfill \cr }

We then expand the right term: ψ(e2,r,e1)=<fr+νe2,br+νe1>=i=1d[fr+νe2]i[br+νe1]i=i=1d[frbr+νe2br+frνe1+e1νe2]i=i=1d[frbrνe1brfrνe2+νe1νe2]i(fromfr=br) \matrix{{\psi \left( {{e_2},r,{e_1}} \right)} \hfill & = \hfill & { < \,{f_r} + {\nu _{{e_2}}},\,{b_r} + {\nu _{{e_1}}}\, > } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_r} + {\nu _{{e_2}}}} \right]}_i}{{\left[ {{b_r} + {\nu _{{e_1}}}} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_r}{b_r} + {\nu _{{e_2}}}{b_r} + {f_r}{\nu _{{e_1}}} + {e_1}{\nu _{{e_2}}}} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_r}{b_r} - {\nu _{{e_1}}}{b_r} - {f_r}{\nu _{{e_2}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}\left( {{\rm{from}}{{\rm{f}}_{\rm{r}}} = - {{\rm{b}}_{\rm{r}}}} \right)} } \hfill \cr }

It is obvious that these two terms are not equal because some of the terms have different signs.

Theorem 4

Assume that relation r1 is the inverse of relation r2. Then, ContE can encode r2 and r1 via making fr1 ∈ ℝd, fr2 ∈ ℝd, br1 ∈ ℝd and br2 ∈ ℝd satisfy one of the following conditions.

fr2 = br1 and br2 = fr1 = 0

br2 = fr1 and fr2 = br1 = 0

Proof

We only prove case (1) since case (2) is the same. First, we have ψ(e1,r1,e2)=<fr1+νe1,br1+νe2>=i=1d[fr1+νe1]i[br1+νe2]i=i=1d[(fr1+νe1)(br1+νe2)]i=i=1d[fr1br1+fr1νe2+νe1br1+νe1νe2]i=i=1d[νe1br1+νe1νe2]i(fromfr1=0)=i=1d[νe1fr2+νe1νe2]i(fromfr2=br1)=i=1d[fr2br2+νe2br2+νe1fr2+νe1νe2]i(fromfr2=0)=<fr2+νe2,br2+νe1>=ψ(e2,r2,e1) \matrix{{\psi \left( {{e_1},{r_1},{e_2}} \right)} \hfill & = \hfill & { < \,{f_{{r_1}}} + {\nu _{{e_1}}},\,{b_{{r_1}}} + {\nu _{{e_2}}}\, > } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_{{r_1}}} + {\nu _{{e_1}}}} \right]}_i}{{\left[ {{b_{{r_1}}} + {\nu _{{e_2}}}} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {\left( {{f_{{r_1}}} + {\nu _{{e_1}}}} \right)\left( {{b_{{r_1}}} + {\nu _{{e_2}}}} \right)} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_{{r_1}}}{b_{{r_1}}} + {f_{{r_1}}}{\nu _{{e_2}}} + {\nu _{{e_1}}}{b_{{r_1}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{\nu _{{e_1}}}{b_{{r_1}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}\,\left( {{\rm{from}}{{\rm{f}}_{{{\rm{r}}_1}}} = 0} \right)} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{\nu _{{e_1}}}{f_{{r_2}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}\,\left( {{\rm{from}}{{\rm{f}}_{{{\rm{r}}_2}}} = {{\rm{b}}_{{{\rm{r}}_{\rm{1}}}}}} \right)} } \hfill \cr {} \hfill & = \hfill & {\sum\limits_{i = 1}^d {{{\left[ {{f_{{r_2}}}{b_{{r_2}}} + {\nu _{{e_2}}}{b_{{r_2}}} + {\nu _{{e_1}}}{f_{{r_2}}} + {\nu _{{e_1}}}{\nu _{{e_2}}}} \right]}_i}\,\left( {{\rm{from}}{{\rm{f}}_{{{\rm{r}}_2}}} = 0} \right)} } \hfill \cr {} \hfill & = \hfill & { < \,{f_{{r_2}}} + {\nu _{{e_2}}},\,{b_{{r_2}}} + {\nu _{{e_1}}}\, > \, = \,\psi \left( {{e_2},{r_2},{e_1}} \right)} \hfill \cr }

Hence, (e1,r3,e3)∈𝒢 ⇔ (e2,r2,e1) ∈ 𝒢.

Composition Pattern We present an analysis of existing models on their abilities in inferring and modeling these four relation patterns. The results are summarized in Table 2. Since deep neural networks (DNNs) are black boxes, it is hard to analyze which pattern they can model. The corresponding patterns of DNN based models such as ConvE and ConvKB are denoted as “−”, i.e., “unknown”. It's worth mentioning that TransE, Rotate and HAKE have drawbacks in modeling composition patterns. We select Rotate model to illustrate this issue, the other two models are similar. For example, suppose there are three persons/entities e1, e2, and e3. e1 is the father of e2 (denoted as r1) and e2 is the brother of e3 (denoted as r2). Then e1 is the father of e3 (denoted as r3). Note that r1 is the same relation as r3. We make a formal analysis of this example. For entity vectors νe1, νe2, νe3, and relation vectors νr1, νr2 and νr3 satisfying νe2 = νe1νr1, νe3 = νe2νr2, and νe3 = νe1νr3, it can be obtained that νr3 = νr1νr2. Assume that r3 = r1, according to Rotate model, νr2 must be an all-one vector. That is, all relations r2 satisfying the above conditions must be all-one vectors. This is unreasonable. ContE can model the composition pattern without this problem.

Relation pattern modeling and inference abilities of baseline models.

Model Symmetry Antisymmetry Inversion Composition
TransE (Antoine et al., 2013) ×
DistMult (Yang et al., 2015) × × ×
ComplEx (Trouillon et al., 2016) ×
SimplE (Seyed & David, 2018) ×
ConvE (Dettmers et al., 2018)
ConvKB (Nguyen et al., 2018)
RotatE (Sun et al., 2019)
HAKE (Zhang et al., 2020)
KGCR (Pu et al., 2020) ×
LineaRE (Peng & Zhang, 2020)
ContE

Time Complexity We further analyse the time complexity of ContE model. Let the number of entities, relations and triples of 𝒢 be ne, nr and nt, respectively. Let d and 2d be the dimensions of the embedding vectors of entity and relation, respectively. It is easy to obtain that the time complexity of ContE is dnt = O(d) from formula (2). The complexity of ContE is the same as TransE. The number of parameters needed for ContE is O(ned + 2nrd). ContE has less parameters and no matrix multiplication operations, so it could be applied to large-scale KGs. The comparison between SOTA baselines and ContE in terms of parameters and time complexity is summarized in Table 3.

Comparison of SOTA baselines and ContE model in terms of time complexity and number of parameters.

Models #Parameters Time Complexity
TransE O(ned + nrd) O(d)
NTN O(ned + nrd2k) O(d3)
ComplEx O(ned + nrd) O(d)
TransR O(ned + nrdk) O(dk)
SimplE O(ned + nrd) O(d)
ContE O(ned + 2nrd) O(d)

Summary on datasets.

Dataset |E| |R| |training| |validation| |test|
FB15K-237 14,541 237 272,115 17,535 20,466
UMLS 135 46 5,216 652 661
Nations 14 55 1,592 199 201
Countries_S1 271 2 1,111 24 24
Countries_S2 271 2 1,063 24 24
Countries_S3 271 2 985 24 24
Experiments

We evaluate the performance of ContE on four standard benchmarks, i.e., UMLS, Nations, FB15K-237 and Countries. Table 3 summarizes the statistics of these datasets. UMLS (Kok & Domingos, 2007) contains data from the Unified Medical Language System, a database of biomedical ontology. It contains 135 concepts and 46 relations, of which 6,529 atoms are true. FB15K (Antoine et al., 2013) is extracted from Freebase which is a large KG containing general knowledge facts. FB15K-237 removed inverse relations from FB15K because through inverse relations FB15K leads to test leakage (Dettmers et al., 2018). Countries dataset (Guillaume, Sameer, & Theo, 2015) contains 272 entities and 2 relations, which is used to test the ability of modelling and reasoning composition pattern. The two relations are isInside(e1,e2) and isNeighbor(e1,e2), respectively. There are three versions of Counties dataset, called S1, S2, and S3 which test the ability for 1-step, 2-step, and 3-step reasoning, respectively. We test the reasoning ability of ContE with SOTA baselines using Countries dataset.

Evaluation Protocol We use the following metrics of correct entities to compare ContE with baselines: To predict e1 in each test triple (e1,r,e2), a KG model scores all the corrupted triples in set T={(e1,r,e2)|e1} T' = \left\{ {\left( {e_1^\prime,r,{e_2}} \right)|e_1^\prime \in {\cal E}} \right\} . Sorting scores of the correct and corrupted triples in descending order, then one get the ranking values of the valid triples in the list. Analogously, we can obtain another ranking value of triple (e1,r,e2) via replacing the tail entity e2. Aggregated on all test triples, five metrics are reported: MR (Mean Rank) is an average of all predicting rankings of valid triples, MRR (Mean Reciprocal Rank) is an average of inverse rankings of valid triples. Usually, MRR is a better metric than MR. Hits@n is the proportion of predicting ranks of the valid triples are not larger than n where n = 1,3,10. The metrics of MRR and Hits@n are utilized in our experiments. The higher Hits@n or the lower MR or the higher MRR, the better performance. We choose the best MRR as the experimental results for all datasets. Notice that if there is a corrupted triple in KG, which means it is correct, it is reasonable to sort it before the original triple. If one select the corrupted triple regardless of whether it is correct or not, this setting is called “raw”. If one selects corrupted triples that do not exist in training, valid, and test set, this setting is called filtered. We report experimental results in the filtered setting (Antoine et al., 2013).

Experimental Setup ContE utilizes the optimizer Adam to optimize the loss function and tunes its hyper-parameters over the validation set. We conduct grid search for finding hyper-parameters to maximize the value of MRR. The hyper-parameters are listed as follows: k (embedding dimension), lr (learning rate), γ (maximal margin), λ (regular coefficient), batch_size (size of each batch), max_ steps (max steps during iteration) and n (negative sample size).

Results We compare ContE with several baseline models to show empirically the performance of modeling the relation patterns and predicting the missing links.

Performance on UMLS Table 5 summarizes the comparison results between ContE and the listed SOTA baselines on UMLS dataset. The results of DistMult, ConvE, NeuralLP (Yang, Zhang, & Cohen, 2017), NTP-λ (Rocktaschel & Riedel, 2017), MINERVA (Das et al., 2018), KGRRS+ComplEx and KGRRS+ConvE are from (Lin, Socher, & Xiong, 2018). “−” means no result of this metric is reported in original paper. For TransE, we rerun the code given by (Sun et al., 2019), and do a grid search to find the best values of hyper-parameters. The range of hyper-parameters is batch_size = 20, k = 500, max_steps = 10000, n ∈ {1,2,3,4,5,6,7,8,9,10}, γ ∈ {1,2,3,4,5,6,7,8,9,10}, lr ∈ {0.0005,0.001}. For ComplEx, the range of hyper-parameters is batch_size=20, k ∈ {500,1000}, max_steps = 10000, n ∈ {1,2,3,4,5, 6,7,8}, γ ∈ {1,2,4,6,8,10,11,12, 13,14}, lr ∈ {0.0005,0.0008,0.001,0.002}. For Rotate, the range of hyper-parameters is batch_size = 20, k ∈ {500,1000}, max_steps = 10000, n ∈ {1,2,3,4,5,6,7,8,9,10, 11,12}, γ ∈ {1,2,3,4,5,6,7,8,9,10,11,12,13}, lr ∈ {0.0005,0.001,0.002}. For HAKE, the range of hyper-parameters is batch_size ∈ {20,30,100,256}, k = 1000, max_steps ∈ {9000,10000}, n ∈ {4,5,6,7,8,9,10,20}, γ ∈ {0.4,0.5,0.6,0.8,0.9,1,2,3,4,5,6,7,8}, lr ∈ {0.001,0.002}, modulus_weight ∈ {1.0,2.5,3.5} and phase_weight ∈ {0.5,1.0,2.5}. For LineaRE, the range of hyper-parameters is k ∈ {250,500,1000}, temperature of sampling α ∈ {0.5,1.0}, batch_ size b ∈ {128,256,512}, learning_rate lr = 0.001, drop_rate = 0.05, n ∈ {128,256,512}, γ ∈ {9,12}. For ContE, the range of hyper-parameters is λ = 0, batch_size ∈ {16,20,32,50,128,512}, k ∈ {500,1000, 2000,3000,3500}, max_steps ∈ {9000,10000}, n ∈ {3,4,5,6,7,8,9,10,20,30,50,100}, γ ∈ {1,2,3,4,5,6,7,8,9,10}, lr ∈ {0.0005,0.000 8,0.001,0.0015,0.002,0.003}.

Experimental results for UMLS.

UMLS

MRR Hits@N

1 3 10
TransE (Antoine et al., 2013) 0.7966 0.6452 0.9418 0.9841
DistMult (Yang et al., 2015) 0.868 0.821 - 0.967
ComplEx (Trouillon et al., 2016) 0.8753 0.7942 0.9531 0.9713
ConvE (Dettmers et al., 2018) 0.957 0.932 - 0.994
NeuralLP (Yang, Zhang, & Cohen, 2017) 0.778 0.643 - 0.962
NTP-λ (Rocktaschel et al., 2017) 0.912 0.843 - 1.0
MINERVA (Das et al., 2018) 0.825 0.728 - 0.968
KGRRS+ComplEx (Lin et al., 2018) 0.929 0.887 - 0.985
KGRRS+ConvE (Lin et al., 2018) 0.940 0.902 - 0.992
Rotate (Sun et al., 2019) 0.9274 0.8744 0.9788 0.9947
HAKE (Zhang et al., 2020) 0.8928 0.8366 0.9387 0.9849
LineaRE (Peng & Zhang, 2020) 0.9508 0.9145 0.9856 0.9992
ContE 0.9677 0.9501 0.9811 1.0

The ContE model achieves the best performance on all metrics except metric Hits@3 in Table 5.

Performance on Nations Table 6 summarizes the comparison results between ContE and the listed SOTA baselines on Nations dataset.

Experimental results for Nations.

Nations

MRR Hits@N

1 3 10
TransE (Antoine et al., 2013) 0.4813 0.2189 0.6667 0.9801
DistMult (Yang et al., 2015) 0.7131 0.5970 0.7761 0.9776
ComplEx (Trouillon et al., 2016) 0.6677 0.5274 0.7413 0.9776
ConvE (Dettmers et al., 2018) 0.5616 0.3470 0.7155 0.9946
Rotate (Sun et al., 2019) 0.7155 0.5796 0.7985 1.0
HAKE (Zhang et al., 2020) 0.7157 0.5945 0.7786 0.9851
LineaRE (Peng & Zhang, 2020) 0.8146 0.7114 0.8881 0.9975
ContE 0.8412 0.7587 0.9179 1.0

For all listed baselines except ConvE, we rerun the codes given by their original papers, and do a grid search to find the best values of hyper-parameters. The results of ConvE are obtained by PyKEEN (Ali et al., 2021). For TransE, the range of hyper-parameters is batch_size = 20, k ∈ {500,1000}, max_steps = 10000, n ∈ {4,64,256,512,1024}, γ ∈ {0.5,1,6,9,10,12,24,100}, lr ∈ {0.00005,0.001}. For DistMult, the range of hyper-parameters are batch_size = 20, k ∈ {500,1000}, n ∈ {4,64,128,256,512}, max_steps = 10000, γ ∈ {0.5,2,6,9,10}, lr ∈ {0.00005,0.001}. For ComplEx, the range of hyper-parameters is batch_size = 20, k ∈ {500,1000}, max_steps = 10000, n ∈ {4,64,128,256,512,1024}, γ ∈ {0.5,1,2,6,9,10,12,13,24,100}, lr ∈ {0.00005,0.001}. For Rotate, the range of hyper-parameters is batch_size = 20, k ∈ {500,1000}, max_steps = 10000, n ∈ {4,5,6,7,8,9,10}, γ ∈ {1,2,3,4,5,6,7,8,9,10}, lr ∈ {0.0005,0.001}. For HAKE, the range of hyper-parameters is batch_size ∈ {20,30,100,256}, k ∈ {500,1000}, n ∈ {4,5,6,7,8,9,10}, max_steps = 10000, γ ∈ {1,2,3,4,5,6,7,8,9,10}, lr ∈ {0.0005,0.001}, modulus_weight ∈ {1.0,2.5,3.5} and phase_weight ∈ {0.5,1.0,2.5}. For LineaRE, the range of hyper-parameters is k ∈ {250,500,1000}, temperature of sampling α ∈ {0.5,1.0}, batch_size b ∈ {128,256,512}, learning_rate lr = 0.001, drop_rate = 0.05, n ∈ {128,256,512}, γ ∈ {9,12}. For ContE, the range of hyper-parameters is λ = 0, batch_size b ∈ {16,20,30}, k ∈ {1000,2000}, max_steps ∈ {6000,7000,8000,9000}, n ∈ {3,4,5,6,7,8,9,10}, γ ∈ {1,2,3,4,5,6,7,8,9,10}, lr ∈ {0.0005,0.0006,0.0007,0.001}.

From Table 6, ContE is the best model which outperforms the best SOTA baseline by 2.66% on metric MRR and by 4.73% on metric Hits@1, by 2.98% on metric Hits@3, respectively.

Performance on FB15K-237 Table 7 summarizes the comparison results between ContE and the listed SOTA baselines on FB15K-237 dataset. The results of the baselines such as TransE, DistMult, ComplEx, and Rotate are from (Sun et al., 2019). The results of R-GCN are from its original paper and CapsE, ConvKB, ConvE are from (Sun et al., 2020). For SimplE we rerun the codes given by its original papers, and do a grid search to find the best values of hyper-parameters. The range of its hyper-parameters is n ∈ {1,2,5,10}, lr ∈ {0.001,0.05,0.01,0.02}, k ∈ {100,200,500}, number_of_epoches = 1000. For ContE, the range of hyper-parameters is λ ∈ {0,0.01,0.05,0.1}, batch_size ∈ {64,128,256,512,1024}, k ∈ {1000,2000}, max_steps ∈ {6000,7000,8000,9000}, n ∈ {256,512,1024}, γ ∈ {1,5,10,12,14,18,20}, lr ∈ {0.00005,0.0005,0.001,0.002}.

Experimental results for FB15K-237.

FB15K-237

MRR Hits@N

1 3 10
TransE (Antoine et al., 2013) 0.279 0.198 0.376 0.441
DistMult (Yang et al., 2015) 0.281 0.199 0.301 0.446
ComplEx (Trouillon et al., 2016) 0.278 0.194 0.297 0.45
ConvE (Dettmers et al., 2018) 0.312 0.225 0.341 0.497
ConvKB (Nguyen et al., 2018) 0.289 0.198 0.324 0.471
R-GCN (Schlichtkrull et al., 2018) 0.164 0.10 0.181 0.30
SimplE (Seyed & David, 2018) 0.169 0.095 0.179 0.327
CapsE (Nguyen et al., 2019) 0.150 - - 0.356
Rotate (Sun et al., 2019) 0.338 0.241 0.375 0.533
ContE 0.3445 0.2454 0.3823 0.5383

Table 7 shows that ContE achieves the best MRR as well as Hits@k (k=1,3,10) compared with the SOTA baselines.

Performance on Countries Table 8 and Table 9 summarize the comparison results between ContE and the listed SOTA baselines on Countries_S1, Countries_ S2 and Countries_S3 datasets, respectively. For all baselines, the range of hyper-parameters is similar to that of UMLS, Nations, FB15K-237.

Experimental results for Countries_S1.

Countries_S1

MRR Hits@N

1 3 10
TransE (Antoine et al., 2013) 0.8785 0.7708 1.0 1.0
DistMult (Yang et al., 2015) 0.9028 0.8125 1.0 1.0
ComplEx (Trouillon et al., 2016) 0.9792 0.9583 1.0 1.0
Rotate (Sun et al., 2019) 0.8750 0.7708 1.0 1.0
HAKE (Zhang et al., 2020) 0.9045 0.8333 0.9792 1.0
LineaRE (Peng & Zhang, 2020) 1.0 1.0 1.0 1.0
ContE 1.0 1.0 1.0 1.0

Experimental results for Countries_S2 and Countries_S3.

Countries_S2 Countries_S3


MRR Hits@N MRR Hits@N


1 3 10 1 3 10
TransE 0.6997 0.50 0.9375 1.0 0.1206 0.00 0.0833 0.3542
DistMult 0.7813 0.5833 1.0 1.0 0.2496 0.0625 0.333 0.6250
ComplEx 0.7934 0.6042 0.9792 1.0 0.2731 0.0833 0.3958 0.6667
Rotate 0.6979 0.4792 0.9583 1.0 0.1299 0.00 0.0833 0.4792
HAKE 0.6667 0.4583 0.8333 0.9583 0.2472 0.0625 0.3333 0.5417
LineaRE 0.7873 0.6458 0.9583 0.9792 0.2393 0.0625 0.3542 0.5208
ContE 0.8370 0.7292 0.9583 0.9792 0.4695 0.3542 0.5 0.625

As can be seen from Tables 8 and 9, ContE shows the stronger reasoning ability compared with other baselines. Specially, on Countries_S2, ContE outperforms the best SOTA baseline by 4.36% on metric MRR and by 8.32% on metric Hits@1, respectively. On Countries_S3, ContE outperforms the best SOTA baseline by 19.64% on metric MRR, by 27.09% on metric Hits@1, and by 10.42% on metric Hits@3, respectively.

Inferring relation patterns We evaluate ContE on datasets UMLS and Nations to test the ability of inferring relation patterns. Table 10 and Table 11 summarize the comparison results between ContE and the listed SOTA baselines on metric MRR. Note that UMLS and Nations have no symmetric relation. Thus, the corresponding column is denoted as “−”.

Testing the ability of inferring relation patterns on UMLS.

UMLS

Symmetry Antisymmetry Inversion Composition
ComplEx - 0.8806 0.8615 0.8732
Rotate - 0.9065 0.9069 0.9141
HAKE - 0.8558 0.8650 0.8723
LineaRE - 0.9446 0.9441 0.9490
ContE - 0.9644 0.9622 0.9645

Testing the ability of inferring relation patterns on Nations.

Nations

Symmetry Antisymmetry Inversion Composition
ComplEx - 0.6499 0.6487 0.6499
Rotate - 0.6825 0.6970 0.6907
HAKE - 0.6953 0.6835 0.6844
LineaRE - 0.8189 0.8240 0.8333
ContE - 0.8256 0.8326 0.8303

From Tables 10 and 11, ContE shows the stronger capability of inferring relation patterns compared with other baselines.

Conclusion

We propose the ContE model—a novel entity embedding model considering contexts of relations for KG completion. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship. ContE is a simple bilinear model. As the number of entities or relations increases, the number of its parameters increases linearly relative to the dimension of the embeddings. ContE is also a fully expressive model compared to other methods such as TransE and HAKE. It is worth mentioning that ContE can model symmetry, antisymmetry, inversion as well as composition patterns. ContE has stronger reasoning capability compared with SOTA baselines.

The empirical results demonstrate that, for the link prediction task, ContE achieves better performance than existing SOTA baselines. In future works, we can leverage the entity's neighborhood to explore local graph structure and use attention to capture the main sense of the entity from its heterogeneous neighborhood, and will further extend the ContE model to multi-modal knowledge graphs (Liu, Li, & Alberto, 2019) that contain not only (links to) images but also numerical features for all entities.

eISSN:
2543-683X
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining