Uneingeschränkter Zugang

Ontology optimization tactics via distance calculating


Zitieren

Introduction

Ontology originally comes from philosophy. It is used to describe the natural connection of things and their components’ inherently hidden connections. Ontology is set up as a model for knowledge storage and representation in information and computer science. It has been extensively applied in different fields such as knowledge management, machine learning, information systems, image retrieval, information retrieval search extension, collaboration and intelligent information integration. As a conceptually semantic model and an analysis tool, being quite effective, ontology has been favored by researchers from pharmacology science, biology science, medical science, geographic information system and social sciences since a few years ago (for instance, see Przydzial et al., [1], Koehler et al., [2], Ivanovic and Budimac [3], Hristoskova et al., [4], and Kabir [5]).

A simple graph is usually used by researchers to represent the structure of ontology. Every concept, objects and elements in ontology are made to correspond to a vertex. Each (directed or undirected) edge on an ontology graph represents a relationship (or potential link) between two concepts (objects or elements). Let O be an ontology and G be a simple graph corresponding to O. It can be attributed to getting. We use the similarity calculating function, the nature of ontology engineer application to compute the similarities between ontology vertices, which represent the intrinsic link between vertices in ontology graph. The ontology similarity measuring function is obtained by measuring the similarity between vertices from different ontologies. That is the goal of ontology mapping. The mapping serves as a bridge connecting different ontologies. Only through mapping, we gain a potential association between the objects or elements from different ontologies. The semi-positive score function Sim : V × V → ℝ+ ∪ {0} maps each pair of vertices to a non-negative real number.

Several effective methods exist for getting efficient ontology similarity measure or ontology mapping algorithm in terms of ontology function. The ontology similarity calculation in terms of ranking learning technology was considered by Wang et al., [12]. The fast ontology algorithm in order to cut the time complexity for ontology application was raised by Huang et al., [13]. An ontology optimizing model in which the ontology function is determined by virtue of NDCG measure was presented by Gao and Liang [14], which is successfully applied in physics education. More ontology applications on various engineering can be refered to Gao et al., [11].

In this article, we determine a new ontology learning method by means of distance calculating. Moreover, we give a theoretical analysis for proposed ontology algorithm.

Algorithm Description

Let S={(vi,vj,yij)}i,j=1N$\mathscr{S}=\{(v_{i},v_{j},y_{ij})\}_{i,j=1}^{N}$ be the ontology training data, where vi,vj ∊ ℝp are ontology vectors and yij = ±1 (if vi and vj are similar, then yij = 1; otherwise, yij = −1. We also fixed m relevant source ontology training sets Sq={(vqi,vqj,yqij)}i,j=1Nq$\begin{array}{} \displaystyle \mathscr{S}_{q}=\{(v_{qi}, v_{qj}, y_{qij})\}_{i,j=1}^{N_{q}} \end{array}$ (q = 1,...,m) if the number of target ontology training samples N is not large, and vqi,vqj ∊ ℝp belong to the certain ontology feature space as vi, vj in this setting.

We aim to learn a distance function d(vi,vj|W) = (vi − vj)T W(vi − vj) which equals to learning a distance matrix W, and the similarity or dissimilarity between a ontology vertex pair vi and vj is obtained by comparing d(vi,vj|W) with a constant threshold parameter c. Specifically, our ontology optimization problem can be stated as

arg min1(N2)i<jg(yij[1vivjW2])+η2WF2s.t.i=1mαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg }}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} g({y_{ij}}[1 - \left\| {{v_i} - {v_j}} \right\|_{\bf{W}}^2]) + \frac{\eta }{2}\left\| {\bf{W}} \right\|_F^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$

where vivjW2=(vivj)TW(vivj),g(z)=max (0,bz)$\begin{array}{} \displaystyle {v_i} - {v_j}_{\bf{W}}^2 = {({v_i} - {v_j})^T}{\bf{W}}({v_i} - {v_j}),g\left( z \right) = {\rm{max }}\left( {0,b - z} \right) \end{array}$ is an ontology hinge loss function, ||W||F is the Frobenius norm of the metric W which is applied to control the model complexity, η is a balance parameter, and the constraint condition reveals that W is positive semi-definite.

The general version of ontology distance learning approach is formulated by

argmin1(N2)i<jL(vi,vj,yij)+γ12WWDF2+γ22α22+γ3θ1,s.t.i=1mαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} L({v_i},{v_j},{y_{ij}}) + \frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$

where W=i=1nθiuiuiT$\begin{array}{} \displaystyle {\bf{W}} = \sum\nolimits_{i = 1}^n {{\theta _i}{{\bf{u}}_i}{\bf{u}}_i^T} \end{array}$ and WD=q=1mαqWq$\begin{array}{} \displaystyle {{\bf{W}}_D} = \sum\nolimits_{q = 1}^m {{\alpha _q}{{\bf{W}}_q}} \end{array}$. Both α22$\begin{array}{} \displaystyle \|\alpha\|_{2}^{2} \end{array}$ and ||θ||1 are employed to control the complexity of model. In what follows, γ1, γ2 and γ3 are all positive balance parameters.

Select L(vi,vj,yij)=g(yij[1vivjW2])$\begin{array}{} \displaystyle L(v_{i},v_{j},y_{ij})=g(y_{ij}[1-\|v_{i}-v_{j}\|_{{\bf W}}^{2}]) \end{array}$ and use the ontology hinge loss for g, that is to say, g(z) = max(0, bz) and b is set to be 0. Thus, we deduce the following ontology optimization problem:

argmin1(N2)i<jg(yij[1vivjW2])+γW2WWDF2+γ22α22+γ3θ1,s.t.i=1nαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} g({y_{ij}}[1 - \left\| {{v_i} - {v_j}} \right\|_{\bf{W}}^2]) + \frac{{{\gamma _{\bf{W}}}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^n_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$

For short expressions, we use vi, xj and yi j to denote vk1$\begin{array}{} \displaystyle v_{k}^{1} \end{array}$, vk2$\begin{array}{} \displaystyle v_{k}^{2} \end{array}$ and yk with k=1,,(N2)=N$\begin{array}{} \displaystyle k = 1, \cdot \cdot \cdot ,\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right) = N\prime \end{array}$. Let δk=vk1vk2$\begin{array}{} \displaystyle \delta_{k}=v_{k}^{1}-v_{k}^{2} \end{array}$ with vk1vk2W2=i=1nθiδkTuiuiTδk=δTfk$\begin{array}{} \displaystyle \left\| {v_k^1 - v_k^2} \right\|_{\bf{W}}^2 = \sum\nolimits_{i = 1}^n {{\theta _i}\delta _k^T{{\bf{u}}_i}{\bf{u}}_i^T{\delta _k}} = {\delta ^T}{f_k} \end{array}$, fk=[fk1,,fkn]T$\begin{array}{} \displaystyle f_{k}=[f_{k}^{1},\cdots, f_{k}^{n}]^{T} \end{array}$ and fki=δkTuiuiTδk$\begin{array}{} \displaystyle f_{k}^{i}=\delta_{k}^{T}{\bf u}_{i}{\bf u}_{i}^{T}\delta_{k} \end{array}$. Therefore, the ontology problem (3) can be re-expressed as

argminα,θ1Nk=1Ng(yk[1θTfk)+γW2WWDF2+γ22α22+γ3θ1,s.t.k=1nαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\bf{arg}}\mathop {\min }\limits_{_{\alpha ,\theta }} \frac{1}{{N'}}\sum\limits_{k = 1}^{N'} {g({y_k}[1 - {\theta ^T}{f_k})} + \frac{{{\gamma _{\bf{W}}}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},} \\ {{\rm{s}}{\rm{.t}}.\quad \sum\limits_{k = 1}^n {{\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m} .} \\ \end{array} \end{array}$$

The answer can be inferred by alternating between two sub ontology problems (minimization α = [α1,··· ,αm]T and θ = [θ1,··· ,θn]T respectively) until its convergence.

Given α, the ontology optimization problem with respect to θ then it can be stated as

argminθF(θ)=Λ(θ)+Ω(θ)$$\begin{array}{} \displaystyle {\rm arg}\min\limits_{\theta}F(\theta)=\Lambda(\theta)+\Omega(\theta) \end{array}$$

where Λ(θ)=1NNk=1g(yk[1θTfk)+γ3θ1$\begin{array}{} \displaystyle \Lambda (\theta ) = \frac{1}{{{N^\prime }}}\sum\nolimits_N^{k = 1} {g({y_k}[1 - {\theta ^T}{f_k}) + {\gamma _3}{{\left\| \theta \right\|}_1}} \end{array}$, and Ω(θ)=γW2WWDF2$\begin{array}{} \displaystyle \Omega(\theta) =\frac{\gamma_{{\bf W}}}{2}\|{\bf W}-{\bf W}_{D}\|_{F}^{2} \end{array}$. Since the ontology loss part Λ(θ) is non-differentiable, we should smooth the ontology loss and then solve (5) in terms of the gradient trick. Let Θ = {x : 0 ≤ xk ≤ 1,x ∊ ℝN′} and σ be the smooth parameter. Then, the smoothed expression of the ontology hinge loss g(fk, yk, θ) = max{0, −yk(1 − θT fk)} can be formulated as

gσ=maxxΘxk(yk(1θTfk))σ2fkxk2,$$\begin{array}{} \displaystyle g_{\sigma}=\max\limits_{x\in\Theta}x_{k}(-y_{k}(1-\theta^{T}f_{k}))-\frac{\sigma}{2}\|f_{k}\|_{\infty}x_{k}^{2}, \end{array}$$

where ||fk|| term is used as a normalization. In view of setting the objective ontology function of (6) to 0 and projecting xk on Θ, we infer the following solution: xk=median{yk(1θTfk)σfk,0,1}$\begin{array}{} \displaystyle x_{k}={\rm median}\{\frac{-y_{k}(1-\theta^{T}f_{k})}{\sigma\|f_{k}\|_{\infty}},0,1\} \end{array}$. Furthermore, the piece-wise approximation of g can be expressed as

gσ={0,yk(1θTfk)>0yk(1θTfk)σ2fk, yk(1θTfk)<σfk(yk(1θTfk))22σfk,Otherwise.$$\begin{array}{} \displaystyle {g_\sigma } = \left\{ {\begin{array}{*{20}{c}} {0,} \hfill & {{y_k}(1 - {\theta ^T}{f_k}) \gt 0} \hfill \\ { - {y_k}(1 - {\theta ^T}{f_k}) - \frac{\sigma }{2}{{\left\| {{f_k}} \right\|}_\infty },} \hfill & {{y_k}(1 - {\theta ^T}{f_k}) < - \sigma {{\left\| {{f_k}} \right\|}_\infty }} \hfill \\ {\frac{{{{({y_k}(1 - {\theta ^T}{f_k}))}^2}}}{{2\sigma {{\left\| {{f_k}} \right\|}_\infty }}},} \hfill & {{\rm{Otherwise}}.} \hfill \\ \end{array}} \right. \end{array}$$

By the computation and deduction, the gradient of the smoothed hinge ontology loss gσ (θ ) is

gσ(fk,yk,θ)θ=ykfkxk.$$\begin{array}{} \displaystyle \frac{\partial g_{\sigma}(f_{k},y_{k},\theta)}{\partial\theta}=y_{k}f_{k}x_{k}. \end{array}$$

Let HΛ = [ f1,··· , fN] and Y = diag(y). We get gσ(θ)θ=kykfkxk=HRYx$\begin{array}{} \displaystyle \frac{{{g_\sigma }(\theta )}}{{\partial \theta }} = \sum\nolimits_k {{y_k}{f_k}{x_k}} = {H^R}Yx \end{array}$, and Lg(θ)NσmaxfkfkT2fk$\begin{array}{} \displaystyle L^{g}(\theta)\frac{N'}{\sigma}\max\frac{\|f_{k}f_{k}^{T}\|_{2}}{\|f_{k}\|_{\infty}} \end{array}$ is the Lipschitz constant of gσ (θ ).

By setting l(θ) = ||θ||1, we infer the approximation of l with the smooth parameter σ as

lσ'={θrσ2,θr<σθrσ2, θr>σθr22σ,Otherwise.$$\begin{array}{} \displaystyle l_{\sigma}^{'}=\left\{\begin{array}{ll}-\theta_{r}-\frac{\sigma'}{2},& \hbox{$\theta_{r}<-\sigma'$} \\ \theta_{r}-\frac{\sigma'}{2},& \hbox{$\theta_{r}>\sigma'$}\\ \frac{\theta_{r}^{2}}{2\sigma'},& \hbox{Otherwise}. \end{array}\right. \end{array}$$

Furthermore, for each xr'=median{θrσ,1,1}$\begin{array}{} \displaystyle x_{r}^{'}={\rm median}\{\frac{\theta_{r}}{\sigma'},-1, 1\} \end{array}$, the gradient can be computed by ni=1lσ(θr)θ=x$\begin{array}{} \displaystyle \frac{\partial\sum_{i=1}^{n}l_{\sigma'}(\theta_{r})}{\partial\theta}=x' \end{array}$ and the Lipschitz constant is denoted as Ll(θ)=1σ$\begin{array}{} \displaystyle L^{l}(\theta)=\frac{1}{\sigma'} \end{array}$.

Moreover, set HstΩ=γ1Tr((ususT)(ututT))$\begin{array}{} \displaystyle H_{st}^{\Omega}=\gamma_{1}{\rm Tr}(({\bf u}_{s}{\bf u}_{s}^{T})({\bf u}_{t}{\bf u}_{t}^{T})) \end{array}$ and frΩ=γ1Tr(WST(ututT))$\begin{array}{} \displaystyle f_{r}^{\Omega}=\gamma_{1}{\rm Tr}({\bf W}_{S}^{T}({\bf u}_{t}{\bf u}_{t}^{T})) \end{array}$, we have Ω(θ)θ=HΩθfΩ$\begin{array}{} \displaystyle \frac{\partial\Omega(\theta)}{\partial\theta}=H^{\Omega}\theta-f^{\Omega} \end{array}$, Fσ(θ)θ=1NHRYx+γcx+HΩθfΩ$\begin{array}{} \displaystyle \frac{\partial F_{\sigma}(\theta)}{\partial\theta}=\frac{1}{N'}H^{R}Yx+\gamma cx'+H^{\Omega}\theta-f^{\Omega} \end{array}$ and Lσ=1σmaxkfkfkT2fk+γCσ+HΩ2$\begin{array}{} \displaystyle L_{\sigma}=\frac{1}{\sigma}\max_{k}\frac{\|f_{k}f_{k}^{T}\|_{2}}{\|f_{k}\|_{\infty}}+\frac{\gamma C}{\sigma'}+\|H^{\Omega}\|_{2} \end{array}$ is the Lipschitz constant of F(θ ).

Denote θt, yt and zt as the solutions in the t-th iteration round, and use θ^$\begin{array}{} \displaystyle \widehat{\theta} \end{array}$ as a guessed solution of θ. We obtain that Lσ is the Lipschitz constant of Fσ (θ ) and the two attached ontology optimizations are stated as

miny<Fσ(θt),yθt>+Lσ2yθt22$$\begin{array}{} \displaystyle \min\limits_{y} \lt \bigtriangledown F_{\sigma}(\theta^{t}),y-\theta^{t} \gt +\frac{L_{\sigma}}{2}\|y-\theta^{t}\|_{2}^{2} \end{array}$$

and

minzi=0ti+12[Fσ(θi)+<Fσ(θi),yθi>]+Lσ2yθ^22,$$\begin{array}{} \displaystyle \min\limits_{z}\sum_{i=0}^{t}\frac{i+1}{2}[F_{\sigma}(\theta^{i})+ \lt \bigtriangledown F_{\sigma}(\theta^{i}),y-\theta^{i} \gt ]+\frac{L_{\sigma}}{2}\|y-\widehat{\theta}\|_{2}^{2}, \end{array}$$

respectively. Set the gradients of the two objective ontology functions in the above two attached ontology problems to be zeros, we yield yt=θt1LσFσ(θt)$\begin{array}{} \displaystyle y^{t}=\theta^{t}-\frac{1}{L_{\sigma}}\bigtriangledown F_{\sigma}(\theta^{t}) \end{array}$ and zt=θ^1Lσi=0ti+12Fσ(θi)$\begin{array}{} \displaystyle z^{t}=\widehat{\theta}-\frac{1}{L_{\sigma}}\sum_{i=0}^{t}\frac{i+1}{2}\bigtriangledown F_{\sigma}(\theta^{i}) \end{array}$. Hence, we deduce θt+1=2t+3zt+t+1t+3yt$\begin{array}{} \displaystyle \theta^{t+1}=\frac{2}{t+3}z^{t}+\frac{t+1}{t+3}y^{t} \end{array}$ and the stop criterion is given by |Fσ (θt+1) − Fσ (θt)| < ε.

Given θ , the optimization ontology problem on parameter α can be stated as

argminαγ12Wp=1mαpWpF2+γ22α22s.t.q=1mαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\;{\min\limits_\alpha }\frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - \mathop \sum\limits^m_{p = 1} {\alpha _p}{{\bf{W}}_p}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{q = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$

And, the ontology problem (9) can be expressed in compact form which is stated by

argminα12αTHαT+γ22α22s.t.q=1mαq=1,αq0,q=1,,m.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\;{\min\limits_\alpha }\frac{1}{2}{\alpha ^T}{\bf{H}}{\alpha ^T} + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{q = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$

where f = [ f1,··· , fm] with fq = γ1Tr(WTWq), and H is a symmetric positive semi-definite matrix such that Hst=γ1Tr(WsTWt)$\begin{array}{} \displaystyle {\bf H}_{st}=\gamma_{1}{\rm Tr}({\bf W}_{s}^{T}{\bf W}_{t}) \end{array}$. We only choose two elements αi and αj to update for each iteration. In order to meet the restraint q=1mαq=1$\begin{array}{} \sum\nolimits_{q = 1}^m {{\alpha _q}} = 1 \end{array}$, we get αi*+αj*=αi+αj$\begin{array}{} \displaystyle \alpha_{i}^{*}+\alpha_{j}^{*}=\alpha_{i}+\alpha_{j} \end{array}$, where αi*$\begin{array}{} \displaystyle \alpha_{i}^{*} \end{array}$ and αj*$\begin{array}{} \displaystyle \alpha_{j}^{*} \end{array}$ are the solutions of the current iteration. Then, according to (10) and set εij=(HiiHijHji+Hjj)αik(HikHjk)αk$\begin{array}{} \displaystyle {\varepsilon _{ij}} = ({H_{ii}} - {H_{ij}} - {H_{ji}} + {H_{jj}}){\alpha _i} - \sum\nolimits_k {({H_{ik}} - {H_{jk}}){\alpha _k}} \end{array}$, we designed the updating rule as follows: αi*=γ2(αi+αj)+(fifj)+εij(HiiHijHji+Hjj)+2γ2$\begin{array}{} \displaystyle \alpha_{i}^{*}=\frac{\gamma_{2}(\alpha_{i}+\alpha_{j})+(f_{i}-f_{j})+\varepsilon_{ij}}{(H_{ii}-H_{ij}-H_{ji}+H_{jj})+2\gamma_{2}} \end{array}$ and αj*=αi+αjαi*$\begin{array}{} \displaystyle \alpha_{j}^{*}=\alpha_{i}+\alpha_{j}-\alpha_{i}^{*} \end{array}$. In case the obtained αi*$\begin{array}{} \displaystyle \alpha_{i}^{*} \end{array}$ and αj*$\begin{array}{} \displaystyle \alpha_{j}^{*} \end{array}$ don’t meet the restraint αq ≥ 0, we further set

{αi*=0,αj*=αi+αjifγ2(αi+αj)+(hihj)+εij0αj*=0,αi*=αi+αj,ifγ2(αi+αj)+(hjhi)+εij0.$$\begin{array}{} \displaystyle \left\{ {\begin{array}{*{20}{l}} {\alpha _i^* = 0,\alpha _j^* = {\alpha _i} + {\alpha _j}}&{if{\gamma _2}({\alpha _i} + {\alpha _j}) + ({h_i} - {h_j}) + {\varepsilon _{ij}} \le 0}\\ {\alpha _j^* = 0,\alpha _i^* = {\alpha _i} + {\alpha _j},}&{if{\gamma _2}({\alpha _i} + {\alpha _j}) + ({h_j} - {h_i}) + {\varepsilon _{ij}} \le 0.} \end{array}} \right. \end{array}$$

The whole ontology algorithm is stated as follows:

Initialize: α(0), θ(0), γ2(0)$\begin{array}{} \displaystyle \gamma_{2}^{(0)} \end{array}$ and γ3(0)$\begin{array}{} \displaystyle \gamma_{3}^{(0)} \end{array}$. Set t = 0, construct W0=r=0nθr(0)ururT$\begin{array}{} \displaystyle {{\bf{W}}^0} = \sum\nolimits_{r = 0}^n {\theta _r^{(0)}{{\bf{u}}_r}{\bf{u}}_r^T} \end{array}$ and WS0=q=1mαq(0)Wq$\begin{array}{} \displaystyle {\bf{W}}_S^0 = \sum\nolimits_{q = 1}^m {\alpha _q^{(0)}{{\bf{W}}_q}} \end{array}$.

Iterate:

Optimize θ(t+1)argminθ1Nk=1Ng(yk(1θTfk))+γ12WWStF2+γ3(t)θt$\begin{array}{l} {\theta ^{(t + 1)}} \leftarrow {\rm{arg mi}}{{\rm{n}}_\theta }\frac{1}{{{N^\prime }}}\sum\nolimits^{{N^\prime }}_{k = 1} {g({y_k}(1 - {\theta ^T}{f_k})) + \frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - {\bf{W}}_S^t} \right\|_F^2 + \gamma _3^{(t)}{{\left\| \theta \right\|}_t}} \end{array}$ and update W(t+1)=r=1nθr(t+1)ururT$\begin{array}{} \displaystyle {{\bf{W}}^{(t + 1)}} = \sum\nolimits_{r = 1}^n {\theta _r^{(t + 1)}{{\bf{u}}_r}{\bf{u}}_r^T} \end{array}$;

Optimize α(t+1)argminαγ12W(t+1)WSF2+γ2(t)2α22$\begin{array}{} \displaystyle \alpha^{(t+1)}\gets{\rm arg}\min\limits_{\alpha}\frac{\gamma_{1}}{2}\|{\bf W}^{(t+1)}-{\bf W}_{S}\|_{F}^{2}+\frac{\gamma_{2}^{(t)}}{2}\|\alpha\|_{2}^{2} \end{array}$

and update WSt+1=q=1mαq(t+1)Wq$\begin{array}{l} {\bf W}_{S}^{t+1}=\sum\nolimits_{q=1}^{m}\alpha_{q}^{(t+1)}{\bf W}_{q} \end{array}$;

Determine γ3t+1=|ρC|(1Nk=1Ng(yk(1(θt+1)Tfk))+γ12W(t+1)WS(t)F2)θ(t+1)1$\begin{array}{} \displaystyle \gamma_{3}^{t+1}=\frac{|\rho_{C}|(\frac{1}{N'}\sum_{k=1}^{N'}g(y_{k}(1-(\theta^{t+1})^{T}f_{k}))+\frac{\gamma_{1}}{2}\|{\bf W}^{(t+1)}-{\bf W}_{S}^{(t)}\|_{F}^{2})}{\|\theta^{(t+1)}\|_{1}} \end{array}$;

Obtain γ2(t+1)=|ρB|(γ1W(t+1)WS(t+1)F2)α(t+1)22$\begin{array}{} \displaystyle \gamma_{2}^{(t+1)}=\frac{|\rho_{B}|(\gamma_{1}\|{\bf W}^{(t+1)}-{\bf W}_{S}^{(t+1)}\|_{F}^{2})}{\|\alpha^{(t+1)}\|_{2}^{2}} \end{array}$;

tt + 1.

Until convergence.

Stability Analysis

In this section, we give the theoretical analysis of our ontology algorithm via stability assumption.

Uniform stability
Definition 1.

(Leave-One-Out) An ontology algorithm has uniform stability β1 with respect to the ontology loss function l if the following holds

sZm,i{1,,m},l(fs,)l(fsi,)β1,$$\begin{array}{} \displaystyle \forall s\in Z^{m},\quad \forall i\in\{1,\cdots,m\},\|l(f_{s},\cdot)-l(f_{s^{i}},\cdot)\|_{\infty}\le\beta_{1}, \end{array}$$

where Z is the ontology sample space, fs is the ontology function determined by the ontology algorithm learning with the set of samples s, and si = {z1,··· ,zi−1, zi+1,··· ,zm} denotes an ontology sample set with the i-th element zi deleted.

Definition 2

(Leave-Two-Out) An ontology algorithm has uniform stability β2 with respect to the ontology loss function l if the following holds

sZm,i{1,,m},l(fs,)l(fsi,j,)β2,$$\begin{array}{} \displaystyle \forall s\in Z^{m},\quad \forall i\in\{1,\cdots,m\},\|l(f_{s},\cdot)-l(f_{s^{i,j}},\cdot)\|_{\infty}\le\beta_{2}, \end{array}$$

where Z is the ontology sample space, fs is the ontology function determined by the ontology algorithm learning with the set of samples s, and si, j is the ontology sample set given from s by deleting two elements zi and zj.

For any convex and differentiable ontology function F : → ℝ as follows (here denotes the Hilbert space): ∀ f, g ,BF( f ||g) = F( f ) −F(g)Tr(< f −g, ∇F(g) >), we have ∂ ℱ(∀ f ) = {g |∀ f ,F( f)− F( f ) ≥ Tr(< ff′, δF( f ) >)}. Let δ F( f ) be any element of ∂ F(h). We infer 8∀ f , f, BF( f|| f ) = F( f) − F( f ) Tr(< f − f , ∇F( f ) >), BF( f|| f ) ≥ 0 and BP+Q = BP + BQ for any convex ontology functions P and Q.

Lemma 1

For any three distance metricsWandW, the following inequality established for any ontology sample zi and zj

|V(W,zi,zj)V(W,zi,zj)|4LM2WWF$$\begin{array}{} \displaystyle |V({\bf W},z_{i},z_{j})-V({\bf W}',z_{i},z_{j})|\le4LM^{2}\|{\bf W}-{\bf W}'\|_{F} \end{array}$$

Next, we describe the LOO and LTO stability of our algorithm.

Theorem 2

Let β1and β2be the LOO and LTO stability of our ontology algorithm problem (2). Suppose that ||v||2M for any sample v. Then, we have

β132L2M4γ1N,β264L2M4γ1N$$\begin{array}{} \displaystyle \beta_{1}\le\frac{32L^{2}M^{4}}{\gamma_{1}N},\quad \beta_{2}\le\frac{64L^{2}M^{4}}{\gamma_{1}N} \end{array}$$

where L is the Lipschitz constant of the function g.

Proof

We only present the detailed proof of the first inequality, and the second one can be determined in the similar way. Let F𝒩 (θ ) = P𝒩 (θ ) + Q(θ ), where PN(θ)=1(N2)i<jV(A,zi,zj)$\begin{array}{} \displaystyle {P_{\mathscr N}}(\theta ) = \frac{1}{{\left( {\begin{array}{*{20}{c}} N \\ 2 \\ \end{array}} \right)}}\sum\nolimits_{i < j} {V(A,{z_i},{z_j})} \end{array}$ and Q(θ)=γ12WWSF2+γ3θ1$\begin{array}{} \displaystyle Q(\theta)=\frac{\gamma_{1}}{2}\|{\bf W}-{\bf W}_{S}\|_{F}^{2}+ \gamma_{3}\|\theta\|_{1} \end{array}$. Clearly, both P𝒩 (θ and Q(θ ) are convex. Suppose θ𝒩 and θ𝒩 be the minimizers of F𝒩 (θ ) and F𝒩(θ ) respectively, where 𝒩 is the set of ontology examples that deletes zi𝒩 from 𝒩 .

Note that

BFN(θN||θN)+BFN(θN||θN)BQ(θN||θN)+BQ(θN||θN).$$\begin{array}{} \displaystyle B_{F_{\mathscr{N}}}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{F_{\mathscr{N}'}}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})\ge B_{Q}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{Q}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'}). \end{array}$$

Let ∆ = ||θ𝒩||1− < θ𝒩,sgn(θ𝒩) > +||θ𝒩||1− < θ𝒩,sgn(θ𝒩 ) > ≥ 0, sgn(θ ) = [sgn(θ1),··· ,sgn(θ𝒩)]T . Hence, we have Q(θN)θ$\begin{array}{} \displaystyle \frac{{\partial Q({\theta _{\mathscr N}})}}{{\partial \theta }} \end{array}$ where δ f (θ ) is the sub-gradient of ||θ||1 and

BQ(θN||θN)+BQ(θN||θN)=γ1WNWNF2+γ3.$$\begin{array}{} \displaystyle B_{Q}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{Q}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})=\gamma_{1}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}^{2}+\gamma_{3}\triangle. \end{array}$$

We have δ F𝒩 (θ𝒩 ) = δ F𝒩(θ𝒩) = 0 since θ𝒩 and θ𝒩 are minimizers of F𝒩 (θ ) and F𝒩(θ ). Using Lemma 1, we obtain

γ1WNWNF2BFN(θN||θN)+BFN(θN||θN)=FN(θN)FN(θN)<θNθN,FN(θN)>+FN(θN)FN(θN)<θNθN,FN(θN)>=FN(θN)FN(θN)+FN(θN)FN(θN)=1(N2)(NV(WN,zi,zj)NV(WN,zi,zj)+NV(WN,zi,zj)NV(WN,zi,zj))1(N2)(N|V(WN,zi,zj)V(WN,zi,zj)|+N|V(WN,zi,zj)V(WN,zi,zj)|)8LM2NWNWNF.$$\begin{array}{} \displaystyle &\quad&\gamma_{1}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}^{2}\le B_{F_{\mathscr{N}}}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{F_{\mathscr{N}'}}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})\\ &=&F_{\mathscr{N}}(\theta_{\mathscr{N}'})-F_{\mathscr{N}}(\theta_{\mathscr{N}})-<\theta_{\mathscr{N}'}-\theta_{\mathscr{N}},\partial F_{\mathscr{N}}(\theta_{\mathscr{N}})>+F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-<\theta_{\mathscr{N}}-\theta_{\mathscr{N}'},\partial F_{\mathscr{N}'}(\theta_{\mathscr{N}}')>\\ &=&F_{\mathscr{N}}(\theta_{\mathscr{N}'})-F_{\mathscr{N}}(\theta_{\mathscr{N}})+F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-F_{\mathscr{N}'}(\theta_{\mathscr{N}'})\\ &=&\frac{1}{{N \choose 2}}(\sum_{\mathscr{N}}V({\bf W}_{\mathscr{N}'},z_{i},z_{j})-\sum_{\mathscr{N}}V({\bf W}_{\mathscr{N}},z_{i},z_{j})+\sum_{\mathscr{N}'}V({\bf W}_{\mathscr{N}},z_{i'},z_{j})-\sum_{\mathscr{N}'}V({\bf W}_{\mathscr{N}'},z_{i'},z_{j}))\\ &\le&\frac{1}{{N \choose 2}}(\sum_{\mathscr{N}}|V({\bf W}_{\mathscr{N}'},z_{i},z_{j})-V({\bf W}_{\mathscr{N}},z_{i},z_{j})|+\sum_{\mathscr{N}'}|V({\bf W}_{\mathscr{N}},z_{i'},z_{j})-V({\bf W}_{\mathscr{N}'},z_{i'},z_{j})|)\\ &\le&\frac{8LM^{2}}{N}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}. \end{array}$$

This implies that

WNWNF8LM2γ1N.$$\begin{array}{} \displaystyle \|{\bf W}_{\mathscr{N}}-{\bf W}_{\mathscr{N}'}\|_{F}\le\frac{8LM^{2}}{\gamma_{1}N}. \end{array}$$

By virtue of |V (W𝒩 ,zi,zj) −V (W𝒩,zi,zj)| ≤ 4LM2||W𝒩W𝒩||F, we deduce

|V(WN,zi,zj)V(WN,zi,zj)|32LM2γ1N.$$\begin{array}{} \displaystyle |V({\bf W}_{\mathscr{N}},z_{i}, z_{j})-V({\bf W}_{\mathscr{N}'},z_{i},z_{j})|\le\frac{32LM^{2}}{\gamma_{1}N}. \end{array}$$

Therefore, the expected result is obtained.

Let 𝒩 be the ontology sample set and V(W,zi,zj)=g(yij[1vivjW2])$\begin{array}{} \displaystyle V({\bf W}, z_{i}, z_{j})=g(y_{ij}[1-\|v_{i}-v_{j}\|_{{\bf W}}^{2}]) \end{array}$. In this sub-section, the empirical ontology risk and expected ontology risk are denoted by RN(W)=1(N2)i<jV(W,zi,zj)$\begin{array}{} \displaystyle {R_{\mathscr N}}({\bf{W}}) = \frac{1}{{\left( {\begin{array}{*{20}{c}} N \\ 2 \\ \end{array}} \right)}}\sum\nolimits_{i < j} {V({\bf{W}},{z_i},{z_j})} \end{array}$ and R(W)=E(zi,zj)[V(W,zi,zj)]$\begin{array}{} \displaystyle R({\bf{W}}) = ({z_i},{z_j})[V({\bf{W}},{z_i},{z_j})] \end{array}$, respectively. We will determine the generalization bound R(W) − R𝒩 (W) in the next theorem. For this purpose, we should use the following McDiarmid inequality.

Theorem 3

[15] Let X1,··· ,XN be independent random variables, each taking values in a set A. Let ϕ : AN → ℝ be such that for each i ∈ {1,··· ,N}, there exists a constant ci > 0 such that

sup_{x1,,xNA,x_i^{'}A}|ϕ(x1,,xN)ϕ(x1,,xi1,xi,xi+1,,xN)|ci.$$\begin{array}{} \displaystyle \mathop {\sup }\limits_{\_\{ {x_1}, \cdots ,{x_N} \in A,{{x'}_i} \in A\} } |\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{i - 1}},x',{x_{i + 1}}, \cdots ,{x_N})| \le {c_i}. \end{array}$$

Then for any ε > 0,

P{ϕ(X1,,XN)E{ϕ(X1,,XN)}ε}e2ε2/i=1Nci2.$$\begin{array}{} \displaystyle {\bf P}\{\phi(X_{1},\cdots,X_{N})-{\bf E}\{\phi(X_{1},\cdots,X_{N})\}\ge\varepsilon \}\le e^{-2\varepsilon^{2}/\sum_{i=1}^{N}c_{i}^{2}}. \end{array}$$

The generalization error bound via uniform stability is presented as follows.

Theorem 4

Let 𝒩 be a set of N randomly selected ontology samples andW𝒩 be the ontology distance matrix determined by (2). With probability at least 1 − δ, we have

R(WN)RN(WN)128L2M4γ1N+ςln1δ2N,$$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\frac{128L^{2}M^{4}}{\gamma_{1}N}+\varsigma\sqrt{\frac{\ln\frac{1}{\delta}}{2N}}, \end{array}$$

where

ς=128L2M4+4γ1gWs+162γ1LM2gWs+γ3γ31γ1.$$\begin{array}{} \displaystyle \varsigma = \frac{{128{L^2}{M^4} + 4{\gamma _1}{g_{{{\bf{W}}_s}}} + 16\sqrt {2{\gamma _1}} L{M^2}\sqrt {{g_{{{\bf{W}}_s}}} + {\gamma _3}{{\left\| {{\gamma _3}} \right\|}_1}} }}{{{\gamma _1}}}. \end{array}$$

The method to proof Theorem 4 mainly followed by [1619], we skip the detailed proof here.

Strong and weak stabilities

Naturally, the stability in uniform version is too restrictive for most learning algorithms, and only a small number of literatures presented that standard ontology learning algorithms met the uniform stability directly, most of these ontology learning algorithms were uncertain. Thus, we are inspired to consider the other “almost everywhere stability” beyond uniform stability in our ontology setting. We define strong and weak stabilities for our ontology framework which are also good measures to show how robust a ontology algorithm is. We assume 0 < δ3,δ4 < 1 in this subsection.

Definition 3

(Strong Stability) Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by fs, and let l be an ontology loss function. Let β3 : ℕ → ℝ and si be the ontology sample set which vi is replaced by vi'$\begin{array}{} \displaystyle v_{i}^{'} \end{array}$. We say that ontology algorithm A has β3 loss stable with respect to ontology loss l if for all n ∈ ℕ, vi'V$\begin{array}{} \displaystyle v_{i}^{'}\in V \end{array}$, i ∈ {1,··· ,n}, we have,

|l(fs,)l(fsi,)|β3,$$\begin{array}{} \displaystyle |l(f_{s},\cdot)-l(f_{s^{i}},\cdot)|\le\beta_{3}, \end{array}$$

We say that the ontology algorithm A has strong loss stability β3 if

{Aisβ3lossstableats}1δ.$$\begin{array}{} \displaystyle \left\{ {A\quad {\rm{is}}\quad {\beta _3}\quad {\rm{loss}}\quad {\rm{stable}}\quad {\rm{at}}\quad s} \right\} \ge 1 - \delta . \end{array}$$

Definition 4

(Weak Stability) Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by fs, and let l be an ontology loss function. Let β4 : ℕ → ℝ. We say that our ontology algorithm A has weak loss stability β4 if for all n ∈ ℕ, i ∈ {1,··· ,n}, we have

{|l(fs,)l(fsi,)|β3}1δ.$$\begin{array}{} \displaystyle \left\{ {\left| {l({f_s}, \cdot ) - l({f_{{s^i}}}, \cdot )} \right| \le {\beta _3}} \right\} \ge 1 - \delta '. \end{array}$$

We present the following lemma which is a fundamental for proving the results on strong and weak stability.

Lemma 5

(Kutin [22]) Let X1,··· ,XN be independent random variables, each taking values in a set C. There is a “bad” subset B ⊆ C, where ℙ(x1,··· ,xNB) = δ. Let ϕ : CN → ℝ be such that for each k ∈ {1,··· ,N}, there exists bck > 0 such that

supx1,,xNCB,xkkC|ϕ(x1,,xN)ϕ(x1,,xk1,xk,xk+1,,xN)|ck,supx1,,xNC,xkkC|ϕ(x1,,xN)ϕ(x1,,xk1,xk,xk+1,,xN)|b.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {\mathop {{\rm{sup}}}\limits_{{x_1}, \cdots ,{x_N} \in C - B,x_k^\prime k \in C} \left| {\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{k - 1}},x_k^\prime ,{x_{k + 1}}, \cdots ,{x_N})} \right| \le {c_k},}\\ {\mathop {{\rm{sup}}}\limits_{{x_1}, \cdots ,{x_N} \in C,x_k^\prime k \in C} \left| {\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{k - 1}},x_k^\prime ,{x_{k + 1}}, \cdots ,{x_N})} \right| \le b.} \end{array} \end{array}$$

Then for any ε > 0,

{|ϕ(X1,,XN)E{ϕ(X1,,XN)}|ε}2(eε2/8Ni=1ci2+N2bδNi=1ck).$$\begin{array}{} \displaystyle \{ |\phi ({X_1}, \cdots ,{X_N}) - \{ \phi ({X_1}, \cdots ,{X_N})\} | \ge \varepsilon \} \le 2\left( {{e^{ - {\varepsilon ^2}/8\sum\nolimits_N^{i = 1} {c_i^2} }} + \frac{{{N^2}b\delta }}{{\sum\nolimits_N^{i = 1} {{c_k}} }}} \right). \end{array}$$

Lemma 6

(Kutin [22]) Let X1,··· ,XN be independent random variables, each taking values in a set C. Let ϕ : CN → ℝ be such that for each k ∈ {1,··· ,N}, there satisfies two condition inequalities in Lemma 1 by substituting λkN$\begin{array}{} \displaystyle \frac{{{\lambda _k}}}{N} \end{array}$ for ck, and substituting e−KN for δ . If 0 < e ≤ minkT (b,λk,K), and N ≥ maxk∆(b,λk,K,ε), then

{|ϕ(X1,,XN)E{ϕ(X1,,XN)}|ε}4eε2N2/40Ni=1λi2.T(b,λk,K)=min{15λk2,4λkK,λk2Kb},Δ(b,λk,K,ε)=max{bλk,λk40,3(24K+3)ln(24K+3),1ε}.$$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {\{ |\phi ({X_1}, \cdots ,{X_N}) - \{ \phi ({X_1}, \cdots ,{X_N})\} | \ge \varepsilon \} \le 4{e^{ - {\varepsilon ^2}{N^2}/40\sum\nolimits_N^{i = 1} {\lambda _i^2} }}.} \\ {T(b,{\lambda _k},K) = \min \{ \frac{{15{\lambda _k}}}{2},4{\lambda _k}\sqrt K ,\frac{{\lambda _k^2K}}{b}\} ,} \\ {\Delta (b,{\lambda _k},K,\varepsilon ) = \max \{ \frac{b}{{{\lambda _k}}},{\lambda _k}\sqrt {40} ,3(\frac{{24}}{K} + 3)\ln (\frac{{24}}{K} + 3),\frac{1}{\varepsilon }\} .} \\ \end{array} \end{array}$$

The main result in this subsection is stated as follows.

Theorem 7

Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by fs. Let l be an ontology loss function such that 0 ≤ l( f ,·) ≤ Ξ for all f and

M*=2β3+4gWSN+[82LM2(gWS+γ3(θS1θN1)+gWS+γ3(θS11))γ1N].$$\begin{array}{} \displaystyle {M^*} = 2{\beta _3} + \frac{{4{g_{{{\bf{W}}_S}}}}}{N} + [\frac{{8\sqrt 2 L{M^2}(\sqrt {{g_{{{\bf{W}}_S}}} + {\gamma _3}({\theta _S}{_1} - {\theta _{\mathscr N}}{_1})} + \sqrt {{g_{{{\bf{W}}_S}}} + {\gamma _3}({\theta _S}{_1}{_1})} )}}{{\sqrt {{\gamma _1}} N}}]. \end{array}$$

1) Let β3such that our ontology algorithm A has strong loss stability (β3,δ1). Then for any 0 < δ < 1, with probability at least 1 − δ, we have

R(WN)RN(WN)Ξ+8N(M*)2ln2(M*)28(M*)24NΞδ1.$$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\Xi+\sqrt{8N(M^{*})^{2}\ln\frac{2(M^{*})^{2}}{8(M^{*})^{2}-4N\Xi\delta_{1}}}. \end{array}$$

2) Let β4such that our ontology algorithm A has weak loss stability (β4,δ2). And if

0<εmin{15N(M*)2,4NM*ln(1/δ1)N,N2(M*))2In(1/δ2)N2Ξ},$$\begin{array}{} \displaystyle 0<\varepsilon\le\min\{\frac{15N(M^{*})}{2},4NM^{*}\sqrt{\frac{{\rm ln}(1/\delta_{1})}{N}},\frac{N^{2}(M^{*}))^{2}\frac{{\rm In}(1/\delta_{2})}{N}}{2\Xi}\}, \end{array}$$

and

Nmax{2ΞN(M*),N(M*)40,3(24Nln(1/δ2)+3)In(24NIn(1/δ2)+3),1ε}.$$\begin{array}{} \displaystyle N\ge\max\{\frac{2\Xi}{N(M^{*})},N(M^{*})\sqrt{40},3(\frac{24N}{{\rm ln}(1/\delta_{2})}+3){\rm In}(\frac{24N}{{\rm In}(1/\delta_{2})}+3),\frac{1}{\varepsilon}\}. \end{array}$$

Then, for any 0 < δ < 1, with probability at least 1 − δ, we have

R(WN)RN(WN)Ξ+40N(M*)2)2ln(4δ)N.$$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\Xi+\sqrt{\frac{40N(M^{*})^{2})^{2}\ln(\frac{4}{\delta})}{N}}. \end{array}$$

The method to proof Theorem 7 is mainly followed by [20, 21], we skip the detailed proof here.

Experiments

In this section, we design five simulation experiments respectively concerning ontology measure and ontology mapping. In our experiment, we select the ontology loss function as the square loss. To make sure the accuracy of the comparison, we ran our algorithm in C++ through available LAPACK and BLAS libraries for linear algebra and operation computations. We implement five experiments on a double-core CPU with a memory of 8GB.

Ontology similarity measure experiment on plant data

We use O1, a plant “PO” ontology in the first experiment. It was constructed in www.plantontology.org. We use the structure of O1 presented in Fig. 1. P@N (Precision Ratio see Craswell and Hawking [5]) to measure the quality of the experiment data. At first, the closest N concepts for every vertex on the ontology graph in plant field was given by experts. Then we gain the first N concepts for every vertex on ontology graph by our algorithm, and compute the precision ratio.

Fig. 1

The Structure of “PO” Ontology.

Meanwhile, we apply ontology methods in [12], [13] and [14] to the “PO” ontology. Then after getting the average precision ratio by means of these three algorithms, the results with our algorithm are compared. Parts of the data can be referred to Table 1.

Tab. 1.The Experiment Results of Ontology Similarity measure

P@3 average precision ratioP@5 average precision ratioP@10 average precision ratio
Our Algorithm0.53580.65170.8821
Algorithm in [12]0.45490.51170.5859
Algorithm in [13]0.42820.48490.5632
Algorithm in [14]0.48310.56350.6871

When N =3, 5 or 10, the precision ratio gained from our algorithms are a little bit higher than the precision ratio determined by algorithms proposed in [12], [13] and [14]. Furthermore, the precision ratios show it tends to increase apparently as N increases. As a result, our algorithms is proved to be better and more effective than those raised by [12], [13] and [14].

Ontology mapping experiment on humanoid robotics data

“Humanoid robotics” ontologies O2 and O3 are used in the second experiment. The structure of O2 and O3 are respectively presented in Fig. 2 and Fig. 3. The leg joint structure of bionic walking device for six-legged robot is presented by the ontology O2. The exoskeleton frame of a robot with wearable and power assisted lower extremities is presented by the ontology O3.

Fig. 2

‘Humanoid Robotics” Ontology O2.

Fig. 3

“Humanoid Robotics” Ontology O3.

Fig. 4

The Structure of “GO” Ontology.

We set the experiment, aiming to get ontology mapping between O2 and O3. P@N Precision Ratio is taken as a measure for the quality of experiment. After applying ontology algorithms in [24], [13] and [14] on “humanoid robotics” ontology and getting the average precision ratio, the precision ratios gained from these three methods are compared. Some results can refer to Table 2.

Tab. 2. The Experiment Results of Ontology Mapping

P@1 average precision ratioP@3 average precision ratioP@5 average precision ratio
Our Algorithm0.27780.50000.7667
Algorithm in [24]0.27780.48150.5444
Algorithm in [13]0.22220.40740.4889
Algorithm in [14]0.27780.46300.5333

When N = 1, 3 or 5, the precision ratios gained from our new ontology algorithm are higher than the precision ratios determined by algorithms proposed in [24], [13] and [14]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms shows much more efficiency than those raised by [24], [13] and [14].

Ontology similarity measure experiment on biology data

Gene “GO” ontology O4 is used in the third experiment, which was constructed in the website http: //www. geneontology. We present the structure of O4 in Figure 4. Again, P@N is chosen as a measure for the quality of the experiment data. Then we apply the ontology methods in [13], [14] and [25] to the “GO” ontology. Then after getting the average precision ratio by means of these three algorithms, the results with our algorithm are compared. Parts of the data can be referred to Table 3.

Tab. 3. The Experiment Results of Ontology Similarity measure

P@3 average precision ratioP@5 average precision ratioP@10 average precision ratioP@20 average precision ratio
Our Algorithm0.49870.63640.76020.8546
Algorithm in [13]0.46380.53480.62340.7459
Algorithm in [14]0.43560.49380.56470.7194
Algorithm in [25]0.42130.51830.60190.7239

When N = 3, 5 or 10, the precision ratios gained from our ontology algorithms are higher than the precision ratios determined by algorithms proposed in [13], [14] and [25]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms turn out to have more effectiveness than those raised by [13], [14] and [25].

Ontology mapping experiment on physics education data

“Physics education” ontologies O5 and O6 are used in the fourth experiment. We respectively present the structures of O5 and O6 in Fig. 5 and Fig. 6.

Fig. 5

“Physics Education” Ontology O5.

Fig. 6

“Physics Education” Ontology O6.

We set the experiment, aiming to give ontology mapping between O5 and O6. P@N precision ratio is taken as a measure for the quality of the experiment. Ontology algorithms are applied in [13], [14] and [26] on “physics education” ontology. The precision ratio gotten from the three methods is compared. Some results can be referred to Table 4.

Tab. 4. The Experiment Results of Ontology Mapping

P@1 average precision ratioP@3 average precision ratioP@5 average precision ratio
Our Algorithm0.69130.75560.9161
Algorithm in [13]0.61290.73120.7935
Algorithm in [14]0.69130.75560.8452
Algorithm in [26]0.67740.77420.8968

When N = 1, 3 or 5, the precision ratio in terms of our new ontology mapping algorithms are much higher than the precision ratio determined by algorithms proposed in [13], [14] and [26]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms shows more effectiveness than those raised by [13], [14] and [26].

Ontology mapping experiment on university data

“University” ontologies O7 and O8 are applied in the last experiment. We present the structures of O7 and O8 in Fig. 7 and Fig. 8.

Fig. 7

“University” Ontology O7.

Fig. 8

“University” Ontology O8.

We set the experiment, aiming to give ontology mapping between O7 and O8. P@N precision ratio is taken as a criterion to measure the quality of the experiment. Ontology algorithms are applied in [12], [13] and [14] on “University” ontology. The precision ratios gotten from the three methods are compared. Some results can be referred to Table 5.

Tab. 5. The Experiment Results of Ontology Mapping

P@1 average precision ratioP@3 average precision ratioP@5 average precision ratio
Our Algorithm0.57140.67860.7714
Algorithm in [12]0.50000.59520.6857
Algorithm in [13]0.42860.52380.6071
Algorithm in [14]0.57140.64290.6500

When N = 1, 3 or 5, the precision ratios in terms of our new ontology mapping algorithms are much higher than the precision ratios determined by algorithms proposed in [12], [13] and [14]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms turn out to have more effectiveness than those raised by [12], [13] and [14].

Conclusions

In this paper, the new ontology learning framework and its optimal approaches are manifested for ontology similarity calculating and ontology mapping. The new ontology algorithm is based on distance function learning tricks. Also, the stability analysis and generalized bounding computation of ontology learning algorithm are presented. Finally, simulation data in five experiments show that our new ontology learning algorithm has high efficiency in these engineering applications. The distance learning based ontology algorithm proposed in our paper illustrates the promising application prospects for multiple disciplines.

eISSN:
2444-8656
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
Volume Open
Fachgebiete der Zeitschrift:
Biologie, andere, Mathematik, Angewandte Mathematik, Allgemeines, Physik