Ontology optimization tactics via distance calculating

Ontology originally comes from philosophy. It is used to describe the natural connection of things and their components’ inherently hidden connections. Ontology is set up as a model for knowledge storage and representation in information and computer science. It has been extensively applied in different fields such as knowledge management, machine learning, information systems, image retrieval, information retrieval search extension, collaboration and intelligent information integration. As a conceptually semantic model and an analysis tool, being quite effective, ontology has been favored by researchers from pharmacology science, biology science, medical science, geographic information system and social sciences since a few years ago (for instance, see Przydzial et al., [1], Koehler et al., [2], Ivanovic and Budimac [3], Hristoskova et al., [4], and Kabir [5]).

A simple graph is usually used by researchers to represent the structure of ontology. Every concept, objects and elements in ontology are made to correspond to a vertex. Each (directed or undirected) edge on an ontology graph represents a relationship (or potential link) between two concepts (objects or elements). Let O be an ontology and G be a simple graph corresponding to O. It can be attributed to getting. We use the similarity calculating function, the nature of ontology engineer application to compute the similarities between ontology vertices, which represent the intrinsic link between vertices in ontology graph. The ontology similarity measuring function is obtained by measuring the similarity between vertices from different ontologies. That is the goal of ontology mapping. The mapping serves as a bridge connecting different ontologies. Only through mapping, we gain a potential association between the objects or elements from different ontologies. The semi-positive score function Sim : V × V → ℝ⁺ ∪ {0} maps each pair of vertices to a non-negative real number.

Several effective methods exist for getting efficient ontology similarity measure or ontology mapping algorithm in terms of ontology function. The ontology similarity calculation in terms of ranking learning technology was considered by Wang et al., [12]. The fast ontology algorithm in order to cut the time complexity for ontology application was raised by Huang et al., [13]. An ontology optimizing model in which the ontology function is determined by virtue of NDCG measure was presented by Gao and Liang [14], which is successfully applied in physics education. More ontology applications on various engineering can be refered to Gao et al., [11].

In this article, we determine a new ontology learning method by means of distance calculating. Moreover, we give a theoretical analysis for proposed ontology algorithm.

Algorithm Description

Let $S = {(v_{i}, v_{j}, y_{i j})}_{i, j = 1}^{N}$ $\mathscr{S}=\{(v_{i},v_{j},y_{ij})\}_{i,j=1}^{N}$ be the ontology training data, where v_i,v_j ∊ ℝ^p are ontology vectors and y_ij = ±1 (if v_i and v_j are similar, then y_ij = 1; otherwise, y_ij = −1. We also fixed m relevant source ontology training sets $S_{q} = {(v_{q i}, v_{q j}, y_{q i j})}_{i, j = 1}^{N_{q}}$ $\begin{array}{} \displaystyle \mathscr{S}_{q}=\{(v_{qi}, v_{qj}, y_{qij})\}_{i,j=1}^{N_{q}} \end{array}$ (q = 1,...,m) if the number of target ontology training samples N is not large, and v_qi,v_qj ∊ ℝ^p belong to the certain ontology feature space as v_i, v_j in this setting.

We aim to learn a distance function d(v_i,v_j|W) = (v_i − v_j)^T W(v_i − v_j) which equals to learning a distance matrix W, and the similarity or dissimilarity between a ontology vertex pair v_i and v_j is obtained by comparing d(v_i,v_j|W) with a constant threshold parameter c. Specifically, our ontology optimization problem can be stated as

$\begin{matrix} arg \min \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} \sum_{i < j} g (y_{i j} [1 - ‖ v_{i} - v_{j} ‖_{W}^{2}]) + \frac{η}{2} ‖ W ‖_{F}^{2} \\ s . t . \sum_{i = 1}^{m} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg }}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} g({y_{ij}}[1 - \left\| {{v_i} - {v_j}} \right\|_{\bf{W}}^2]) + \frac{\eta }{2}\left\| {\bf{W}} \right\|_F^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$(1)

where $‖ v_{i} - v_{j} ‖_{W}^{2} = {(v_{i} - v_{j})}^{T} W (v_{i} - v_{j}), g (z) = max (0, b - z)$ $\begin{array}{} \displaystyle {v_i} - {v_j}_{\bf{W}}^2 = {({v_i} - {v_j})^T}{\bf{W}}({v_i} - {v_j}),g\left( z \right) = {\rm{max }}\left( {0,b - z} \right) \end{array}$ is an ontology hinge loss function, ||W||_F is the Frobenius norm of the metric W which is applied to control the model complexity, η is a balance parameter, and the constraint condition reveals that W is positive semi-definite.

The general version of ontology distance learning approach is formulated by

$\begin{matrix} arg \min \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} \sum_{i < j} L (v_{i}, v_{j}, y_{i j}) + \frac{γ_{1}}{2} ‖ W - W_{D} ‖_{F}^{2} + \frac{γ_{2}}{2} ‖ α ‖_{2}^{2} + γ_{3} ‖ θ ‖_{1}, \\ s . t . \sum_{i = 1}^{m} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} L({v_i},{v_j},{y_{ij}}) + \frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$(2)

where $W = \sum_{i = 1}^{n} θ_{i} u_{i} u_{i}^{T}$ $\begin{array}{} \displaystyle {\bf{W}} = \sum\nolimits_{i = 1}^n {{\theta _i}{{\bf{u}}_i}{\bf{u}}_i^T} \end{array}$ and $W_{D} = \sum_{q = 1}^{m} α_{q} W_{q}$ $\begin{array}{} \displaystyle {{\bf{W}}_D} = \sum\nolimits_{q = 1}^m {{\alpha _q}{{\bf{W}}_q}} \end{array}$. Both $‖ α ‖_{2}^{2}$ $\begin{array}{} \displaystyle \|\alpha\|_{2}^{2} \end{array}$ and ||θ||₁ are employed to control the complexity of model. In what follows, γ₁, γ₂ and γ₃ are all positive balance parameters.

Select $L (v_{i}, v_{j}, y_{i j}) = g (y_{i j} [1 - ‖ v_{i} - v_{j} ‖_{W}^{2}])$ $\begin{array}{} \displaystyle L(v_{i},v_{j},y_{ij})=g(y_{ij}[1-\|v_{i}-v_{j}\|_{{\bf W}}^{2}]) \end{array}$ and use the ontology hinge loss for g, that is to say, g(z) = max(0, b − z) and b is set to be 0. Thus, we deduce the following ontology optimization problem:

$\begin{matrix} arg \min \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} \sum_{i < j} g (y_{i j} [1 - ‖ v_{i} - v_{j} ‖_{W}^{2}]) + \frac{γ_{W}}{2} ‖ W - W_{D} ‖_{F}^{2} + \frac{γ_{2}}{2} ‖ α ‖_{2}^{2} + γ_{3} ‖ θ ‖_{1}, \\ s . t . \sum_{i = 1}^{n} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\min \frac{1}{{\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right)}}\mathop \sum\limits_{i < j} g({y_{ij}}[1 - \left\| {{v_i} - {v_j}} \right\|_{\bf{W}}^2]) + \frac{{{\gamma _{\bf{W}}}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^n_{i = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$(3)

For short expressions, we use v_i, x_j and y_{i j} to denote $v_{k}^{1}$ $\begin{array}{} \displaystyle v_{k}^{1} \end{array}$, $v_{k}^{2}$ $\begin{array}{} \displaystyle v_{k}^{2} \end{array}$ and y_k with $k = 1, \cdot \cdot \cdot, (\begin{matrix} N \\ 2 \end{matrix}) = N'$ $\begin{array}{} \displaystyle k = 1, \cdot \cdot \cdot ,\left( {\begin{array}{*{20}{c}} N\\ 2 \end{array}} \right) = N\prime \end{array}$. Let $δ_{k} = v_{k}^{1} - v_{k}^{2}$ $\begin{array}{} \displaystyle \delta_{k}=v_{k}^{1}-v_{k}^{2} \end{array}$ with $‖ v_{k}^{1} - v_{k}^{2} ‖_{W}^{2} = \sum_{i = 1}^{n} θ_{i} δ_{k}^{T} u_{i} u_{i}^{T} δ_{k} = δ^{T} f_{k}$ $\begin{array}{} \displaystyle \left\| {v_k^1 - v_k^2} \right\|_{\bf{W}}^2 = \sum\nolimits_{i = 1}^n {{\theta _i}\delta _k^T{{\bf{u}}_i}{\bf{u}}_i^T{\delta _k}} = {\delta ^T}{f_k} \end{array}$, $f_{k} = {[f_{k}^{1}, \dots, f_{k}^{n}]}^{T}$ $\begin{array}{} \displaystyle f_{k}=[f_{k}^{1},\cdots, f_{k}^{n}]^{T} \end{array}$ and $f_{k}^{i} = δ_{k}^{T} u_{i} u_{i}^{T} δ_{k}$ $\begin{array}{} \displaystyle f_{k}^{i}=\delta_{k}^{T}{\bf u}_{i}{\bf u}_{i}^{T}\delta_{k} \end{array}$. Therefore, the ontology problem (3) can be re-expressed as

$\begin{matrix} a r g \min_{_{α, θ}} \frac{1}{N^{'}} \sum_{k = 1}^{N^{'}} g (y_{k} [1 - θ^{T} f_{k}) + \frac{γ_{W}}{2} {‖ W - W_{D} ‖}_{F}^{2} + \frac{γ_{2}}{2} {‖ α ‖}_{2}^{2} + γ_{3} {‖ θ ‖}_{1}, \\ s .t . \sum_{k = 1}^{n} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\bf{arg}}\mathop {\min }\limits_{_{\alpha ,\theta }} \frac{1}{{N'}}\sum\limits_{k = 1}^{N'} {g({y_k}[1 - {\theta ^T}{f_k})} + \frac{{{\gamma _{\bf{W}}}}}{2}\left\| {{\bf{W}} - {{\bf{W}}_D}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2 + {\gamma _3}{{\left\| \theta \right\|}_1},} \\ {{\rm{s}}{\rm{.t}}.\quad \sum\limits_{k = 1}^n {{\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m} .} \\ \end{array} \end{array}$$(4)

The answer can be inferred by alternating between two sub ontology problems (minimization α = [α₁,··· ,α_m]^T and θ = [θ₁,··· ,θ_n]^T respectively) until its convergence.

Given α, the ontology optimization problem with respect to θ then it can be stated as

$\begin{matrix} a r g min_{θ} F (θ) = Λ (θ) + Ω (θ) \end{matrix}$ $$\begin{array}{} \displaystyle {\rm arg}\min\limits_{\theta}F(\theta)=\Lambda(\theta)+\Omega(\theta) \end{array}$$(5)

where $Λ (θ) = \frac{1}{N^{'}} \sum_{N}^{k = 1} g (y_{k} [1 - θ^{T} f_{k}) + γ_{3} {‖ θ ‖}_{1}$ $\begin{array}{} \displaystyle \Lambda (\theta ) = \frac{1}{{{N^\prime }}}\sum\nolimits_N^{k = 1} {g({y_k}[1 - {\theta ^T}{f_k}) + {\gamma _3}{{\left\| \theta \right\|}_1}} \end{array}$, and $Ω (θ) = \frac{γ_{W}}{2} ‖ W - W_{D} ‖_{F}^{2}$ $\begin{array}{} \displaystyle \Omega(\theta) =\frac{\gamma_{{\bf W}}}{2}\|{\bf W}-{\bf W}_{D}\|_{F}^{2} \end{array}$. Since the ontology loss part Λ(θ) is non-differentiable, we should smooth the ontology loss and then solve (5) in terms of the gradient trick. Let Θ = {x : 0 ≤ x_k ≤ 1,x ∊ ℝ^N′} and σ be the smooth parameter. Then, the smoothed expression of the ontology hinge loss g(f_k, y_k, θ) = max{0, −y_k(1 − θ^T f_k)} can be formulated as

$\begin{matrix} g_{σ} = max_{x \in Θ} x_{k} (- y_{k} (1 - θ^{T} f_{k})) - \frac{σ}{2} ∥ f_{k} ∥_{\infty} x_{k}^{2}, \end{matrix}$ $$\begin{array}{} \displaystyle g_{\sigma}=\max\limits_{x\in\Theta}x_{k}(-y_{k}(1-\theta^{T}f_{k}))-\frac{\sigma}{2}\|f_{k}\|_{\infty}x_{k}^{2}, \end{array}$$(6)

where ||f_k||_∞ term is used as a normalization. In view of setting the objective ontology function of (6) to 0 and projecting x_k on Θ, we infer the following solution: $x_{k} = median {\frac{- y_{k} (1 - θ^{T} f_{k})}{σ ‖ f_{k} ‖_{\infty}}, 0, 1}$ $\begin{array}{} \displaystyle x_{k}={\rm median}\{\frac{-y_{k}(1-\theta^{T}f_{k})}{\sigma\|f_{k}\|_{\infty}},0,1\} \end{array}$. Furthermore, the piece-wise approximation of g can be expressed as

$g_{σ} = {\begin{matrix} 0, & y_{k} (1 - θ^{T} f_{k}) > 0 \\ - y_{k} (1 - θ^{T} f_{k}) - \frac{σ}{2} {‖ f_{k} ‖}_{\infty}, & y_{k} (1 - θ^{T} f_{k}) < - σ {‖ f_{k} ‖}_{\infty} \\ \frac{{(y_{k} (1 - θ^{T} f_{k}))}^{2}}{2 σ {‖ f_{k} ‖}_{\infty}}, & Otherwise . \end{matrix}$ $$\begin{array}{} \displaystyle {g_\sigma } = \left\{ {\begin{array}{*{20}{c}} {0,} \hfill & {{y_k}(1 - {\theta ^T}{f_k}) \gt 0} \hfill \\ { - {y_k}(1 - {\theta ^T}{f_k}) - \frac{\sigma }{2}{{\left\| {{f_k}} \right\|}_\infty },} \hfill & {{y_k}(1 - {\theta ^T}{f_k}) < - \sigma {{\left\| {{f_k}} \right\|}_\infty }} \hfill \\ {\frac{{{{({y_k}(1 - {\theta ^T}{f_k}))}^2}}}{{2\sigma {{\left\| {{f_k}} \right\|}_\infty }}},} \hfill & {{\rm{Otherwise}}.} \hfill \\ \end{array}} \right. \end{array}$$(7)

By the computation and deduction, the gradient of the smoothed hinge ontology loss g_σ (θ ) is

$\frac{\partial g_{σ} (f_{k}, y_{k}, θ)}{\partial θ} = y_{k} f_{k} x_{k} .$ $$\begin{array}{} \displaystyle \frac{\partial g_{\sigma}(f_{k},y_{k},\theta)}{\partial\theta}=y_{k}f_{k}x_{k}. \end{array}$$(8)

Let H^Λ = [ f₁,··· , f_N′] and Y = diag(y). We get $\frac{g_{σ} (θ)}{\partial θ} = \sum_{k} y_{k} f_{k} x_{k} = H^{R} Y x$ $\begin{array}{} \displaystyle \frac{{{g_\sigma }(\theta )}}{{\partial \theta }} = \sum\nolimits_k {{y_k}{f_k}{x_k}} = {H^R}Yx \end{array}$, and $L^{g} (θ) \frac{N^{'}}{σ} \max \frac{‖ f_{k} f_{k}^{T} ‖_{2}}{‖ f_{k} ‖_{\infty}}$ $\begin{array}{} \displaystyle L^{g}(\theta)\frac{N'}{\sigma}\max\frac{\|f_{k}f_{k}^{T}\|_{2}}{\|f_{k}\|_{\infty}} \end{array}$ is the Lipschitz constant of g_σ (θ ).

By setting l(θ) = ||θ||₁, we infer the approximation of l with the smooth parameter σ^′ as

$l_{σ}^{'} = {\begin{matrix} - θ_{r} - \frac{σ^{'}}{2}, & θ_{r} < - σ^{'} \\ θ_{r} - \frac{σ^{'}}{2}, & θ_{r} > σ^{'} \\ \frac{θ_{r}^{2}}{2 σ^{'}}, & Otherwise . \end{matrix}$ $$\begin{array}{} \displaystyle l_{\sigma}^{'}=\left\{\begin{array}{ll}-\theta_{r}-\frac{\sigma'}{2},& \hbox{$\theta_{r}<-\sigma'$} \\ \theta_{r}-\frac{\sigma'}{2},& \hbox{$\theta_{r}>\sigma'$}\\ \frac{\theta_{r}^{2}}{2\sigma'},& \hbox{Otherwise}. \end{array}\right. \end{array}$$

Furthermore, for each $x_{r}^{'} = median {\frac{θ_{r}}{σ^{'}}, - 1, 1}$ $\begin{array}{} \displaystyle x_{r}^{'}={\rm median}\{\frac{\theta_{r}}{\sigma'},-1, 1\} \end{array}$, the gradient can be computed by $\frac{\partial \sum_{n}^{i = 1} l_{σ^{'}} (θ_{r})}{\partial θ} = x^{'}$ $\begin{array}{} \displaystyle \frac{\partial\sum_{i=1}^{n}l_{\sigma'}(\theta_{r})}{\partial\theta}=x' \end{array}$ and the Lipschitz constant is denoted as $L^{l} (θ) = \frac{1}{σ^{'}}$ $\begin{array}{} \displaystyle L^{l}(\theta)=\frac{1}{\sigma'} \end{array}$.

Moreover, set $H_{s t}^{Ω} = γ_{1} Tr ((u_{s} u_{s}^{T}) (u_{t} u_{t}^{T}))$ $\begin{array}{} \displaystyle H_{st}^{\Omega}=\gamma_{1}{\rm Tr}(({\bf u}_{s}{\bf u}_{s}^{T})({\bf u}_{t}{\bf u}_{t}^{T})) \end{array}$ and $f_{r}^{Ω} = γ_{1} Tr (W_{S}^{T} (u_{t} u_{t}^{T}))$ $\begin{array}{} \displaystyle f_{r}^{\Omega}=\gamma_{1}{\rm Tr}({\bf W}_{S}^{T}({\bf u}_{t}{\bf u}_{t}^{T})) \end{array}$, we have $\frac{\partial Ω (θ)}{\partial θ} = H^{Ω} θ - f^{Ω}$ $\begin{array}{} \displaystyle \frac{\partial\Omega(\theta)}{\partial\theta}=H^{\Omega}\theta-f^{\Omega} \end{array}$, $\frac{\partial F_{σ} (θ)}{\partial θ} = \frac{1}{N^{'}} H^{R} Y x + γ c x^{'} + H^{Ω} θ - f^{Ω}$ $\begin{array}{} \displaystyle \frac{\partial F_{\sigma}(\theta)}{\partial\theta}=\frac{1}{N'}H^{R}Yx+\gamma cx'+H^{\Omega}\theta-f^{\Omega} \end{array}$ and $L_{σ} = \frac{1}{σ} \max_{k} \frac{‖ f_{k} f_{k}^{T} ‖_{2}}{‖ f_{k} ‖_{\infty}} + \frac{γ C}{σ^{'}} + ‖ H^{Ω} ‖_{2}$ $\begin{array}{} \displaystyle L_{\sigma}=\frac{1}{\sigma}\max_{k}\frac{\|f_{k}f_{k}^{T}\|_{2}}{\|f_{k}\|_{\infty}}+\frac{\gamma C}{\sigma'}+\|H^{\Omega}\|_{2} \end{array}$ is the Lipschitz constant of F(θ ).

Denote θ^t, y^t and z^t as the solutions in the t-th iteration round, and use $\hat{θ}$ $\begin{array}{} \displaystyle \widehat{\theta} \end{array}$ as a guessed solution of θ. We obtain that L_σ is the Lipschitz constant of F_σ (θ ) and the two attached ontology optimizations are stated as

$\begin{matrix} min_{y} < ▽ F_{σ} (θ^{t}), y - θ^{t} > + \frac{L_{σ}}{2} ∥ y - θ^{t} ∥_{2}^{2} \end{matrix}$ $$\begin{array}{} \displaystyle \min\limits_{y} \lt \bigtriangledown F_{\sigma}(\theta^{t}),y-\theta^{t} \gt +\frac{L_{\sigma}}{2}\|y-\theta^{t}\|_{2}^{2} \end{array}$$

and

$\begin{matrix} min_{z} \sum_{i = 0}^{t} \frac{i + 1}{2} [F_{σ} (θ^{i}) + < ▽ F_{σ} (θ^{i}), y - θ^{i} >] + \frac{L_{σ}}{2} ∥ y - \hat{θ} ∥_{2}^{2}, \end{matrix}$ $$\begin{array}{} \displaystyle \min\limits_{z}\sum_{i=0}^{t}\frac{i+1}{2}[F_{\sigma}(\theta^{i})+ \lt \bigtriangledown F_{\sigma}(\theta^{i}),y-\theta^{i} \gt ]+\frac{L_{\sigma}}{2}\|y-\widehat{\theta}\|_{2}^{2}, \end{array}$$

respectively. Set the gradients of the two objective ontology functions in the above two attached ontology problems to be zeros, we yield $y^{t} = θ^{t} - \frac{1}{L_{σ}} ▿ F_{σ} (θ^{t})$ $\begin{array}{} \displaystyle y^{t}=\theta^{t}-\frac{1}{L_{\sigma}}\bigtriangledown F_{\sigma}(\theta^{t}) \end{array}$ and $z^{t} = \hat{θ} - \frac{1}{L_{σ}} \sum_{i = 0}^{t} \frac{i + 1}{2} ▿ F_{σ} (θ^{i})$ $\begin{array}{} \displaystyle z^{t}=\widehat{\theta}-\frac{1}{L_{\sigma}}\sum_{i=0}^{t}\frac{i+1}{2}\bigtriangledown F_{\sigma}(\theta^{i}) \end{array}$. Hence, we deduce $θ^{t + 1} = \frac{2}{t + 3} z^{t} + \frac{t + 1}{t + 3} y^{t}$ $\begin{array}{} \displaystyle \theta^{t+1}=\frac{2}{t+3}z^{t}+\frac{t+1}{t+3}y^{t} \end{array}$ and the stop criterion is given by |F_σ (θ^t+1) − F_σ (θ^t)| < ε.

Given θ , the optimization ontology problem on parameter α can be stated as

$\begin{matrix} \begin{matrix} a r g min_{α} \frac{γ_{1}}{2} {(W - \sum_{p = 1}^{m} α_{p} W_{p})}_{F}^{2} + \frac{γ_{2}}{2} {(α)}_{2}^{2} \\ s . t . \sum_{q = 1}^{m} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix} \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\;{\min\limits_\alpha }\frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - \mathop \sum\limits^m_{p = 1} {\alpha _p}{{\bf{W}}_p}} \right\|_F^2 + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{q = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$(9)

And, the ontology problem (9) can be expressed in compact form which is stated by

$\begin{matrix} \begin{matrix} a r g min_{α} \frac{1}{2} α^{T} H α^{T} + \frac{γ_{2}}{2} {(α)}_{2}^{2} \\ s . t . \sum_{q = 1}^{m} α_{q} = 1, α_{q} \geq 0, q = 1, \dots, m . \end{matrix} \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {{\rm{arg}}\;{\min\limits_\alpha }\frac{1}{2}{\alpha ^T}{\bf{H}}{\alpha ^T} + \frac{{{\gamma _2}}}{2}\left\| \alpha \right\|_2^2}\\ {{\rm{s}}.{\rm{t}}.\quad \mathop \sum\limits^m_{q = 1} {\alpha _q} = 1,{\alpha _q} \ge 0,q = 1, \cdots ,m.} \end{array} \end{array}$$(10)

where f = [ f₁,··· , f_m] with f_q = γ₁Tr(W^TW_q), and H is a symmetric positive semi-definite matrix such that $H_{s t} = γ_{1} Tr (W_{s}^{T} W_{t})$ $\begin{array}{} \displaystyle {\bf H}_{st}=\gamma_{1}{\rm Tr}({\bf W}_{s}^{T}{\bf W}_{t}) \end{array}$. We only choose two elements α_i and α_j to update for each iteration. In order to meet the restraint $\begin{matrix} \sum_{q = 1}^{m} α_{q} = 1 \end{matrix}$ $\begin{array}{} \sum\nolimits_{q = 1}^m {{\alpha _q}} = 1 \end{array}$, we get $α_{i}^{*} + α_{j}^{*} = α_{i} + α_{j}$ $\begin{array}{} \displaystyle \alpha_{i}^{*}+\alpha_{j}^{*}=\alpha_{i}+\alpha_{j} \end{array}$, where $α_{i}^{*}$ $\begin{array}{} \displaystyle \alpha_{i}^{*} \end{array}$ and $α_{j}^{*}$ $\begin{array}{} \displaystyle \alpha_{j}^{*} \end{array}$ are the solutions of the current iteration. Then, according to (10) and set $ε_{i j} = (H_{i i} - H_{i j} - H_{j i} + H_{j j}) α_{i} - \sum_{k} (H_{i k} - H_{j k}) α_{k}$ $\begin{array}{} \displaystyle {\varepsilon _{ij}} = ({H_{ii}} - {H_{ij}} - {H_{ji}} + {H_{jj}}){\alpha _i} - \sum\nolimits_k {({H_{ik}} - {H_{jk}}){\alpha _k}} \end{array}$, we designed the updating rule as follows: $α_{i}^{*} = \frac{γ_{2} (α_{i} + α_{j}) + (f_{i} - f_{j}) + ε_{i j}}{(H_{i i} - H_{i j} - H_{j i} + H_{j j}) + 2 γ_{2}}$ $\begin{array}{} \displaystyle \alpha_{i}^{*}=\frac{\gamma_{2}(\alpha_{i}+\alpha_{j})+(f_{i}-f_{j})+\varepsilon_{ij}}{(H_{ii}-H_{ij}-H_{ji}+H_{jj})+2\gamma_{2}} \end{array}$ and $α_{j}^{*} = α_{i} + α_{j} - α_{i}^{*}$ $\begin{array}{} \displaystyle \alpha_{j}^{*}=\alpha_{i}+\alpha_{j}-\alpha_{i}^{*} \end{array}$. In case the obtained $α_{i}^{*}$ $\begin{array}{} \displaystyle \alpha_{i}^{*} \end{array}$ and $α_{j}^{*}$ $\begin{array}{} \displaystyle \alpha_{j}^{*} \end{array}$ don’t meet the restraint α_q ≥ 0, we further set

${\begin{matrix} α_{i}^{*} = 0, α_{j}^{*} = α_{i} + α_{j} & i f γ_{2} (α_{i} + α_{j}) + (h_{i} - h_{j}) + ε_{i j} \leq 0 \\ α_{j}^{*} = 0, α_{i}^{*} = α_{i} + α_{j}, & i f γ_{2} (α_{i} + α_{j}) + (h_{j} - h_{i}) + ε_{i j} \leq 0. \end{matrix}$ $$\begin{array}{} \displaystyle \left\{ {\begin{array}{*{20}{l}} {\alpha _i^* = 0,\alpha _j^* = {\alpha _i} + {\alpha _j}}&{if{\gamma _2}({\alpha _i} + {\alpha _j}) + ({h_i} - {h_j}) + {\varepsilon _{ij}} \le 0}\\ {\alpha _j^* = 0,\alpha _i^* = {\alpha _i} + {\alpha _j},}&{if{\gamma _2}({\alpha _i} + {\alpha _j}) + ({h_j} - {h_i}) + {\varepsilon _{ij}} \le 0.} \end{array}} \right. \end{array}$$

The whole ontology algorithm is stated as follows:

Initialize: α⁽⁰⁾, θ⁽⁰⁾, $γ_{2}^{(0)}$ $\begin{array}{} \displaystyle \gamma_{2}^{(0)} \end{array}$ and $γ_{3}^{(0)}$ $\begin{array}{} \displaystyle \gamma_{3}^{(0)} \end{array}$. Set t = 0, construct $W^{0} = \sum_{r = 0}^{n} θ_{r}^{(0)} u_{r} u_{r}^{T}$ $\begin{array}{} \displaystyle {{\bf{W}}^0} = \sum\nolimits_{r = 0}^n {\theta _r^{(0)}{{\bf{u}}_r}{\bf{u}}_r^T} \end{array}$ and $W_{S}^{0} = \sum_{q = 1}^{m} α_{q}^{(0)} W_{q}$ $\begin{array}{} \displaystyle {\bf{W}}_S^0 = \sum\nolimits_{q = 1}^m {\alpha _q^{(0)}{{\bf{W}}_q}} \end{array}$.

Iterate:

Optimize $\begin{matrix} θ^{(t + 1)} \leftarrow a r g m i n_{θ} \frac{1}{N^{'}} \sum_{k = 1}^{N^{'}} g (y_{k} (1 - θ^{T} f_{k})) + \frac{γ_{1}}{2} {(W - W_{S}^{t})}_{F}^{2} + γ_{3}^{(t)} {(θ)}_{t} \end{matrix}$ $\begin{array}{l} {\theta ^{(t + 1)}} \leftarrow {\rm{arg mi}}{{\rm{n}}_\theta }\frac{1}{{{N^\prime }}}\sum\nolimits^{{N^\prime }}_{k = 1} {g({y_k}(1 - {\theta ^T}{f_k})) + \frac{{{\gamma _1}}}{2}\left\| {{\bf{W}} - {\bf{W}}_S^t} \right\|_F^2 + \gamma _3^{(t)}{{\left\| \theta \right\|}_t}} \end{array}$ and update $W^{(t + 1)} = \sum_{r = 1}^{n} θ_{r}^{(t + 1)} u_{r} u_{r}^{T}$ $\begin{array}{} \displaystyle {{\bf{W}}^{(t + 1)}} = \sum\nolimits_{r = 1}^n {\theta _r^{(t + 1)}{{\bf{u}}_r}{\bf{u}}_r^T} \end{array}$;

Optimize $α^{(t + 1)} \leftarrow arg \min_{α} \frac{γ_{1}}{2} ‖ W^{(t + 1)} - W_{S} ‖_{F}^{2} + \frac{γ_{2}^{(t)}}{2} ‖ α ‖_{2}^{2}$ $\begin{array}{} \displaystyle \alpha^{(t+1)}\gets{\rm arg}\min\limits_{\alpha}\frac{\gamma_{1}}{2}\|{\bf W}^{(t+1)}-{\bf W}_{S}\|_{F}^{2}+\frac{\gamma_{2}^{(t)}}{2}\|\alpha\|_{2}^{2} \end{array}$

and update $\begin{matrix} W_{S}^{t + 1} = \sum_{q = 1}^{m} α_{q}^{(t + 1)} W_{q} \end{matrix}$ $\begin{array}{l} {\bf W}_{S}^{t+1}=\sum\nolimits_{q=1}^{m}\alpha_{q}^{(t+1)}{\bf W}_{q} \end{array}$;

Determine $\begin{matrix} γ_{3}^{t + 1} = \frac{| ρ_{C} | (\frac{1}{N^{'}} \sum_{k = 1}^{N^{'}} g (y_{k} (1 - (θ^{t + 1})^{T} f_{k})) + \frac{γ_{1}}{2} ∥ W^{(t + 1)} - W_{S}^{(t)} ∥_{F}^{2})}{∥ θ^{(t + 1)} ∥_{1}} \end{matrix}$ $\begin{array}{} \displaystyle \gamma_{3}^{t+1}=\frac{|\rho_{C}|(\frac{1}{N'}\sum_{k=1}^{N'}g(y_{k}(1-(\theta^{t+1})^{T}f_{k}))+\frac{\gamma_{1}}{2}\|{\bf W}^{(t+1)}-{\bf W}_{S}^{(t)}\|_{F}^{2})}{\|\theta^{(t+1)}\|_{1}} \end{array}$;

Obtain $γ_{2}^{(t + 1)} = \frac{| ρ_{B} | (γ_{1} ‖ W^{(t + 1)} - W_{S}^{(t + 1)} ‖_{F}^{2})}{‖ α^{(t + 1)} ‖_{2}^{2}}$ $\begin{array}{} \displaystyle \gamma_{2}^{(t+1)}=\frac{|\rho_{B}|(\gamma_{1}\|{\bf W}^{(t+1)}-{\bf W}_{S}^{(t+1)}\|_{F}^{2})}{\|\alpha^{(t+1)}\|_{2}^{2}} \end{array}$;

t ← t + 1.

Until convergence.

Stability Analysis

In this section, we give the theoretical analysis of our ontology algorithm via stability assumption.

3.1

Uniform stability

Definition 1.

(Leave-One-Out) An ontology algorithm has uniform stability β₁ with respect to the ontology loss function l if the following holds

$\forall s \in Z^{m}, \forall i \in {1, \dots, m}, ‖ l (f_{s}, \cdot) - l (f_{s^{i}}, \cdot) ‖_{\infty} \leq β_{1},$ $$\begin{array}{} \displaystyle \forall s\in Z^{m},\quad \forall i\in\{1,\cdots,m\},\|l(f_{s},\cdot)-l(f_{s^{i}},\cdot)\|_{\infty}\le\beta_{1}, \end{array}$$(11)

where Z is the ontology sample space, f_s is the ontology function determined by the ontology algorithm learning with the set of samples s, and sⁱ = {z₁,··· ,z_i−1, z_i+1,··· ,z_m} denotes an ontology sample set with the i^′-th element z_i deleted.

Definition 2

(Leave-Two-Out) An ontology algorithm has uniform stability β₂ with respect to the ontology loss function l if the following holds

$\forall s \in Z^{m}, \forall i \in {1, \dots, m}, ‖ l (f_{s}, \cdot) - l (f_{s^{i, j}}, \cdot) ‖_{\infty} \leq β_{2},$ $$\begin{array}{} \displaystyle \forall s\in Z^{m},\quad \forall i\in\{1,\cdots,m\},\|l(f_{s},\cdot)-l(f_{s^{i,j}},\cdot)\|_{\infty}\le\beta_{2}, \end{array}$$(12)

where Z is the ontology sample space, f_s is the ontology function determined by the ontology algorithm learning with the set of samples s, and s^{i, j} is the ontology sample set given from s by deleting two elements z_i and z_j.

For any convex and differentiable ontology function F : ℱ → ℝ as follows (here ℱ denotes the Hilbert space): ∀ f, g ∈ ℱ ,B_F( f ||g) = F( f ) −F(g)−Tr(< f −g, ∇F(g) >), we have ∂ ℱ(∀ f ) = {g ∈ ℱ |∀ f^′ ∈ ℱ ,F( f^′)− F( f ) ≥ Tr(< f^′ − f′, δF( f ) >)}. Let δ F( f ) be any element of ∂ F(h). We infer 8∀ f , f^′ ∈ ℱ, B_F( f^′|| f ) = F( f^′) − F( f ) − Tr(< f^′ − f , ∇F( f ) >), B_F( f^′|| f ) ≥ 0 and B_P+Q = B_P + B_Q for any convex ontology functions P and Q.

Lemma 1

For any three distance metricsWandW^′, the following inequality established for any ontology sample z_i and zj

$| V (W, z_{i}, z_{j}) - V (W^{'}, z_{i}, z_{j}) | \leq 4 L M^{2} ‖ W - W^{'} ‖_{F}$ $$\begin{array}{} \displaystyle |V({\bf W},z_{i},z_{j})-V({\bf W}',z_{i},z_{j})|\le4LM^{2}\|{\bf W}-{\bf W}'\|_{F} \end{array}$$(13)

Next, we describe the LOO and LTO stability of our algorithm.

Theorem 2

Let β₁and β₂be the LOO and LTO stability of our ontology algorithm problem (2). Suppose that ||v||₂ ≤ M for any sample v. Then, we have

$β_{1} \leq \frac{32 L^{2} M^{4}}{γ_{1} N}, β_{2} \leq \frac{64 L^{2} M^{4}}{γ_{1} N}$ $$\begin{array}{} \displaystyle \beta_{1}\le\frac{32L^{2}M^{4}}{\gamma_{1}N},\quad \beta_{2}\le\frac{64L^{2}M^{4}}{\gamma_{1}N} \end{array}$$(14)

where L is the Lipschitz constant of the function g.

Proof

We only present the detailed proof of the first inequality, and the second one can be determined in the similar way. Let F_𝒩 (θ ) = P_𝒩 (θ ) + Q(θ ), where $P_{N} (θ) = \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} \sum_{i < j} V (A, z_{i}, z_{j})$ $\begin{array}{} \displaystyle {P_{\mathscr N}}(\theta ) = \frac{1}{{\left( {\begin{array}{*{20}{c}} N \\ 2 \\ \end{array}} \right)}}\sum\nolimits_{i < j} {V(A,{z_i},{z_j})} \end{array}$ and $Q (θ) = \frac{γ_{1}}{2} ‖ W - W_{S} ‖_{F}^{2} + γ_{3} ‖ θ ‖_{1}$ $\begin{array}{} \displaystyle Q(\theta)=\frac{\gamma_{1}}{2}\|{\bf W}-{\bf W}_{S}\|_{F}^{2}+ \gamma_{3}\|\theta\|_{1} \end{array}$. Clearly, both P_𝒩 (θ and Q(θ ) are convex. Suppose θ_𝒩 and θ_𝒩^′ be the minimizers of F_𝒩 (θ ) and F_𝒩^′(θ ) respectively, where 𝒩^′ is the set of ontology examples that deletes z_i ∈ 𝒩 from 𝒩 .

Note that

$B_{F_{N}} (θ_{N^{'}} | | θ_{N}) + B_{F_{N^{'}}} (θ_{N} | | θ_{N^{'}}) \geq B_{Q} (θ_{N^{'}} | | θ_{N}) + B_{Q} (θ_{N} | | θ_{N^{'}}) .$ $$\begin{array}{} \displaystyle B_{F_{\mathscr{N}}}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{F_{\mathscr{N}'}}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})\ge B_{Q}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{Q}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'}). \end{array}$$

Let ∆ = ||θ_𝒩^′||₁− < θ_𝒩,sgn(θ_𝒩^′) > +||θ_𝒩^′||₁− < θ_𝒩^′,sgn(θ_𝒩 ) > ≥ 0, sgn(θ ) = [sgn(θ₁),··· ,sgn(θ_𝒩)]^T . Hence, we have $\frac{\partial Q (θ_{N})}{\partial θ}$ $\begin{array}{} \displaystyle \frac{{\partial Q({\theta _{\mathscr N}})}}{{\partial \theta }} \end{array}$ where δ f (θ ) is the sub-gradient of ||θ||₁ and

$B_{Q} (θ_{N^{'}} | | θ_{N}) + B_{Q} (θ_{N} | | θ_{N^{'}}) = γ_{1} ‖ W_{N^{'}} - W_{N} ‖_{F}^{2} + γ_{3} △ .$ $$\begin{array}{} \displaystyle B_{Q}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{Q}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})=\gamma_{1}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}^{2}+\gamma_{3}\triangle. \end{array}$$

We have δ F_𝒩 (θ_𝒩 ) = δ F_𝒩^′(θ_𝒩^′) = 0 since θ_𝒩 and θ_𝒩^′ are minimizers of F_𝒩 (θ ) and F_𝒩^′(θ ). Using Lemma 1, we obtain

$\begin{matrix} γ_{1} ‖ W_{N^{'}} - W_{N} ‖_{F}^{2} \leq B_{F_{N}} (θ_{N^{'}} | | θ_{N}) + B_{F_{N^{'}}} (θ_{N} | | θ_{N^{'}}) \\ = & F_{N} (θ_{N^{'}}) - F_{N} (θ_{N}) - < θ_{N^{'}} - θ_{N}, \partial F_{N} (θ_{N}) > + F_{N^{'}} (θ_{N^{'}}) - F_{N^{'}} (θ_{N^{'}}) - < θ_{N} - θ_{N^{'}}, \partial F_{N^{'}} (θ_{N^{'}}) > \\ = & F_{N} (θ_{N^{'}}) - F_{N} (θ_{N}) + F_{N^{'}} (θ_{N^{'}}) - F_{N^{'}} (θ_{N^{'}}) \\ = & \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} (\sum_{N} V (W_{N^{'}}, z_{i}, z_{j}) - \sum_{N} V (W_{N}, z_{i}, z_{j}) + \sum_{N^{'}} V (W_{N}, z_{i^{'}}, z_{j}) - \sum_{N^{'}} V (W_{N^{'}}, z_{i^{'}}, z_{j})) \\ \leq & \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} (\sum_{N} | V (W_{N^{'}}, z_{i}, z_{j}) - V (W_{N}, z_{i}, z_{j}) | + \sum_{N^{'}} | V (W_{N}, z_{i^{'}}, z_{j}) - V (W_{N^{'}}, z_{i^{'}}, z_{j}) |) \\ \leq & \frac{8 L M^{2}}{N} ‖ W_{N^{'}} - W_{N} ‖_{F} . \end{matrix}$ $$\begin{array}{} \displaystyle &\quad&\gamma_{1}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}^{2}\le B_{F_{\mathscr{N}}}(\theta_{\mathscr{N}'}||\theta_{\mathscr{N}})+B_{F_{\mathscr{N}'}}(\theta_{\mathscr{N}}||\theta_{\mathscr{N}'})\\ &=&F_{\mathscr{N}}(\theta_{\mathscr{N}'})-F_{\mathscr{N}}(\theta_{\mathscr{N}})-<\theta_{\mathscr{N}'}-\theta_{\mathscr{N}},\partial F_{\mathscr{N}}(\theta_{\mathscr{N}})>+F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-<\theta_{\mathscr{N}}-\theta_{\mathscr{N}'},\partial F_{\mathscr{N}'}(\theta_{\mathscr{N}}')>\\ &=&F_{\mathscr{N}}(\theta_{\mathscr{N}'})-F_{\mathscr{N}}(\theta_{\mathscr{N}})+F_{\mathscr{N}'}(\theta_{\mathscr{N}'})-F_{\mathscr{N}'}(\theta_{\mathscr{N}'})\\ &=&\frac{1}{{N \choose 2}}(\sum_{\mathscr{N}}V({\bf W}_{\mathscr{N}'},z_{i},z_{j})-\sum_{\mathscr{N}}V({\bf W}_{\mathscr{N}},z_{i},z_{j})+\sum_{\mathscr{N}'}V({\bf W}_{\mathscr{N}},z_{i'},z_{j})-\sum_{\mathscr{N}'}V({\bf W}_{\mathscr{N}'},z_{i'},z_{j}))\\ &\le&\frac{1}{{N \choose 2}}(\sum_{\mathscr{N}}|V({\bf W}_{\mathscr{N}'},z_{i},z_{j})-V({\bf W}_{\mathscr{N}},z_{i},z_{j})|+\sum_{\mathscr{N}'}|V({\bf W}_{\mathscr{N}},z_{i'},z_{j})-V({\bf W}_{\mathscr{N}'},z_{i'},z_{j})|)\\ &\le&\frac{8LM^{2}}{N}\|{\bf W}_{\mathscr{N}'}-{\bf W}_{\mathscr{N}}\|_{F}. \end{array}$$

This implies that

$‖ W_{N} - W_{N^{'}} ‖_{F} \leq \frac{8 L M^{2}}{γ_{1} N} .$ $$\begin{array}{} \displaystyle \|{\bf W}_{\mathscr{N}}-{\bf W}_{\mathscr{N}'}\|_{F}\le\frac{8LM^{2}}{\gamma_{1}N}. \end{array}$$

By virtue of |V (W_{𝒩 ,z_i,z_j}) −V (W_{𝒩^′,z_i,z_j})| ≤ 4LM²||W_𝒩 −W_𝒩^′||_F, we deduce

$| V (W_{N}, z_{i}, z_{j}) - V (W_{N^{'}}, z_{i}, z_{j}) | \leq \frac{32 L M^{2}}{γ_{1} N} .$ $$\begin{array}{} \displaystyle |V({\bf W}_{\mathscr{N}},z_{i}, z_{j})-V({\bf W}_{\mathscr{N}'},z_{i},z_{j})|\le\frac{32LM^{2}}{\gamma_{1}N}. \end{array}$$

Therefore, the expected result is obtained.

Let 𝒩 be the ontology sample set and $V (W, z_{i}, z_{j}) = g (y_{i j} [1 - ‖ v_{i} - v_{j} ‖_{W}^{2}])$ $\begin{array}{} \displaystyle V({\bf W}, z_{i}, z_{j})=g(y_{ij}[1-\|v_{i}-v_{j}\|_{{\bf W}}^{2}]) \end{array}$. In this sub-section, the empirical ontology risk and expected ontology risk are denoted by $R_{N} (W) = \frac{1}{(\begin{matrix} N \\ 2 \end{matrix})} \sum_{i < j} V (W, z_{i}, z_{j})$ $\begin{array}{} \displaystyle {R_{\mathscr N}}({\bf{W}}) = \frac{1}{{\left( {\begin{array}{*{20}{c}} N \\ 2 \\ \end{array}} \right)}}\sum\nolimits_{i < j} {V({\bf{W}},{z_i},{z_j})} \end{array}$ and $R (W) = E (z_{i}, z_{j}) [V (W, z_{i}, z_{j})]$ $\begin{array}{} \displaystyle R({\bf{W}}) = ({z_i},{z_j})[V({\bf{W}},{z_i},{z_j})] \end{array}$, respectively. We will determine the generalization bound R(W) − R_𝒩 (W) in the next theorem. For this purpose, we should use the following McDiarmid inequality.

Theorem 3

[15] Let X₁,··· ,X_N be independent random variables, each taking values in a set A. Let ϕ : A^N → ℝ be such that for each i ∈ {1,··· ,N}, there exists a constant c_i > 0 such that

$\sup_{_{x_{1}, \dots, x_{N} \in A, x_i^{'} \in A}} | ϕ (x_{1}, \dots, x_{N}) - ϕ (x_{1}, \dots, x_{i - 1}, x_{i}^{'}, x_{i + 1}, \dots, x_{N}) | \leq c_{i} .$ $$\begin{array}{} \displaystyle \mathop {\sup }\limits_{\_\{ {x_1}, \cdots ,{x_N} \in A,{{x'}_i} \in A\} } |\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{i - 1}},x',{x_{i + 1}}, \cdots ,{x_N})| \le {c_i}. \end{array}$$(15)

Then for any ε > 0,

$P {ϕ (X_{1}, \dots, X_{N}) - E {ϕ (X_{1}, \dots, X_{N})} \geq ε} \leq e^{- 2 ε^{2} / \sum_{i = 1}^{N} c_{i}^{2}} .$ $$\begin{array}{} \displaystyle {\bf P}\{\phi(X_{1},\cdots,X_{N})-{\bf E}\{\phi(X_{1},\cdots,X_{N})\}\ge\varepsilon \}\le e^{-2\varepsilon^{2}/\sum_{i=1}^{N}c_{i}^{2}}. \end{array}$$(16)

The generalization error bound via uniform stability is presented as follows.

Theorem 4

Let 𝒩 be a set of N randomly selected ontology samples andW_𝒩 be the ontology distance matrix determined by (2). With probability at least 1 − δ, we have

$R (W_{N}) - R_{N} (W_{N}) \leq \frac{128 L^{2} M^{4}}{γ_{1} N} + ς \sqrt{\frac{\ln \frac{1}{δ}}{2 N}},$ $$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\frac{128L^{2}M^{4}}{\gamma_{1}N}+\varsigma\sqrt{\frac{\ln\frac{1}{\delta}}{2N}}, \end{array}$$(17)

where

$ς = \frac{128 L^{2} M^{4} + 4 γ_{1} g_{W_{s}} + 16 \sqrt{2 γ_{1}} L M^{2} \sqrt{g_{W_{s}} + γ_{3} ‖ γ_{3} ‖_{1}}}{γ_{1}} .$ $$\begin{array}{} \displaystyle \varsigma = \frac{{128{L^2}{M^4} + 4{\gamma _1}{g_{{{\bf{W}}_s}}} + 16\sqrt {2{\gamma _1}} L{M^2}\sqrt {{g_{{{\bf{W}}_s}}} + {\gamma _3}{{\left\| {{\gamma _3}} \right\|}_1}} }}{{{\gamma _1}}}. \end{array}$$

The method to proof Theorem 4 mainly followed by [16–19], we skip the detailed proof here.

3.2

Strong and weak stabilities

Naturally, the stability in uniform version is too restrictive for most learning algorithms, and only a small number of literatures presented that standard ontology learning algorithms met the uniform stability directly, most of these ontology learning algorithms were uncertain. Thus, we are inspired to consider the other “almost everywhere stability” beyond uniform stability in our ontology setting. We define strong and weak stabilities for our ontology framework which are also good measures to show how robust a ontology algorithm is. We assume 0 < δ₃,δ₄ < 1 in this subsection.

Definition 3

(Strong Stability) Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by f_s, and let l be an ontology loss function. Let β₃ : ℕ → ℝ and sⁱ be the ontology sample set which v_i is replaced by $v_{i}^{'}$ $\begin{array}{} \displaystyle v_{i}^{'} \end{array}$. We say that ontology algorithm A has β₃ loss stable with respect to ontology loss l if for all n ∈ ℕ, $v_{i}^{'} \in V$ $\begin{array}{} \displaystyle v_{i}^{'}\in V \end{array}$, i ∈ {1,··· ,n}, we have,

$| l (f_{s}, \cdot) - l (f_{s^{i}}, \cdot) | \leq β_{3},$ $$\begin{array}{} \displaystyle |l(f_{s},\cdot)-l(f_{s^{i}},\cdot)|\le\beta_{3}, \end{array}$$(18)

We say that the ontology algorithm A has strong loss stability β₃ if

$ℙ {A is β_{3} loss stable at s} \geq 1 - δ .$ $$\begin{array}{} \displaystyle \left\{ {A\quad {\rm{is}}\quad {\beta _3}\quad {\rm{loss}}\quad {\rm{stable}}\quad {\rm{at}}\quad s} \right\} \ge 1 - \delta . \end{array}$$(19)

Definition 4

(Weak Stability) Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by f_s, and let l be an ontology loss function. Let β₄ : ℕ → ℝ. We say that our ontology algorithm A has weak loss stability β₄ if for all n ∈ ℕ, i ∈ {1,··· ,n}, we have

$ℙ {| l (f_{s}, \cdot) - l (f_{s^{i}}, \cdot) | \leq β_{3}} \geq 1 - δ^{'} .$ $$\begin{array}{} \displaystyle \left\{ {\left| {l({f_s}, \cdot ) - l({f_{{s^i}}}, \cdot )} \right| \le {\beta _3}} \right\} \ge 1 - \delta '. \end{array}$$

We present the following lemma which is a fundamental for proving the results on strong and weak stability.

Lemma 5

(Kutin [22]) Let X₁,··· ,X_N be independent random variables, each taking values in a set C. There is a “bad” subset B ⊆ C, where ℙ(x₁,··· ,x_N ∈ B) = δ. Let ϕ : C^N → ℝ be such that for each k ∈ {1,··· ,N}, there exists b ≥ c_k > 0 such that

$\begin{matrix} sup_{x_{1}, \dots, x_{N} \in C - B, x_{k}^{'} k \in C} | ϕ (x_{1}, \dots, x_{N}) - ϕ (x_{1}, \dots, x_{k - 1}, x_{k}^{'}, x_{k + 1}, \dots, x_{N}) | \leq c_{k}, \\ sup_{x_{1}, \dots, x_{N} \in C, x_{k}^{'} k \in C} | ϕ (x_{1}, \dots, x_{N}) - ϕ (x_{1}, \dots, x_{k - 1}, x_{k}^{'}, x_{k + 1}, \dots, x_{N}) | \leq b . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {\mathop {{\rm{sup}}}\limits_{{x_1}, \cdots ,{x_N} \in C - B,x_k^\prime k \in C} \left| {\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{k - 1}},x_k^\prime ,{x_{k + 1}}, \cdots ,{x_N})} \right| \le {c_k},}\\ {\mathop {{\rm{sup}}}\limits_{{x_1}, \cdots ,{x_N} \in C,x_k^\prime k \in C} \left| {\phi ({x_1}, \cdots ,{x_N}) - \phi ({x_1}, \cdots ,{x_{k - 1}},x_k^\prime ,{x_{k + 1}}, \cdots ,{x_N})} \right| \le b.} \end{array} \end{array}$$

Then for any ε > 0,

$ℙ {| ϕ (X_{1}, \dots, X_{N}) - E {ϕ (X_{1}, \dots, X_{N})} | \geq ε} \leq 2 (e^{- ε^{2} / 8 \sum_{N}^{i = 1} c_{i}^{2}} + \frac{N^{2} b δ}{\sum_{N}^{i = 1} c_{k}}) .$ $$\begin{array}{} \displaystyle \{ |\phi ({X_1}, \cdots ,{X_N}) - \{ \phi ({X_1}, \cdots ,{X_N})\} | \ge \varepsilon \} \le 2\left( {{e^{ - {\varepsilon ^2}/8\sum\nolimits_N^{i = 1} {c_i^2} }} + \frac{{{N^2}b\delta }}{{\sum\nolimits_N^{i = 1} {{c_k}} }}} \right). \end{array}$$

Lemma 6

(Kutin [22]) Let X₁,··· ,X_N be independent random variables, each taking values in a set C. Let ϕ : C^N → ℝ be such that for each k ∈ {1,··· ,N}, there satisfies two condition inequalities in Lemma 1 by substituting $\frac{λ_{k}}{N}$ $\begin{array}{} \displaystyle \frac{{{\lambda _k}}}{N} \end{array}$ for c_k, and substituting e^−KN for δ . If 0 < e ≤ min_kT (b,λ_k,K), and N ≥ max_k∆(b,λ_k,K,ε), then

$\begin{matrix} ℙ {| ϕ (X_{1}, \dots, X_{N}) - E {ϕ (X_{1}, \dots, X_{N})} | \geq ε} \leq 4 e^{- ε^{2} N^{2} / 40 \sum_{N}^{i = 1} λ_{i}^{2}} . \\ T (b, λ_{k}, K) = \min {\frac{15 λ_{k}}{2}, 4 λ_{k} \sqrt{K}, \frac{λ_{k}^{2} K}{b}}, \\ Δ (b, λ_{k}, K, ε) = \max {\frac{b}{λ_{k}}, λ_{k} \sqrt{40}, 3 (\frac{24}{K} + 3) \ln (\frac{24}{K} + 3), \frac{1}{ε}} . \end{matrix}$ $$\begin{array}{} \displaystyle \begin{array}{*{20}{c}} {\{ |\phi ({X_1}, \cdots ,{X_N}) - \{ \phi ({X_1}, \cdots ,{X_N})\} | \ge \varepsilon \} \le 4{e^{ - {\varepsilon ^2}{N^2}/40\sum\nolimits_N^{i = 1} {\lambda _i^2} }}.} \\ {T(b,{\lambda _k},K) = \min \{ \frac{{15{\lambda _k}}}{2},4{\lambda _k}\sqrt K ,\frac{{\lambda _k^2K}}{b}\} ,} \\ {\Delta (b,{\lambda _k},K,\varepsilon ) = \max \{ \frac{b}{{{\lambda _k}}},{\lambda _k}\sqrt {40} ,3(\frac{{24}}{K} + 3)\ln (\frac{{24}}{K} + 3),\frac{1}{\varepsilon }\} .} \\ \end{array} \end{array}$$

The main result in this subsection is stated as follows.

Theorem 7

Let A be our ontology algorithm whose output on an ontology training sample Z is denoted by f_s. Let l be an ontology loss function such that 0 ≤ l( f ,·) ≤ Ξ for all f and

$M^{*} = 2 β_{3} + \frac{4 g_{W_{S}}}{N} + [\frac{8 \sqrt{2} L M^{2} (\sqrt{g_{W_{S}} + γ_{3} (‖ θ_{S} ‖_{1} - ‖ θ_{N} ‖_{1})} + \sqrt{g_{W_{S}} + γ_{3} (‖ θ_{S} ‖_{1} ‖_{1})})}{\sqrt{γ_{1}} N}] .$ $$\begin{array}{} \displaystyle {M^*} = 2{\beta _3} + \frac{{4{g_{{{\bf{W}}_S}}}}}{N} + [\frac{{8\sqrt 2 L{M^2}(\sqrt {{g_{{{\bf{W}}_S}}} + {\gamma _3}({\theta _S}{_1} - {\theta _{\mathscr N}}{_1})} + \sqrt {{g_{{{\bf{W}}_S}}} + {\gamma _3}({\theta _S}{_1}{_1})} )}}{{\sqrt {{\gamma _1}} N}}]. \end{array}$$

1) Let β₃such that our ontology algorithm A has strong loss stability (β₃,δ₁). Then for any 0 < δ < 1, with probability at least 1 − δ, we have

$R (W_{N}) - R_{N} (W_{N}) \leq Ξ + \sqrt{8 N {(M^{*})}^{2} \ln \frac{2 {(M^{*})}^{2}}{8 {(M^{*})}^{2} - 4 N Ξ δ_{1}}} .$ $$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\Xi+\sqrt{8N(M^{*})^{2}\ln\frac{2(M^{*})^{2}}{8(M^{*})^{2}-4N\Xi\delta_{1}}}. \end{array}$$

2) Let β₄such that our ontology algorithm A has weak loss stability (β₄,δ₂). And if

$0 < ε \leq \min {\frac{15 N (M^{*})}{2}, 4 N M^{*} \sqrt{\frac{ln (1 / δ_{1})}{N}}, \frac{N^{2} (M^{*}))^{2} \frac{In (1 / δ_{2})}{N}}{2 Ξ}},$ $$\begin{array}{} \displaystyle 0<\varepsilon\le\min\{\frac{15N(M^{*})}{2},4NM^{*}\sqrt{\frac{{\rm ln}(1/\delta_{1})}{N}},\frac{N^{2}(M^{*}))^{2}\frac{{\rm In}(1/\delta_{2})}{N}}{2\Xi}\}, \end{array}$$

and

$N \geq \max {\frac{2 Ξ}{N (M^{*})}, N (M^{*}) \sqrt{40}, 3 (\frac{24 N}{ln (1 / δ_{2})} + 3) In (\frac{24 N}{In (1 / δ_{2})} + 3), \frac{1}{ε}} .$ $$\begin{array}{} \displaystyle N\ge\max\{\frac{2\Xi}{N(M^{*})},N(M^{*})\sqrt{40},3(\frac{24N}{{\rm ln}(1/\delta_{2})}+3){\rm In}(\frac{24N}{{\rm In}(1/\delta_{2})}+3),\frac{1}{\varepsilon}\}. \end{array}$$

Then, for any 0 < δ < 1, with probability at least 1 − δ, we have

$R (W_{N}) - R_{N} (W_{N}) \leq Ξ + \sqrt{\frac{40 N {(M^{*})}^{2})^{2} \ln (\frac{4}{δ})}{N}} .$ $$\begin{array}{} \displaystyle R({\bf W}_{\mathscr{N}})-R_{\mathscr{N}}({\bf W}_{\mathscr{N}})\le\Xi+\sqrt{\frac{40N(M^{*})^{2})^{2}\ln(\frac{4}{\delta})}{N}}. \end{array}$$

The method to proof Theorem 7 is mainly followed by [20, 21], we skip the detailed proof here.

Experiments

In this section, we design five simulation experiments respectively concerning ontology measure and ontology mapping. In our experiment, we select the ontology loss function as the square loss. To make sure the accuracy of the comparison, we ran our algorithm in C++ through available LAPACK and BLAS libraries for linear algebra and operation computations. We implement five experiments on a double-core CPU with a memory of 8GB.

4.1

Ontology similarity measure experiment on plant data

We use O₁, a plant “PO” ontology in the first experiment. It was constructed in www.plantontology.org. We use the structure of O₁ presented in Fig. 1. P@N (Precision Ratio see Craswell and Hawking [5]) to measure the quality of the experiment data. At first, the closest N concepts for every vertex on the ontology graph in plant field was given by experts. Then we gain the first N concepts for every vertex on ontology graph by our algorithm, and compute the precision ratio.

Meanwhile, we apply ontology methods in [12], [13] and [14] to the “PO” ontology. Then after getting the average precision ratio by means of these three algorithms, the results with our algorithm are compared. Parts of the data can be referred to Table 1.

Table 1

Tab. 1.The Experiment Results of Ontology Similarity measure

	P@3 average precision ratio	P@5 average precision ratio	P@10 average precision ratio
Our Algorithm	0.5358	0.6517	0.8821
Algorithm in [12]	0.4549	0.5117	0.5859
Algorithm in [13]	0.4282	0.4849	0.5632
Algorithm in [14]	0.4831	0.5635	0.6871

When N =3, 5 or 10, the precision ratio gained from our algorithms are a little bit higher than the precision ratio determined by algorithms proposed in [12], [13] and [14]. Furthermore, the precision ratios show it tends to increase apparently as N increases. As a result, our algorithms is proved to be better and more effective than those raised by [12], [13] and [14].

4.2

Ontology mapping experiment on humanoid robotics data

“Humanoid robotics” ontologies O₂ and O₃ are used in the second experiment. The structure of O₂ and O₃are respectively presented in Fig. 2 and Fig. 3. The leg joint structure of bionic walking device for six-legged robot is presented by the ontology O₂. The exoskeleton frame of a robot with wearable and power assisted lower extremities is presented by the ontology O₃.

We set the experiment, aiming to get ontology mapping between O₂ and O₃. P@N Precision Ratio is taken as a measure for the quality of experiment. After applying ontology algorithms in [24], [13] and [14] on “humanoid robotics” ontology and getting the average precision ratio, the precision ratios gained from these three methods are compared. Some results can refer to Table 2.

Table 2

Tab. 2. The Experiment Results of Ontology Mapping

	P@1 average precision ratio	P@3 average precision ratio	P@5 average precision ratio
Our Algorithm	0.2778	0.5000	0.7667
Algorithm in [24]	0.2778	0.4815	0.5444
Algorithm in [13]	0.2222	0.4074	0.4889
Algorithm in [14]	0.2778	0.4630	0.5333

When N = 1, 3 or 5, the precision ratios gained from our new ontology algorithm are higher than the precision ratios determined by algorithms proposed in [24], [13] and [14]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms shows much more efficiency than those raised by [24], [13] and [14].

4.3

Ontology similarity measure experiment on biology data

Gene “GO” ontology O₄ is used in the third experiment, which was constructed in the website http: //www. geneontology. We present the structure of O₄ in Figure 4. Again, P@N is chosen as a measure for the quality of the experiment data. Then we apply the ontology methods in [13], [14] and [25] to the “GO” ontology. Then after getting the average precision ratio by means of these three algorithms, the results with our algorithm are compared. Parts of the data can be referred to Table 3.

Table 3

Tab. 3. The Experiment Results of Ontology Similarity measure

	P@3 average precision ratio	P@5 average precision ratio	P@10 average precision ratio	P@20 average precision ratio
Our Algorithm	0.4987	0.6364	0.7602	0.8546
Algorithm in [13]	0.4638	0.5348	0.6234	0.7459
Algorithm in [14]	0.4356	0.4938	0.5647	0.7194
Algorithm in [25]	0.4213	0.5183	0.6019	0.7239

When N = 3, 5 or 10, the precision ratios gained from our ontology algorithms are higher than the precision ratios determined by algorithms proposed in [13], [14] and [25]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms turn out to have more effectiveness than those raised by [13], [14] and [25].

4.4

Ontology mapping experiment on physics education data

“Physics education” ontologies O₅ and O₆ are used in the fourth experiment. We respectively present the structures of O₅ and O₆ in Fig. 5 and Fig. 6.

We set the experiment, aiming to give ontology mapping between O₅ and O₆. P@N precision ratio is taken as a measure for the quality of the experiment. Ontology algorithms are applied in [13], [14] and [26] on “physics education” ontology. The precision ratio gotten from the three methods is compared. Some results can be referred to Table 4.

Table 4

Tab. 4. The Experiment Results of Ontology Mapping

	P@1 average precision ratio	P@3 average precision ratio	P@5 average precision ratio
Our Algorithm	0.6913	0.7556	0.9161
Algorithm in [13]	0.6129	0.7312	0.7935
Algorithm in [14]	0.6913	0.7556	0.8452
Algorithm in [26]	0.6774	0.7742	0.8968

When N = 1, 3 or 5, the precision ratio in terms of our new ontology mapping algorithms are much higher than the precision ratio determined by algorithms proposed in [13], [14] and [26]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms shows more effectiveness than those raised by [13], [14] and [26].

4.5

Ontology mapping experiment on university data

“University” ontologies O₇ and O₈ are applied in the last experiment. We present the structures of O₇ and O₈ in Fig. 7 and Fig. 8.

We set the experiment, aiming to give ontology mapping between O₇ and O₈. P@N precision ratio is taken as a criterion to measure the quality of the experiment. Ontology algorithms are applied in [12], [13] and [14] on “University” ontology. The precision ratios gotten from the three methods are compared. Some results can be referred to Table 5.

Table 5

Tab. 5. The Experiment Results of Ontology Mapping

	P@1 average precision ratio	P@3 average precision ratio	P@5 average precision ratio
Our Algorithm	0.5714	0.6786	0.7714
Algorithm in [12]	0.5000	0.5952	0.6857
Algorithm in [13]	0.4286	0.5238	0.6071
Algorithm in [14]	0.5714	0.6429	0.6500

When N = 1, 3 or 5, the precision ratios in terms of our new ontology mapping algorithms are much higher than the precision ratios determined by algorithms proposed in [12], [13] and [14]. Furthermore, the precision ratios show they tend to increase apparently as N increases. As a result, our algorithms turn out to have more effectiveness than those raised by [12], [13] and [14].

Conclusions

In this paper, the new ontology learning framework and its optimal approaches are manifested for ontology similarity calculating and ontology mapping. The new ontology algorithm is based on distance function learning tricks. Also, the stability analysis and generalized bounding computation of ontology learning algorithm are presented. Finally, simulation data in five experiments show that our new ontology learning algorithm has high efficiency in these engineering applications. The distance learning based ontology algorithm proposed in our paper illustrates the promising application prospects for multiple disciplines.

eISSN:: 2444-8656
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: Volume Open
Fachgebiete der Zeitschrift:: Biologie, andere, Mathematik, Angewandte Mathematik, Allgemeines, Physik

Zeitschrift RSS Feed

Ontology optimization tactics via distance calculating

Online veröffentlicht: 29. Jan. 2016

Seitenbereich: 159 - 174

Eingereicht: 03. Okt. 2015

Akzeptiert: 28. Jan. 2016

DOI: https://doi.org/10.21042/AMNS.2016.1.00012

© 2016 Yun Gao, Mohammad Reza Farahani, Wei Gao

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8