1. bookAHEAD OF PRINT
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
Open Access

A study of local smoothness-informed convolutional neural network models for image inpainting

Published Online: 14 Apr 2022
Volume & Issue: AHEAD OF PRINT
Page range: -
Received: 25 Dec 2021
Accepted: 10 Apr 2022
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
Introduction

Image inpainting aims to fill the undetectable domain, which is applicable to object removal, restoration for corrupted pictures, etc. Factually, we can only use the known information to estimate the undetectable information to obtain an inpainting that integrates into the surroundings. In other words, the inpainting can be viewed as a prediction process.

Diffusion is one of the main ideas for inpainting, which is to fill the undetectable domain using surroundings in a diffusion law. Diffusion-based inpainting such as [1,2,3] is completed via solving a nonlinear diffusion equation. The total variation (TV) inpainting model of Chan and Shen [2] is well-known for image inpainting and was developed into a Curvature-Driven Diffusions (CDD) inpainting model [3] later by the authors. The TV was used for some regularity and was earlier designed for noise removal with the aim to keep edges of the images by Rudin et al. [4]. The convolutional algorithm proposed by Oliveira et al. [5] is also a diffusion approach.

Diffusion-based algorithms perform well for smooth regions. For textured images, the exemplar matching algorithms perform well; see studies [6, 7], for example. The main idea of the exemplar matching is to search for the most similar patches to fill the undetectable domain. In recent years, deep learning showed prosperities in more challenging inpainting tasks. The Context Encoders (CE) proposed by Pathak et al. [8] suggest generative adversarial networks (GAN) [9] on restoring the complex features and motivated many studies. Yeh et al. [10] adopted the poisson blending [11] to the second training for the generative neural network to enhance the inpainting. Unlike [10] which improves inpainting performance via twice training, Iizuka et al. [12] considered the combination of global and local discriminations simultaneously. For the GAN-based inpainting, it is mainly the feature discrimination enhances the performance of the generative model.

In this paper, we propose local smoothness-informed (regularized) convolutional neural network (CNN) models for image inpainting and then provide a comparative study of them with various versions of the TV model. In contract with the artificially defined convolutional kernels, the learned kernels are determined by image itself. To determine the convolutional kernels, we consider two instructive knowledge. One is the TV. Traditionally, the TV-based inpainting is completed via solving the associated Euler-Lagrange equation iteratively and is used for noise removal as well. For inpainting without noise removal, a TV term may still perform as a regularization. We propose to study a H1 or H2 regularization term based inpainting as well (see a similar idea in [10]).

By deep learning method, the inpainting is determined by the training data in the detectable domain. Hence, for an image which composes of various gray intensities and structures, it is difficult or even impossible to discover the representation for the global image. Additionally, only relevant data to the surroundings of the undetectable domain contributes to feature estimation. Since the computational time consuming is in proportion to the image size, we consider the local image containing the damaged domain as the input with the aim to mostly use the relevant data and mostly reduce the computational time. In implementation unlike [10], this H1 or H2 regularization we study is applied only locally around the boundary of the undetectable region in order to enhance the smoothness of the image from the detectable part to the undetectable part of the image. In addition, only the observed data of a single image input is used in this comparative study. We will not study (in fact people do not usually have) those numerous nonlocal data for training as in most neural network-based inpainting. We will only consider the CNN with single image training data and discuss the effect of the TV and our proposed simple local (smoothness enhanced) H1 or H2 regularization associated with the traditional least square loss function.

Generally, this paper presents the following studies:

A deep learning method for the Euler-Lagrange equation, which differs from the point based numerical iterative method. The learning method without complex derivatives on the surrounding and fills the domain automatically.

Exploration of the local H1, H2 regularization on mostly connected domain to the undetectable domain for the learning.

Without extra images for learning that saves the training time.

The rest of this paper is organized as follow. In Section 2, we mathematically describe the inpainting problem. In Section 3, we describe the CNN method with TV and our proposed regularization terms and numerical computational details for the image inpainting. Results are presented in Section 4. Discussions and conclusions are stated in Section 5.

Problem setup

With merely a damaged image, inpainting is to reconstruct the information of the damaged parts using the observed data of the image. The inpainting can be mathematically described as follow.

Consider a damaged gray-level image on domain Ω in Fig. 1, divide into an undetectable domain Ω1 and a detectable domain Ω2. Denote u(x, y) as the complete gray-level image, (x, y) ∈ Ω as the coordinate. Define the undetectable value of Ω1 as 0, a damaged image can be represented as f(x,y)=h(x,y)u(x,y), f(x,y) = h(x,y)u(x,y), where h(x, y) has value 0 in Ω1 and 1 in Ω2. Inpainting aims to discover the gray values of Ω1.

Fig. 1

A damaged image. Ω1 is the undetectable domain, Ω2 is the detectable domain.

Methodology

For the inpainting problem, we explore a CNN approach with the TV and local smoothness constraints separately. The main idea of the local smoothness preservation is to condition the derivatives of the surroundings around the undetectable domain on the given image at the training stage. The following parts of this section present the designed CNN for inpainting and the computational details.

Convolutional neural network

CNN has been designed since the 1980s and was later diversely developed, for example, the well-known CNN developed by Fukushima [13] and Zhang et al. [14]. We utilize CNN to the gray-level image inpainting and design the CNN with the same sized output to the input for every layer to avoid the data loss. The CNN accepts one damaged image and outputs one inpainted image; the image is convoluted by the kernels and activated from input to output. The kernels stride over the image by one step to complete the convolution for the whole image and the Rectified Linear Units [15] is used as the activation operator. The main computational details and architecture of the CNN are as follow.

Consider a CNN (denoted as g(x, y; w, b)) with L layers consist of kernels with odd size, the gray value at location (x, y) of the ith output of the (l + 1)th layer reads gil+1(x,y;wil+1,bil+1)=σ(k=1Klp=ssq=ssgkl(x+p,y+q;wil,bil)wikl+1(c+p,c+q)+bil+1)s=S12,c=S+12, \matrix{ {g_i^{l + 1}(x,y;w_i^{l + 1},b_i^{l + 1}) = \sigma \left( {\sum\limits_{k = 1}^{{K^l}} \sum\limits_{p = - s}^s \sum\limits_{q = - s}^s g_k^l(x + p,y + q;w_i^l,b_i^l)w_{ik}^{l + 1}(c + p,c + q) + b_i^{l + 1}} \right)} \hfill \cr {\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;s = {{S - 1} \over 2},c = {{S + 1} \over 2},} \hfill \cr } where σ is an activation operator, Kl is the amount of the outputs of the lth layer, wikl+1 w_{ik}^{l + 1} denotes the kth kernel corresponding to the ith output of the (l + 1)th layer, bil+1 b_i^{l + 1} denotes the ith component of the bias vector of the (l + 1)th layer, gl is the input of the l + 1 layer, l = 0, ⋯, L − 1, S is the kernel size, s is the convolutional stride depart from the center of the kernel, c is the center of the kernel.

Training the CNN aims to learn the kernels and biases using the loss function presented in Section 3.2.

Instructive knowledge

Let g(x, y) as the unknown function representing the inpainted image. Inpainting aims to determine g(x, y). The main idea of the deep learning methods for image inpainting is to generate an estimation with the available data of Ω2 and train the neural network (finding the kernels and biases) via the loss minimization process. The basic loss function is the least square total loss Ld=Ωh(gu)2dxdy. {L_d} = \int_\Omega h{(g - u)^2}dxdy. However, this loss function lacks the connectivity conservation between Ω1 and its surroundings. Without connectivity, the estimation in Ω1 has weaker smoothness, which usually results in odd or irregular estimations. TV is wildly used for edge maintaining or low-order smoothness, which was earlier designed for noise removal by Rudin et al. [4]. The TV-based energy functional for inpainting (see, e.g., [2, 16]) with f = hu is given by Lcs=Ω(h2(gu)2+λ|g|)dxdy, {L_{cs}} = \int_\Omega \left( {{h \over 2}{{(g - u)}^2} + \lambda |\nabla g|} \right)dxdy, where λ > 0 is a weighted coefficient, ∇ is the gradient operator. We may choose a very small λ if noise removal is not a major task.

The variational problem to find a solution g is to minimize the energy functional (4). Ultimately, determining g is to solve the associated Euler-Lagrange equation h(gu)λ(g|g|2+ε)=0 h(g - u) - \lambda \nabla \cdot \left( {{{\nabla g} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right) = 0 with ɛ = 0. Here ɛ > 0 is a small parameter for numerical implementation to avoid dividing by zero. We set ɛ according to the flatness of the detectable domain since the estimation depends on the given information, and ɛ is given by ε=e1AΩ2|g|2n2dxdy, \varepsilon = {e^{ - {1 \over A}\int_{{\Omega _2}} {{|\nabla g{|^2}} \over {{n^2}}}dxdy}}, where A is the total area of Ω, n > 0 is used to adjust the gradient magnitude. The given parameter ɛ ∈ (0, 1] depends inversely on the flatness to conserve the TV role which follows the idea of edge detection [17]. For comparison in implementation, we actually minimize the shifted TV of (4), which is |g|2+ε. \sqrt {|\nabla g{|^2} + \varepsilon } . We note that the parameter ɛ is to avoid dividing by zero in minimizing (4). Traditionally, the Euler-Lagrange equation is solved in a numerical iterative algorithm. Let g(x,y;t)t=h(g(x,y;t)u)λ(g(x,y;t)|g(x,y;t)|2+ε) {{\partial g(x,y;t)} \over {\partial t}} = h(g(x,y;t) - u) - \lambda \nabla \cdot \left( {{{\nabla g(x,y;t)} \over {\sqrt {|\nabla g(x,y;t{{)|}^2} + \varepsilon } }}} \right) denotes the spatial variant on temporal level, t is an artificially introduced temporal variable for numerical realization. The final solution g satisfies g(x,y;t)t0 {{\partial g(x,y;t)} \over {\partial t}} \to 0 . However, for the convenience of using the optimization algorithm we mainly apply to train the CNN in this paper (see next subsection), we convert the Euler-Lagrange equation to the minimization problem of the following loss function Leu=Ω|h(gu)λ(g|g|2+ε)|2dxdy. {L_{eu}} = \int_\Omega {\left| {h(g - u) - \lambda \nabla \cdot \left( {{{\nabla g} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right)} \right|^2}dxdy. Consequently, solving the variational problem for the energy functional (4) shifts to the minimization problem of the loss function (8). Note that there might be slight differences between their solutions, due to the addition of small ɛ.

The TV on the whole domain Ω was initially for noise removal. For image inpainting without noise removal to the partial unknown domain, it mainly to enhance the connectivity between the corrupted domain and the detectable domain. In this paper, we explore a local H1 or H2 regularization for the connectivity, which features one of the necessary geometrical details of the images. The regularization is to condition the low-order derivatives of the sub-domain of detectable domain Ω2 that mostly connected to Ω1 on the given function f (inspired by [10]) to preserve the various levels of smoothness to enhance connection between Ω1 and its surroundings. With this point, g subjects to Dg(x,y)=Du(x,y),(x,y)Ω3, Dg(x,y) = Du(x,y),\quad (x,y) \in {\Omega _3}, where D is a first or second order differential operator, Ω3 ⊂ Ω2 is around Ω1. Assume f ∈ ℂ23). Theoretically, the first order differential operator ∇ (which we call local H1 regularization) is able to enhance the smoothness of g approaching to f, and the second order differential operator Δ (which we call local H2 regularization) conducts a smoother connection.

Denote Vg(x, y) = Dg(x, y), Vu(x, y) = Du(x, y). Combining the loss function Ld, determining g is to minimize the following loss LD=Ωh(gu)2dxdy+βΩ3|VgVu|2dxdy, {L_D} = \int_\Omega h{(g - u)^2}dxdy + \beta \int_{{\Omega _3}} |{V_g} - {V_u}{|^2}dxdy, β is a penalty coefficient.

Algorithm

In this section, we present the computational details of deep learning models mentioned in previous section for image inpainting. The integral is discretely represented in a summation and derivatives are approximated by finite differences. Avoiding data loss, we pad zero sides to the boundaries of the image matrix.

Denote f = f (x, y) as the input image, w as the weighted matrixes representing the kernels, b as the bias vectors, g = g(x, y; w, b) as the estimation of the CNN, in this subsection (x, y) ∈ [0, W) × [0, W) ⊂ ℕ × ℕ as the coordinate of any pixel or discrete point of the image and W as the length of the boundary of the image.

Computing the gradient for the TV, we tend to use the Sobel operator [18] {Sx, Sy} to approximate the first order partial derivatives gx and gy, where Sx=[101202101],Sy=[121000121]. {S_x} = \left[ {\matrix{ { - 1} & 0 & 1 \cr { - 2} & 0 & 2 \cr { - 1} & 0 & 1 \cr } } \right],\quad {S_y} = \left[ {\matrix{ 1 & 2 & 1 \cr 0 & 0 & 0 \cr { - 1} & { - 2} & { - 1} \cr } } \right]. With this operator, ∀(x, y) ∈ [0, W) × [0, W), gx(x,y)(Sx*g)(x,y),gy(x,y)(Sy*g)(x,y), {g_x}(x,y) \approx ({S_x}*g)(x,y),{g_y}(x,y) \approx ({S_y}*g)(x,y), where * is the convolution operator symbol.

Computing the second order partial derivatives, we consider the convolutions twice by the operators Sx and Sy, so the divergence is approximated as (g|g|2+ε)(x,y)Sx*(gx|g|2+ε)(x,y)+Sy*(gy|g|2+ε)(x,y) \nabla \cdot \left( {{{\nabla g} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right)(x,y) \approx {S_x}*\left( {{{{g_x}} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right)(x,y) + {S_y}*\left( {{{{g_y}} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right)(x,y)

A H1 or H2 regularization in Ω3 is respectively denoted as |VgVu|2=|gu|2 |{V_g} - {V_u}{|^2} = |\nabla g - \nabla u{|^2} or |VgVu|2=|ΔgΔu|2. |{V_g} - {V_u}{|^2} = |\Delta g - \Delta u{|^2}.

Near the boundary of Ω1, it is not convenient to compute derivatives with the operators Sx, Sy, so we approximate the derivatives ∀(x, y) ∈ Ω3 as follow gx(x,y)(D1*g)(x,y),gy(x,y)(D1T*g)(x,y)ux(x,y)(D1*u)(x,y),uy(x,y)(D1T*u)(x,y), \matrix{ {{g_x}(x,y) \approx ({D_1}*g)(x,y),{g_y}(x,y) \approx (D_1^T*g)(x,y)} \hfill \cr {{u_x}(x,y) \approx ({D_1}*u)(x,y),{u_y}(x,y) \approx (D_1^T*u)(x,y),} \hfill \cr } where D1=[1,1]. {D_1} = [ - 1,1]. For the rectangular damaged domain considered in this paper, the second order partial derivatives are approximated by second-order central difference operator, or applying D1 or D1T D_1^T twice.

Let d = [0, W) × [0, W), d2d as the discrete Ω2, d3d2 as the discrete Ω3. With the instructive knowledge presented in the above subsection, determining g is to solve the optimal problems minw,bL˜d,minw,bL˜eu,minw,bL˜cs,minw,bL˜D \mathop {\min }\limits_{w,b} {\tilde L_d},\mathop {\min }\limits_{w,b} {\tilde L_{eu}},\mathop {\min }\limits_{w,b} {\tilde L_{cs}},\mathop {\min }\limits_{w,b} {\tilde L_D} separately, where L˜d=1W2(x,y)dh(gu)2L˜eu=1W2(x,y)d[h(gu)λ(g|g|2+ε)]2L˜cs=1W2(x,y)d[h2(gu)2+λgx2+gy2+ε]L˜D=1W2(x,y)dh(gu)2+βW2(x,y)d3|VgVu|2. \matrix{ {{{\tilde L}_d} = {1 \over {{W^2}}}\sum\limits_{(x,y) \in d} h{{(g - u)}^2}} \hfill \cr {{{\tilde L}_{eu}} = {1 \over {{W^2}}}\sum\limits_{(x,y) \in d} {{\left[ {h(g - u) - \lambda \nabla \cdot \left( {{{\nabla g} \over {\sqrt {|\nabla g{|^2} + \varepsilon } }}} \right)} \right]}^2}} \hfill \cr {{{\tilde L}_{cs}} = {1 \over {{W^2}}}\sum\limits_{(x,y) \in d} \left[ {{h \over 2}{{(g - u)}^2} + \lambda \sqrt {g_x^2 + g_y^2 + \varepsilon } } \right]} \hfill \cr {{{\tilde L}_D} = {1 \over {{W^2}}}\sum\limits_{(x,y) \in d} h{{(g - u)}^2} + {\beta \over {{W^2}}}\sum\limits_{(x,y) \in {d_3}} |{V_g} - {V_u}{|^2}.} \hfill \cr }

We adopt the Adam [19] algorithm to update the parameters. Computational process is stated with loss function L˜D {\tilde L_D} in Algorithm 1.

Smoothness-Informed Deep Learning for Image Inpainting

Input: damaged gray-level image f (x, y).
Output: inpainted image g(x, y).
1:Given required parameters α, β1, β2 for Adam algorithm.
2:Initialize parameters w, b.
3:while not converged do
4:    Produce estimation: gg(x, y; w, b).
5:    Compute the loss: L1W2(x,y)dh(gu)2+βW2(x,y)d3|VgVu|2 L \leftarrow {1 \over {{W^2}}}\sum\nolimits_{(x,y) \in d} h{(g - u)^2} + {\beta \over {{W^2}}}\sum\nolimits_{(x,y) \in {d_3}} |{V_g} - {V_u}{|^2} .
6:    Update the parameters using Adam algorithm on TensorFlow [20].
7:return trained parameters w, b.
Results

We present the implementation details and results of the smoothness-informed deep learning for image inpainting in this section. The algorithm is implemented in Python. We test these models using photos captured by ourself in Fig. 3, the images from [16] for comparison in Fig. 4. The shape of the damaged domain and size are referenced to [10, 16]. For the damaged blocks in Fig. 3, we size the input image as 40 × 40 and size the damaged domain as 24 × 24. For the damaged blocks in Fig. 4, we size the input image as 60 × 60 and size the damaged domain as 32 × 32. The inpainting results in Figs. 5 and 6. We evaluate the inpainting by the Peak Signal to Noise Ratio (PSNR) [21], Structural Similarity (SSIM) [22], which are defined by PSNR(g,u)=10log10(2552MSE), PSNR(g,u) = 10lo{g_{10}}\left( {{{{{255}^2}} \over {MSE}}} \right), SSIM(g,u)=(2μgμu+C1)(2σgu+C2)(μg2+μu2+C1)(σg2+σu2+C2), SSIM(g,u) = {{(2{\mu _g}{\mu _u} + {C_1})(2{\sigma _{gu}} + {C_2})} \over {(\mu _g^2 + \mu _u^2 + {C_1})(\sigma _g^2 + \sigma _u^2 + {C_2})}}, where MSE(g,u)=1W2i=0W1j=0W1(gi,jui,j)2 MSE(g,u) = {1 \over {{W^2}}}\sum\limits_{i = 0}^{W - 1} \sum\limits_{j = 0}^{W - 1} {({g_{i,j}} - {u_{i,j}})^2} is the mean square error, g and u denote the inpainted image and the original image respectively. We remark that g and u only contain the inpainted domain and W denotes the boundary length of this domain. A higher PSNR means less noise and a larger SSIM means a higher structure similarity. In implementation, the constants C1 = 0.050625, C2 = 0.2601.

Discussions and conclusions

In this study, we designed a smoothness-informed CNN approach for image inpainting, the TV and the local H1 or H2 regularization (or smoothness) was considered. Deep learning for the Euler-Lagrange equation relevant to the TV inpainting model was investigated. Quantitative evaluation results in Table 1 show that the H2 regularization can work or perhaps achieve better estimation than the TV regularization for local image inpainting.

Fig. 2

Computational Framework. The 0th layer is the input (damaged image), the 39 hidden layers contain the kernels and biases, each hidden layer has 16 neurons, the last layer outputs the estimation of the real image. W × W denotes the size of the image. The upper backward line means each output back to the input layer for next training. At the first training, g0 = f. Each formula denotes the gray value of each point of the image after convolution and activation, *denotes the convolution symbol.

Fig. 3

Images captured by the authors. (a) denotes the complete images, (b) denotes the damaged images.

Fig. 4

The left image is from [16], the right one is the damaged image.

Fig. 5

Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 0.001), LD(H2, β = 0.001).

Fig. 6

Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 10, 100, 10, 100, 100 orderly from the first row to the last row), LD(H2,β = 0.001, 0.001, 0.001, 0.1, 1 orderly from the first row to the last row).

Quantitative evaluation for the inpainting in Fig. 2/Fig. 3

ModelsQuantityPSNRSSIM

Ld8/513.0833/12.92840.9979/0.7977
Leu8/513.1586/13.62700.9983/0.9960
Lcs8/513.1694/13.90760.9987/0.9985
LDH18/513.1419/13.62650.9966/0.9871
LDH28/513.1984/14.53620.9974/0.9990

The inpainted images in Fig. 5 show that the local H1 or H2 regularized or smoothness-informed model and Euler-Lagrange equation formed model may provide sharper structures especially for textures. Nevertheless, more tests are necessary to ensure it.

The deep learning algorithm for the TV inpainting model conducts better PSNR and SSIM values for the same image in comparison to (may influenced by inpainting regions and numbers) the Split Bregman algorithm in [16].

Only detectable data of an image to be used for learning by our method, which is not limited by the training dataset. The smoothness-informed CNN is applicable to both smooth and some texture images with appropriate detectable data in the damaged image, and can deal with large size as studied in [10, 16] without numerous images for training. The local H1 or H2 regularization enhances the capability of the CNN learning from limited detectable data in a single image.

This study mainly to show that, the deep CNN-based image inpainting, without TV, local H1 or H2 regularization may also enhance the connectivity and uniqueness to mostly recover the damaged or missing parts of image. The regularization coefficient and local smoothness domain play a vital role to the inpainting quality.

Fig. 1

A damaged image. Ω1 is the undetectable domain, Ω2 is the detectable domain.
A damaged image. Ω1 is the undetectable domain, Ω2 is the detectable domain.

Fig. 2

Computational Framework. The 0th layer is the input (damaged image), the 39 hidden layers contain the kernels and biases, each hidden layer has 16 neurons, the last layer outputs the estimation of the real image. W × W denotes the size of the image. The upper backward line means each output back to the input layer for next training. At the first training, g0 = f. Each formula denotes the gray value of each point of the image after convolution and activation, *denotes the convolution symbol.
Computational Framework. The 0th layer is the input (damaged image), the 39 hidden layers contain the kernels and biases, each hidden layer has 16 neurons, the last layer outputs the estimation of the real image. W × W denotes the size of the image. The upper backward line means each output back to the input layer for next training. At the first training, g0 = f. Each formula denotes the gray value of each point of the image after convolution and activation, *denotes the convolution symbol.

Fig. 3

Images captured by the authors. (a) denotes the complete images, (b) denotes the damaged images.
Images captured by the authors. (a) denotes the complete images, (b) denotes the damaged images.

Fig. 4

The left image is from [16], the right one is the damaged image.
The left image is from [16], the right one is the damaged image.

Fig. 5

Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 0.001), LD(H2, β = 0.001).
Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 0.001), LD(H2, β = 0.001).

Fig. 6

Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 10, 100, 10, 100, 100 orderly from the first row to the last row), LD(H2,β = 0.001, 0.001, 0.001, 0.1, 1 orderly from the first row to the last row).
Inpainting results. (a) column is the complete local image, (b) column is the damaged local image, (c)–(g) columns are the outputs of the CNN trained with Ld, Leu(λ = 0.001, n = 100), Lcs(λ = 0.001, n = 100), LD(H1, β = 10, 100, 10, 100, 100 orderly from the first row to the last row), LD(H2,β = 0.001, 0.001, 0.001, 0.1, 1 orderly from the first row to the last row).

Smoothness-Informed Deep Learning for Image Inpainting

Input: damaged gray-level image f (x, y).
Output: inpainted image g(x, y).
1: Given required parameters α, β1, β2 for Adam algorithm.
2: Initialize parameters w, b.
3: while not converged do
4:     Produce estimation: gg(x, y; w, b).
5:     Compute the loss: L1W2(x,y)dh(gu)2+βW2(x,y)d3|VgVu|2 L \leftarrow {1 \over {{W^2}}}\sum\nolimits_{(x,y) \in d} h{(g - u)^2} + {\beta \over {{W^2}}}\sum\nolimits_{(x,y) \in {d_3}} |{V_g} - {V_u}{|^2} .
6:     Update the parameters using Adam algorithm on TensorFlow [20].
7: return trained parameters w, b.

Quantitative evaluation for the inpainting in Fig. 2/Fig. 3

Models Quantity PSNR SSIM

Ld 8/5 13.0833/12.9284 0.9979/0.7977
Leu 8/5 13.1586/13.6270 0.9983/0.9960
Lcs 8/5 13.1694/13.9076 0.9987/0.9985
LDH1 8/5 13.1419/13.6265 0.9966/0.9871
LDH2 8/5 13.1984/14.5362 0.9974/0.9990

Bertalmio M, Sapiro G, Caselles V, Ballester C. Image Inpainting, In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques (SIGGRAPH’00), ACM Press, USA, 2000 pp. 417–424. https://doi.org/10.1145/344779.344972. BertalmioM SapiroG CasellesV BallesterC Image Inpainting In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques (SIGGRAPH’00) ACM Press, USA 2000 417 424 https://doi.org/10.1145/344779.344972. 10.1145/344779.344972 Search in Google Scholar

Chan T F, Shen J. Mathematical Models for Local Nontexture Inpaintings, SIAM Journal on Applied Mathematics, 2002, 62(3), pp. 1019–1043. https://doi.org/10.2307/3061798. ChanT F ShenJ Mathematical Models for Local Nontexture Inpaintings SIAM Journal on Applied Mathematics 2002 62 3 1019 1043 https://doi.org/10.2307/3061798. 10.1137/S0036139900368844 Search in Google Scholar

Chan T F, Shen J. Nontexture Inpainting by Curvature-Driven Diffusions. Journal of Visual Communication and Image Representation, 2001, 12, 436–449. https://doi.org/10.1006/jvci.2001.0487. ChanT F ShenJ Nontexture Inpainting by Curvature-Driven Diffusions Journal of Visual Communication and Image Representation 2001 12 436 449 https://doi.org/10.1006/jvci.2001.0487. 10.1006/jvci.2001.0487 Search in Google Scholar

Rudin L I, Osher S, Fatemi E. Nonlinear Total Variation Based Noise Removal Algorithms, Physica D: Nonlinear Phenomena, 1992, 60(1–4), pp. 259–268. RudinL I OsherS FatemiE Nonlinear Total Variation Based Noise Removal Algorithms Physica D: Nonlinear Phenomena 1992 60 1–4 259 268 10.1016/0167-2789(92)90242-F Search in Google Scholar

Oliveira M M, Bowen B, Mckenna R, Chang Y S. Fast Digital Image Inpainting, Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP 2001), Marbella, Spain, 2001. OliveiraM M BowenB MckennaR ChangY S Fast Digital Image Inpainting Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP 2001) Marbella, Spain 2001 Search in Google Scholar

Efros A A, Leung T K. Texture Synthesis by Non-Parametric Sampling, Proceedings of the Seventh IEEE International Conference on Computer Vision 2, 1999, pp. 1033–1038. https://doi.org/10.1109/ICCV.1999.790383. EfrosA A LeungT K Texture Synthesis by Non-Parametric Sampling Proceedings of the Seventh IEEE International Conference on Computer Vision 2 1999 1033 1038 https://doi.org/10.1109/ICCV.1999.790383. 10.1109/ICCV.1999.790383 Search in Google Scholar

Xu Z, Sun J. Image Inpainting by Patch Propagation Using Patch Sparsity, In: IEEE Trans Image Processing, 2010, 19(5), pp. 1153–1165. https://doi.org/10.1109/TIP.2010.2042098. XuZ SunJ Image Inpainting by Patch Propagation Using Patch Sparsity In: IEEE Trans Image Processing 2010 19 5 1153 1165 https://doi.org/10.1109/TIP.2010.2042098. 10.1109/TIP.2010.204209820129864 Search in Google Scholar

Pathak D, Krähenbühl P, Donahue J, Darrell T, Efros A A. Context Encoders: Feature Learning by Inpainting, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2536–2544. https://doi.org/10.1109/CVPR.2016.278. PathakD KrähenbühlP DonahueJ DarrellT EfrosA A Context Encoders: Feature Learning by Inpainting 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Las Vegas, NV, USA 2016 2536 2544 https://doi.org/10.1109/CVPR.2016.278. 10.1109/CVPR.2016.278 Search in Google Scholar

Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative Adversarial Nets, In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), MIT Press, Cambridge, MA, USA, 2014, 2, pp. 2672–2680. GoodfellowI J Pouget-AbadieJ MirzaM XuB Warde-FarleyD OzairS CourvilleA BengioY Generative Adversarial Nets In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14) MIT Press, Cambridge, MA, USA 2014 2 2672 2680 Search in Google Scholar

Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M, Do, M N. Semantic Image Inpainting with Deep Generative Models, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6882–6890. https://doi.org/10.1109/CVPR.2017.728. YehR A ChenC LimT Y SchwingA G Hasegawa-JohnsonM DoM N Semantic Image Inpainting with Deep Generative Models 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu, HI, USA 2017 6882 6890 https://doi.org/10.1109/CVPR.2017.728. 10.1109/CVPR.2017.728 Search in Google Scholar

Pérez P, Gangnet M, Blake A. Poisson Image Editing. In ACM SIGGRAPH 2003 Papers (SIGGRAPH’03), Association for Computing Machinery, New York, NY, USA, 2003, pp. 313–318. https://doi.org/10.1145/1201775.882269. PérezP GangnetM BlakeA Poisson Image Editing In ACM SIGGRAPH 2003 Papers (SIGGRAPH’03), Association for Computing Machinery New York, NY, USA 2003 313 318 https://doi.org/10.1145/1201775.882269. 10.1145/1201775.882269 Search in Google Scholar

Iizuka S, Simo-Serra E, Ishikawa H. Globally and Locally Consistent Image Completion, ACM Trans Graphics, 2017, 36(4), pp. 1–14. https://doi.org/10.1145/3072959.3073659. IizukaS Simo-SerraE IshikawaH Globally and Locally Consistent Image Completion ACM Trans Graphics 2017 36 4 1 14 https://doi.org/10.1145/3072959.3073659. 10.1145/3072959.3073659 Search in Google Scholar

Fukushima K. Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition, Neural Networks, 1988, 1, pp. 119–130. https://doi.org/10.1016/0893-6080(88)90014-7. FukushimaK Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition Neural Networks 1988 1 119 130 https://doi.org/10.1016/0893-6080(88)90014-7. 10.1016/0893-6080(88)90014-7 Search in Google Scholar

Zhang W, Itoh K, Tanida J, Ichioka Y. Parallel Distributed Processing Model with Local Space-Invariant Interconnections and Its Optical Architecture, Applied Optics, 1990, 29(32), pp. 4790–4797. https://doi.org/:10.1364/AO.29.004790. ZhangW ItohK TanidaJ IchiokaY Parallel Distributed Processing Model with Local Space-Invariant Interconnections and Its Optical Architecture Applied Optics 1990 29 32 4790 4797 https://doi.org/:10.1364/AO.29.004790. 10.1364/AO.29.00479020577468 Search in Google Scholar

Nair V, Hinton G E. Rectified Linear Units Improve Restricted Boltzmann Machines, In: Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10). Omnipress, Madison, WI, USA, 2010, 807–814. NairV HintonG E Rectified Linear Units Improve Restricted Boltzmann Machines In: Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10) Omnipress, Madison, WI, USA 2010 807 814 Search in Google Scholar

Pascal G. Total Variation Inpainting Using Split Bregman, Image Processing On Line, 2012, 2, pp. 147–157. https://doi.org/10.5201/ipol.2012.g-tvi. PascalG Total Variation Inpainting Using Split Bregman Image Processing On Line 2012 2 147 157 https://doi.org/10.5201/ipol.2012.g-tvi. 10.5201/ipol.2012.g-tvi Search in Google Scholar

Perona P, Malik J. Scale-Space and Edge Detection Using Anisotropic Diffusion, In: IEEE Trans Pattern Analy Machine Intell., 1990, 12(7), pp. 629–639. https://doi.org/10.1109/34.56205. PeronaP MalikJ Scale-Space and Edge Detection Using Anisotropic Diffusion In: IEEE Trans Pattern Analy Machine Intell 1990 12 7 629 639 https://doi.org/10.1109/34.56205. 10.1109/34.56205 Search in Google Scholar

Sobel I. An Isotropic 3x3 Image Gradient Operator, Presentation at Stanford A.I. Project 1968. SobelI An Isotropic 3x3 Image Gradient Operator Presentation at Stanford A.I. Project 1968 Search in Google Scholar

Kingma D P, Ba J. Adam: A Method for Stochastic Optimization, CoRR, 2015, abs/1412.6980. KingmaD P BaJ Adam: A Method for Stochastic Optimization CoRR 2015 abs/1412.6980. Search in Google Scholar

Abadi M, Barham, P, Chen J M, et al., TensorFlow: A System for Large-Scale Machine Learning, In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 2016, pp. 265–283. AbadiM BarhamP ChenJ M TensorFlow: A System for Large-Scale Machine Learning In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16) USENIX Association, USA 2016 265 283 Search in Google Scholar

VQEG, Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, Mar., 2000, http://www.vqeg.org/. VQEG Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment Mar. 2000 http://www.vqeg.org/. Search in Google Scholar

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image Quality Assessment: From Error Visibility to Structural Similarity, 2004, IEEE TIP. WangZ BovikA C SheikhH R SimoncelliE P Image Quality Assessment: From Error Visibility to Structural Similarity 2004 IEEE TIP 10.1109/TIP.2003.819861 Search in Google Scholar

Recommended articles from Trend MD