Cite

Introduction

The tomb murals buried underground thousands of years reflect the living conditions, social customs, and artistic tastes of the ancient royal vivid. Because the tomb murals are different from these kinds of murals such as the cave murals and temple murals, the cave murals are mainly effected by windy, dusty, and ultraviolet light, and the temple murals are mainly effected by human-made damage. The tomb murals are completely enclosed spaces before being excavated, and the residual information is highly reliable. But they are too huge to excavate once for a whole mural. So, the archaeologists always make the whole mural into several blocks to move to the museum. It caused a lack of information between the mural blocks, which affected the coherence and integrity of the entire mural. Figure 1 is the blocks of playing polo mural from Crown Prince Zhanghuai.

Fig. 1

Blocks of the tomb mural in Zhanghuai‘s tomb

This mural is 6.88 m long and 2.29 m high and dived into 5 blocks. There is some information missing between the neighbours. It is difficult to restore the mural directly because it may lead to stitching dislocation. The first thing should do is that using information extension get the missing parts to help reconstruct the whole mural. The current method of computer-aided ancient murals for digital information restoration is digital image inpainting. It uses the residual mural information to diffuse to the filling edges and fills the information to the missing parts like peeling onions, and finally completes the mural information reconstruction. This kind of technology is mainly completed from two technical directions. On the one hand, the inpainting algorithms based on partial differential equations (PDE) use different diffusion equations to spread the residual information to the missing parts. Such as J Shen and TF Chan [1] proposed a diffusion equation of Total Variation (TV) to solve the diffusion limited optimization problem. Chan et al. [2] added Euler-Lagrange equation to Total Variation to constrain the diffusion direction. Humphrey et al. [3] added auxiliary variables and replaced a function of one variable with a function of two variables to improve inpainting efficiency. Barbu T. et al. [4] proposed a novel TV model based on the second order PDE using the nonlinear second order the Euler-Lagrange diffusion model. They are all belonging to the PDE kind of algorithm. The biggest advantage of these kinds of algorithms is that it is convenient and quick to use, but the disadvantage is that it makes blurry when the missing part is large as the interspace between the mural blocks. On the other hand, it is a texture synthesis model based on exemplar filling. Such as Criminisi et al. [5] proposed an algorithm based on exemplar filling that compares the priority values of each exemplar. [6] changed the exemplars source from global to local using the content perception Markova model. Siadati S Z et al. [7] redefined the priority value of Criminisi's algorithm by adding the weight of structure tensor. They are all belonging to the exemplar-based algorithm. The disadvantage is that the filling of a large number of similar exemplars will produce a mosaic effect. Whether the inpainting algorithm based on PDE or exemplar filling does not help to rebuild the whole mural information because the limit residual information and the large missing parts between the blocks. It is a more challenging task than recovering the deleted part of the image in the past. It needs to deduce the texture information of the extension part based on the information inside, and the generated information needs to be as real as possible. There are some new techniques to help generate the information using deep learning, such as S. Iizuka et al. [8] set up a CGAN to generate the information by adding predefined conditions, and Sabini M et al. [9]set up a DCGAN to generate the information out of the image. That kind of information generation algorithm is good for getting the losing information between the blocks. The generative adversarial network is suitable as a deep learning technique compared with the inpainting methods, and it has an excellent data fitting effect, which is manifested in higher fitting efficiency and sharper effect of generated image samples. But the traditional GAN is difficult to train and unstable, so it needs to be modified to achieve better results for the huge tomb mural restoration. So in this paper, we set up a mural generating method based on DCGAN to restore the missing information, and change the construction of the layers in convolution and deconvution by optimizing the nonlinear activation function. After adjusting the pooling layer of the structure in GANs, the tomb mural blocks extend to the two sides of the mural and get the missing information.

Set up the mural information generating network

The generative adversarial network belongs to the typical deep learning network [10]. The main parts are the generator module G and the discriminator module D occupied two pillar positions respectively in the network architecture. The generator constantly captures input image samples and generates samples similar to frame number images itself. The discriminator makes accurate judgments and scores the input data. The work of the generator and discriminator is iterated repeatedly, and the final result is more and more realistic samples generated by the generator, but it is more and more difficult for the discriminator to distinguish between true and false input data. The aim is to get the Nash equilibrium which is reached that the image generated by the generator is infinitely close to the distribution of the real image, while the discriminator can‘t distinguish the new image with the original image, and the prediction probability of the image output by the generator is close to 1/2 [11]. The following figure 2 is the essential GAN structure.

Fig. 2

GAN Network Structure Diagram

Both the generator model and the discriminator model shown in Figure 2 can be constructed with differentiable functions. The sampling of the original image is used as the input data of the identification model, and the random variable is used as the input data of the generation model. The output is as close as possible to the sample G(Z) of the real data distribution set. The goal of the discriminator is to realize the dichotomy identification of data sources. If the input of the discriminator comes from real data, the judgment is true; if the input source is sample G(Z), the judgment is false. The goal of the generator is to make the D(G(Z)) displayed on the discriminator of the fake data G(Z) generated by itself be the same as the D(G(Z)) displayed on the discriminator of the real data x.

There are more and more extension models based on GANs, and different models have different improvements [12]. The comparison of the different typical GANs has different characteristics, and they are shown in table 1.

The comparison for different GANS

GANS type Improvement points Highlight improvement effect
f-GAN Distance metric The f-divergence metric was used for the discriminator model
WGAN Distance metric Wasserstein distance measurement is adopted to improve the stability of network training and prevent network crash effectively
WGAN-GP Distance metric Optimize the objective function of the identification model
EB-GAN Energy model There are more options for network structure and loss functions
PG-GAN Add incentive convergence The training efficiency is improved to obtain high-quality and diverse generated images
LAP-GAN Add CGAN layer to the Laplace pyramid Increase the number of pixels in the resulting image
CVAE-GAN Fused VAE and GAN Assign a specific label to the image
SGAN Learn both a single generation model and a single semi-supervised classification The classification efficiency of experimental data set is improved and the training time scheduling is more flexible
CGAN Add additional information to the target function in the network Customize the type of image generation
DCGAN Add deep convolution layer and deconvolution layer Recognize more advanced image features to improve network stability performance

The proposal of deep convolution generating confrontation network (DCGAN) largely overcomes the technical difficulties of unstable training of original generative adversarial network and prone model collapse. Therefore, this paper adopts this network model to carry out mural information restoration. The whole structure of the generator is a convolutional neural network structure [13]. Its input is different from that of an ordinary convolutional neural network, which uses images to replace random noise. The convolutional layer mainly uses deconvolution to replace convolution operation. The activation function for the output layer is set to Tanh and the activation functions for the other layers are set to nonlinear ReLU. The discriminative model eliminates the full connection layer and uses another functional layer to replace the full connection layer. The convolutional layer set up in this way can also achieve a good sampling effect. Similar to the generative model, a batch normalization operation is added at all the layers except for the first input, setting the generative function for each layer to nonlinear LeakyReLU. Figure 3 is the network structure of DCGAN.

Fig. 3

The network structure of DCGAN

The proposal of this network has a great promotion effect on the development of GAN, and the organic combination of convolutional neural network (CNN) and generative adversarial network (GAN) ensures the quality and diversity of images generated by this kind of structure. In the training process of DCGAN, the special technique of batch normalization (BN) is adopted to make the training process more stable and reliable. After each convolutional layer or deconvolution layer, the corresponding activation function is adopted to effectively solve the problem of gradient disappearance. But the (BN) may generate bad artifact, so it is needed to and more nonlinear layers to make the network more complex. So it is needed to normalize the image batches to a typical Gaussian noise which values from zero to one. By removing the pooling layer, the convolutional neural network can retain the advanced feature information of the image more effectively. The most important thing is that it should take balance to the generator and discriminator training throughout the training process.

The nonlinear optimization for generating mural information
The training scheme for the tomb mural blocks

This paper chooses DCGAN as the essential (D,G)construction, the generative model G uses the encoder-decoder convolutional neural network, and the discriminative model D uses marching convolution repeatedly to down-sampling the image for binary classification. It has a training heart CNN to capture the features of the mural blocks. The way of dimensionality reduction in the deep convolutional networks is similar to that of Galerkin reduced order model (ROM) [14], Variational Autoencoder (VAE) [15] and cluster-based reduced order model (CROM) [16]. Nevertheless, while CNN is able to capture the nonlinear image features since the effective loss functions are non-linear.

The following figure 4 is the flow chart for mural information generation based on DCGAN.

Fig. 4

The information generation flow chart on tomb mural

For each training image data Id, we pre-processed them to Numpy array to get In and Ip as the testing group and training group. By running generator G to load Ip to obtain the output image Io = G(Ip) ∈ [0,1]128×128×3, and the inside of the generator G includes 13 layers convolution and 2 layers deconvolution. The activation function of convolution is chosen nonlinear LeakyReLU and the activation function of deconvolution is chosen nonlinear ReLU, and the activation function of the final output layer is chosen nonlinear Tanh. The inside of the discriminator D just includes 5 layers convolution and its activation function is nonlinear LeakyReLU, and the output layer is Flatten. This GAN will generate new mural information constantly to discriminate.

The nonlinear optimization based on DCGAN
The design of the generator G

Since the aim is to predict the original size of the tomb mural, and the convolution often reduces the size of the mural images, a 2-layer transposed convolution is used to restore the original size, and the transposed convolution can generate images from specific input data.

Convolution function

Conv is to calculate the sum of N-dimensional convolution: The Layer-input is the input of each layer; Filtersize is the size of the filter; and Kernel -size choose 4; strides choose 2; activation function is LeakyReLU; the dilation of the convolution kernel expands is 1.

AtrousConvolution2-D is to calculate four-dimensional input and the 4-dimensional convolution kernel operations, and do a 2-D convolution for the input data. All the parameters are the same as Conv. The definition of the convolution function for the generator G shows in figure 5.

Fig. 5

The setting of convolution layers

Deconvolution function

Deconv is to calculate the sum of N-dimensional deconvolution: The Layer-input is the input of each layer; Filter-size is the size of the filter; and Kernel-size choose 3; strides choose 2; activation function is ReLU. The define of the deconvolution function for the generator G shows in figure 6.

Fig. 6

The setting of deconvolution layers

For the generator G, the original encoder-decoder structure is still maintained, and the convolution is expanded to increase the receiving field of neurons and improve authenticity. By doing 13 times convolutions and 2 times deconvolutions for the original mural image data the total parameter construction shows below.

The detailed parameters of the generator are shown in Table 2 below.

Construction parameter list of generator G

input filter kemel strides dilation output
Conv1 g_shape 64 4 1 1 g1
Conv2 g1 128 4 2 1 g2
Conv3 g2 256 4 2 1 g3
Conv4 g3 512 4 1 1 g4
Conv5 g4 512 4 1 1 g5
Conv6 g5 512 4 1 2 g6
Conv7 g6 512 4 1 4 g7
Conv8 g7 512 4 1 8 g8
Conv9 g8 512 4 1 16 g9
Conv10 g9 512 4 1 1 g10
Conv11 g10 512 4 1 1 g11
Deconv12 g11 256 4 2 1 g12
Deconv13 g12 128 4 2 1 g13
Conv14 g13 128 4 1 1 g14
Conv15 g14 64 4 1 1 g15
The design of the discriminator D Convolution function

The convolution function for discriminator D is the same as the generator G.

Dcrm-loss is the loss function for discriminator, set two tensors y-true and y-pred. Set Il is the left part of Id, and set Il I_l^\prime is the right part of Id. Let it fast flip around the ordinate axis to help the input Dl have the feedback in the left domain. For the aim of generating the prediction on Id, the discriminator D calculate the Dg(Id), Dl(Il) and Dl(Il) {D_l}\left( {I_l^\prime} \right) separately. And put these three outputs into the connector C, and then get the final result of the discriminator D. p=C(Dg(Ig)Dl(Il)Dl(Ir)) p = C({D_g}({I_g})\parallel {D_l}({I_l})\parallel {D_l}(I_r^\prime))

The discriminator uses 5 layers of 5 × 5 convolutional layers to repeatedly down-sample the mural images and performs dichotomy. The final parameters of the discriminator are shown in table 3 below.

Construction parameter list of discriminator D

input filter kemel strides output
Conv1 d_input 32 5 2 d1
Conv2 d1 64 5 2 d2
Conv3 d2 64 5 2 d3
Conv4 d3 128 5 2 d4
Conv5 d4 128 5 2 d5

The discriminator is used for distinguishing the real mural images Id from the generated outputs mural images Io. In the discriminator, Dg(Id) = 1 if the targeted outputs are accepted while Dg(Id) = 0 if they are rejected.

Training and define the loss function

In general, the training process in GAN can be considered as a minimization-maximization problem based on across entropy loss function J(D,G): minGmaxDEIdpdata(Id)[logD(Id)]+Eμpμ[log(1D(G(μ)))] \mathop {\min }\limits_G \mathop {\max }\limits_D {E_{{I_d} \sim {p_{data}}({I_d})}}[logD({I_d})] + {E_{\mu \sim {p_\mu }}}[log(1 - D(G(\mu )))]

Where pu(u) is a prior distribution for the random datasetu, and pdata(Id) is the corresponding probability data distribution for the targeted outputs Id. In particular, the min-max process consists of two steps. Stept 1: Given a fixed generator, updating the discriminator by maximizing the discriminator function max J(D)((θ)(D),(θ )(G)). J(D)(θ(D),θ(G))max=EIdPdata(Id)[logD(Id,θ(D))]+Eμpμ(μ)[log(1D(G(μ)),θ(G),θ(D))] {J^{(D)}}{({\theta ^{(D)}},{\theta ^{(G)}})_{\max }} = {E_{{I_d} \sim {P_{data}}({I_d})}}[logD({I_d},{\theta ^{(D)}})] + {E_{\mu \sim {p_\mu }(\mu )}}[log(1 - D(G(\mu )),{\theta ^{(G)}},{\theta ^{(D)}})]

Stept 2: Updating the generator by minimizing the generator function min J(G)((θ)(D),(θ)(G)). J(G)(θ(D),θ(G))min=Eμpμ(μ)[log(1D(G(μ)))] {J^{(G)}}{({\theta ^{(D)}},{\theta ^{(G)}})_{\min }} = {E_{\mu \sim {p_\mu }(\mu )}}[\log (1 - D(G(\mu )))]

The min-max game has a global optimum for pg(I) = pdata(Id), if D and G have enough capacity. The optimal discriminator for the J(G)(G,D) can be characterized by: D*(Id)=Pdata(Id)Pdata(Id)+Pg(I) {D^*}({I_d}) = {{{P_{data}}({I_d})} \over {{P_{data}}({I_d}) + {P_g}(I)}}

The min-imax game in (2) can be rewritten as maxDJ(G)(G,D)=EIdpdata[logpdata(Id)pdata(Id+pg(I))]+EIPg[pg(I)pdata(Id)+pg(I)] \mathop {\max }\limits_D {J^{(G)}}(G,D) = {E_{{I_d} \sim {p_{data}}}}[\log {{{p_{data({I_d})}}} \over {{p_{data({I_d} + {p_{g(I)}})}}}}] + {E_{I \sim {P_g}}}[{{{p_g}(I)} \over {{p_{data}}({I_d}) + {p_g}(I)}}]

When the D*(Id)=12 {D^*}({I_d}) = {1 \over 2} , the generator can use any random dataset u to synthesis the generated outputs mural image I, and the discriminator loses the ability to distinguish between the real dataset Id and generated new mural image I.

In order to stabilise the training process, the three typical loss functions are defined. JD(In,Ip)=|logD(In)+log(1D(G(Ip)))| {J_D}\left( {{I_n},{I_p}} \right) = - \left| {\log D\left( {{I_n}} \right) + \log \left( {1 - D\left( {G\left( {{I_p}} \right)} \right)} \right)} \right| JG(In,Ip)=JMSE(In,Ip)αlogD(G(Ip)) {J_G}\left( {{I_n},{I_p}} \right) = {J_{MSE}}\left( {{I_n},{I_p}} \right) - \alpha \cdot \log D\left( {G\left( {{I_p}} \right)} \right) JMSE(In,Ip)=M(G(Ip)In)22 {J_{MSE}}\left( {{I_n},{I_p}} \right) = \left\| {M\left( {G\left( {{I_p}} \right) - {I_n}} \right)} \right\|_2^2

JMSE refers to the mean square error of the original image In and the restored image Ip, the smaller the better. JD is the loss function of the discriminator. It is hoped that the discriminator‘s ability to discriminate whether the image is generated by the generator or the real image, that is the probability of discrimination is infinitely close to 1/2. JG is the generator loss function, and α is an adjustable hyperparameter, which is used to weigh the error loss of the mean square and the generation of the standard generator against the network loss. Finally, the generator weight is updated by the training iteration JMSE to adjust the generator. The weight of the discriminator is updated according to the iterative LD to adjust the discriminator.

Set the training mural image database
Choose training images for pre-processing and post-processing

The tomb mural is huge for digital, so it is better to capture it by 114 sub-lens images which are divided into nine groups. Each mural image is divided into 4 × 416 non-overlapping blocks, and then 114 × 16 = 1824 blocks are compressed into 256 × 256 images for unified numbering. Here we choose the tomb mural data set with a total number of 1487 images were created. The normalized data set is shown in Figure 7 below.

Fig. 7

The training mural image database

There are totally more than 20 GB data for training in the five big blocks in figure 1 shown.

The training data in tomb mural blocks in Crown Prince Zhanghuai

Block 1 Block 2 Block 3 Block 4 Block 5
Pixel 2210*1800 2220*1850 2270*1290 2510*1830 2060*2120
Number of blocks 384 367 362 324 387
Data \ GB 4.44 5.01 2.07 4.44 5.00

In order to prepare for the image training, this paper uses the following pre-processing to give a training image Itr. Firstly the image is normalized to In ∈ [0,1]128×128×3, and then defined a mask M ∈ [0,1]128×128, and let Mij can cover the middle part of the image. Secondly, it should calculate the average pixel strength u in the non-mask part In ⊙ (1 − M). Thirdly it set as the external pixies value for each channel. To improve the quality of the final output effect, it should do little post-processing for the final Io. Using Io=255Io I_o^\prime = 255 \cdot {I_o} to reformat Io, and using seamless clone to fuse Itr and Io I_o^\prime , to output the final result Iop.

Define the Mask

After preparing the data set, normalize the data set to In ∈ [0,1]128×128×3, it should define a mask that uses a binary mask M (mask) with only two values of 0 and 1. The value of 1 means the effective information on the mural which needs to remain, and the value of 0 means the missing part which needs to be restored. Multiply the elements in In by the elements in M. The product of the multiplication of the elements at the corresponding positions of the two matrices is the Hadamard product. Define the Im as bellow: Im=μM+In(1M) {I_m} = \mu \cdot M + {I_n} \odot (1 - M)

Where the parameter μ is the average pixel value and M is the mask area. Area In ⊙ (1 − M) is an area without the mask. Add the two results to Im, and use Im and M to generate the Ip ∈ [0,1]128×128×4. The program outputs (In, Ip) is the image for generating a confrontation network training.

The mural generation results and analysis
The information generation for mural clocks

The mural Polo in Crown Prince Zhanghuai tomb has a total of 1487 pieces of 256 × 256 images, which requires more than 40 hours for training. The continuity of the generated information is better, and the color matches with the initial background are higher, and the pattern texture of the mural can be well restored. And, as the training time increases, the more iterations of the training cycle, the training effect will gradually improve, and the overall image integration will be better.

From figure 8 it shows that different training cycle can lead to different generation results. The mural is too huge to training for a long time. It depends on the computer‘s operation capability. This experiment is output by a computer that has Intel(R) Xeon(R) CPU E5-1650v3@ 3.50 GHz and NVIDIA GeForce RTX 2080 Ti. The training module is set to save the log file during the repair process. The stored data is mainly the loss data of the generator and the discriminator and the recognition loss. The data file is exported and visualized statistical analyzing. The changes in the parameters of the generator loss and the discriminator loss are shown in figure 9.

Fig. 8

The generations in different training cycles

Fig. 9

The Generator loss and Discriminator loss

The generator loss parameters around 1.03 and the recognition loss parameters fluctuated around 0.02. The execution time of each step pre-sets as two minutes. With the superposition of the number of iterations, the fluctuation of the final loss function will become smaller and smaller, and eventually tend to be a stable value. The quality of the generated information is getting better and better, and the detailed features are restored. The following figures are the mural information extension on this platform.

The analysis of generated information

The experiment used Keras and Tensorflow to do the tomb mural blocks image extension. Keras is an open source library written by Python. Its biggest advantage is able to use the function library to complete the design of deep learning model debugging and successfully applied to solve specific problems. Tensorflow is a very good numerical open source library. It can clearly point to deal with the process of generating the training process. There are two groups of experimental effects. One is generated by the essential outpainting algorithm with 8 convolutions and 1 de deconvolutions DCGAN in the article [17], the other is our existential algorithm with 13 convolutions and 2 de deconvolutions DCGAN.

The generating results in figure10(a1) ∼ (d1) are generated by article [17], and our DCGAN generating results are shown in figure10(a2) ∼ (d2). We choose structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) to evaluate the results.

Fig. 10

The comparisons of information extension results

SSIM is usually used to measure the structural similarity of the two images. The formula derivation mainly involves three measurement indicators: brightness, contrast, and overall image structure. Specifically, the average value can represent the brightness, the standard deviation represents the contrast, and the covariance represents the degree of structural similarity. It is shown in formula 11. SSIM(x,y)=(2μxμy+c1)(2σxy+c2)(μx2+μy2+c1)(σx2+σy2+c2) SSIM(x,y) = {{(2{\mu _x}{\mu _y} + {c_1})(2{\sigma _{xy}} + {c_2})} \over {({\mu _x}^2 + {\mu _y}^2 + {c_1})({\sigma _x}^2 + {\sigma _y}^2 + {c_2})}}

μx is the average value of existing mural information x, μy is the average value of extended mural information y, σx2 \sigma _x^2 is the variance of x, σy2 \sigma _y^2 is the variance of y, and σxy is the covariance of x and y. c1 and c2 are constants. The range of structural similarity is 0 to 1. When generating information and existing information is exactly the same, the value of SSIM is equal to 1.

PSNR means the peak signal that reaches the noise ratio. It is generally used for evaluating the total visual result. It can measure the generated information that has the same quality as the original mural image. The definition of PSNR is shown as formula 12. PSNR=10*log10(MAXI2MSE) PSNR = 10{\rm{*}}\mathop {\log }\nolimits_{10} ({{MA{X_I}^2} \over {MSE}})

MAXI2 MAX_I^2 is the maximum possible pixel value of the image. If each pixel is represented by an 8-bit binary, the value is 255. Different values of PSNR represent different levels of image quality.

It can be seen from Table 5 the different results in SSIM and PSNR.

The evaluation of the generation results in article [17] and ours

Evaluation figure a figure b figure c figure d
SSIM value[17] 0.842 0.756 0.597 0.874
SSIM value(ours) 0.937 0.842 0.731 0.943
PSNR value[17] 26.505 24.272 23.174 25.224
PSNR value (ours) 34.852 31.758 28.226 31.741

By reconstructing the DCGAN for extending the inner information to the mural block side and generating the missing information to get the total mural. Our algorithm is better not only in the visual effect but also in the quantify evaluation.

Conclusion

Traditional image inpainting is to fill the holes in the image, but our restoration technique can help to join the separated mural block by generating the missing information between the mural blocks. The information training based on the modified DCGAN model using the tomb mural residual image can be used to generate the extension part of the mural blocks. It adds CNN to the typical GAN to have a better nonlinear feature. For the generator G, we keep the original encoder-decoder structure, and exchange convolution 13 layers to 8 layers and exchange deconvolution 2 layers to 1 layer to increase the reception of neurons and improve the authenticity. The more nonlinear convolution layers make it get better results in completing the entire screen.

However, due to the scarcity of the mural residual information, there are not enough images fit for training. The final image restoration results are not perfect. The mural data set can be expanded later to optimize the restoration effect. This program only can process images with 256 × 256 pixels. It makes the pixels limit by the data size of the image and makes the generated information not be clear and blurry. So to increase the tomb mural training database is the next thing to do.

eISSN:
2444-8656
Idioma:
Inglés
Calendario de la edición:
Volume Open
Temas de la revista:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics