The tomb murals buried underground thousands of years reflect the living conditions, social customs, and artistic tastes of the ancient royal vivid. Because the tomb murals are different from these kinds of murals such as the cave murals and temple murals, the cave murals are mainly effected by windy, dusty, and ultraviolet light, and the temple murals are mainly effected by human-made damage. The tomb murals are completely enclosed spaces before being excavated, and the residual information is highly reliable. But they are too huge to excavate once for a whole mural. So, the archaeologists always make the whole mural into several blocks to move to the museum. It caused a lack of information between the mural blocks, which affected the coherence and integrity of the entire mural. Figure 1 is the blocks of playing polo mural from Crown Prince Zhanghuai.
This mural is 6.88 m long and 2.29 m high and dived into 5 blocks. There is some information missing between the neighbours. It is difficult to restore the mural directly because it may lead to stitching dislocation. The first thing should do is that using information extension get the missing parts to help reconstruct the whole mural. The current method of computer-aided ancient murals for digital information restoration is digital image inpainting. It uses the residual mural information to diffuse to the filling edges and fills the information to the missing parts like peeling onions, and finally completes the mural information reconstruction. This kind of technology is mainly completed from two technical directions. On the one hand, the inpainting algorithms based on partial differential equations (PDE) use different diffusion equations to spread the residual information to the missing parts. Such as J Shen and TF Chan [1] proposed a diffusion equation of Total Variation (TV) to solve the diffusion limited optimization problem. Chan et al. [2] added Euler-Lagrange equation to Total Variation to constrain the diffusion direction. Humphrey et al. [3] added auxiliary variables and replaced a function of one variable with a function of two variables to improve inpainting efficiency. Barbu T. et al. [4] proposed a novel TV model based on the second order PDE using the nonlinear second order the Euler-Lagrange diffusion model. They are all belonging to the PDE kind of algorithm. The biggest advantage of these kinds of algorithms is that it is convenient and quick to use, but the disadvantage is that it makes blurry when the missing part is large as the interspace between the mural blocks. On the other hand, it is a texture synthesis model based on exemplar filling. Such as Criminisi et al. [5] proposed an algorithm based on exemplar filling that compares the priority values of each exemplar. [6] changed the exemplars source from global to local using the content perception Markova model. Siadati S Z et al. [7] redefined the priority value of Criminisi's algorithm by adding the weight of structure tensor. They are all belonging to the exemplar-based algorithm. The disadvantage is that the filling of a large number of similar exemplars will produce a mosaic effect. Whether the inpainting algorithm based on PDE or exemplar filling does not help to rebuild the whole mural information because the limit residual information and the large missing parts between the blocks. It is a more challenging task than recovering the deleted part of the image in the past. It needs to deduce the texture information of the extension part based on the information inside, and the generated information needs to be as real as possible. There are some new techniques to help generate the information using deep learning, such as S. Iizuka et al. [8] set up a CGAN to generate the information by adding predefined conditions, and Sabini M et al. [9]set up a DCGAN to generate the information out of the image. That kind of information generation algorithm is good for getting the losing information between the blocks. The generative adversarial network is suitable as a deep learning technique compared with the inpainting methods, and it has an excellent data fitting effect, which is manifested in higher fitting efficiency and sharper effect of generated image samples. But the traditional GAN is difficult to train and unstable, so it needs to be modified to achieve better results for the huge tomb mural restoration. So in this paper, we set up a mural generating method based on DCGAN to restore the missing information, and change the construction of the layers in convolution and deconvution by optimizing the nonlinear activation function. After adjusting the pooling layer of the structure in GANs, the tomb mural blocks extend to the two sides of the mural and get the missing information.
The generative adversarial network belongs to the typical deep learning network [10]. The main parts are the generator module
Both the generator model and the discriminator model shown in Figure 2 can be constructed with differentiable functions. The sampling of the original image is used as the input data of the identification model, and the random variable is used as the input data of the generation model. The output is as close as possible to the sample
There are more and more extension models based on GANs, and different models have different improvements [12]. The comparison of the different typical GANs has different characteristics, and they are shown in table 1.
The comparison for different GANS
GANS type | Improvement points | Highlight improvement effect |
---|---|---|
f-GAN | Distance metric | The f-divergence metric was used for the discriminator model |
WGAN | Distance metric | Wasserstein distance measurement is adopted to improve the stability of network training and prevent network crash effectively |
WGAN-GP | Distance metric | Optimize the objective function of the identification model |
EB-GAN | Energy model | There are more options for network structure and loss functions |
PG-GAN | Add incentive convergence | The training efficiency is improved to obtain high-quality and diverse generated images |
LAP-GAN | Add CGAN layer to the Laplace pyramid | Increase the number of pixels in the resulting image |
CVAE-GAN | Fused VAE and GAN | Assign a specific label to the image |
SGAN | Learn both a single generation model and a single semi-supervised classification | The classification efficiency of experimental data set is improved and the training time scheduling is more flexible |
CGAN | Add additional information to the target function in the network | Customize the type of image generation |
DCGAN | Add deep convolution layer and deconvolution layer | Recognize more advanced image features to improve network stability performance |
The proposal of deep convolution generating confrontation network (DCGAN) largely overcomes the technical difficulties of unstable training of original generative adversarial network and prone model collapse. Therefore, this paper adopts this network model to carry out mural information restoration. The whole structure of the generator is a convolutional neural network structure [13]. Its input is different from that of an ordinary convolutional neural network, which uses images to replace random noise. The convolutional layer mainly uses deconvolution to replace convolution operation. The activation function for the output layer is set to Tanh and the activation functions for the other layers are set to nonlinear ReLU. The discriminative model eliminates the full connection layer and uses another functional layer to replace the full connection layer. The convolutional layer set up in this way can also achieve a good sampling effect. Similar to the generative model, a batch normalization operation is added at all the layers except for the first input, setting the generative function for each layer to nonlinear LeakyReLU. Figure 3 is the network structure of DCGAN.
The proposal of this network has a great promotion effect on the development of GAN, and the organic combination of convolutional neural network (CNN) and generative adversarial network (GAN) ensures the quality and diversity of images generated by this kind of structure. In the training process of DCGAN, the special technique of batch normalization (BN) is adopted to make the training process more stable and reliable. After each convolutional layer or deconvolution layer, the corresponding activation function is adopted to effectively solve the problem of gradient disappearance. But the (BN) may generate bad artifact, so it is needed to and more nonlinear layers to make the network more complex. So it is needed to normalize the image batches to a typical Gaussian noise which values from zero to one. By removing the pooling layer, the convolutional neural network can retain the advanced feature information of the image more effectively. The most important thing is that it should take balance to the generator and discriminator training throughout the training process.
This paper chooses DCGAN as the essential (
The following figure 4 is the flow chart for mural information generation based on DCGAN.
For each training image data
Since the aim is to predict the original size of the tomb mural, and the convolution often reduces the size of the mural images, a 2-layer transposed convolution is used to restore the original size, and the transposed convolution can generate images from specific input data.
Conv is to calculate the sum of N-dimensional convolution: The Layer-input is the input of each layer; Filtersize is the size of the filter; and Kernel -size choose 4; strides choose 2; activation function is LeakyReLU; the dilation of the convolution kernel expands is 1.
AtrousConvolution2-D is to calculate four-dimensional input and the 4-dimensional convolution kernel operations, and do a 2-D convolution for the input data. All the parameters are the same as Conv. The definition of the convolution function for the generator
Deconv is to calculate the sum of N-dimensional deconvolution: The Layer-input is the input of each layer; Filter-size is the size of the filter; and Kernel-size choose 3; strides choose 2; activation function is ReLU. The define of the deconvolution function for the generator
For the generator
The detailed parameters of the generator are shown in Table 2 below.
Construction parameter list of generator
input | filter | kemel | strides | dilation | output | |
---|---|---|---|---|---|---|
Conv1 | g_shape | 64 | 4 | 1 | 1 | g1 |
Conv2 | g1 | 128 | 4 | 2 | 1 | g2 |
Conv3 | g2 | 256 | 4 | 2 | 1 | g3 |
Conv4 | g3 | 512 | 4 | 1 | 1 | g4 |
Conv5 | g4 | 512 | 4 | 1 | 1 | g5 |
Conv6 | g5 | 512 | 4 | 1 | 2 | g6 |
Conv7 | g6 | 512 | 4 | 1 | 4 | g7 |
Conv8 | g7 | 512 | 4 | 1 | 8 | g8 |
Conv9 | g8 | 512 | 4 | 1 | 16 | g9 |
Conv10 | g9 | 512 | 4 | 1 | 1 | g10 |
Conv11 | g10 | 512 | 4 | 1 | 1 | g11 |
Deconv12 | g11 | 256 | 4 | 2 | 1 | g12 |
Deconv13 | g12 | 128 | 4 | 2 | 1 | g13 |
Conv14 | g13 | 128 | 4 | 1 | 1 | g14 |
Conv15 | g14 | 64 | 4 | 1 | 1 | g15 |
The convolution function for discriminator
Dcrm-loss is the loss function for discriminator, set two tensors y-true and y-pred. Set
The discriminator uses 5 layers of 5 × 5 convolutional layers to repeatedly down-sample the mural images and performs dichotomy. The final parameters of the discriminator are shown in table 3 below.
Construction parameter list of discriminator
input | filter | kemel | strides | output | |
---|---|---|---|---|---|
Conv1 | d_input | 32 | 5 | 2 | d1 |
Conv2 | d1 | 64 | 5 | 2 | d2 |
Conv3 | d2 | 64 | 5 | 2 | d3 |
Conv4 | d3 | 128 | 5 | 2 | d4 |
Conv5 | d4 | 128 | 5 | 2 | d5 |
The discriminator is used for distinguishing the real mural images
In general, the training process in GAN can be considered as a minimization-maximization problem based on across entropy loss function
Where
Stept 2: Updating the generator by minimizing the generator function min
The min-max game has a global optimum for
The min-imax game in (2) can be rewritten as
When the
In order to stabilise the training process, the three typical loss functions are defined.
The tomb mural is huge for digital, so it is better to capture it by 114 sub-lens images which are divided into nine groups. Each mural image is divided into 4 × 416 non-overlapping blocks, and then 114 × 16 = 1824 blocks are compressed into 256 × 256 images for unified numbering. Here we choose the tomb mural data set with a total number of 1487 images were created. The normalized data set is shown in Figure 7 below.
There are totally more than 20 GB data for training in the five big blocks in figure 1 shown.
The training data in tomb mural blocks in Crown Prince Zhanghuai
Block 1 | Block 2 | Block 3 | Block 4 | Block 5 | |
---|---|---|---|---|---|
Pixel | 2210*1800 | 2220*1850 | 2270*1290 | 2510*1830 | 2060*2120 |
Number of blocks | 384 | 367 | 362 | 324 | 387 |
4.44 | 5.01 | 2.07 | 4.44 | 5.00 |
In order to prepare for the image training, this paper uses the following pre-processing to give a training image
After preparing the data set, normalize the data set to
Where the parameter
The mural Polo in Crown Prince Zhanghuai tomb has a total of 1487 pieces of 256 × 256 images, which requires more than 40 hours for training. The continuity of the generated information is better, and the color matches with the initial background are higher, and the pattern texture of the mural can be well restored. And, as the training time increases, the more iterations of the training cycle, the training effect will gradually improve, and the overall image integration will be better.
From figure 8 it shows that different training cycle can lead to different generation results. The mural is too huge to training for a long time. It depends on the computer‘s operation capability. This experiment is output by a computer that has Intel(R) Xeon(R) CPU E5-1650v3@ 3.50 GHz and NVIDIA GeForce RTX 2080 Ti. The training module is set to save the log file during the repair process. The stored data is mainly the loss data of the generator and the discriminator and the recognition loss. The data file is exported and visualized statistical analyzing. The changes in the parameters of the generator loss and the discriminator loss are shown in figure 9.
The generator loss parameters around 1.03 and the recognition loss parameters fluctuated around 0.02. The execution time of each step pre-sets as two minutes. With the superposition of the number of iterations, the fluctuation of the final loss function will become smaller and smaller, and eventually tend to be a stable value. The quality of the generated information is getting better and better, and the detailed features are restored. The following figures are the mural information extension on this platform.
The experiment used Keras and Tensorflow to do the tomb mural blocks image extension. Keras is an open source library written by Python. Its biggest advantage is able to use the function library to complete the design of deep learning model debugging and successfully applied to solve specific problems. Tensorflow is a very good numerical open source library. It can clearly point to deal with the process of generating the training process. There are two groups of experimental effects. One is generated by the essential outpainting algorithm with 8 convolutions and 1 de deconvolutions DCGAN in the article [17], the other is our existential algorithm with 13 convolutions and 2 de deconvolutions DCGAN.
The generating results in figure10(
SSIM is usually used to measure the structural similarity of the two images. The formula derivation mainly involves three measurement indicators: brightness, contrast, and overall image structure. Specifically, the average value can represent the brightness, the standard deviation represents the contrast, and the covariance represents the degree of structural similarity. It is shown in formula 11.
PSNR means the peak signal that reaches the noise ratio. It is generally used for evaluating the total visual result. It can measure the generated information that has the same quality as the original mural image. The definition of PSNR is shown as formula 12.
It can be seen from Table 5 the different results in SSIM and PSNR.
The evaluation of the generation results in article [17] and ours
Evaluation | figure a | figure b | figure c | figure d |
---|---|---|---|---|
SSIM value[17] | 0.842 | 0.756 | 0.597 | 0.874 |
SSIM value(ours) | 0.937 | 0.842 | 0.731 | 0.943 |
PSNR value[17] | 26.505 | 24.272 | 23.174 | 25.224 |
PSNR value (ours) | 34.852 | 31.758 | 28.226 | 31.741 |
By reconstructing the DCGAN for extending the inner information to the mural block side and generating the missing information to get the total mural. Our algorithm is better not only in the visual effect but also in the quantify evaluation.
Traditional image inpainting is to fill the holes in the image, but our restoration technique can help to join the separated mural block by generating the missing information between the mural blocks. The information training based on the modified DCGAN model using the tomb mural residual image can be used to generate the extension part of the mural blocks. It adds CNN to the typical GAN to have a better nonlinear feature. For the generator
However, due to the scarcity of the mural residual information, there are not enough images fit for training. The final image restoration results are not perfect. The mural data set can be expanded later to optimize the restoration effect. This program only can process images with 256 × 256 pixels. It makes the pixels limit by the data size of the image and makes the generated information not be clear and blurry. So to increase the tomb mural training database is the next thing to do.