Development of Blind Deblurring Based on Deep Learning

With the advent of the intelligent era, the ways to obtain images are more extensive and convenient. Images have also become an important means for people to transmit information every day. Image restoration is the task of restoring clean images from degraded versions. Typical examples of degradation include noise, blur, rain, fog, etc. This is a highly ill posed problem because there are infinite feasible solutions. However, with the wide range of image acquisition methods, the image quality also decreases. There are many reasons for image blur. According to the formation conditions, the image blur caused by photography can be divided into motion blur, defocus blur and Gaussian blur, as shown in the figure. Among them, motion blur is the main reason for image degradation, Motion blur is also the most common kind of blur in obtaining pictures in life and one of the research hotspots. At the moment of obtaining images, the image quality degradation caused by the relative movement between the camera and the target object is called motion blur. In addition to motion blur, there are Gaussian blur and defocus blur, as shown in Figure 1. With the development of society, the rapid growth of consumer digital photography makes the camera jitter in motion blur extremely prominent. Especially with the popularity of small high-resolution cameras, these cameras are light and difficult to maintain sufficient stability. If the camera jitter occurs in the image for any reason, this moment will be “lost”. This is also an important problem that has plagued photography lovers for a long time. How to avoid image degradation and improve image quality has always been an urgent problem in the field of image processing.

Motion blurring of image plays an important role in the field of daily public safety. It is used in the field of highway safety for capturing the electronic eyes of illegal vehicles and monitoring the suspect. From a mathematical point of view, motion blur can be regarded as the result of convolution between clear image and fuzzy kernel. In actual scenes, there is usually random noise, and its mathematical model can be expressed as: (1) $B = I \otimes K + N$ {\rm{B}} = {\rm{I}} \otimes {\rm{K}} + {\rm{N}}

Where B represents blurred image, I represents clear image, K represents point spread function, represents convolution, and N represents additive noise.

Fuzzy Reduction Method

The restoration of blurred image belongs to image restoration. The traditional deblurring methods are divided into blind deblurring and non blind deblurring according to whether the fuzzy kernel is known. For different fuzzy types, the forms of fuzzy kernel are also different. If the fuzzy kernel is known, the process of restoring a clear image is called non blind deblurring. Non blind deblurring is directly calculated from the known fuzzy kernel and fuzzy image. If the fuzzy kernel is unknown, it is called blind deblurring. Blind deblurring needs to estimate the fuzzy kernel and clear image at the same time. Only after the accurate fuzzy kernel is estimated can the clear image be restored. If the fuzzy kernel is not estimated accurately, it will directly affect the quality of the restored image. According to the prior knowledge of the blurred image, a clear image close to the real image is reconstructed from one or more blurred images. Literature [1, 2] proposed a method based on probability statistics to estimate the fuzzy kernel. The fuzzy kernel is usually inconsistent. Different pixels in a frame usually correspond to different fuzzy kernels, so it is a serious ill conditioned problem to find the fuzzy kernel corresponding to each pixel. This kind of method only has certain deblurring effect on specific images. The mathematical model is complex, the calculation efficiency is low, it is greatly affected by noise and has high requirements for fuzzy kernel estimation; Or the robustness of the algorithm is not very strong, so it can not adapt to some different data sets, and it is difficult to adapt to the fuzziness caused by some different factors. Simplified assumptions on fuzzy models usually hinder their performance in real word examples. In real word examples, fuzzy is much more complex than modeling and entangled with the image processing pipeline in the camera.

In addition, recent machine learning based methods also rely on synthetic fuzzy data sets generated under these assumptions. This makes the traditional deblurring method unable to remove the fuzzy kernel, which is difficult to approximate or parameterize (such as object motion boundary). Some learning based methods are also proposed for deblurring blurred images. Recent work has begun to use end-to-end trainable networks for image [3] and video [4, 5] deblurring. Non uniform blind deblurring of general dynamic scenes is a challenging computer vision problem, because blur comes not only from the motion of multiple objects, but also from camera jitter and scene depth change.

Method of Convolutional Neural Network Based on Deep Learning

In order to limit the solution space to effective natural images, the existing restoration technologies [6, 7, 8] explicitly use the image priors made by hand through empirical observation. However, designing such a priori knowledge is a challenging task and often can not be popularized. In order to improve this problem, deep learning methods begin to be more applied to image processing. Recently, the most advanced method [17, 44,] adopts convolution neural network (CNN). After 2016, the early CNN based image deblurring method usually uses CNN as a fuzzy kernel estimator to construct a two-stage image deblurring framework, such as CNN based fuzzy kernel estimation stage and kernel based deconvolution stage. Chakrabarti uses CNN to estimate the fuzzy kernel [12], obtains the fuzzy kernel, and then uses the non blind deblurring algorithm to deblurring. Schuler et al. [13] trained a depth network to estimate the fuzzy kernel, and then used the traditional non blind deconvolution method to restore the potentially clear image. Sun et al. [14] used convolutional neural network to estimate the image fuzzy kernel, and then used the estimated fuzzy check image for deblurring restoration. Although this method uses the blind deblurring algorithm to restore the image, it still uses the idea of non blind deblurring after estimating the fuzzy kernel to deconvolute the image.

This leads to the slow operation of the algorithm and the restoration result depends on the fuzzy kernel estimation. However, this method applies the convolution neural network to image deblurring, which lays the foundation for subsequent methods based on this. Li et al. [15] proposed a new convolution structure called “hole convolution”, and its kernel is calculated by a rectangular rectangular ring, The experimental results show that this method can effectively restore the image; Liu et al. [16] proposed a two-stage deblurring module to restore the blurred image of the dynamic scene based on the high-frequency image. Firstly, the residual image is thinned by the coding network, and then the thinned residual image is combined with the input blurred image to obtain the latent image, and further proposed a coarse to fine framework based on the fuzzy processing module. Noroozi uses multi-scale CNN for end-to-end training [17], which does not need to estimate the fuzzy kernel and belongs to blind fuzzy. The basic idea of the blind deblurring method based on CNN is to take the blurred image and the corresponding clear image as training samples and input them into the convolution neural network for training. After the training, the optimized network model is obtained. When in use, the blurred image is taken as the network input, and the network output is the deblurring image. Different image processing is based on different convolution networks, as shown in Fig. 2.

Different CNN for image processing. (a) U-net or Codec network. (b) Multiscale or cascade refinement network. (c) Extended convolution network. (d) Scale recursive network (SRN).

On the other hand, the recent image deblurring method based on CNN aims to directly understand the complex relationship between blurred and clear image pairs in an end-to-end way. NAH et al. [18] proposed using multi-scale convolutional neural network to defuzzify and created the most widely used GoPro data set at present. The “end-to-end” training method is adopted, which has good model effect and operation speed. It can directly recover the latent image without assuming any limited fuzzy kernel model. In particular, the multi-scale structure is designed to imitate the traditional coarse to fine optimization method. Unlike other methods, this method does not estimate explicit potential errors. Therefore, artifacts due to kernel estimation errors will not be generated. Secondly, the proposed model is trained by multi-scale loss. As shown in the figure, the model is suitable for the structure from coarse to fine, which greatly enhances the convergence. The multi-scale system uses the improved residual network structure, as shown in Figure 3, to achieve a deeper architecture. In addition, the results are further improved by using counter loss [19]. Because the loss term optimizes the result and makes it similar to the ground truth, it even restores the extremely complex occlusion area of the fuzzy kernel, and has made significant improvements in the deblurring of dynamic scenes. A large number of experimental results show that the performance of this method in qualitative and quantitative evaluation is much better than the latest dynamic scene deblurring method.

(a) The original remaining network building blocks. (b) Building blocks of the modified network by NAH et al.

Method Based on Cyclic Neural Network

The cyclic neural network is used in the image motion fuzzy restoration method. This method innovatively uses the cyclic neural network processing data with sequence and time dependence for image processing.

Tao et al. [20] proposed a scale cyclic neural network (SRN deblurnet) according to the strategy of gradually recovering clear images with different resolutions in the pyramid, Compared with other methods, SRN has simpler structure, fewer parameters and easier training than other networks. Resblock network is also inspired by the recent success of encoder decoder structure used in various computer vision tasks, and explores an effective method to adapt it to image deblurring tasks. In SRN network, the direct application of the existing encoder decoder structure can not produce the best results. The encoder decoder resblock network of Tao et al. Amplifies the advantages of various CNN structures, produces the feasibility of training and produces a very large receptive field, which is very important for large motion deblurring. Experiments show that the end-to-end depth image deblurring framework can greatly improve the training efficiency by using the cyclic structure and combining the above advantages.

Zhang et al. [21] proposed a depth hierarchical multi patch network based on spatial pyramid matching, which processes fuzzy images through a fine to rough hierarchical representation, It runs 40 times faster than the previous multi-scale method. Zhang et al. [22] proposed a spatial variation neural network composed of three deep convolution neural networks (CNNs) and a cyclic neural network (RNN). RNN is used as a deconvolution operator to deconvolute the feature map extracted from the input image by a neural network. This method has good performance, speed and model size.

Defuzzification Method Based On Generation CounterMeasure Network

Most of the traditional deblurring methods based on convolutional neural network have a series of problems, such as the color of the output image is unnatural, the texture features are not rich enough, the image is too smooth and so on. Confrontation network (GAN) is also gradually applied to the field of image deblurring because it can retain texture details and generate realistic images. In 2014, goodflow et al. Proposed a groundbreaking generation confrontation network (GAN) that can show strong ability in computer vision tasks [23].

The images processed by the generation confrontation network are very close to clear images, It can not even be distinguished with the naked eye [24–25]. However, the generator should be constrained when using the generated countermeasure network, because once the network is too free, it will lead to instability, and it is difficult to learn the mapping relationship between the input image and the target image. If L1 or L2 constraints are directly established between the output of the generator and the corresponding target image at the pixel level, the generated image will become too smooth and vulnerable to image noise. With the increasingly prominent application of generative countermeasure network in the image field, ledig et al. [26] proposed a generative countermeasure network (GAN) for image super-resolution (SR). Its deep residual network can restore realistic texture, and the results are obtained through the mean score (MOS) test. Using srgan can improve the quality of perception, this method can be well applied to the field of image deblurring.

In 2018, kupyn et al. [27] first removed the blur of camera jitter according to the conditional counter-measure network, and then proposed a kernel free blind motion defuzzification learning method to make up for the previous shortcomings. The conditional countermeasure network deblurgan optimized by using multi-component loss function, but this method did not consider the impact of different feature layers in the perception network on the perception loss, The detail of the restored image is still smooth. Meanwhile, in 2019, kupyn et al. [28] introduced the feature pyramid network into deblurgan for the first time and proposed a new end-to-end generation countermeasure network deblurgan-v2 for single image motion deblurgan, which greatly improved the efficiency, quality and flexibility of deblurgan. Table 2 shows more depth learning methods for blind image deblurring. Then the network is reproduced, and the deblurring image is shown in Figure 4.

Table I

Comparison of characteristics of mainstream data sets

Data Set	Construction Method	Advantages and Disadvantages
Levin etc.	Algorithm simulation fuzzy kernel	Easy to obtain; It is easy to obtain without considering local fuzziness;
Kupy etc.	Simulated trajectory	Easy to obtain; Only the motion in two-dimensional space is simulated, and the real three-dimensional space is not considered
Kohler etc.	The motion track is captured by 6D camera	The motion trajectories in three-dimensional space are collected; Lens distortion, depth of field variation, etc. are not considered
GOPRO etc.	Take the average value for continuous shooting by high-speed camera	Closer to the real fuzzy situation; The acquisition process is troublesome and the data scene is single
Lai etc.	Real acquisition	Completely real fuzzy pictures; There is no corresponding clear image, which is often used as a test set

Table II

Blind deblurring algorithm based on deep learning

Method	Applicable Scenario	Mechanism	Advantage	Limitations
Spatial variation RNN[22]	Motion blur, dynamic scene blur	The deblurring process is formulated through the wireless impulse response model	Weights can be learned from another network and different weights can be learned for different fuzzy systems	Large regional and spatial change structures need to be involved at the same time’
SRN[20]	Motion blur	New multiscale cyclic network structure	The number of trainable parameters is reduced and the training efficiency is improved	Limited to fixed data sets and training periods
DMPHN[21]	Motion blur	End to end CNN hierarchical model similar to spatial pyramid matching	The required filter is small and can be inferred quickly	Requires large GPU memory
DPSR[32]	LR blurred image	A new SISR degradation model is designed	The deep plug and play framework can deal with any fuzzy kernel	For most real images, it does not match the degradation model
BIE-RVD[33]	Motion blur	Automatic coding structure of spatiotemporal video screen based on end-to-end differentiable structure	High accuracy and fast network running speed	The task of training is complex and difficult
DDMS[34]	Motion blur	A full convolution structure with filtering transformation and characteristic modulation is constructed	Real time filtering completely eliminates multi-scale processing and large filters	Real time filtering completely eliminates multi-scale processing and large filters
deblurGAN[27]DeblurGANV2[28]	Motion blurMotion blur	The generated countermeasure network based on perceptual loss [9] (perceptual loss) constraint is used for deblurring	The restored image is more similar to the target image in semantics and closer to people's subjective evaluation of image quality	The influence of different feature layers in the perceptual network on the perceptual loss is not considered, so that the restored image details are still smooth.
Deepdeblur[18]	dynamic scene blur	End to end multiscale convolution network	Without estimating the fuzzy kernel, multi-scale CNN can restore clear images directly and flexibly	The multi-scale stacked sub network results in large amount of parameters, large consumption of video memory and great difficulty in training
SRN-deblur[20]	Blur of dynamic scene	End to end multiscale cyclic network	Multi-scale structure and parameter sharing alleviate the problem of large amount of parameters, and the learning ability is more stable	The edge is too smooth and there are artifacts
DMPHN[21]	Motion blur	The deep-seated multi-facet network based on spatial pyramid matching processes fuzzy images through fine hierarchical representation.	It can solve the problem of performance saturation and run faster than multi-scale method	It can solve the problem of performance saturation and run faster than multi-scale method
MPRnet[35]	Deblurring, rain removing and noise removing	A multi-stage progressive image restoration	It can output accurate spatial details and context information. The network structure is simple and the effect is good	The deblurring effect under the dark light line is not good
MIMO-Net[29]	Motion blur	Single encoder multiple input single decoder multiple output	Increase the network feeling field and make the training less difficult	The spatial details are lost and the texture is not clear enough

Visual comparison of image deblurring results of GoPro test set [13]. Patches blurred by key points are displayed in (b), while patches magnified from deblurring results are displayed in (c) – (h).

III

Key Problems of Image Motion Blur Restoration

Feature Extraction

Image recognition is actually a classification process. In order to identify the category of an image, we need to distinguish it from other different categories of images. This requires that the selected features not only can well describe the image, but also can well distinguish different types of images. We want to select the image features with small difference between similar images (small intra class spacing) and large difference between images of different categories (large class spacing), which we call the most discriminative feature. In addition, prior knowledge plays an important role in feature extraction. How to rely on prior knowledge to help us select features is also a problem that will continue to be concerned later. The research process and ideas of traditional feature extraction methods are very useful, Because these methods have strong interpretability, they provide inspiration and analogy for designing machine learning methods to solve such problems. The existing convolutional neural network is similar to these feature extraction methods, because each filter weight is actually a linear recognition pattern, which is similar to the boundary and gradient detection of these feature extraction processes. At the same time, the role of pooling is to coordinate the information of a region, which is similar to the feature integration (such as histogram) after these features are extracted. Through experiments, it is found that the first few layers of convolution network are actually doing edge and gradient detection. However, in fact, when the convolution network was invented, there were no such feature extraction methods.

Evaluation Method

The evaluation of image quality can be divided into subjective evaluation and objective evaluation. Subjective evaluation mainly evaluates human visual senses, while objective evaluation uses an evaluation standard to compare image quality. The commonly used objective evaluation methods are peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [29]. The PSNR reflects the distortion degree of the estimated image and the original clear image. Generally speaking, the larger the peak signal-to-noise ratio, the better the image restoration effect. Its expression is: (2) $PSNR = 10 log 10 [\frac{M \times N \times 255^{2}}{{‖ X - \hat{X} ‖}_{2}^{2}}]$ {\rm{PSNR}} = 10\log 10\left[ {{{{\rm{M}} \times {\rm{N}} \times {{255}^2}} \over {\left\| {{\rm{X}} - {\rm{\hat X}}} \right\|_2^2}}} \right]

M and N represent the size of the row and column of the image respectively, and X represents the original clear image, ◯ represents the estimated image. Due to research findings, PSNR is sometimes inconsistent with human visual evaluation. Therefore, SSIM is adopted to further improve the evaluation criteria. Its expression is: (3) $SSIM = {[l (x, y)]}^{α} {[c (x, y)]}^{β} {[s (x, y)]}^{γ}$ {\rm{SSIM}} = {\left[ {{\rm{l}}\left( {{\rm{x}},{\rm{y}}} \right)} \right]^\alpha }{\left[ {{\rm{c}}\left( {{\rm{x}},{\rm{y}}} \right)} \right]^\beta }{\left[ {{\rm{s}}\left( {{\rm{x}},{\rm{y}}} \right)} \right]^\gamma } (4) $l (x, y) = \frac{2 u_{x} u_{y} + c_{1}}{u_{x}^{2} + u_{y}^{2} + c_{1}}$ {\rm{l}}\left( {{\rm{x}},{\rm{y}}} \right) = {{2{{\rm{u}}_{\rm{x}}}{{\rm{u}}_{\rm{y}}} + {{\rm{c}}_1}} \over {{\rm{u}}_{\rm{x}}^2 + {\rm{u}}_{\rm{y}}^2 + {{\rm{c}}_1}}} (5) $c (x, y) = \frac{2 σ_{x} σ_{y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}}$ {\rm{c}}\left( {{\rm{x}},{\rm{y}}} \right) = {{2{\sigma _{\rm{x}}}{\sigma _{\rm{y}}} + {{\rm{c}}_2}} \over {\sigma _{\rm{x}}^2 + \sigma _{\rm{y}}^2 + {{\rm{c}}_2}}} (6) $s (x, y) = \frac{σ_{xy} + c_{3}}{σ_{x} σ_{y} + c_{3}}$ {\rm{s}}\left( {{\rm{x}},{\rm{y}}} \right) = {{{\sigma _{{\rm{xy}}}} + {{\rm{c}}_3}} \over {{\sigma _{\rm{x}}}{\sigma _{\rm{y}}} + {{\rm{c}}_3}}}

SSIM is a comprehensive map image evaluation index, which evaluates the image from brightness, contrast and structural similarity respectively, where u_x and u_y represent the mean value of image X and Y respectively, X and Y represent the difference between image X and Y respectively, and XY represents the co difference between image x and y. SSIM measures the similarity of two images, and its value is between 0 and 1. The closer it is to 1, the higher the similarity is the better the restoration result is.

Data Set Construction

Most of the blurred images in the traditional data set are blurred by some fixed cores, which is difficult to imitate the natural blurred images. When we use the algorithm in machine learning to deal with some problems, the quality of the data set will directly affect the results of our algorithm. Therefore, high-quality data sets play an important role in our follow-up research. The most direct way to obtain the data set is to directly capture the image in the real scene, the data set of Lai et al. [30]. The image obtained in this way only has fuzzy image and no corresponding clear image, so it can only be used for testing and can not be used for the data set of network training. The restoration of fuzzy images through neural networks requires pairs of fuzzy clear image pairs for network training, but such data sets are difficult to obtain in the real world. The manual use of paired data sets can not ensure the consistent content of clear images and fuzzy images, and the fuzzy images synthesized by the algorithm can not contain various complex factors in the real environment. It may perform well when used for network training, but the effect is really unsatisfactory when using real fuzzy images.

Trends and Prospects

Image deblurring has attracted more and more attention in the field of image processing. It not only has important theoretical significance, but also has urgent needs in practical application. Both theoretical research and practical application have made more achievements and progress, but there are still some aspects to be improved in the future waiting for us to improve and solve.

Update of Data Set

In deep learning, the quality of data sets directly affects the subsequent experimental results. The quality and updating of data sets are of great significance to image deblurring. Among them, only 2103 pairs of training pictures and 1111 pairs of test pictures are used in GoPro dataset, which is the most widely used and the largest.

Compared with datasets in other fields of computer vision, especially the Imagenet dataset contains 14197122 pictures, it is very different. GoPro data sets only expand the number of data sets, the diversity of data sets obtained is not enough, and the scene is too single, and some even show that motion blur is not particularly obvious.

Similarly, some other data sets synthesized by algorithms show less data and are not sufficient in demonstration. Therefore, in the current situation, we need to enrich and update the data, not only to ensure that the amount of data is sufficient, but also to fully meet the requirements of the experiment. Different from image recognition or image segmentation, it is difficult to obtain fuzzy image data set. However, for any field, data sets are the basis of researchers' development. The lack of data sets directly affects the research progress in this field. Therefore, it is urgent to propose a large-scale and new data set.

Algorithm Efficiency

The important factor affecting the application of the algorithm is the running speed of the deblurring algorithm. Some scenes require high real-time performance, and the efficiency of the algorithm is the first choice. To improve the timeliness of the image motion blur restoration algorithm, this algorithm can be applied to many scenes to improve the solution based on computer vision. For example, in factory production monitoring, more and more attention is paid to the use of image processing technology. In the traditional method, the articles need to stop at the monitoring point to collect images in the production process. Using the image motion deblurring algorithm with high real-time performance can collect pictures when the articles are moving, save the steps of stopping the articles, and greatly improve the production efficiency of the article production line. Therefore, improving the efficiency of the algorithm plays an important role in daily life.

More Objective Evaluation of Indicators

Nowadays, people's subjective feeling is very close to the widely used structural similarity index and peak signal-to-noise ratio, especially when there is uneven fuzzy motion blur in the training image, so a reference evaluation index is good enough to evaluate the processed deblurring image and the rationality of the algorithm when processing the experimental results. With the continuous development of deblurring technology, in order to achieve a fair and fair evaluation of the deblurring image, we not only attach people's perception test effect on the image at the back of the paper, but also need to get a more recognized evaluation standard in the field of deblurring.

Summary

This paper systematically summarizes the current research status of image motion blur restoration technology, points out the key problems of the existing research, and looks forward to the future development trend and application prospect, which lays a foundation for further research.

eISSN:: 2470-8038
Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Computer Sciences, other

RSS Feed de la revue

Development of Blind Deblurring Based on Deep Learning

Publié en ligne: 21 mai 2023

Pages: 106 - 114

DOI: https://doi.org/10.21307/ijanmc-2021-040

Mots clésDeep Learning, Neural Networks, Computer Vision, Deep Learning, Blind Deblurring Neural Network

© 2021 Shi Kecun et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1

Figure 2

Figure 3

Figure 4

Mots clés
Deep Learning, Neural Networks, Computer Vision, Deep Learning, Blind Deblurring Neural Network