Image super-resolution reconstruction is to reconstruct high resolution image from low resolution image. This typical problem has been widely used in astronomy, physics, medicine and other fields. In the initial super-resolution reconstruction, interpolation methods are used, including nearest neighbor interpolation, bilinear interpolation, bicubic interpolation and so on. The interpolation method can effectively enhance the image resolution with a small amount of computation, but simply increasing the number of pixels will lead to blurred edges of the image. In addition, methods such as image super-resolution, local linear regression, dictionary learning and random forest based on sparse coding have also been widely used in many fields.
Single image super resolution is an ill-conditioned inverse problem: one LR image can correspond to multiple HR images, and the reconstructed HR images often have defects such as detail loss, edge aliasing and blurring. Deep learning technology has greatly promoted the rapid development of the field of computer vision. The vast majority of current SISR algorithms are based on end-to-end deep learning technology, that is, they directly learn the mapping between LR and HR. Although deep learning-based SISR method has made great progress, it still has problems in practical application: better results tend to rely on deeper networks, and more parameters also require more training data. This requires longer training and reasoning times, as well as greater computing power and memory. As a result, its usefulness is greatly limited, especially in resource-constrained mobile devices.
In this paper, a new dual regression scheme is proposed to form a closed loop to improve SR performance. We introduce an additional constraint to reduce the possible space so that the super-resolved image can reconstruct the input LR image. Ideally, if the mapping from LR→HR is optimal, the super resolution image can be d down-sampled to obtain the same input LR image. With such constraints, we can estimate the underlying down-sampling kernel, thereby reducing the space of possible functions and finding a good mapping from LR to HR. The specific scheme is shown in Figure 1
Composition of double regression model
Attention mechanisms stem from studies of the human visual system. Human vision tends to focus on salient areas and ignore useless information, which increases the efficiency with which the brain processes information. The compression-expansion module proposed by Hu et al. using the attention mechanism can explicitly model the interdependence between feature channels, automatically obtain the importance of each feature channel through learning, and then suppress the less useful features according to the importance, so as to improve the performance of image classification network. The RCAN method proposed by Zhang et al. introduces the attention model of Hu et al. The channel attention module can adaptively select feature channels with richer information, which improves the performance of SISR. However, the feature channels of this attention mechanism are independent of each other, which limits the flow of feature information between channels.
Increasing the network depth can generally improve the performance of SISR. However, deeper networks, which generally have more parameters, require more training data. In practice, the acquisition of training data is often limited, and the risk of overfitting of network training is also increasing. One advantage of a recursive network is that it can increase the depth of the network without increasing the number of parameters. DRCN introduces recursion techniques into SISR methods for the first time: a convolutional layer is used as a recursion unit, and the weights are shared among the recursion units. DRRN improves the use of recursive techniques: residual blocks are used as recursive units, and parameters are shared among residual blocks, which also improves the performance of DRCN. Similarly, MemNet, proposed by Tai et al., also uses residual blocks as recursive units to build the network.
The image super-resolution under the dual regression model focuses on the input LR image to achieve the purpose of restoring HR image. In principle, LR image
Where d is the SR degradation function responsible for converting HR image into LR image,
Where g is the SR function and describes the input parameters of the function g,
Thereinto, →
Where
Where
Super resolution reconstruction process
One of the learning strategies based on the basic network is to recursively learn high-level features using the same module. This approach also minimizes model parameters, since the strategy requires only one module to be recursively updated, as shown in Figure 3.
Recursive learning
One of the most commonly used recursive networks is the deep recursive convolutional network. With one convolutional layer, the DRCN can reach a receptive field of 41×41 without additional parameters. The deep recursive residual network uses the residual module Res Block as a part of the recursive module, which has a total of 25 recursions and performs better than the baseline Res Block. In addition to end-to-end recursion, the researchers also used a two-state recursive network, which shares signals between LR images and generated HR image states within the network. In general, while reducing parameters, recursive learning networks can learn complex representations of data at the cost of computational performance. In addition, increased computational requirements may cause gradients to explode or disappear. Therefore, recursive learning is often used in conjunction with multi-supervised or residual learning to minimize the risk of gradient explosion or disappearance.
Existing methods only focus on learning the mapping from LR to HR images. However, the space of possible mapping functions can be very large, making training very difficult. To solve this problem, we propose a dual regression scheme by introducing an additional constraint on LR data. Specifically, in addition to learning the LR→HR mapping, we also learn the inverse/dual mapping from the super resolution image to the LR image. We learn to reconstruct HR images with the original map P and LR images with the dual map D simultaneously. Note that the dual map can be viewed as an estimate of the underlying down-sampled kernel. Formally, we formulate the SR problem as a binary regression scheme involving two regression tasks. The specific expression form is shown in Figure 1.
The original regression task: We need to find a function
Dual regression task: We seek a function
Primal and dual learning tasks can form a closed loop to provide information supervision for training models P and D. If P (x) is the correct HR image, then the down-sampled image D(P (x)) should be very close to the input LR image x. With this constraint, we can reduce the space of functions of possible maps, making it easier to learn better maps to reconstruct HR images. By jointly learning these two learning tasks, we propose to train the super-resolution model as follows. Given N pairs of samples
Where
We consider a more general SR case where there is no corresponding HR data with real-world LR data. More critically, the degradation methods of LR images are often unknown, which makes this problem very challenging. In this case, existing SR models often produce serious adaptation problems. To solve this problem, we propose an efficient algorithm to adapt the SR model to the new LR data. Dual regression maps learn the underlying degradation methods and do not necessarily depend on HR images. Therefore, we can directly learn from unpaired real-world LR data for model adaptation. In order to ensure the performance of HR image reconstruction, we also add the information of pairwise synthetic data which is very easy to obtain. Given M unpaired LR samples and N paired synthetic samples, the objective function can be expressed as follows.
Where,
The hardware environment of this section is as follows: CPU frequency is 3.6GHz, memory is 32GB, graphics card is NVIDIA RTX 2080Ti; The software environment is: Windows operating system, Pytorch deep learning framework, Python3.7 programming language. In the experiment, 800 high-quality training images from the public dataset DIV2K were used as the training set. In the test phase, Set5, Set14 and BSD100 were used as the standard test set, which all contained various animals and plants as well as natural scene images.
Before training, the training set is enhanced by horizontal flip and Angle rotation. Considering that putting a single image directly into the model would increase the amount of computation, which is not convenient for training, each image was randomly cropped to obtain 256×256 image blocks. LR images in the training set and test set were obtained by bicuxic interpolation in Matlab language. In order to evaluate the model performance, the PEak-to-noise ratio (PSNR) and structural similarity are calculated on the Y channel (i.e., brightness) of the YCbCr channel in this experiment. The algorithm in this paper has two models in total. The generative network model Ours P for PSNR value generates the final model Ours for adversarial network training. The experimental results of the open datasets Set5, Set14 and BSD100 are respectively compared with those of the classical algorithms FSRCNN, SRGAN and ESRGAN. In the experiment, the number of modules in the intermediate feature layer of SRGAN and ESRGAN is set to 24, which is consistent with the original paper. The number of ECB modules in the proposed algorithm is set to 8, and the number of input channels is 24. The PSNR and SSIM values and the number of parameters at ×2 and ×4 magnifications were compared respectively.
Considering that the training of generative adversarial network should go through two training processes, in order to save time, only the generative network model Ours oriented to PSNR value is trained. It can be seen that the deeper the network, the higher the objective evaluation index, but at the same time, the longer the training time. At the same time, it can be seen from the table that the model continues to increase the number of network layers in a deeper situation, which does not significantly improve the objective evaluation index, but increases the number of model parameters. Therefore, the algorithm in this paper is finally set as 8.
Comparison of the results of PSNR reconstruction by two times of each algorithm
Set5 | 32.40 | 33.19 | 33.46 | 34.68 |
Set14 | 29.52 | 29.56 | 29.47 | 30.56 |
Bsd100 | 26.74 | 29.47 | 29.78 | 29.78 |
Comparison of SSIM results of four times reconstruction of each algorithm
Set5 | 0.8657 | 0.8657 | 0.8569 | 0.6895 |
Set14 | 0.7564 | 0.7369 | 0.8697 | 0.7698 |
Bsd100 | 0.7156 | 0.6689 | 0.6783 | 0.6856 |
Figure 4 and 5 shows the super-resolution image reconstructed by the proposed method and other advanced methods at ×4 magnification. The reconstruction results obtained by the proposed method are significantly better than those obtained by other methods. The first group of pictures recovered by FSRCNN method are blurred and have serious distortion; the second group of pictures recovered by SRGAN method have serious double shadow phenomenon around the branches; and the first group of pictures recovered by ESRGAN method also have serious blur phenomenon. The proposed method almost perfectly recovers the images under ×4 magnification. The recovery results of this method are similar. Compared with the current state-of-the-art, more lightweight network, the proposed method can better recover the image contour in the Set14 dataset at ×4 magnification. It is easy to observe that the reconstruction results in this paper are also significantly better than the other two cutting-edge methods.
Double magnification reconstruction comparison of each algorithm
Comparison of four times magnification reconstruction of each algorithm
This paper presents a new binary regression scheme for paired and unpaired data. On pairwise data, we introduce an additional constraint by reconstructing LR images to reduce the space of possible functions. As a result, we can significantly improve the performance of SR models. In addition, we focus on unpaired data and apply the dual regression scheme to real-world data. We performed ablation studies with the dual regression protocol, and the model with the dual regression protocol performed better on all datasets compared to baseline. These results show that the dual regression scheme can improve the reconstruction of HR images by introducing additional constraints to reduce the space of mapping functions. We also evaluated the influence of our dual regression scheme on other models. Compared with other classical algorithms, we compared PSNR results, SSIM results and intuitive dataset images from two levels of double and four times magnification. It can be seen that the improved algorithm is significantly better than the classical algorithm. Due to the large number of super-resolution scenes, we only selected the data sets that were biased towards buildings, trees, numbers and other relevant directions. There may be some omission problems in the algorithm adaptation of medical and other professional fields, which still needs to be improved in the subsequent research.