Image quality assessment deals with the quantitative evaluation of the quality of images and can be widely used in image acquisition, compression, storage, transmission and other image processing systems. Generally, human beings are the ultimate receivers of images. Subjective evaluation by humans is a reliable IQA method, but it is cumbersome and difficult to apply in real-world scenarios. An objective IQA method aims to design mathematical models to automatically measure the image quality in a way that is consistent with human evaluations. According to the availability of ground-truth images, objective IQA indices fall into three categories: full-reference (FR), reduced-reference (RR) and no-reference (NR) models [1]. In this paper, the discussion is focused on FR models.
At present, there are two popular techniques for constructing FR models: knowledge-based and learning-based techniques. The deep learning method learns the evaluation model in an end-to-end manner, and its “black-box” lacks explanation. Furthermore, this approach requires a large number of training samples, but the cost of obtaining high-quality and convincing samples is relatively high. Currently, the commonly used method for obtaining samples is still data augmentation. In this work, we emphasize the knowledge-based approach, which uses knowledge about the HVS to heuristically construct IQA models. Investigating these models reveals that the gradient feature is widely employed. In analyzing the relationship between the gradient feature and the IQA task, the gradient has at least the following two characteristics. 1. The information contained in natural images is presented by changes in intensity value or color in the spatial domain. In extreme cases, the constant image (smoothness) and the pure noise image (variation in all directions) cannot convey any information. Thus, the feature of measuring change is widely used in IQA, with the gradient as the basic tool for measuring change. 2. The judgment of the image quality level in IQA is different from the classic discrimination task. The features for discrimination tasks, such as face recognition and fingerprint recognition, should be robust to image distortion, while the features for IQA should be sensitive to image distortion. The gradient feature is sensitive to image distortion and image content but is weak in robustness.
Representative FR models using the gradient feature include the feature similarity index (FSIM) [2], gradient magnitude similarity deviation index (GMSD) [3], superpixel-based similarity index (SPSIM) [4] and directional anisotropic structure metric (DASM) [5]. In the FSIM and GMSD, the image gradient magnitude is employed as the fundamental feature. SPSIM is computed on the basis of three features: superpixel luminance, superpixel chrominance and pixel gradient. The DASM is obtained by incorporating the gradient magnitude, anisotropy and local directivity features. Objective IQA models are designed by simulating the behaviors of the HVS, which integrates perception, understanding and assessing functions, that is, humans evaluate the image quality in the HVS perception space. Therefore, the features for IQA should be the subjective quantity perceived by the HVS. The gradient is often directly used in IQA models as an effective feature to measure change; however, does the change measured by the gradient actually correspond to that perceived by the HVS? In fact, the change measured by the gradient belongs to the objective quantity (objective physical stimulus), while that perceived by the HVS belongs to the subjective quantity (subjective response). Thus, how can one map the objective quantity to the subjective quantity? This mapping function is nonlinear, and it is difficult to accurately describe its form. Empirically, the ability of the human perception system to sense changes has a certain upper threshold. When the objective change exceeds the upper threshold, the subjective change increases insignificantly in situations such as the human perception of changes in salt-solution saltiness, at an outside temperature, and in the weight of objects carried.
In this paper, we discuss the ability of the HVS to perceive changes affected by the upper threshold by employing the adaptively truncating gradient to measure the change perceived by the HVS. We propose an IQA index based on the adaptively truncating gradient. Specifically, the upper threshold at each pixel in the image is adaptively determined according to the image content, and the adaptively truncating gradient is obtained by retaining the part of the gradient magnitude that is less than the upper threshold and truncating the part that is greater than the upper threshold. Experimental results on public databases show that the proposed index correlates well with the subjective judgments.
The image information is presented by the change in the intensity values in the spatial domain, and this change may be destroyed by degradation of the image quality. The gradient feature can effectively measure the change and is widely used in IQA algorithms. The image gradient can be obtained by convolving the image with a gradient operator, such as Sobel, Roberts and Scharr and Prewitt. Usually, a different gradient operator for the IQA model may yield distinguished performance. This problem was discussed in [2,6], where the experiment results showed that the Scharr operator can obtain a slightly better performance than the others. Here, we adopt a 3×3 Scharr operator whose templates along the horizontal (
Denote
Where the symbol ⊗ denotes the convolution operation.
The image gradient only reflects the objective changes in images. Since human evaluation of image quality is carried out in the HVS perception space, the image features extracted for IQA models should reflect the subjective changes perceived by the HVS. We consider that the ability of HVS to perceive changes is subject to the upper threshold. When the objective change exceeds the upper threshold, the subjective change does not obviously increase. In this study, we define the adaptively truncating gradient to measure the subjective change sensed by the HVS.
Denote
The truncating gradients of
In Eq. (4), if the value of
Similarly, using formula (3),
Obviously, for the calculation of the truncating gradients
Inspired by this recognition, in contrast to Weber’s law, we consider that the upper threshold for truncating the significantly perceptible stimulus change is also related to the original stimulus intensity value. Because different pixels in the image correspond to different gray values, the original stimulus intensity values will also be different. Here, we adaptively determine the upper threshold according to the background luminance of different areas of the image.
The adaptively upper threshold is defined as
Where
In formula (7), the luminance values
Where
Based on Eq. (6), the value of the upper threshold at each pixel in an image can be adaptively determined according to the image content. Then, the adaptively truncating gradient is obtained by formulas (4) and (5). Figure 1 shows the gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. It can be seen that the maximum amplitude of the gradient map is approximately 250, while the maximum amplitude of the adaptively truncating gradient is approximately 70.
The gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. (a) the reference image. (b) the distorted image. (c) and (d) are the gradient map of (a) and (b), respectively. (e) and (f) are the adaptively truncating gradient map of (a) and (b), respectively.
With the adaptively truncating gradient defined, the local quality of the distorted image is predicted by the similarity between the adaptively truncating gradient of
Where the parameter
The overall quality score of the distorted image is predicted by the local quality
A higher score indicates better image quality.
All the experiments in this study were implemented in MATLAB R2016b and executed on a Lenovo Ideapad700 laptop with Intel Core i5-6300HQ@2.3-GHz CPU and 4 GB RAM. Several well-known FR metrics were used when comparing performances with the proposed method, including PSNR, SSIM[1], FSIM [2], GMSD[3], DASM[5], IFC [8], VIF [9], MS-SSIM [10], and SSRM [11]. To widely evaluate the performance of these metrics, six public databases were employed for the experiments: TID2013 [12], TID2008 [13], CSIQ [14], LIVE [15], IVC [16] and A57 [17]. The TID2008 database consists of 25 reference images and a total of 1700 distorted images, each of which is distorted using 17 different types of distortions at four different levels of distortion. The TID2013 is an expanded version of TID2008, which contains 3000 distorted images with 24 distortion types. The LIVE database includes 29 reference images and 779 distorted images with five distortion types. The CSIQ database contains 30 original images and 886 distorted images degraded by six types of distortion. The IVC database consists of 10 reference images and 185 distorted images. The A57 database includes 3 reference images and 54 distorted images. Note that for the color images in these databases, only the luminance component is evaluated.
Four commonly used performance criteria are employed to evaluate the competing IQA metrics. The Spearman rank order correlation coefficient (SROCC) and Kendall rank order correlation coefficient (KROCC) are adopted for measuring the prediction monotonicity of an objective IQA metric. For compute the other two criteria, the Pearson linear correlation coefficient (PLCC) and the root mean squared error (RMSE), we need to apply a regression analysis. The PLCC measures the consistency between the objective scores after nonlinear regression and the subjective mean opinion scores (MOS). The RMSE measures the relative distance between the objective scores after nonlinear regression and MOS. For the nonlinear regression, we used the following mapping function:
where
For the proposed metric, there are three parameters that need to be set to obtain the final quality score. They are
To further analyze the effect of threshold parameter
SROCC performance with different
Table I lists the SROCC, KROCC, PLCC and RMSE results of ten metrics on six databases, and the two best results of each row are highlighted in bold. Overall, the methods which employed the gradient feature performs well across all the databases, such as FSIM, GMSD, DASM and the proposed metric. This partly demonstrates the validity of considering the degradation of gray changes in quality evaluation. Furthermore, the proposed metric performs well, outperforming SSIM and SSRM and competing with FSIM and GMSD.
COMPARISON THE PERFORMANCE RESULTS OF TEN IQA METRICS ON SIX PUBLIC DATABASES. THE FIRST TWO ARE MARKED IN BOLD
Database | criteria | PSNR | SSIM (2004) | MS-SSIM (2003) | IFC (2005) | VIF (2006) | FSIM (2011) | GMSD (2014) | DASM (2017) | SSRM (2018) | Proposed |
---|---|---|---|---|---|---|---|---|---|---|---|
SROCC | 0.6396 | 0.7417 | 0.7859 | 0.5389 | 0.6769 | 0.8015 | 0.8025 | 0.7500 | 0.8105 | ||
TID2013 | KROCC | 0.4698 | 0.5588 | 0.6047 | 0.3939 | 0.5147 | 0.6289 | 0.6321 | 0.5718 | 0.6387 | |
PLCC | 0.7017 | 0.7895 | 0.8329 | 0.5538 | 0.7720 | 0.8542 | 0.8574 | 0.8078 | 0.8601 | ||
RMSE | 0.8832 | 0.7608 | 0.6861 | 1.0322 | 0.7880 | 0.6444 | 0.6547 | 0.7307 | 0.6324 | ||
SROCC | 0.5531 | 0.7749 | 0.8542 | 0.5675 | 0.7491 | 0.8805 | - | 0.8332 | 0.8913 | ||
TID2008 | KROCC | 0.4027 | 0.5768 | 0.6568 | 0.4236 | 0.5860 | 0.6946 | - | 0.6535 | 0.7042 | |
PLCC | 0.5734 | 0.7732 | 0.8451 | 0.7340 | 0.8090 | 0.8738 | - | 0.8379 | 0.8745 | ||
RMSE | 1.0994 | 0.8511 | 0.7173 | 0.9113 | 0.7888 | 0.6525 | - | 0.7324 | 0.6458 | ||
SROCC | 0.8756 | 0.9479 | 0.9513 | 0.9259 | 0.9546 | 0.9601 | 0.9608 | 0.9531 | |||
LIVE | KROCC | 0.6865 | 0.7963 | 0.8045 | 0.7579 | 0.8282 | 0.8237 | 0.8218 | 0.8211 | ||
PLCC | 0.8721 | 0.9449 | 0.9430 | 0.9248 | 0.9597 | 0.9515 | 0.9571 | 0.9379 | |||
RMSE | 13.368 | 8.9455 | 9.0956 | 10.392 | 7.6734 | 7.6780 | 7.7716 | 8.0188 | |||
SROCC | 0.8058 | 0.8756 | 0.9133 | 0.7671 | 0.9195 | 0.9242 | 0.9369 | 0.9251 | |||
CSIQ | KROCC | 0.6084 | 0.6907 | 0.7393 | 0.5897 | 0.7537 | 0.7567 | 0.7791 | 0.7575 | ||
PLCC | 0.8001 | 0.8613 | 0.8998 | 0.8381 | 0.9277 | 0.9120 | 0.9097 | 0.9055 | |||
RMSE | 0.1575 | 0.1334 | 0.1145 | 0.1432 | 0.0980 | 0.1077 | 0.1138 | 0.1114 | |||
SROCC | 0.6884 | 0.9018 | 0.8980 | 0.8993 | 0.8964 | 0.8789 | 0.8966 | 0.9047 | |||
IVC | KROCC | 0.5218 | 0.7223 | 0.7203 | 0.7202 | 0.7158 | 0.6882 | 0.7179 | 0.7310 | ||
PLCC | 0.7199 | 0.9119 | 0.8934 | 0.9080 | 0.9028 | 0.8549 | 0.9190 | 0.9132 | |||
RMSE | 0.8456 | 0.4999 | 0.5474 | 0.5105 | 0.5239 | 0.6320 | 0.5220 | 0.4965 | |||
SROCC | 0.6189 | 0.8066 | 0.8394 | 0.3185 | 0.6223 | 0.9103 | 0.8527 | 0.9062 | |||
A57 | KROCC | 0.4309 | 0.6058 | 0.6478 | 0.2378 | 0.4589 | 0.7513 | 0.6604 | 0.7289 | ||
PLCC | 0.6587 | 0.8017 | 0.8504 | 0.4548 | 0.6158 | 0.9085 | 0.8528 | 0.8530 | |||
RMSE | 0.1849 | 0.1469 | 0.1293 | 0.2189 | 0.1936 | 0.0933 | 01027 | 0.1283 |
Among the six databases, TID2013 has the highest number of distorted types. Table II lists the SROCC results of ten metrics about each individual distorted type of the TID2013 database. The proposed algorithm performs well in variety of distortion types. In particular, the proposed algorithm is outstanding for JPEG, JP2K and JPEG-trans-error distortion types that are sensitive to variations.
COMPARISON SROCC FOR INDIVIDUAL DISTORTION OF TEN IQA METRICS ON TID2013 DATABASE. THE FIRS TTWO ARE MARKED IN BOLD
Database | Distortion type | PSNR | SSIM (2004) | MS-SSIM (2003) | IFC (2005) | VIF (2006) | FSIM (2011) | GMSD (2014) | DASM (2017) | SSRM (2018) | Proposed |
---|---|---|---|---|---|---|---|---|---|---|---|
TID2013 | |||||||||||
Awgn | 0.9291 | 0.8671 | 0.8646 | 0.6612 | 0.8994 | 0.8973 | 0.8545 | 0.9293 | |||
Awgn-color | 0.7726 | 0.7730 | 0.5352 | 0.8299 | 0.8208 | 08612 | 0.7757 | 0.8463 | |||
Spatial-correlated | 0.9197 | 0.8515 | 0.8544 | 0.6601 | 0.8835 | 0.8750 | 0.8392 | 0.9178 | |||
Mask-noise | 0.7767 | 0.8073 | 0.6932 | 0.7944 | 0.7085 | 0.8019 | 0.8184 | 0.8068 | |||
HF-noise | 0.9140 | 0.8634 | 0.8604 | 0.7406 | 0.8972 | 0.8984 | 0.8754 | 0.9069 | |||
Impulse-noise | 0.7503 | 0.7629 | 0.6408 | 0.8537 | 0.8072 | 0.7633 | 0.7872 | 0.8336 | |||
Quantization-noise | 0.8801 | 0.8657 | 0.8706 | 0.6282 | 0.7854 | 0.8719 | 0.8496 | 0.8629 | |||
GB | 0.9155 | 0.9668 | 0.9673 | 0.8907 | 0.9650 | 0.9551 | 0.9114 | 0.9546 | |||
Denoising | 0.9481 | 0.9254 | 0.9268 | 0.7779 | 0.8911 | 0.9302 | 0.9288 | 0.9361 | |||
JPEG | 0.9189 | 0.9200 | 0.9265 | 0.8357 | 0.9192 | 0.9324 | 0.9473 | 0.9287 | |||
JP2K | 0.8840 | 0.9468 | 0.9504 | 0.9078 | 0.9516 | 0.9577 | 0.9620 | 0.9562 | |||
JPEG-trans-error | 0.7682 | 0.8493 | 0.8475 | 0.7425 | 0.8409 | 0.8464 | 0.8401 | 0.8369 | |||
JP2K-trans-error | 0.8886 | 0.8828 | 0.8889 | 0.7769 | 0.8761 | 0.8913 | 0.8966 | 0.8765 | |||
Pattern-noise | 0.6864 | 0.7821 | 0.7968 | 0.5737 | 0.7720 | 0.7917 | 0.8138 | 0.7745 | 0.7632 | ||
Block-distortion | 0.1552 | 0.5720 | 0.4801 | 0.2414 | 0.5306 | 0.5489 | 0.6338 | 0.3186 | |||
Mean-shift | 0.7671 | 0.5522 | 0.6276 | 0.7531 | 0.7356 | 0.6127 | 0.6919 | 0.6143 | |||
Contrast change | 0.4416 | 0.3775 | 0.4634 | -0.1798 | 0.4686 | 0.3253 | 0.3498 | 0.4519 | |||
Saturation change | -0.4141 | -0.4099 | -0.4029 | -0.3099 | -0.2748 | -0.1907 | -0.2513 | -0.2602 | |||
Multiple-noise | 0.7803 | 0.7786 | 0.61423 | 0.8468 | 0.8469 | 0.8814 | 0.8067 | 0.8698 | |||
Comfort-noise | 0.8410 | 0.8566 | 0.8528 | 0.81620 | 0.8946 | 0.9121 | 0.8921 | 0.9112 | |||
Noisy-compression | 0.9144 | 0.9057 | 0.9068 | 0.8180 | 0.9204 | 0.9402 | 0.9164 | 0.9367 | |||
Color quantization | 0.8542 | 0.8555 | 0.6006 | 0.8414 | 0.8760 | 0.9098 | 0.8546 | 0.8952 | |||
Chromatic abbr. | 0.8775 | 0.8784 | 0.8210 | 0.8848 | 0.8715 | 0.8517 | 0.8693 | 0.8844 | |||
Sparse sample | 0.9044 | 0.9461 | 0.9483 | 0.8885 | 0.9353 | 0.9565 | 0.9541 | 0.9601 |
In this paper, we discuss the problem of whether the change measured by the gradient correspond to the change perceived by the HVS. Considering that the ability of the HVS to perceive changes is affected by the upper threshold, we defined the adaptively truncating gradient and proposed a novel IQA index. Numerical experimental results showed that this index performs well on multiple databases. In addition, more studies need to be conducted to address this problem due to its complexity. In future research, we expect to using machine learning methods to further understand this issue.