INFORMAZIONI SU QUESTO ARTICOLO

Cita

INTRODUCTION

Image quality assessment deals with the quantitative evaluation of the quality of images and can be widely used in image acquisition, compression, storage, transmission and other image processing systems. Generally, human beings are the ultimate receivers of images. Subjective evaluation by humans is a reliable IQA method, but it is cumbersome and difficult to apply in real-world scenarios. An objective IQA method aims to design mathematical models to automatically measure the image quality in a way that is consistent with human evaluations. According to the availability of ground-truth images, objective IQA indices fall into three categories: full-reference (FR), reduced-reference (RR) and no-reference (NR) models [1]. In this paper, the discussion is focused on FR models.

At present, there are two popular techniques for constructing FR models: knowledge-based and learning-based techniques. The deep learning method learns the evaluation model in an end-to-end manner, and its “black-box” lacks explanation. Furthermore, this approach requires a large number of training samples, but the cost of obtaining high-quality and convincing samples is relatively high. Currently, the commonly used method for obtaining samples is still data augmentation. In this work, we emphasize the knowledge-based approach, which uses knowledge about the HVS to heuristically construct IQA models. Investigating these models reveals that the gradient feature is widely employed. In analyzing the relationship between the gradient feature and the IQA task, the gradient has at least the following two characteristics. 1. The information contained in natural images is presented by changes in intensity value or color in the spatial domain. In extreme cases, the constant image (smoothness) and the pure noise image (variation in all directions) cannot convey any information. Thus, the feature of measuring change is widely used in IQA, with the gradient as the basic tool for measuring change. 2. The judgment of the image quality level in IQA is different from the classic discrimination task. The features for discrimination tasks, such as face recognition and fingerprint recognition, should be robust to image distortion, while the features for IQA should be sensitive to image distortion. The gradient feature is sensitive to image distortion and image content but is weak in robustness.

Representative FR models using the gradient feature include the feature similarity index (FSIM) [2], gradient magnitude similarity deviation index (GMSD) [3], superpixel-based similarity index (SPSIM) [4] and directional anisotropic structure metric (DASM) [5]. In the FSIM and GMSD, the image gradient magnitude is employed as the fundamental feature. SPSIM is computed on the basis of three features: superpixel luminance, superpixel chrominance and pixel gradient. The DASM is obtained by incorporating the gradient magnitude, anisotropy and local directivity features. Objective IQA models are designed by simulating the behaviors of the HVS, which integrates perception, understanding and assessing functions, that is, humans evaluate the image quality in the HVS perception space. Therefore, the features for IQA should be the subjective quantity perceived by the HVS. The gradient is often directly used in IQA models as an effective feature to measure change; however, does the change measured by the gradient actually correspond to that perceived by the HVS? In fact, the change measured by the gradient belongs to the objective quantity (objective physical stimulus), while that perceived by the HVS belongs to the subjective quantity (subjective response). Thus, how can one map the objective quantity to the subjective quantity? This mapping function is nonlinear, and it is difficult to accurately describe its form. Empirically, the ability of the human perception system to sense changes has a certain upper threshold. When the objective change exceeds the upper threshold, the subjective change increases insignificantly in situations such as the human perception of changes in salt-solution saltiness, at an outside temperature, and in the weight of objects carried.

In this paper, we discuss the ability of the HVS to perceive changes affected by the upper threshold by employing the adaptively truncating gradient to measure the change perceived by the HVS. We propose an IQA index based on the adaptively truncating gradient. Specifically, the upper threshold at each pixel in the image is adaptively determined according to the image content, and the adaptively truncating gradient is obtained by retaining the part of the gradient magnitude that is less than the upper threshold and truncating the part that is greater than the upper threshold. Experimental results on public databases show that the proposed index correlates well with the subjective judgments.

AN IQA INDEX BASED ON ADAPTIVELY TRUNCATING GRADIENT
Definition of Adaptively Truncating Gradient

The image information is presented by the change in the intensity values in the spatial domain, and this change may be destroyed by degradation of the image quality. The gradient feature can effectively measure the change and is widely used in IQA algorithms. The image gradient can be obtained by convolving the image with a gradient operator, such as Sobel, Roberts and Scharr and Prewitt. Usually, a different gradient operator for the IQA model may yield distinguished performance. This problem was discussed in [2,6], where the experiment results showed that the Scharr operator can obtain a slightly better performance than the others. Here, we adopt a 3×3 Scharr operator whose templates along the horizontal (H) and vertical (V) directions take the following form:hH=116[30310010303],hV=116[31030003103]

Denote r = [r1, ⋯, r1, ⋯, rN] for a reference image and d = [d1,⋯,di,⋯,dN] for a distorted image, where i is the pixel index, and N is the number of total pixels. The image gradients in the horizontal and vertical directions can be obtained by convolution of the image with hH and hV, and the gradient magnitude is computed from their root mean square. The gradient magnitudes of r and d at each pixel i, denoted as G(r, i) and G(d, i) are calculated asG(r,i)=(rhH)2(i)+(rhV)2(i)G(d,i)=(dhH)2(i)+(dhV)2(i)

Where the symbol ⊗ denotes the convolution operation.

The image gradient only reflects the objective changes in images. Since human evaluation of image quality is carried out in the HVS perception space, the image features extracted for IQA models should reflect the subjective changes perceived by the HVS. We consider that the ability of HVS to perceive changes is subject to the upper threshold. When the objective change exceeds the upper threshold, the subjective change does not obviously increase. In this study, we define the adaptively truncating gradient to measure the subjective change sensed by the HVS.

Denote T as the upper threshold. We define a truncating function trunc(·). For any given variable x, it is retained when it is less than T and truncated when it is greater than T. The specific expression istrunc(x)={T, if xTx, if x<T

The truncating gradients of r and d at each pixel i are denoted as GT(r, i) and GT(d, i), and the upper threshold at this point is denoted as T(i). Using formula (3), the calculation of GT(r, i) is as follows:GT(r,i)=trunc(G(r,i))={T(i), if G(r,i)T(i)G(r,i), if G(r,i)<T(i)

In Eq. (4), if the value of G(r, i) is greater than T(i), then G(r, i) will be truncated, and the truncating gradient GT(r, i) is set to T(i). That is, the part of the gradient magnitude that is greater than the upper threshold is masked. Otherwise, G(r, i) is not be masked, and the truncating gradient GT(r, i) is set equal to G(r, i). That is, the part of the gradient magnitude that is less than the upper threshold can be perceived by the HVS.

Similarly, using formula (3), G(r, i), is calculated as follows:GT(d,i)=trunc(G(d,i))={T(i), if G(d,i)T(i)G(d,i), if G(d,i)<T(i)

Obviously, for the calculation of the truncating gradients GT(r, i) and GT(d, i) Eq. (4) and (5), the selection of the upper threshold T(i) is very important. According to Weber’s law, the ratio of the stimulus change that causes a just noticeable difference (JND) from the original stimulus intensity is a constant. In psychology, the HVS has the property of light adaptation, and the perception of luminance obeys Weber’s law [7]. The just noticeable incremental luminance over the background by the HVS is related to the background luminance.

Inspired by this recognition, in contrast to Weber’s law, we consider that the upper threshold for truncating the significantly perceptible stimulus change is also related to the original stimulus intensity value. Because different pixels in the image correspond to different gray values, the original stimulus intensity values will also be different. Here, we adaptively determine the upper threshold according to the background luminance of different areas of the image.

The adaptively upper threshold is defined asT(i)=I(i)T0

Where T0 is an adjustable threshold parameter. (The details of selecting T0 will be presented in section III-A.) I(i) takes the larger value of the luminance of r and d at point i.I(i)=max(r¯(i),d¯(i))

In formula (7), the luminance values r¯(i) and d¯(i) at pixel i of r and d is estimated by formulas (8) and (9). For reference image r, denote the square neighborhood as Ωir with center of pixel i and radius of t, and let the intensity value of any pixel in the neighborhood be ri,j, jΩir. Similarly, for the distorted image, denote the square neighborhood as Ωid with center of pixel i and radius of t, and let the intensity value of any pixel in the neighborhood be di,j, jΩidr¯(i)=1mj=1mri,jd¯(i)=1mj=1mri,j

Where m = (2t + 1)2.

Based on Eq. (6), the value of the upper threshold at each pixel in an image can be adaptively determined according to the image content. Then, the adaptively truncating gradient is obtained by formulas (4) and (5). Figure 1 shows the gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. It can be seen that the maximum amplitude of the gradient map is approximately 250, while the maximum amplitude of the adaptively truncating gradient is approximately 70.

Figure 1.

The gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. (a) the reference image. (b) the distorted image. (c) and (d) are the gradient map of (a) and (b), respectively. (e) and (f) are the adaptively truncating gradient map of (a) and (b), respectively.

The Proposed IQA Index

With the adaptively truncating gradient defined, the local quality of the distorted image is predicted by the similarity between the adaptively truncating gradient of r and d, which is defined asS(i)=2GT(r,i)GT(d,i)+CGT2(r,i)+GT2(d,i)+C

Where the parameter C is introduced to avoid the denominator becoming zero and supplies numerical stability. The range of S(i) is from 0 to 1. Obviously, on the one hand, S(i) is close to 0 when GT (r, i) and GT (d, i) are quite different. On the other hand, S(i) will achieve the maximal value 1 when GT (r, i) is equal to GT (d, i).

The overall quality score of the distorted image is predicted by the local quality S(i), which is calculated as follows :score=1Ni=1NS(i)

A higher score indicates better image quality.

EXPERIMENTAL RESULTS
Experimental Setup

All the experiments in this study were implemented in MATLAB R2016b and executed on a Lenovo Ideapad700 laptop with Intel Core i5-6300HQ@2.3-GHz CPU and 4 GB RAM. Several well-known FR metrics were used when comparing performances with the proposed method, including PSNR, SSIM[1], FSIM [2], GMSD[3], DASM[5], IFC [8], VIF [9], MS-SSIM [10], and SSRM [11]. To widely evaluate the performance of these metrics, six public databases were employed for the experiments: TID2013 [12], TID2008 [13], CSIQ [14], LIVE [15], IVC [16] and A57 [17]. The TID2008 database consists of 25 reference images and a total of 1700 distorted images, each of which is distorted using 17 different types of distortions at four different levels of distortion. The TID2013 is an expanded version of TID2008, which contains 3000 distorted images with 24 distortion types. The LIVE database includes 29 reference images and 779 distorted images with five distortion types. The CSIQ database contains 30 original images and 886 distorted images degraded by six types of distortion. The IVC database consists of 10 reference images and 185 distorted images. The A57 database includes 3 reference images and 54 distorted images. Note that for the color images in these databases, only the luminance component is evaluated.

Four commonly used performance criteria are employed to evaluate the competing IQA metrics. The Spearman rank order correlation coefficient (SROCC) and Kendall rank order correlation coefficient (KROCC) are adopted for measuring the prediction monotonicity of an objective IQA metric. For compute the other two criteria, the Pearson linear correlation coefficient (PLCC) and the root mean squared error (RMSE), we need to apply a regression analysis. The PLCC measures the consistency between the objective scores after nonlinear regression and the subjective mean opinion scores (MOS). The RMSE measures the relative distance between the objective scores after nonlinear regression and MOS. For the nonlinear regression, we used the following mapping function:QP=β1[1211+eβ2(Qβ3)]+β4Q+β5

where Q and QP are original objective scores of an IQA metric and the objective scores after regression, respectively. βi, i = 1, 2, ⋯, 5 are the fixed parameters. Higher values of SROCC, KROCC, PLCC and lower RMSE values indicate a better performance of IQA metrics.

For the proposed metric, there are three parameters that need to be set to obtain the final quality score. They are T0, t and C. Selecting the first 8 reference images and corresponding 544 distorted images in the TID2008 database as the testing subset, we choose the parameters that can yield the highest SROCC. The result is T0 = 3, t = 51 and C = 1600.

To further analyze the effect of threshold parameter T0, more experiments were carried out. Figure 2 shows the SROCC performance with different T0 values on six databases. On most databases, SROCC can is best when T0 is 3. This result indicates that the range of upper threshold T is approximately [0,255/3] for an 8-bit grayscale image according to formula (6). If the change in image intensity is above 255/3, then it will be masked in visual perception.

Figure 2.

SROCC performance with different T0 values on six databases

Performance Comparison

Table I lists the SROCC, KROCC, PLCC and RMSE results of ten metrics on six databases, and the two best results of each row are highlighted in bold. Overall, the methods which employed the gradient feature performs well across all the databases, such as FSIM, GMSD, DASM and the proposed metric. This partly demonstrates the validity of considering the degradation of gray changes in quality evaluation. Furthermore, the proposed metric performs well, outperforming SSIM and SSRM and competing with FSIM and GMSD.

COMPARISON THE PERFORMANCE RESULTS OF TEN IQA METRICS ON SIX PUBLIC DATABASES. THE FIRST TWO ARE MARKED IN BOLD

DatabasecriteriaPSNRSSIM (2004)MS-SSIM (2003)IFC (2005)VIF (2006)FSIM (2011)GMSD (2014)DASM (2017)SSRM (2018)Proposed
SROCC0.63960.74170.78590.53890.67690.80150.80380.80250.75000.8105
TID2013KROCC0.46980.55880.60470.39390.51470.62890.63340.63210.57180.6387
PLCC0.70170.78950.83290.55380.77200.85890.85420.85740.80780.8601
RMSE0.88320.76080.68611.03220.78800.63490.64440.65470.73070.6324
SROCC0.55310.77490.85420.56750.74910.88050.8906-0.83320.8913
TID2008KROCC0.40270.57680.65680.42360.58600.69460.7090-0.65350.7042
PLCC0.57340.77320.84510.73400.80900.87380.8786-0.83790.8745
RMSE1.09940.85110.71730.91130.78880.65250.6408-0.73240.6458
SROCC0.87560.94790.95130.92590.96360.96340.95460.96010.96080.9531
LIVEKROCC0.68650.79630.80450.75790.82820.83370.82370.82180.83120.8211
PLCC0.87210.94490.94300.92480.95980.95970.95150.95710.96950.9379
RMSE13.3688.94559.095610.3927.67347.67807.11317.77165.66398.0188
SROCC0.80580.87560.91330.76710.91950.92420.95710.95230.93690.9251
CSIQKROCC0.60840.69070.73930.58970.75370.75670.81220.80410.77910.7575
PLCC0.80010.86130.89980.83810.92770.91200.95430.95310.90970.9055
RMSE0.15750.13340.11450.14320.09800.10770.07910.07990.11380.1114
SROCC0.68840.90180.89800.89930.89640.92620.87890.89660.90470.9103
IVCKROCC0.52180.72230.72030.72020.71580.75640.68820.71790.73100.7352
PLCC0.71990.91190.89340.90800.90280.93760.85490.91900.91320.9206
RMSE0.84560.49990.54740.51050.52390.42360.63200.52200.49650.4758
SROCC0.61890.80660.83940.31850.62230.91810.91030.92150.85270.9062
A57KROCC0.43090.60580.64780.23780.45890.76390.75130.77820.66040.7289
PLCC0.65870.80170.85040.45480.61580.92520.90850.94290.85280.8530
RMSE0.18490.14690.12930.21890.19360.0933010270.08130.07070.1283

Among the six databases, TID2013 has the highest number of distorted types. Table II lists the SROCC results of ten metrics about each individual distorted type of the TID2013 database. The proposed algorithm performs well in variety of distortion types. In particular, the proposed algorithm is outstanding for JPEG, JP2K and JPEG-trans-error distortion types that are sensitive to variations.

COMPARISON SROCC FOR INDIVIDUAL DISTORTION OF TEN IQA METRICS ON TID2013 DATABASE. THE FIRS TTWO ARE MARKED IN BOLD

DatabaseDistortion typePSNRSSIM (2004)MS-SSIM (2003)IFC (2005)VIF (2006)FSIM (2011)GMSD (2014)DASM (2017)SSRM (2018)Proposed
TID2013
Awgn0.92910.86710.86460.66120.89940.89730.94610.92990.85450.9293
Awgn-color0.89860.77260.77300.53520.82990.82080.8689086120.77570.8463
Spatial-correlated0.91970.85150.85440.66010.88350.87500.93480.93010.83920.9178
Mask-noise0.83210.77670.80730.69320.84500.79440.70850.80190.81840.8068
HF-noise0.91400.86340.86040.74060.89720.89840.91640.91790.87540.9069
Impulse-noise0.89690.75030.76290.64080.85370.80720.76330.85500.78720.8336
Quantization-noise0.88010.86570.87060.62820.78540.87190.90570.90320.84960.8629
GB0.91550.96680.96730.89070.96500.95510.91140.95460.96740.9686
Denoising0.94810.92540.92680.77790.89110.93020.95250.94960.92880.9361
JPEG0.91890.92000.92650.83570.91920.93240.95000.94730.92870.9515
JP2K0.88400.94680.95040.90780.95160.95770.96560.96200.95620.9635
JPEG-trans-error0.76820.84930.84750.74250.84090.84640.84010.85340.83690.8802
JP2K-trans-error0.88860.88280.88890.77690.87610.89130.91350.89660.87650.9141
Pattern-noise0.68640.78210.79680.57370.77200.79170.81430.81380.77450.7632
Block-distortion0.15520.57200.48010.24140.53060.54890.66300.63380.31860.6635
Mean-shift0.76710.77520.79060.55220.62760.75310.73560.61270.69190.6143
Contrast change0.44160.37750.4634-0.17980.83860.46860.32530.34980.45190.4889
Saturation change0.0944-0.4141-0.4099-0.4029-0.3099-0.2748-0.19070.0382-0.2513-0.2602
Multiple-noise0.89110.78030.77860.614230.84680.84690.88800.88140.80670.8698
Comfort-noise0.84100.85660.85280.816200.89460.91210.92980.92030.89210.9112
Noisy-compression0.91440.90570.90680.81800.92040.94660.96310.94020.91640.9367
Color quantization0.92690.85420.85550.60060.84140.87600.90980.91770.85460.8952
Chromatic abbr.0.88710.87750.87840.82100.88480.87150.85170.86930.88440.8849
Sparse sample0.90440.94610.94830.88850.93530.95650.96840.96690.95410.9601
CONCLUSION

In this paper, we discuss the problem of whether the change measured by the gradient correspond to the change perceived by the HVS. Considering that the ability of the HVS to perceive changes is affected by the upper threshold, we defined the adaptively truncating gradient and proposed a novel IQA index. Numerical experimental results showed that this index performs well on multiple databases. In addition, more studies need to be conducted to address this problem due to its complexity. In future research, we expect to using machine learning methods to further understand this issue.

eISSN:
2470-8038
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, other