Literature DB >> 27341493

Full-Reference Image Quality Assessment with Linear Combination of Genetically Selected Quality Measures.

Mariusz Oszust1.   

Abstract

Information carried by an image can be distorted due to different image processing steps introduced by different electronic means of storage and communication. Therefore, development of algorithms which can automatically assess a quality of the image in a way that is consistent with human evaluation is important. In this paper, an approach to image quality assessment (IQA) is proposed in which the quality of a given image is evaluated jointly by several IQA approaches. At first, in order to obtain such joint models, an optimisation problem of IQA measures aggregation is defined, where a weighted sum of their outputs, i.e., objective scores, is used as the aggregation operator. Then, the weight of each measure is considered as a decision variable in a problem of minimisation of root mean square error between obtained objective scores and subjective scores. Subjective scores reflect ground-truth and involve evaluation of images by human observers. The optimisation problem is solved using a genetic algorithm, which also selects suitable measures used in aggregation. Obtained multimeasures are evaluated on four largest widely used image benchmarks and compared against state-of-the-art full-reference IQA approaches. Results of comparison reveal that the proposed approach outperforms other competing measures.

Entities:  

Mesh:

Year:  2016        PMID: 27341493      PMCID: PMC4920377          DOI: 10.1371/journal.pone.0158333

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Visual information is often a subject of many processing steps, e.g., acquisition, enhancement, compression, or transmission. After processing, some information carried by the content of the image can be distorted. Therefore, its quality should be evaluated from a human perception point of view. There are three categories of image quality assessment (IQA) measures (metrics or models), depending on availability of a pristine, i.e., distortion-free, image: (1) full-reference, (2) no-reference, and (3) reduced-reference models. In this paper, the full-reference approach is considered, in which for each distorted image in a benchmark dataset its reference image is provided. Application of peak signal-to-noise ratio (PSNR) is one of the simplest approaches to IQA. However, an output of PSNR is not well correlated with human evaluation; therefore this technique often serves as a bottom model for comparison. In [1], Damera-Venkata et al. presented noise quality measure (NQM) in which a distorted image is modelled using a linear frequency distortion and an additive noise injection. Wang et al. [2] introduced universal image quality index (UQI). UQI evaluates quality of an image using loss of correlation, luminance distortion, and contrast distortion. Further extension of UQI, structural similarity (SSIM), was proposed by Wang et al. [3]. A multi-scale SSIM, MSSIM, was presented in [4]. Wang and Li in [5] proposed information content weighted SSIM (IW-SSIM) approach as an extension of MSSIM. In that work, local information was measured using statistical models of natural scenes. Statistical properties of natural environment are also utilised in visual information fidelity (VIF) [6] measure and information fidelity criterion (IFC) [7]. In [8], Riesz-transform based feature similarity (RFSIM) was proposed. The measure is computed by comparing Riesz-transform features at key locations between the distorted image and its reference image. Authors of feature similarity index (FSIM) [9] developed an approach which uses phase congruency and image gradient magnitude as low-level local features. FSIMc is a version of FSIM developed for processing colour images. In [10], spectral residual based similarity (SR-SIM) using visual saliency map was proposed. A visual saliency to calculate a local quality map of the distorted image is used in visual saliency-induced index (VSI) [11]. The gradient similarity (GSM) measure [12] estimates image quality taking into consideration structure and contrast changes, as well as luminance distortions. In [13], image structural degradation was considered and determined using local binary patterns. In SURF-SIM [14], multiscale differences between features detected and described by Speed Up Robust Features (SURF) approach are combined with a pooling strategy. An IQA measure that evaluates images taking into account inter-patch and intra-patch similarities was described in [15]. In that work, authors used modified normalised correlation coefficient and image curvature. Development of full-reference IQA measures can also involve different fusion strategies. For example, Liu and Yang [16] combined SNR, SSIM, VIF, and VSNR using canonical correlation analysis. A most apparent distortion algorithm (MAD) [17] adopts two strategies for IQA. In that approach, a local luminance and a contrast masking evaluate high-quality images. Changes in the local statistics of spatial-frequency components are used for images with a low quality. Three IQA metrics, MS-SIM, VIF and R-SVD, were non-linearly combined by Okarma in [18, 19]. A non-linear fusion of IQA measures was also investigated in [20]. In [21], up to seven IQA models were combined using a regularised regression. Peng and Li in [22] presented an approach based on conditional Bayesian mixture of experts model. In that paper, a support vector machines classifier was used for prediction of the type of distortion, and then SSIM, VSNR, and VIF with k-nearest-neighbour regression were fused. Authors in their other paper, [23], presented and adaptive combination of IQA measures with an edge-quality based on preservation of edge direction. In [24], a combination of local and global distortion measures was considered using saliency maps, gradient and contrast information. Recently, many complex fusion approaches have been introduced, and therefore, the main contribution of this paper is to show that a solution based on linear combination, which, together with a genetic algorithm, is able to find well-performing fusion of IQA measures. Apart from comparison of different approaches performed in accordance to a widely accepted protocol, the paper provides some insights on a selection of IQA techniques which are likely to be fused. In this paper, a decision fusion of 16 full-reference IQA measures is defined as an optimisation problem of finding weights in a weighted sum of their outputs. A genetic algorithm finds the solution that minimises root mean square error (RMSE) of prediction performance. The number of used measures and parameters of the regression model for fitting objective scores to subjective scores prior to RMSE calculation are found by the algorithm. Finally, the proposed approach is evaluated on four largest IQA image benchmarks and compared with the state-of-the-art approaches. The rest of this paper is organised as follows. In the section Methods, a formulation of the optimisation problem and the development of the proposed approach are presented. Experimental results with related discussions are covered in the section Results and Discussion. Finally, the last section concludes the paper.

Methods

Since digital processing can alter an appearance of the image and that may lead to different opinions on its quality, many IQA algorithms have been proposed for automatic assessment [25]. In order to compare IQA approaches, specific image databases have been proposed. They contain reference images, their corresponding distorted images, and ground-truth information obtained from human observers. Information on the perceived quality is reported as mean opinion scores (MOS values) or differential mean opinion scores (DMOS values). The desired IQA metric should produce objective scores which are consistent with human ratings (subjective scores). In this work, it is assumed that joint metric can provide better results, in terms of prediction quality, than a single metric that contributes to the multimeasure. Let Q be an output of an aggregated decision of n IQA measures, where n ∈ N. It can be expressed as: where A is an aggregation operator. The operator often has a form of a weighted sum [26-28], therefore Q can be expressed as follows: where x = [x1, x2, …, x] denotes a vector of weights, . The vector x contains decision variables in an optimisation problem of finding an effective fusion of IQA measures. Since many fusions can be proposed, a given x should be evaluated. For this purpose one of typically used IQA measures quality evaluation indices can be used. In order to measure consistency of the output of the examined IQA model with human assessment, the following indices of prediction accuracy, monotonicity, and consistency are often considered [29, 30]: Spearman Rank order Correlation Coefficient (SRCC), Kendall Rank order Correlation Coefficient (KRCC), Pearson linear Correlation Coefficient (PCC), and Root Mean Square Error (RMSE). Evaluation indices are calculated after a nonlinear mapping between a vector of objective scores, , and MOS or differential MOS (DMOS), , using the following mapping function for the nonlinear regression [30]: where = [β1, β2, …, β5] are parameters of the regression model [29], and is a mapped equivalent of . SRCC is calculated as follows: where d is the difference between i image in and , and m is the total number of images. KRCC, in turn, uses the number of concordant pairs in the dataset, m, and the number of discordant pairs in the dataset, m. It is illustrated by Eq (5). PCC is defined as: where, and denote mean-removed vectors. RMSE is given by Eq (7). Higher SRCC, KRCC, and PCC values are considered better, in contrary to the values of RMSE. One of these performance indices could be used as an objective function in a considered optimisation problem. Preliminary experiments revealed that maximisation of SRCC or KRCC may lead to fusion providing unacceptably high RMSE values. On the other hand, RMSE requires determination of . Finally, RMSE was used as the objective function in the considered problem (Eq (8)), and components were considered as decision variables in addition to the weights of fused IQA measures. Linear combination may produce negative weights which can be unintuitive in terms of contribution of IQA measures that take part in the aggregation. Therefore, different combination types were considered starting from convex combination, in which weights are positive and their sum is equal one, affine combination with preserved sum condition, or conical combination with positive weights. Preliminary results confirmed that the proposed approach provides best performance without constraining the weights. In this paper, an optimisation-based fusion was performed using N = 16 IQA measures with publicly available source code. The following techniques were used: VSI [11], FSIM [9], FSIMc [9], GSM [12], IFC [7], IW-SSIM [5], MAD [17], MSSIM [4], NQM [1], PSNR [29], RFSIM [8], SR-SIM [10], SSIM [3], VIF [6], IFS [31], and SFF [32]. In the proposed approach, the vector of decision variables, x, is obtained in a data-driven fashion. Since there are four largest widely used IQA image benchmarks, in this paper four IQA fusion measures are introduced. For this purpose, 20% of the reference images from the given dataset along with their distorted counterparts were used for training. In the literature, sometimes more images were utilised in order to tune parameters of developed methods, e.g., 30% [9, 11], 80% [13], or parameters were generated for each image dataset separately [13, 21–23]. Some approaches used images from all datasets for this purpose [15]. In order to show dataset-independent results, each fusion measure developed in this paper was evaluated on all datasets. Finally, the vector , where d denotes a dataset, was obtained in the following steps: (1) Selection of the 20% reference images from a given dataset and their distorted equivalents; (2) Evaluation of images using N = 16 full-reference IQA measures; (3) Selection of n ∈ N IQA measures, finding weights of linear combination of their opinion scores and . Objective scores of used measures, if needed, were scaled to be in a 0-1 range. The optimisation problem was solved using a genetic algorithm (GA) [28, 33], since the number of possible solutions grows exponentially with the number of used IQA metrics. The GA uses a population of individuals, where each individual represents a single solution. Then, from generation to generation, after applying selection, crossover and mutation operators, better solutions are emerging. The GA was run for 200 generations, with a population of 100 individuals, elite count equal to 0.05 of the population size, and 0.8 crossover fraction. Scattered crossover, Gaussian mutation and stochastic uniform selection rules were used [33]. All presented calculations were performed using Matlab software (version 7.14) with GA Toolbox [34]. After 100 runs, the best solution, , was selected. The individual in the proposed solution is represented by real-valued vector, where dimensions refer to weights of IQA measures, x, and values. Parameters of the GA were determined experimentally observing convergence of the objective function over the generations. Fig 1 presents a flowchart of the approach with a process in which the introduced fusion measure is obtained and its usage for image quality assessment.
Fig 1

Flowchart of the proposed approach.

In an offline training process, the proposed approach is obtained using some of images from a benchmark dataset. Images are assessed by full-reference IQA measures. Then, a genetic algorithm selects IQA measures and assigns weights to them. Obtained weights for linear combination of selected measures are used in image quality assessment tasks.

Flowchart of the proposed approach.

In an offline training process, the proposed approach is obtained using some of images from a benchmark dataset. Images are assessed by full-reference IQA measures. Then, a genetic algorithm selects IQA measures and assigns weights to them. Obtained weights for linear combination of selected measures are used in image quality assessment tasks. In experiments, the following four image benchmarks were used: TID2013 [35], TID2008 [36], CSIQ [17], and LIVE [3]. The number of reference images, distortions, and subjects for each dataset are shown in Table 1. Each database contains reference images, their corresponding distorted images and subjective scores.
Table 1

IQA benchmark image datasets.

DatasetNo. of reference imagesNo. of distorted imagesNo. of distortionsNo. of observers
TID2013 [35]25300024971
TID2008 [36]25170017838
CSIQ [17]30866635
LIVE [3]297795161
Finally, four IQA measures, namely Linearly Combined Similarity Measures (LCSIMs), were obtained: Their corresponding components are as follows:

Results and Discussion

This section presents experimental evaluation of the proposed approach in comparison with state-of-the-art techniques, as well as discussion on influence of the aggregated IQA measures and on resulting fusion models.

Comparative evaluation

For evaluation, four largest image benchmarks (TID2013, TID2008, CSIQ, and LIVE) and four performance indices (SRCC, PCC, KROCC, RMSE) were used. Table 2 presents evaluation results for the best ten models and LCSIMs. The top two models for each criterion are shown in boldface. The table also contains direct and weighted averages of obtained values. For the weighted average, the number of images in the database is used as its weight. Overall results for RMSE do not take into account LIVE dataset due to range difference.
Table 2

Performance comparison of resulted fusion measures with IQA models that were used in optimisation.

VSIFSIMFSIMcGSMMADMSSIMSR-SIMVIFIFSSFFLCSIM1LCSIM2LCSIM3LCSIM4
SRCC0.89650.80150.85100.79460.78070.78590.79990.67690.86970.85130.90440.81390.83070.8086
KRCC0.71830.62890.66650.62550.60350.60470.63140.51470.67850.65810.73260.63870.65340.6292
PCC0.90000.85890.87690.84640.82670.83290.85900.77350.87910.87060.91400.79930.86590.8651
RMSE0.54040.63490.59590.66030.69750.68610.63470.78560.59090.60990.50300.74490.62010.6218
TID2008
SRCC0.89790.88050.88400.85040.83400.85420.89130.74910.89030.87670.90570.91780.91070.8892
KRCC0.71230.69460.69910.65960.64450.65680.71490.58600.70090.68820.72710.74950.73910.7053
PCC0.87620.87380.87620.84220.83060.84510.88660.80840.88100.88170.89560.92020.91130.8965
RMSE0.64660.65250.64680.72350.74730.71730.62060.78990.63490.63330.59700.52530.55250.5945
CSIQ
SRCC0.94230.92420.93100.91080.94660.91330.93190.91950.95820.96270.94940.96580.97330.9624
KRCC0.78570.75670.76900.73740.79700.73930.77250.75370.81650.82880.79940.83960.85790.8323
PCC0.92790.91200.91920.89640.95000.89910.92500.92770.95760.96430.89680.96650.97800.9704
RMSE0.09790.10770.10340.11640.08200.11490.09970.09800.07570.06950.11620.06740.05470.0634
LIVE
SRCC0.95240.96340.96450.95610.96690.95130.96180.96360.95990.96490.96270.97100.97220.9749
KRCC0.80580.83370.83630.81500.84210.80450.82990.82820.82540.83650.82810.84840.85260.8600
PCC0.94820.95970.96130.95120.96750.94890.95530.94110.95860.96320.84630.95800.96620.9757
RMSE8.68167.67817.52978.43276.90738.61888.08139.24027.77657.346114.55537.83517.04575.9821
Overall direct
SRCC0.92230.89240.90760.87800.88200.87620.89630.82730.91950.91390.93060.91710.92170.9088
KRCC0.75550.72850.74270.70940.72180.70130.73720.67070.75530.75290.77180.76910.77580.7567
PCC0.91310.90110.90840.88400.89370.88150.90650.86270.91910.91990.88820.91100.93040.9269
RMSE0.42830.46500.44870.50000.50890.50610.45170.55780.43380.43760.40540.44590.40910.4266
Overall weighted
SRCC0.91030.85980.88510.84580.84120.84240.86280.76570.89880.88770.91830.88230.88950.8722
KRCC0.73700.68980.71070.67380.67110.66220.69800.60610.72200.71210.75240.72230.72950.7065
PCC0.90360.88290.89310.86530.86250.85990.88750.82520.90040.89810.89820.87450.90610.9019
RMSE0.50250.55660.53330.59320.61500.60500.54560.67790.52260.53130.47030.57060.50980.5249

The best two IQA models for each criterion are shown in boldface. Overall results for RMSE do not take into account LIVE dataset due to range difference.

The best two IQA models for each criterion are shown in boldface. Overall results for RMSE do not take into account LIVE dataset due to range difference. The obtained results show that LCSIM3 clearly outperformed other measures, since it yielded the best results on LIVE and CSIQ. It was also the second best measure on TID2008 dataset, after LCSIM2. LSIM1 outperformed other measures on TID2013. Overall results are biased towards techniques that performed well on TID2013, which is the largest benchmark, i.e., LCSIM1, VSI, and IFS. Among results obtained by measures that took part in the LCSIM1 fusion, VSI and MAD are worth noticing. Such good performance of LCSIM family should be confirmed using statistical significance tests. In order to evaluate statistical significance of obtained IQA models, hypothesis tests based on the prediction residuals of each measure after non-linear mapping were conducted using left-tailed F-test [17]. In the test, smaller residual variance denoted the better prediction. Table 3 presents results of these tests, where a symbol “1”, “0” or “-1” denotes that the IQA fusion measure in the row is statistically better with a confidence greater than 95%, indistinguishable, or worse than the IQA measure in the column.
Table 3

Statistical significance tests.

VSIFSIMFSIMcGSMMADMSSIMSR-SIMVIFIFSSFFLCSIM1LCSIM2LCSIM3LCSIM4
TID2013
11111111110111LCSIM1
-1-1-1-1-1-1-11-1-1-10-1-1LCSIM2
-1-1-1111110-1-110-1LCSIM3
-10-111101-10-1110LCSIM4
TID2008
11111111110-1-1-1LCSIM1
11111111111011LCSIM2
11111111111-101LCSIM3
11111101111-1-10LCSIM4
CSIQ
1111-1111-1-10-1-1-1LCSIM1
111111111010-1-1LCSIM2
11111111111101LCSIM3
111111111111-10LCSIM4
LIVE
-1-1-1-1-1-1-11-1-10-1-1-1LCSIM1
1001-11-110-110-1-1LCSIM2
1111011111110-1LCSIM3
11111111111110LCSIM4

The fusion measure in the row is significantly better than the IQA measure in the column (’1’), worse (’-1’), or indistinguishable (’0’).

The fusion measure in the row is significantly better than the IQA measure in the column (’1’), worse (’-1’), or indistinguishable (’0’). Significance tests confirm good performance of the developed family of multimeasures. LCSIM3 was significantly better than other measures on TID2013, LIVE and CSIQ databases. Its results on TID2013 were also good. However, since it was developed using information carried by scores being a reflectance of the dataset which do not contain many of distortions that are present in CSIQ benchmark, its opinion scores were less correlated in this case than scores of VSI, FCSIM, or IFS. Consequently, LCSIM that was obtained on TID2013 (LCSIM1) performed worse than other measures on LIVE benchmark. Fig 2 presents the scatter plots for LCSIM3 and the two best performing IQA models for each benchmark. It can be seen that compared models for databases other than TID2013 yielded less accurate quality predictions for large DMOS values and small MOS values (i.e., in presence of severe distortions) than LCSIM3. Fig 3, in turn, contains absolute values of the difference between subjective scores and objective scores for the five best IQA measures after nonlinear fitting (Eq (3)). Here, the values were obtained for 50 images from the most popular LIVE dataset. The figure shows how scores obtained by IQA measures differ from the expected scores; smaller values are considered better. It can be seen that the introduced fusion measure, LCSIM3, returned scores which are visibly closer to subjective scores obtained in tests with human subjects. This is also confirmed by RMSE values reported for this dataset.
Fig 2

Scatter plots of subjective opinion scores against scores obtained by the two best IQA measures and LCSIM3 on used datasets.

Different types of distortions are represented by different colours; the set of colours is coherent within a dataset. Curves fitted with logistic functions are also shown.

Fig 3

Absolute values of the difference between objective scores and nonlinearly fitted subjective scores for 50 exemplary images from LIVE dataset.

For each image, a smaller value denotes objective assessment which is closer to human evaluation.

Scatter plots of subjective opinion scores against scores obtained by the two best IQA measures and LCSIM3 on used datasets.

Different types of distortions are represented by different colours; the set of colours is coherent within a dataset. Curves fitted with logistic functions are also shown.

Absolute values of the difference between objective scores and nonlinearly fitted subjective scores for 50 exemplary images from LIVE dataset.

For each image, a smaller value denotes objective assessment which is closer to human evaluation. The proposed family of multimeasures aggregates different IQA measures. Therefore, it is worth examining their time- and memory-consumption. The processing time and memory requirements have been determined for all aggregated IQA measures assessing an exemplary image from TID2013 dataset. The results are shown in Table 4. It can be seen that MAD and VIF are the most demanding techniques. Taking into account that processing time requirements for image quality assessment algorithms are less demanding than for video quality assessment techniques, obtained timings on ordinary 2200MHz CPU seem to be acceptable. LCSIMs aggregate several IQA measures; therefore, their running time will be longer in case of sequential execution of used measures or close to the execution time of MAD measure in case of more memory-consuming parallel implementation.
Table 4

Time and memory costs of IQA measures used in the optimisation.

IQA measureTime [s]Memory [MB]
VSI0.220798.01
FSIM0.283573.62
FSIMc0.283573.62
GSM0.179643.20
IFC0.8102356.18
IW-SSIM0.7160227.27
MAD1.1032725.74
MSSIM0.140233.80
NQM0.3159135.55
PSNR0.03431.50
RFSIM0.08013.01
SR-SIM0.03366.95
SSIM0.095817.43
VIF0.9461343.89
IFS0.121321.73
SFF0.097117.63
It would be desirable to compare the proposed multimeasures with other related fusion IQA measures. Table 5 contains such comparative evaluation based on SRCC values. SRCC was used as a basis for comparison since many papers do not report other performance indices. Two best results for a given benchmark dataset are written in boldface, some results were not reported in referred works; therefore, they are denoted by “-”. IQA measures which were developed using images from the benchmark in the column are excluded from the comparison. Moreover, overall results were calculated excluding TID2013 since some measures have not been evaluated on it. Furthermore, in order to provide fair comparison, overall results exclude works in which authors obtained a separate IQA measure for each benchmark without providing cross-database evaluation, e.g., [18, 19, 21–23], or [37]. Results for approaches that are not dataset independent are written in italics.
Table 5

Comparison of the approach with other fusion IQA measures based on SRCC values.

IQA measureTID2013TID2008CSIQLIVEOverall directOverall weighted
MAD [17]0.78070.83400.94660.96690.91580.8944
CQM [18]-0.8720----
Lahouhou et al. [21]---0.9500--
ADM [38]-0.86170.93330.94600.91370.9001
BMMF [39]0.83400.9471----
BME [22]-0.88820.95730.9711--
RMSSIM [23]-0.85690.94530.9633-
IGM [40]-0.89020.94010.95800.92940.9190
EHIS [19]-0.90980.94980.9622--
MMF [37]-0.94870.97550.9732--
GLD-PFT [24]-0.88490.95490.96310.93430.9186
Barri et al. [41]-0.81000.96300.95700.91000.8843
DOG-SSIM [42]0.89420.92590.92040.94230.92950.9282
ESIM [20]0.88040.90260.96200.94200.94200.9300
LCSIM10.90440.90570.94940.96270.93930.9280
LCSIM20.81390.91780.96580.97100.95150.9408
LCSIM30.83070.91070.97330.97220.95210.9395
LCSIM40.80860.88920.96240.97490.94220.9251

The result for the measure that was trained using images from the dataset indicated in the column is italicised in order to show the lack of the dataset independence. Overall results were calculated on the basis of three most popular datasets (i.e., TID2008, CSIQ and LIVE) taking into account IQA measures that provided independent results for at least two datasets. The two best measures for each dataset are shown in boldface.

The result for the measure that was trained using images from the dataset indicated in the column is italicised in order to show the lack of the dataset independence. Overall results were calculated on the basis of three most popular datasets (i.e., TID2008, CSIQ and LIVE) taking into account IQA measures that provided independent results for at least two datasets. The two best measures for each dataset are shown in boldface. Evaluation results show that LCSIM3 and LCSIM2 outperformed other approaches which use fusion of IQA measures. Among other measures, DOG-SSIM and ESIM provided good results on TID2013 benchmark, and the approach developed by Barri et al. turned out to be the second best technique on CSIQ dataset. Outstanding performances of LCSIM3 and LCSIM2 are also confirmed by overall results. Here, they are followed by ESIM, LCSIM4, LCSIM1, and DOG-SIM. Most of these models were trained on TID2008, except LCSIM3 that was trained on images from CSIQ. This happened since all three most popular datasets share the same types of distortions.

Influence of parameters and IQA measures on fusion

The already presented results confirm good performance of obtained IQA fusion measures in comparison with state-of-the-art fusion and single IQA measures. However, it would be desirable to answer why some measures took part in the fusion more often than others. A contribution of aggregated models also requires some attention, since the linear combination can produce unintuitive negative weights. At first, in order to show the contribution of a given measure, SRCC values between objective and subjective scores were obtained for each distortion type. This may explain why some measures were involved in a fusion, and also show how well perform developed LCSIMs in comparison with IQA measures that were used in optimisation, from distortion type point of view. Table 6 contains SRCC values of the best ten IQA models and LCSIMs obtained on benchmark datasets. The two best IQA measures for each distortion type are written in boldface.
Table 6

SROCC values of IQA measures for each distortion type.

Dist. TypeVSIFSIMFSIMcGSMMADMSSIMSR-SIMVIFIFSSFFLCSIM1LCSIM2LCSIM3LCSIM4
TID2013
AWGN0.94600.89730.91010.90640.88430.86460.92120.89940.93820.90660.93890.92480.92400.9141
AWGNc0.87050.82080.85370.81750.80190.77300.84960.82990.85370.81660.85840.84820.84810.8464
SCN0.93670.87500.89000.91580.89110.85440.91500.88350.93400.89820.93350.92530.92230.9090
MN0.76970.79440.80940.72930.73800.80730.76450.84500.79600.81850.77320.81750.80210.7901
HFN0.92000.89840.90400.88690.88760.86040.91020.89720.91400.89770.90850.90810.90980.9002
IN0.87410.80720.82510.79650.27690.76290.82490.85370.83890.78710.79890.79050.72180.5710
QN0.87480.87190.88070.88410.85140.87060.84470.78540.83350.86070.87870.85230.87210.8574
GB0.96120.95510.95510.96890.93190.96730.96120.96500.96580.96750.95880.96230.96420.9503
DEN0.94840.93020.93300.94320.92520.92680.93710.89110.91830.90910.94230.91940.93070.9144
JPEG0.95410.93240.93390.92840.92170.92650.93980.91920.92900.92730.93460.92340.92560.9274
JP2K0.97060.95770.95890.96020.95110.95040.96550.95160.96110.95710.96270.96050.96140.9574
JGTE0.92160.84640.86100.85120.82830.84750.85270.84090.89250.88310.89170.86260.86070.8540
J2TE0.92280.89130.89190.91820.87880.88890.90470.87610.90100.87080.91840.91940.92040.9008
NEN0.80600.79170.79370.81300.83150.79680.76170.77200.78390.76680.81000.81290.81660.8286
BWD0.17130.54890.55320.64180.28120.48010.45710.53060.10040.17860.35220.32950.18540.4517
IS0.77000.75310.74870.78750.64500.79060.64020.62760.65750.66540.67780.45370.61500.5572
CC0.47540.46860.46790.48570.19720.46340.46440.83860.44690.46910.44450.62860.42530.5324
CCS0.81000.27480.83590.35780.05750.40990.18750.30990.82570.82690.83240.80940.81710.7155
MGN0.91170.84690.85690.83480.84090.77860.87190.84680.87900.84340.89520.88300.88120.8746
CN0.92430.91210.91350.91240.90640.85280.91990.89460.90370.90070.91800.91160.91270.9074
LCNI0.95640.94660.94850.95630.94430.90680.95910.92040.94330.92620.96000.95030.94980.9436
ICQD0.88390.87600.88150.89730.87450.85550.87270.84140.90070.87950.90190.88400.89830.8825
CHA0.89060.87150.89250.88230.83100.87840.87460.88480.88620.87890.88050.87650.87360.8611
SSR0.96280.95650.95760.96680.95670.94830.96130.93530.95560.95220.96560.95850.96190.9545
TID2008
AWGN0.92290.85660.87580.86060.83860.80860.89900.87970.91720.87310.90870.91470.89980.8803
AWGNc0.91180.85270.89310.80910.82550.80540.89530.87570.89580.86260.89280.89930.88970.8852
SCN0.92960.84830.87110.89410.86780.82090.90830.86980.93070.89390.92240.93300.91870.8967
MN0.77340.80210.82640.74520.73360.81070.78700.86830.80210.83650.72880.85300.81810.7822
HFN0.92530.90930.91560.89450.88640.86940.91970.90750.92150.91190.91080.91900.92050.9023
IN0.82980.74520.77190.72350.06500.69070.76650.83270.81430.74840.73250.75240.61780.3996
QN0.87310.85640.87260.88000.81600.85890.83640.79700.79730.84480.86080.85410.85360.8297
GB0.95290.94720.94720.96000.91960.95630.95490.95400.96020.96240.95190.95740.95680.9405
DEN0.96930.96030.96180.97250.94330.95820.96680.91610.94910.93830.96330.95940.95690.9424
JPEG0.96160.92790.92940.93930.92750.93220.93940.91680.92790.93230.93400.94440.93440.9257
JP2K0.98480.97730.97800.97580.97070.97000.98070.97090.97780.97720.98210.98250.97960.9768
JGTE0.91600.87080.87560.87900.86610.86810.88810.85850.87350.85670.91230.89840.89870.8938
J2TE0.89420.85440.85550.89360.83940.86060.89030.85010.87990.83860.89250.91370.90050.8816
NEN0.76990.74910.75140.73860.82870.73770.76700.76190.70350.69700.77700.80310.79850.8238
BWD0.62950.84920.84640.88620.79700.75460.77870.83240.08710.53690.63290.73760.66650.8259
IS0.67140.67200.65540.71900.51630.73360.57280.50960.52150.52250.38080.55870.49280.4602
CC0.65570.64810.65100.66910.27230.63810.64830.81880.62730.64610.60960.72580.57640.6323
CSIQ
AWGN0.96360.92620.93590.94400.95410.94710.96280.95750.95930.94670.96180.97170.97180.9625
JPEG0.96180.96540.96640.96320.96150.96340.96710.97050.96600.96410.92060.97110.97080.9570
JP2K0.96940.96850.97040.96480.97520.96830.97730.96720.97120.97630.97590.97810.97910.9731
AGPN0.96380.92340.93700.93870.95700.93310.95200.95110.95260.95500.96680.96500.96570.9650
GB0.96790.97290.97290.95890.96820.97110.97670.97450.96210.97510.96400.97900.97740.9735
GCD0.95040.94200.94380.93540.92070.95260.95280.93450.94850.95360.95100.95900.94430.9449
LIVE
JP2K0.96040.97170.97240.97000.96760.96270.97010.96960.96940.96720.96950.96900.96820.9721
JPEG0.97610.98340.98400.97780.97640.98150.98230.98460.97780.97860.97610.98230.97920.9818
AWGN0.98350.96520.97160.97740.98440.97330.98100.98580.98830.98590.98550.98780.98840.9873
GB0.95270.97080.97080.95180.94650.95420.96600.97280.96650.97520.96640.96410.96400.9589
FF0.94300.94990.95190.94020.95690.94710.94650.96500.94040.95290.94990.96040.96160.9694
Results for distortion types reveal that VSI, FSIMc, GSM, VIF, IFS, and SFF are among best single IQA models. They also were often a part of fusion models, what can be seen in Eqs (9)–(12). Here, LCSIM family was better or close to best IQA models and showed outstanding performance on CSIQ dataset. In order to provide further investigation why some measures were fused together, SRCC values between IQA models on CSIQ dataset were obtained. They are shown in Table 7. This time correlation sign was preserved, since it may suggest why some measures have negative weights in fusions. Negative correlations can also be seen on Fig 2. Similar pairwise relations between IQA models were noticed on other datasets. It can be seen that some measures are less correlated with each other while preserving good correlation with subjective scores. VIF is the less correlated measure with MSSIM and MAD, all these measures perform well on CSIQ dataset. SRCC values for these measures are written in boldface in the Table 7. IQA measures in pairs MAD—VIF and MSSIM—VIF are complementary and thus likely to be fused together.
Table 7

SRCC between objective scores of IQA measures on CSIQ.

VSIFSIMFSIMcGSMMADMSSIMSR-SIMVIFIFSSFF
VSI0.97220.98080.9719-0.95630.97430.97060.88130.96840.9603
FSIM0.97220.99850.9884-0.96130.98520.98650.88790.94620.9436
FSIMc0.98080.99850.9880-0.96620.98770.98500.88680.95520.9515
GSM0.97190.98840.9880-0.95480.98570.98310.85360.93360.9271
MAD-0.9563-0.9613-0.9662-0.9548-0.9605-0.9552-0.8689-0.9422-0.9451
MSSIM0.97430.98520.98770.9857-0.96050.97190.85380.93580.9292
SR-SIM0.97060.98650.98500.9831-0.95520.97190.90100.94140.9422
SSIM0.93820.96020.96030.9655-0.93180.96750.94280.82780.89860.8863
VIF0.88130.88790.88680.8536-0.86890.85380.90100.92330.9276
IFS0.96840.94620.95520.9336-0.94220.93580.94140.92330.9916
SFF0.96030.94360.95150.9271-0.94510.92920.94220.92760.9916
Mean0.76620.77070.77270.7642-0.94420.76310.76690.70740.75520.7514
SRCC on CSIQ-0.9423-0.9242-0.9310-0.91080.9466-0.9133-0.9319-0.9195-0.9582-0.9627
These findings were confirmed in an experiment in which a predefined number of IQA measures, k ∈ N, could take part in the fusion. Such reduced fusion models are helpful to determine the contribution of each fused measure. In the experiment, k varied from 2 to 5. In order to estimate the influence of the IQA measure on the results obtained by the fusion model, the percentage decrease of RMSE without the measure was calculated. Table 8 contains such reduced LCSIMs for CSIQ dataset, their RMSE values, and contributions. The table also contains LCSIM3, since it was developed on images from CSIQ.
Table 8

Results of the experiment with predefined number of aggregated IQA models on CSIQ dataset.

No. of IQA models, kEquationRMSEContribution [%]
2-6.0533MAD + 3.8434VIF0.061939.13, 24.51
3-1.8294MAD + 1.1814VIF + 8.7912SFF0.059227.63, 94.36, 5.58
4-11.8203GSM—3.3564MAD + 2.0515VIF + 3.5018IFS0.05872.65, 32.06, 19.81, 8.71
5-3.6729FSIMc—4.5873MAD + 2.5546VIF + 4.8361IFS + 12.0375SFF0.05743.20, 26.22, 24.17, 5.28, 2.55
-LCSIM3, Eq (11)0.05470.18, 2.32, 1.80, 0.01, 30.85, 79.75, 4.04, 6.17, 13.59, 3.36, 1.08

Contribution is calculated as percentage decrease of RMSE without a given IQA measure. Contributions of aggregated measures are separated with comma; values for MAD, VIF, and MSSIM are written in boldface.

Contribution is calculated as percentage decrease of RMSE without a given IQA measure. Contributions of aggregated measures are separated with comma; values for MAD, VIF, and MSSIM are written in boldface. Results shown in Table 8 confirm that the IQA measures that achieve good performance on CSIQ dataset and are less correlated with each other, are likely to be aggregated. In obtained fusion measures, weights do not reflect well the contribution of selected IQA measures, what can be seen in case of three (k = 3) fused models, where VIF and MAD with lower weights contributed more than SFF. The sign of the weight depends on correlation of the measure with objective scores (MOS or DMOS) but it can also be used as compensation, making the resulting vector of objective opinion scores closer to the vector of subjective scores, since the optimisation utilise RMSE between them for finding better aggregated models. It is worth noticing that RMSE results obtained by all measures developed in experiments with the predefined number of IQA measures are better than results of state-of-the-art approaches on this dataset (see Table 8). MAD, VIF, and MSSIM contributed the most to LCSIM measures obtained on CSIQ dataset. This can also be observed for the remaining LCSIM measures, where the best contributing three IQA single models are as follows: MAD (19.76%), IFS (16.90%), and PSNR (16.71%) to LCSIM1, VIF (15.87%), MAD (8.31%), and SSIM (4.49%) to LCSIM2, VIF (38.54%), MAD (33.87%), and GSM (4.22%) to LCSIM4. The used in calculation of RMSE (and PCC) also influenced the results. In order to show its influence, each β component, = [β1, β2, …, β5], determined in optimisation for a given LCSIM was changed in the range 0.1 to 20 with the step 0.1, while other components remained unchanged. Table 9 presents minimum, maximum, mean and standard deviation of RMSE values for each component calculated on benchmark datasets. It can be seen that β4 has the largest influence on LCSIM1, β2 on LCSIM2, β3 on LCSIM3, and all components are similarly important to LCSIM4.
Table 9

Influence of the β on obtained IQA fusion measures.

βMinMaxMeanSD
LCSIM1 on TID 2013
β10.50300.68490.54740.0451
β20.50300.67680.58840.0087
β30.50300.58810.58560.0145
β40.50301.35540.59410.0810
β50.50300.59980.55700.0412
LCSIM2 on TID 2008
β10.52530.52530.52530.0000
β20.525310.11830.58150.6780
β30.52530.52530.52530.0000
β40.52530.52530.52530.0000
β50.52530.52530.52530.0000
LCSIM3 on CSIQ
β10.05470.05710.05540.0008
β20.05470.05710.05560.0008
β30.05470.20530.06250.0287
β40.05470.05710.05570.0009
β50.05470.05710.05520.0005
LCSIM4 on LIVE
β15.98206.47765.99760.0387
β25.98206.03156.00240.0187
β35.98206.91506.02530.0836
β45.98206.03206.01360.0185
β55.98206.00165.98370.0054

Conclusions

In this paper, a multimeasure resulted from a fusion of full-reference IQA measures is presented. The fusion was formulated as an optimisation problem that was solved using the genetic algorithm, which was also responsible for selection of appropriate IQA measures. Evaluation of the proposed approach on widely used four largest image benchmarks reveals that LCSIM family of measures performs better than compared state-of-the-art IQA models, in terms of prediction quality reflected by SRCC, KRCC, PCC, and RMSE. The contribution of aggregated IQA measures was also investigated in the paper. Further extension of the approach could involve using other IQA measures for fusion; therefore, Matlab source code that would allow running the optimisation with any newly developed measure with known objective scores for used image benchmarks and evaluate the results, is available to download at http://marosz.kia.prz.edu.pl/LCSIM.html. Another direction of future research would be to develop a fusion measure oriented on a given type of distortion or a measure which aggregates full-reference IQA measures with small memory footprint and short computation time.
  15 in total

1.  Image quality assessment: from error visibility to structural similarity.

Authors:  Zhou Wang; Alan Conrad Bovik; Hamid Rahim Sheikh; Eero P Simoncelli
Journal:  IEEE Trans Image Process       Date:  2004-04       Impact factor: 10.856

2.  Image quality assessment based on gradient similarity.

Authors:  Anmin Liu; Weisi Lin; Manish Narwaria
Journal:  IEEE Trans Image Process       Date:  2011-11-15       Impact factor: 10.856

3.  Information content weighting for perceptual image quality assessment.

Authors:  Zhou Wang; Qiang Li
Journal:  IEEE Trans Image Process       Date:  2010-11-15       Impact factor: 10.856

4.  An information fidelity criterion for image quality assessment using natural scene statistics.

Authors:  Hamid Rahim Sheikh; Alan Conrad Bovik; Gustavo de Veciana
Journal:  IEEE Trans Image Process       Date:  2005-12       Impact factor: 10.856

5.  A statistical evaluation of recent full reference image quality assessment algorithms.

Authors:  Hamid Rahim Sheikh; Muhammad Farooq Sabir; Alan Conrad Bovik
Journal:  IEEE Trans Image Process       Date:  2006-11       Impact factor: 10.856

6.  Image information and visual quality.

Authors:  Hamid Rahim Sheikh; Alan C Bovik
Journal:  IEEE Trans Image Process       Date:  2006-02       Impact factor: 10.856

7.  Image quality assessment based on a degradation model.

Authors:  N Damera-Venkata; T D Kite; W S Geisler; B L Evans; A C Bovik
Journal:  IEEE Trans Image Process       Date:  2000       Impact factor: 10.856

8.  Image Quality Assessment Using Human Visual DOG Model Fused With Random Forest.

Authors:  Soo-Chang Pei; Li-Heng Chen
Journal:  IEEE Trans Image Process       Date:  2015-06-01       Impact factor: 10.856

9.  Sparse feature fidelity for perceptual image quality assessment.

Authors:  Hua-Wen Chang; Hua Yang; Yong Gan; Ming-Hui Wang
Journal:  IEEE Trans Image Process       Date:  2013-06-06       Impact factor: 10.856

10.  Image quality assessment based on inter-patch and intra-patch similarity.

Authors:  Fei Zhou; Zongqing Lu; Can Wang; Wen Sun; Shu-Tao Xia; Qingmin Liao
Journal:  PLoS One       Date:  2015-03-20       Impact factor: 3.240

View more
  2 in total

1.  A shallow convolutional neural network for blind image sharpness assessment.

Authors:  Shaode Yu; Shibin Wu; Lei Wang; Fan Jiang; Yaoqin Xie; Leida Li
Journal:  PLoS One       Date:  2017-05-01       Impact factor: 3.240

2.  Full-Reference Image Quality Assessment Based on an Optimal Linear Combination of Quality Measures Selected by Simulated Annealing.

Authors:  Domonkos Varga
Journal:  J Imaging       Date:  2022-08-21
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.