Literature DB >> 32948056

Variational Bayesian Pansharpening with Super-Gaussian Sparse Image Priors.

Fernando Pérez-Bueno1, Miguel Vega2, Javier Mateos1, Rafael Molina1, Aggelos K Katsaggelos3.   

Abstract

Pansharpening is a technique that fuses a low spatial resolution multispectral image and a high spatial resolution panchromatic one to obtain a multispectral image with the spatial resolution of the latter while preserving the spectral information of the multispectral image. In this paper we propose a variational Bayesian methodology for pansharpening. The proposed methodology uses the sensor characteristics to model the observation process and Super-Gaussian sparse image priors on the expected characteristics of the pansharpened image. The pansharpened image, as well as all model and variational parameters, are estimated within the proposed methodology. Using real and synthetic data, the quality of the pansharpened images is assessed both visually and quantitatively and compared with other pansharpening methods. Theoretical and experimental results demonstrate the effectiveness, efficiency, and flexibility of the proposed formulation.

Entities:  

Keywords:  image fusion; pansharpening; super-Gaussians; variational Bayesian

Year:  2020        PMID: 32948056      PMCID: PMC7570633          DOI: 10.3390/s20185308

Source DB:  PubMed          Journal:  Sensors (Basel)        ISSN: 1424-8220            Impact factor:   3.576


1. Introduction

Remote sensing sensors simultaneously capture a Multispectral (MS) low resolution image along with a single-band high resolution image of the same area, referred to as Panchromatic (PAN) image. However, MS high-resolution images are needed by many applications, such as land use and land cover analyses or change detection. Pansharpening is a technique that fuses the MS and PAN images into an MS high resolution image that has the spatial resolution of the PAN image and the spectral resolution of the MS one. In this paper we formulate the pansharpening problem following the Bayesian framework. Within this framework, we use the sensor characteristics to model the observation process as a conditional probability distribution. The observation process describes both the MS high resolution image to MS low resolution image relationship and how the PAN image is obtained from the MS high resolution one. This probability distributions provides fidelity to the observed data in the pansharpened image reconstruction process. together with from fidelity to the data, Bayesian methods incorporate prior knowledge on the MS high resolution image in the form of prior probability distributions. Crisp images, such as high resolution MS images, are expected to have Super-Gaussian (SG) statistics, while upsampled images suffer from blur that smooths out sharp gradients, making them more Gaussian in their statistics [1]. Our goal is to integrate the sharp edges of the PAN image into the pansharpened image, leading to less Gaussian statistics which makes SG priors a suitable choice. SG priors have been successfully applied to other image processing tasks, such as compressed sensing [2], blind deconvolution [1,3] and blind color deconvolution [4] and so it is also expected to produce good results in pansharpening. However, the form of the SG prior does not allow us to obtain the posterior distribution in an analytical way, making full Bayesian inference intractable. Hence, in this paper, we use the variational Bayesian inference to estimate the distribution of the pansharpened image as well as the model parameters from the MS low resolution and PAN images. The rest of the paper is organized as follows: a categorization and short review of related pansharpening methods is presented in Section 2. In Section 3 the pansharpening problem is mathematically formulated. Following the Bayesian modelling and inference, in Section 4 we propose a fully Bayesian method for the estimation of all the problem unknowns and model parameters. In Section 5, the quality of the pansharpened images is assessed both visually and quantitatively and compared with other classic and state-of-the-art pansharpening methods. In Section 6 we discuss the obtained results and finally, Section 7 concludes the paper.

2. Related Work

Early pansharpening techniques, such as in the Brovey method [5], substituted some bands for image visualization or performed simple arithmetic transformations. Other classical methods included the transformation of the MS image and the substitution of one of its components by the high spatial resolution PAN image. Examples of this strategy are Principal Components Analysis PCA substitution [6], Brovey Transform [7] and Intensity-Hue-Saturation (IHS) [8] methods. A review of those early methods, among others, can be found in [9]. Over the past 20 years, numerous methods have been presented and, in an attempt to bring some order to the diversity of approaches, different reviews, comparisons and classifications have been proposed in the literature (see, for instance, [10,11,12,13,14,15,16,17]) each one with different criteria and, therefore, with a different categorization. Nevertheless, in the last years, there seems to be a consensus in three main categories, namely Component Substitution (CS), Multi-Resolution Analysis (MRA) and Variational Optimization (VO) [15,16,17]. Additionally, the increasing number of Deep Learning (DL)-based pansharpening methods proposed in recent years can be regarded as a new category. The Component Substitution (CS) category includes the most widely used pansharpening methods. CS methods [12] usually upsample the MS image to the size of the PAN image and transform it to another space that separates the spatial and spectral image components. Then, the transformed component containing the spatial information is substituted by the PAN image (possibly, after histogram matching). Finally, the backward transform is applied to obtain the pansharpened image. Examples of these methods include the already mentioned PCA substitution [6], IHS methods [8,18,19], the Gram–Schmidt (GS) methods [20] and Brovey transform [7]. In [21], the transformation is replaced by any weighted average of the MS bands. It is shown that this approach generalizes any CS image fusion method. Determination of the weights has been carried out in different ways. For instance, in [22] the weights are optimally estimated to minimize the mean squared error while in [23] they are set to the correlation coefficient between a single band low resolution image (obtained from the MS image) and each MS band. A local criterion, based on the belonging of a given pixel to a fuzzy cluster, was applied in [24] to estimate weights that are different for each pixel of the image. To obtain a crisper MS high-resolution image, in [25] a Wiener deconvolution of the upsampled MS bands was performed before fusion. In general, CS-based methods produce spectral distortions due to the different statistics of the PAN image and the transformed component containing the spatial details. To tackle this issue, Multi-Resolution Analysis (MRA) methods decompose the MS and PAN images to different levels, extract spatial details from the decomposed PAN image, and inject them into the finer scales of the MS image. This principle is also known as the ARSIS concept [10]. The High-Pass Filtering (HPF) algorithm in [11,18], can be considered to be the first approach in this category where only two levels are considered. Multi-scale decompositons, such as the wavelet transform (WT) [26,27,28], the Generalized Laplacian Pyramid (GLP) [29,30,31] or the Non-Subsampled Contourlet Transform (NSCT) [32,33,34], were used to bring more precision to the methods. The “a trous” wavelet transform (AWT) was the preferred decomposition technique [26,28] until the publication of [31] showed the advantages of GLP over AWT. This was later corroborated in [14] where a comparison of different methods based on decimated and undecimated WT, AWT, GLP and NSCT concluded that GLP outperforms AWT because it better removes aliasing. MRA category also includes the Smoothing Filter Based Intensity Modulation (SFIM) method [35,36], which first upsamples the MS image to the size of the PAN one and then uses a simplified solar radiation and land surface reflection model to increase its quality, and the Indusion method [37] in which upscaling and fusion steps are carried out together. Deep Learning (DL) techniques have gained prominence in the past years and several methods have been proposed for pansharpening. As far as we know, the use of Deep Neural Networks (DNN) for pansharpening were first introduced in [38] where a Modified Sparse Denoising Autoencoder  (MSDA) algorithm was proposed. For the same task, a Coupled Sparse Denoising Autoencoder (CSDA) was used in [39]. Convolutional neural networks were introduced in [40] and also used, for instance, in [41]. Instead of facing the difficult task of learning the whole image, residual networks [42,43] learn, from upsampled MS and PAN patches, only the details of the MS high-resolution image that are not already in the upsampled MS image and add them to it to obtain the pansharpened image. To adjust the size of the MS image to the size of the PAN one in a coarse-to-fine manner, two residual networks in cascade were set in the so called Progressive Cascade Deep Residual Network (PCDRN)  [44]. In [45] a multi-scale approach is followed by learning a DNN to upsample each NSCT directional sub-band from the MS and PAN images. In general, the main weaknesses of the DL techniques are the high computational resources needed for training, the need of a huge amount of training data, which, in the case of pansharpening, might not be available, and the poor generalization to satellite images not used during training. The absence of ground-truth MS high-resolution images, needed for training these DL methods, is a problem pointed-out by [46] where a non-supervised generative adversarial network (Pan-GAN) was proposed. The GAN aims to generate pansharpened images that are consistent with the spectral information of the MS image while maintaining the spatial information of the PAN image. However, the generalization of this technique to satellite images different from the ones used for training is not clear. The adaptation of general image fusion methods, like the U2Fusion method in [47], to the pansharpening problem is a promising research area. From a practical perspective, Variational Optimization (VO)-based methods present advantages both from a theoretical as well as computational points of view [48]. VO-based methods mathematically model the relation between the observed images and the original MS high resolution image, building an energy functional based on some desired properties of the original image. The pansharpened image is obtained as the image that minimizes this energy functional [49]. This mathematical formulation allows to rigorously introduce and process features that are visually important into the energy functional. Variational optimization can be considered as a particular case of the Bayesian approach [50], where the estimated image is obtained by maximizing the posterior probability distribution of the MS high resolution image. Bayesian methods for pansharpening formulate the relations between the observed images and the original MS high resolution image as probability distributions, model the desired properties as prior distributions and use Bayes’ theory to estimate the pansharpened image based on the posterior distribution of the original MS high resolution  image. Following the seminal P+Xs method [51], the PAN image is usually modelled as a combination of the bands of the original high resolution mutispectral image. However, in [49] this model was generalized by substituting the intensity images by their gradients. Note that while the P+Xs method [51] preserves spectral information, it produces blurring artifacts. To remove blur while preserving spectral similarity, other restrictions are introduced as reasonable assumptions or prior knowledge about the original image such as Laplacian prior [52], total variation [53,54], sparse representations [55], band correlations [56,57], non-local priors [58,59], etc. Spectral information is also preserved by enforcing the pansharpened image to be close to the observed MS one when downsampled to the size of the latter [52,60,61]. A special class of VO-based methods are the super-resolution methods which model pansharpening as the inverse problem of recovering the original high-resolution image by fusing the MS image and the PAN (see [52,62] for a recent review and [63] for a recent work). Deconvolution methods, such as [64], also try to solve the inverse problem but the upsampling of the MS image to the size of the PAN one is performed prior to the pansharpening procedure. Registration and fusion are carried out simultaneously in [65]. Note that the variational Bayesian approach, also followed in this paper, is more general than variational optimization. While VO-based methods aim at obtaining a single estimate of the pansharpened image, the variational Bayesian approach estimates the whole posterior distribution of the pansharpened images and the model parameters, given the observations. When a single image is needed, the mode of the distribution is usually selected, but other solutions can be obtained, for instance, by sampling the estimated distribution. Even more, the proposed approach allows us to simultaneously estimate the model parameters along with the pansharpened image using the same framework.

3. Problem Formulation

Let us denote by the MS high-resolution image hypothetically captured with an ideal high-resolution sensor with B bands , , of size pixels, that is, , where the superscript denotes the transpose of a vector or matrix. Note that each band of the image is flattened into a column vector containing its pixels in lexicographical order. Unfortunately, this high-resolution image is not available in real applications. Instead, we observe an MS low-resolution image with B bands of size pixels with , . The bands in this image are flattened as well to express them as a column vector. The relation between each low-resolution band, , and its corresponding high-resolution one, , is defined by where is decimation operator, is a blurring matrix, , and the capture noise is modeled as additive white Gaussian noise with variance . A single band high-resolution PAN image covering a wide range of frequencies is also provided by the sensor. This PAN image of size is modelled as an spectral average of the unknown high-resolution bands , as  where are known quantities that depend on each particular satellite sensor, and the capture noise is modeled as additive white Gaussian noise with variance . Once the image formation is formulated, let us use the Bayesian formulation to tackle the problem of recovering , the MS high resolution image, using the observed , its degraded MS low resolution and PAN .

4. Bayesian Modelling and Inference

We model the distribution of each low resolution image , , following the degradation model in Equation (1) as a Gaussian distribution with mean and covariance matrix . Then, the distribution of the observed image is modelled by with . Analogously, using the degradation model in Equation (2), the distribution of the PAN image is given by The starting point for Bayesian methods is to choose a prior distribution for the unknowns. In this paper, we use SG distributions as priors for the MS high resolution image as with and and is a partition function. In Equation (5), is a filtered version of the b-th band, , where is a set of J high-pass filters, is the i-th pixel value of , and  is a penalty function. The image priors are placed on the filtered image . It is well-known that the application of high-pass filters to natural images returns sparse coefficients. Most of the coefficients are zero or close to zero while only the edge related coefficients remain large. Sparse priors enjoy SG properties, heavier tails, more peaked and positive excess kurtosis compared to the Gaussian distribution. The distribution mass is located around zero, but large values have a higher probability than in a Gaussian distribution. For  in Equation (5) to be SG, has to be symmetric around zero and the function increasing and concave for . This condition is equivalent to being decreasing on , and allows to be represented as where inf denotes the infimum, is the concave conjugate of and are a set of positive parameters. The relationship dual to Equation (6) is given by [66] To achieve sparsity, the function should suppress most of the coefficients in and preserve a small number of key features. Table 1 shows some penalty functions, corresponding to SG distributions (see [1]).
Table 1

Some possible penalty functions.

Label ρ(s) ρ(s)/|s|
p, 0<p1 1p|s|p |s|p2
log log(ϵ+|s|) (ϵ+|s|)1|s|1
From Equations (3)–(5), the joint probability distribution , with  the set of all unknowns, is given by where flat hyperpriors , and on the model hyperparameters have been included. Following the Bayesian paradigm, inference will be based on . Since this posterior distribution cannot be analytically calculated due to the form of the SG distribution, in this paper we use the mean-field variational Bayesian model [67] to approximate by the distribution of the form , that minimizes the Kullback–Leibler divergence [68] defined as The Kullback–Leibler divergence is always non-negative and it is equal to zero if and only if . Even with this factorization, the SG prior for hampers the evaluation of this divergence, but the quadratic bound for in Equation (7) allows us to bound the prior in Equation (5) with a Gaussian form such that We then define the lower bound of the prior where and obtain the lower bound of the joint probability distribution to obtain the inequality . Utilizing the lower bound for the posterior probability distribution in Equation (10) we minimize instead of . As shown in [67], for each unknown , the estimated will have the form where represents all the variables in except and denotes the expected value calculated using the distribution . When point estimates are required is used. For variables with a degenerate posterior approximation, that is, for , the value where the posterior degenerates is given by [67] Let us now obtain the analytic expressions for each unknown posterior approximation.

4.1. High Resolution Multispectral Image Update

Using Equation (14) we can show in a straightforward way that the posterior distribution for the high resolution MS image will have the form where the inverse of the covariance matrix is given by with ⊗ denoting the Kronecker product, is a diagonal matrix formed from the elements of a vector and the mean is obtained as

4.2. Variational Parameters Update

To estimate the value of the variational parameters, introduced in Equation (7), we need to solve, for each band , filter , and pixel , the optimization problem where . Since whose minimum is achieved at , we have, differentiating the right hand side of (19) with respect to x,

4.3. Model Parameters Update

The estimates of the noise variance in the degradation models in Equations (3) and (4) are obtained using Equation (15) as where represents the trace of the matrix. From Equation (14) we obtain the following distribution for the parameter of the SG prior in Equation (5). The mode of this distribution can be obtained (see [69]) by solving The penalty function shown in Table 1 produces proper priors, for which the partition function can be evaluated, but the log penalty function produces an improper prior. We tackle this problem examining, for , the behavior of and keeping in the term that depends on . This produces for the log prior

4.4. Calculating the Covariance Matrices

The matrix in Equation (17) must be explicitly computed to find its trace and also to calculate . However, since its calculation is very intense, we propose the following approximation. We  first approximate using where is calculated as the mean of the values in . We then use the approximation with Finally we have

4.5. Proposed Algorithm

Based on the previous derivations, we propose the Variational Bayesian SG Pansharpening Algorithm in Algorithm 1. The linear equations problem in Equation (18), used in step 4 of Algorithm 1, has been solved using the Conjugate Gradient approach.

5. Materials and Methods

To test the performance of the proposed methodology on different kind of images, five satellite images were used: three LANDSAT 7-ETM+ [70] images, a SPOT-5 [71] image and a FORMOSAT-2 [72] image. LANDSAT MS images have six bands and a ratio between PAN and MS images . Figure 1 and Figure 2 show RGB color images formed by the bands B4, B3 and B2 of LANDSAT MS images, and their corresponding PAN images. Figure 1 corresponds to an area from Chesapeake Bay (US) while Figure 2 depicts two areas from Neatherland.SPOT-5 MS images have four bands and two PAN images, with resolution ratios of and , are provided. FORMOSAT-2 MS images also have four bands and a ratio between PAN and MS images . Figure 3a,c show the RGB color images formed from bands B3, B2 and B1 bands of a SPOT-5 image from Roma (IT) and a FORMOSAT-2 MS image from Salon-de-Provence (FR) and Figure 3b,d their corresponding PAN images.
Figure 1

Observed LANDSAT 7-ETM+ Chesapeake Bay image: (a) multispectral (MS), (b) panchromatic (PAN).

Figure 2

Observed LANDSAT 7-ETM+ Netherland images: (a) MS, (b) PAN, (c) MS, (d) PAN.

Figure 3

Observed SPOT-5 Roma image: (a) MS, (b) PAN. FORMOSAT-2 Salon-de-Provence image: (c) MS, (d) PAN.

Both the observed and images have been normalized to the range before running Algorithm 1. The convergence criterion in the algorithm was or 50 iterations were reached, whatever occurs first. The relationship between the MS high resolution image and the panchromatic image in Equation (2) is governed by the parameters that need to be set before pansharpening is carried out. If we knew the sensor spectral response characteristics, the values of could be estimated from them. For instance, for LANDSAT 7-ETM+, Figure 4 shows the sensor spectral response curves for the MS bands B1-B6, shown in color, and the PAN band shown in black. For this sensor, the PAN band mainly overlaps B2-B4 MS bands, and coefficients could be obtained from this overlapping (see [52]). In this paper, however, a more general approach is followed to estimate from the observations. First, we define , a version of the PAN image downsampled to the size of the MS image. Then, since the sensor spectral response is the same in high and low resolution, the parameters can be obtained by solving
Figure 4

LANDSAT 7-ETM+ band spectral response normalized to one.

Table 2 shows the associated to the different considered observed images. For the LANDSAT 7-ETM+ images only the first four bands are positive and and values are 0 since we know that bands B5 and B6 are not covered by the panchromatic sensor. For this process, each band is normalized to the interval . Note that due to the normalization, the estimated values do not only depend on the sensor spectral response but also on the observed area characteristics. This explains the differences between the obtained values for the images in Figure 2a,c. Although those images are from the same area of Netherlands, clouds in Figure 2a modify the estimation of the values of .
Table 2

Estimated values for the different sensors.

SensorImageB1B2B3B4B5B6
LANDSAT 7-ETM+ Figure 1 0.09860.10110.25760.542700
LANDSAT 7-ETM+Figure 2a0.01830.42430.05760.499800
LANDSAT 7-ETM+Figure 2c00.22830.16110.610600
SPOT-5Figure 3a00.29930.68970.0110--
FORMOSAT-2Figure 3c0.03840.556600.4051--

6. Discussion

Within the variational Bayesian methodology, two methods are proposed in this paper: one using the log penalty function (see Table 1), hence, named log method, and another using the penalty function, with , referred as method. The proposed methods have been compared with the following classic and state-of-the-art pansharpening methods: the Principal Component Analysis (PCA) [6], the Intensity–Hue–Saturation (IHS) transform [19], the Brovey transform (Brovey) [7], the Band-Dependent Spatial-Detail (BDSD) method in [22], the Gram-Schmidt (GS) method in [20], the Gram-Schmidt adaptive (GSA) method in [21], the Partial Replacement Adaptive Component Substitution (PRACS) method in [23], the High Pass Filtering (HPF) algorithm in [18], the Smoothing Filter Based Intensity Modulation (SFIM) method [35,36], the Indusion method in [37], the Additive A Trous Wavelet Transform (ATWT) in [26], the Additive Wavelet Luminance Proportional (AWLP) method in [28], the ATWT Model 2 (ATWT-M2) and ATWT Model 3 (ATWT-M3) methods in [10], the Generalized Laplacian Pyramid (GLP)-based methods in [29], concretely the modulation transfer functions (MTF)-GLP, GLP with High Pass Modulation (MTF-GLP-HPM), and GLP with Context Based Decision (MTF-GLP-CBD) methods, and the pansharpening method using a Total Variation (TV) image model in [53]. We have used the implementation of the methods and measures provided by the Pansharpening Toolbox (https://rscl-grss.org/coderecord.php?id=541) [13]. For those methods not included in the toolbox we have used the code provided by the authors. The code of the proposed methods will be publicly available at https://github.com/vipgugr. We have also included the results of bilinear interpolating the MS image to the size of the PAN, marked as EXP, as a reference. Both quantitative and qualitative comparisons of the different methods have been performed.

6.1. Quantitative Comparison

A common problem in pansharpening is the nonexistence of a MS high resolution ground-truth image to compare with. Hence we performed two kinds of quantitative comparisons. Firstly, the images obtained using the different methods have been compared following Wald’s protocol [73] as follows: the observed MS image, , and the PAN image, , are downsampled by applying the operator to generate low resolution versions of them. Then, pansharpening is applied to those low resolution images and the obtained estimation of the MS image, , is quantitatively compared with the observed MS image, . Secondly, the different methods have been compared using Quality with No Reference (QNR) measures [13,74]. As previously stated, for the LANDSAT image in Figure 1, the resolution ratio between MS and PAN images is . Since the SPOT-5 satellite provides two PAN images, two experiments were carried out on the image in Figure 3, one with a decimation ratio of 4 and another with a ratio of 16. For the FORMOSAT-2 image the ratio is . However, for the sake of completeness, two experiments were also carried out, one assuming a decimation ratio of 4 and another with a ratio of 16. Both spatial and spectral quality metrics have been used to compare the results obtained using the different methods. Details for the metrics used is shown below: Spatial measures: Q Universal Quality Index (UQI) [75] averaged on all MS bands. Range: [-1, 1] The higher the the better. Q4, Q8 Instances of the [76] index taking values. Suitable to measure quality for multiband images having an arbitrary number of spectral bands. Q4 is used for SPOT-5 and FORMOSAT-2 images which have four bands and Q8 for the LANDSAT image with six bands. Range: [0, 1] The higher the better. Spatial Correlation Coefficient (SCC) [77] Measures the correlation coefficient between compared images after the application of a Sobel filter. Range: [0, 1] The higher the better. QNR spatial distortion () [78] Measures the spatial distortion between MS bands and PAN image. Range: [0, 1] The lower the better. Spectral measures: Spectral Angle Mapper (SAM) [79] For spectral fidelity. Measures the mean angle between the corresponding pixels of the compared images in the space defined by considering each spectral band as a coordinate axis Range: [0, 180] The lower the better. Erreur Relative Globale Adimensionnelle de Synthese (ERGAS) [80] Measures spectral consistency between compared images. Range: The lower ERGAS value the better consistency, specially for values lower than the number of image bands B. QNR spectral distortion () [78] This measure is derived from the differences between the inter-band Q index values computed for HR and LR images. Range: [0, 1] The lower, the better. Spatial and spectral measures: Jointly Spectral and Spatial Quality Index (QNR) [78] QNR is obtained as the product of (1-) and (1-). Range: [0, 1] The higher the better. Table 3 shows the obtained figures of merit using Wald’s protocol for the LANDSAT image in Figure 1. As it is clear from the table, ℓ1 outperforms all the other methods both in spectral fidelity and the incorporation of spatial details. Note the high SCC value (meaning that the details in the PAN image have been successfully incorporated into the pansharpened image) while also obtaining the lowest spectral distortion as evidenced by the SAM and ERGAS values. The TV method obtains the second best results except for the SAM metric, for this metric, the proposed log method has the second best value. This method also obtains the third best values for ERGAS and SCC measures. GLP based and PRACS methods also obtain high values for the Q, Q8 indices and low value for SAM. However, their ERGAS and SCC performance is worse. Table 4 shows the QNR quantitative results for the LANDSAT image in Figure 1. In this table, the proposed methods achieve competitive results. Log obtains the best value and this method together with ℓ1 obtain second and third QNR scores, respectively. Note that EXP obtained the highest score using QNR since bilinear interpolation of the observed MS low resolution image is used as the MS high resolution estimation to calculate and calculations.
Table 3

Quantitative results on the LANDSAT 7-ETM+ Chesapeake Bay image. Bold values indicate the best score.

MethodQ8QSAMERGASSCC
EXP 0.83350.83832.02235.11130.8718
PCA 0.78300.80742.79375.80860.8569
IHS 0.67950.67342.66407.85860.8223
Brovey 0.67980.67902.16057.52260.8148
BDSD 0.75990.78292.46625.92440.8539
GS 0.74470.74013.30528.72580.8040
GSA 0.80100.82022.40335.06880.8764
PRACS 0.83630.84232.09984.86550.8774
HPF 0.81370.82432.22215.12790.8834
SFIM 0.80080.81842.18575.02280.8905
Indusion 0.80520.81672.29065.54120.8495
ATWT 0.81000.82082.34825.36420.8809
AWLP 0.82670.83152.25165.38730.8734
ATWT-M2 0.77370.78022.60045.81330.8317
ATWT-M3 0.78840.79252.57205.89730.8367
MTF-GLP 0.81570.82582.24185.13070.8838
MTF-GLP-HPM 0.80370.82062.18785.02360.8918
MTF-GLP-CBD 0.82000.82922.24995.10150.8809
TV 0.84920.86062.13624.25050.9163
1 0.8595 0.8694 1.8518 4.0954 0.9220
log 0.79510.78561.88394.48190.9007
Table 4

QNR Quantitative results on the LANDSAT 7-ETM+ Chesapeake Bay image. Bold values indicate the best score.

Dλ DS QNR
EXP 0.0096 0.0166 0.9740
PCA 0.05550.14590.8066
IHS 0.10490.28880.6366
Brovey 0.09630.22900.6968
BDSD 0.10550.14860.7616
GS 0.07160.19920.7435
GSA 0.06050.11670.8298
PRACS 0.01540.08230.9035
HPF 0.08570.13080.7947
SFIM 0.08340.11460.8116
Indusion 0.05950.05250.8911
ATWT 0.10440.15000.7612
AWLP 0.10450.15170.7596
ATWT-M2 0.15300.23270.6499
ATWT-M3 0.12270.19750.7041
MTF-GLP 0.09170.13780.7831
MTF-GLP-HPM 0.08900.12150.8004
MTF-GLP-CBD 0.05850.10810.8397
TV 0.04250.09190.8695
l1 0.03380.05270.9153
log 0.0090 0.03640.9549
Table 5 and Table 6 show the quantitative results using Wald’s protocol for the LANDSAT images in Figure 2a,c, respectively. PRACS outperforms all other methods on the image in Figure 2a (see Table 5) and the proposed ℓ1 and log obtain the first and second best scores on the image in Figure 2c (see Table 6). Table 7 and Table 8 show the obtained QNR figures of merit for those two images. The proposed methods produce good , and QNR values for both images, both above 0.9 which supports their good performance. Again the EXP results are the best in all the measures for Table 8 and provides the best , for the image associated to Table 7. The ℓ1 method obtains the best for this image and BDSD the highest QNR.
Table 5

Quantitative results using Wald’s protocol on the LANDSAT 7-ETM+ Netherland image in Figure 2a. Bold values indicate the best score.

Q8QSAMERGASSCC
EXP 0.47270.88493.04457.72350.8686
PCA 0.37590.76173.816512.64200.8205
IHS 0.33220.74791.763310.75980.8570
Brovey 0.28920.7675 0.0000 10.78090.8492
BDSD 0.72050.95201.72964.67480.9777
GS 0.38600.78333.348611.78500.8323
GSA 0.55430.84742.508610.29900.8707
PRACS 0.8230 0.9720 0.9558 2.9097 0.9878
HPF 0.64200.90452.06997.43870.9458
SFIM 0.59500.90431.88988.17780.9379
Indusion 0.41080.84063.69639.65210.8269
ATWT 0.55820.87412.68599.65070.9267
AWLP 0.47150.87412.205910.10570.9195
ATWT-M2 0.39430.84363.78798.72890.8606
ATWT-M3 0.48610.86853.52747.75060.8829
MTF-GLP 0.59750.89462.25448.12790.9351
MTF-GLP-HPM 0.56590.89262.01339.02210.9272
MTF-GLP-CBD 0.60950.90012.17267.82900.9392
TV 0.47980.89063.48737.12070.8977
l1 0.48150.90443.37836.94220.9022
log 0.49310.89202.91876.93210.8952
Table 6

Quantitative results using Wald’s protocol on the LANDSAT 7-ETM+ Netherland image in Figure 2c. Bold values indicate the best score.

Q8QSAMERGASSCC
EXP 0.78740.78673.33625.92790.8521
PCA 0.61670.48545.707812.81270.5788
IHS 0.53430.37784.741712.95260.5918
Brovey 0.54650.40633.460513.08760.5960
BDSD 0.75750.77333.93486.86100.7856
GS 0.58990.43696.115813.92750.5522
GSA 0.73230.74444.09207.20780.7535
PRACS 0.78220.78293.47876.32440.8037
HPF 0.71670.72413.81357.32660.7603
SFIM 0.69080.71814.66939.02620.7261
Indusion 0.72300.73463.82297.21770.7631
ATWT 0.69480.69714.04028.03060.7240
AWLP 0.71100.70663.81248.17330.7165
ATWT-M2 0.63320.62884.52238.55420.6135
ATWT-M3 0.69930.69674.30287.44200.7201
MTF-GLP 0.71160.71723.86327.51340.7457
MTF-GLP-HPM 0.68750.71334.67809.11100.7161
MTF-GLP-CBD 0.75190.75953.75226.86980.7740
TV 0.78430.80653.84025.73510.8519
l1 0.8118 0.8196 3.1337 5.1831 0.8853
log 0.77500.7682 3.0810 5.31150.8818
Table 7

QNR quantitative results on the LANDSAT 7-ETM+ Netherland image in Figure 2a. Bold values indicate the best score.

Dλ DS QNR
EXP 0.0104 0.05930.9309
PCA 0.24630.39980.4523
IHS 0.26320.40350.4394
Brovey 0.21820.38730.4790
BDSD 0.01590.0505 0.9344
GS 0.23350.40670.4548
GSA 0.21390.32400.5314
PRACS 0.06650.21060.7369
HPF 0.17110.26380.6102
SFIM 0.16100.25130.6282
Indusion 0.13540.16120.7252
ATWT 0.19680.29610.5654
AWLP 0.19770.29540.5653
ATWT-M2 0.17270.31270.5686
ATWT-M3 0.11870.21430.6924
MTF-GLP 0.17960.27910.5915
MTF-GLP-HPM 0.16880.26610.6100
MTF-GLP-CBD 0.17410.27780.5965
TV 0.09120.11270.8064
l1 0.0342 0.0386 0.9286
log 0.01570.06270.9225
Table 8

QNR quantitative results on the LANDSAT 7-ETM+ Netherland image in Figure 2c. Bold values indicate the best score.

Dλ DS QNR
EXP 0.0067 0.0157 0.9778
PCA 0.18930.50190.4038
IHS 0.20400.59730.3205
Brovey 0.19790.52270.3829
BDSD 0.01340.01070.9761
GS 0.19990.52370.3811
GSA 0.09860.23370.6907
PRACS 0.02550.11760.8599
HPF 0.15940.27720.6075
SFIM 0.11240.22720.6859
Indusion 0.13520.20540.6871
ATWT 0.18220.32340.5534
AWLP 0.17650.31010.5682
ATWT-M2 0.19030.34670.5290
ATWT-M3 0.09170.15810.7647
MTF-GLP 0.16560.29330.5897
MTF-GLP-HPM 0.11640.24240.6694
MTF-GLP-CBD 0.08540.19530.7359
TV 0.03810.11860.8478
l1 0.02280.07040.9084
log 0.00730.01990.9730
Figure 5 and Figure 6 show a zoomed in region of the RGB color images formed by bands B4, B3, and B2 of MS ground truth images used to apply Wald’s protocol and also the absolute error images for the methods in Table 7 and Table 8. In those images, the darker the intensity the lower the absolute error. Figure 5 and Figure 6 are consistent with the quantitative comparison shown in Table 5 and Table 6, respectively. The best results for the image in Figure 2a were obtained using PRACS, while for the image in Figure 2c the best performing method is ℓ1. Note that brighter areas in Figure 5e,f correspond to the borders of cloudy areas in Figure 2a. We argue that since clouds alter the weights of estimated using Equation (30), the boundaries of clouds and land areas in Figure 2a are not well resolved. This explains a worse behavior of the proposed methods in the cloudy areas of this image.
Figure 5

(a) Ground truth image from Figure 2a. The normalized maximum absolute error minus the absolute error images images for the following methods, (b) Partial Replacement Adaptive Component Substitution (PRACS), (c) modulation transfer functions (MTF)-generalized Laplacian pyramid (GLP)-context based decision (CBD), (d) Total Variation (TV), (e) ℓ1 and (f) log.

Figure 6

(a) Ground truth image from Figure 2c. The normalized maximum absolute error images for the following methods: (b) PRACS, (c) MTF-GLP-CBD, (d) TV, (e) ℓ1 and (f) log.

Table 9 and Table 10 show, respectively, the quantitative results using Wald’s protocol for the SPOT-5 and the FORMOSAT-2 images in Figure 3 for the decimation ratios and . The proposed log obtains the best figures of merit for the SPOT image in Figure 3a with except for Q and Q4 metrics. The Q values obtained by log and ℓ1 are slightly lower than those obtained by BDSD. Note that BDSD achieved the third best general figures just below the proposed log and ℓ1 algorithms. With the proposed log algorithm provides the best results except for Q, Q4 and SAM values, where competitive values are obtained. The proposed log achieves a slightly lower Q value than PRACS and a slightly higher SAM value than Brovey. In general, PRACS is the second best performing method for this image for . For the FORMOSAT-2 image in Figure 3c, the proposed ℓ1 and log algorithms obtained the best numerical results for a magnification. Both methods provide similar results, which are better than all the one provided by the competing methods. For a ratio , there is not a clear winner. The proposed methods are competitive in this image although they do not stand out in any of the measures. Table 11 and Table 12 show, respectively, the QNR quantitative results for the SPOT-5 and the FORMOSAT-2 images in Figure 3 for the decimation ratios and . In Table 11, EXP achieves the best , and QNR scores. In this table, the proposed methods obtain good scores. The log method obtains the second best and values and very high QNR values for both decimation ratios. Results for the FORMOSAT image, shown in Table 12, are very similar although in this case, BDSD obtains the best and QNR values for and ATWT-M3 for .
Table 9

Quantitative results using Wald’s protocol on the SPOT-5 Roma image. Bold values indicate the best score.

p/P 416
Method Q4 Q SAM ERGAS SCC Q4 Q SAM ERGAS SCC
EXP 0.8766 0.8859 1.70483.78570.8640 0.7325 0.74072.50712.84410.6049
PCA 0.40670.53605.164612.33460.27880.39270.50915.72086.29110.2443
IHS 0.40720.52383.995112.23420.27720.39730.50514.45086.17700.2520
Brovey 0.41240.53371.841312.19600.25940.40190.5124 2.4000 6.17180.2482
BDSD 0.85590.88252.05654.27760.82350.59470.62313.03284.30500.2804
GS 0.41020.53645.052312.12720.26340.39850.51245.58496.14710.2405
GSA 0.48970.53842.989311.03630.19320.49970.53543.11895.37030.2164
PRACS 0.82200.83802.05364.89360.72980.7291 0.7458 2.51902.96470.5295
HPF 0.74880.76952.06326.33880.62500.58880.61242.83704.59690.2988
SFIM 0.77440.78601.96586.06940.64470.60520.62322.67924.44250.3079
Indusion 0.74730.78942.16095.97170.67610.53010.59353.56894.96560.3408
ATWT 0.69280.71712.23587.64330.51830.58490.60922.89324.72450.3099
AWLP 0.70660.72602.17987.62460.50750.59470.61732.85044.69630.3052
ATWT-M2 0.72290.73432.30846.31450.46780.67230.68222.72913.32770.3965
ATWT-M3 0.78370.79302.23945.19350.67140.69240.70192.70263.10940.4801
MTF-GLP 0.72890.75072.12016.79750.57660.57750.60232.92754.83550.3022
MTF-GLP-HPM 0.75530.76752.00056.49230.59720.59280.61112.73884.70660.3073
MTF-GLP-CBD 0.77180.78702.01886.04900.62150.60210.62172.77674.56320.3118
TV 0.74720.78933.18826.13930.61620.64800.68723.51093.97630.3265
1 0.86170.87832.06884.15570.81960.64090.70173.77933.71170.3792
log 0.86360.8762 1.6053 3.3673 0.8923 0.73230.73952.4228 2.7072 0.6262
Table 10

Quantitative results using Wald’s protocol on the FORMOSAT-2 Salon-de-Provence image. Bold values indicate the best score.

p/P 416
Method Q4 Q SAM ERGAS SCC Q4 Q SAM ERGAS SCC
EXP 0.86100.86171.81583.72570.84600.69180.69252.50572.69340.5906
PCA 0.78580.80892.20985.07110.65950.74150.76682.63732.74850.5870
IHS 0.79120.81591.96374.62930.66950.74450.77202.45272.58030.5894
Brovey 0.78610.81461.91064.53320.66440.73850.76872.43452.55970.5793
BDSD 0.84430.85452.02924.26080.72320.77010.78232.54602.72170.5821
GS 0.79790.81952.13754.91930.66190.75060.77532.57612.68600.5888
GSA 0.78590.81672.18555.25080.65080.76070.77722.55102.76980.5796
PRACS 0.84950.85921.95814.08420.7373 0.7947 0.8016 2.4240 2.3683 0.6230
HPF 0.84690.85231.97994.22100.77820.78130.78852.46092.53880.5900
SFIM 0.84700.85231.98214.27180.77550.78170.78882.46002.56580.5874
Indusion 0.83040.83712.00754.43610.74070.75190.76982.49862.69410.5758
ATWT 0.83910.84492.03744.56750.76630.79010.79652.46792.53990.6161
AWLP 0.84080.84731.88964.35490.76710.79090.7979 2.3689 2.45400.6150
ATWT-M2 0.82770.83262.23074.21640.69750.76720.77042.54992.37090.6325
ATWT-M3 0.83250.83562.21374.05860.73410.76390.76602.56592.3878 0.6384
MTF-GLP 0.84770.85331.98834.27050.77040.79010.79682.47142.55630.6176
MTF-GLP-HPM 0.84760.85321.98894.33130.76760.79040.79712.46662.59370.6158
MTF-GLP-CBD 0.84930.85482.00764.25750.77300.78460.79162.52702.63940.6123
TV 0.86960.87902.14373.76020.78760.78070.78782.79062.43740.5956
1 0.8946 0.8974 1.9200 3.3526 0.84570.76910.76253.07262.60410.5503
log 0.87060.8683 1.7597 3.3686 0.8751 0.68890.68482.49752.60090.6050
Table 11

QNR Quantitative results on the SPOT-5 Roma image. Bold values indicate the best score.

p/P 416
Dλ DS QNR Dλ DS QNR
EXP 0.0041 0.0150 0.9809 0.0001 0.0312 0.9687
PCA 0.20470.40940.46970.30940.50350.3429
IHS 0.23890.41580.44470.35740.51430.3121
Brovey 0.18040.37990.50820.28900.47540.3730
BDSD 0.01080.09220.89800.03880.03440.9281
GS 0.19640.41000.47410.30450.50440.3447
GSA 0.21940.34210.51350.32670.42870.3846
PRACS 0.03250.15550.81710.06560.21620.7324
HPF 0.08510.14050.78640.21490.25560.5844
SFIM 0.06610.12560.81670.19490.24160.6107
Indusion 0.05800.03700.90720.24580.15870.6345
ATWT 0.13100.21190.68490.23980.30370.5293
AWLP 0.10300.19500.72210.20150.28140.5738
ATWT-M2 0.07280.20020.74160.09960.16910.7482
ATWT-M3 0.01620.03490.94940.04930.03280.9195
MTF-GLP 0.10400.15860.75390.25110.30000.5242
MTF-GLP-HPM 0.08580.14410.78250.23140.28680.5481
MTF-GLP-CBD 0.06570.12720.81540.19220.26810.5912
TV 0.33990.18300.53940.18660.15100.6906
l1 0.02770.03780.93560.19270.15000.6862
log 0.00560.02720.96740.04220.03800.9214
Table 12

QNR quantitative results on the FORMOSAT-2 Salon-de-Provence image. Bold values indicate the best score.

p/P 416
Dλ DS QNR Dλ DS QNR
EXP 0.0087 0.08370.9083 0.0086 0.09310.8990
PCA 0.11080.21900.69450.15030.30830.5877
IHS 0.10830.21270.70200.15080.30270.5921
Brovey 0.08590.19640.73450.12150.28150.6312
BDSD 0.0179 0.0150 0.9673 0.02640.17750.8008
GS 0.09700.20840.71490.13660.29710.6069
GSA 0.12540.20950.69140.16180.29630.5899
PRACS 0.06210.16560.78260.08780.24130.6921
HPF 0.08960.11300.80750.11470.15590.7473
SFIM 0.08770.11210.81010.11280.15560.7492
Indusion 0.05100.01740.93250.09380.09650.8187
ATWT 0.12270.15290.74320.13550.19510.6958
AWLP 0.12900.15000.74040.13930.18870.6983
ATWT-M2 0.11840.15500.74490.09670.12380.7915
ATWT-M3 0.08710.09510.82610.0586 0.0388 0.9049
MTF-GLP 0.10040.12770.78470.14370.19960.6854
MTF-GLP-HPM 0.09800.12690.78750.14100.19890.6881
MTF-GLP-CBD 0.09100.12220.79790.13170.19190.7017
TV 0.11140.07870.81860.09510.18110.7410
l1 0.04250.05140.90830.05940.08470.8609
log 0.00940.09170.89980.01740.10110.8832
Table 13 shows the required CPU time in seconds on a 2.40GHz Intel® Xeon® CPU for the pansharpening of a MS image with 4 bands to a size, for and , using the different methods under comparison. Equation (18) has been solved using the Conjugate Gradient method which required, to achieve convergence, less than 30 iterations for the ℓ1 prior and at least 1000 iterations for the log prior. This explains the differences between their required CPU time. Note that the proposed methods automatically estimate the model parameters which increases the running time but makes our methods parameter free.
Table 13

Elapsed CPU time in seconds for the different pansharpening methods on a 1024 × 1024 image and with different ratios.

P p PCAIHSBroveyBDSDGS
512×512 1024×1024 0.90.040.040.90.4
256×256 1024×1024 0.30.030.030.80.3
P p GSAPRACSHPFSFIMIndusion
512×512 1024×1024 12.20.20.50.5
256×256 1024×1024 1.61.20.20.180.3
P p ATWT AWLP ATWT-M2 ATWT-M3 MTF-GLP
512×512 1024×1024 111410100.9
256×256 1024×1024 33770.6
P p MTF-HPM MTF-CBD TV 1 log
512×512 1024×1024 0.821.5 1038081.1 104
256×256 1024×1024 0.60.61.6 1033 1032.8 104

6.2. Qualitative Comparison

Figure 7 shows a small region of interest of the observed LANDSAT-7 images in Figure 1 and the pansharpening results with obtained by the proposed methods and the competing ones with the best quantitative performance, that is, PRACS, MTF-GLP-HPM, MTF-GLP-CBD and TV methods. All color images in this figure are RGB images formed from the B4, B3 and B2 Landsat bands. Since we are using full resolution images, there is no ground truth to compare with, so a visual analysis of the resulting images is performed. The improved resolution of all the pansharpening results in Figure 7c–h with respect to the observed MS image in Figure 7a is evident. PRACS, MTF-GLP-HPM and MTF-GLP-CBD images in Figure 7c–e have a lower detail level than TV and the proposed ℓ1 method, see Figure 7f,g, respectively. See, for instance, the staircase effects in some diagonal edges not present in the TV and proposed ℓ1 method results. The PRACS, MTF-GLP-HPM and MTF-GLP-CBD methods produce similar, but lower, spectral quality than the proposed method, which is consistent with the numerical results in Table 3 and discussion presented in Section 6.1. The image obtained using the ℓ1 method, Figure 7g, has colors closer to those of the observed MS image than the TV image, Figure 7f, which is also somewhat noisier. The log method is very good at removing noise in the image (see the sea area) but it tends to remove other fine details too.
Figure 7

A region of interest of the LANDSAT 7-ETM+ Chesapeake Bay image in Figure 1a. Observed images: (a) MS, (b) PAN. pansharpened images by: (c) PRACS, (d) MTF-GLP-High Pass Modulation (HPM), (e) MTF-GLP-CBD, (f) TV, (g) ℓ1 and (h) log methods.

Figure 8 shows a region of interest of the observed SPOT-5 images in Figure 3a,b and the pansharpening results with obtained using the competing methods with the best performance on this image, that is, Brovey, PRACS, ATWT-M3, and TV methods, and the proposed ℓ1 and log methods. All color images in this figure are RGB images formed from the bands B3, B2 and B1 of the SPOT-5 image. The Brovey method (Figure 8c) produces the highest spectral distortion, however, it also recovers more spatial details in the image (see the airport runway and plane). ATWT-M3 (Figure 8e), on the other hand, produces a blurry image. PRACS produces a sharper image, see Figure 8d, but details in the PAN image do not seem to be well-integrated. The TV method (Figure 8f) and the proposed ℓ1 and log methods (Figure 8g,h obtain the most consistent results, with high spatial details and low spectral distortion. However, TV introduces staircase artifacts on diagonal lines that are not noticeable in the ℓ1 and log images. As with the LANDSAT-7 image, the log image in Figure 8h lacks some small details, removed by the method along with noise.
Figure 8

A region of interest of the SPOT-5 Roma image in Figure 3a. Observed images: (a) MS, (b) PAN. pansharpened images by: (c) Brovey, (d) PRACS, (e) Additive A Trous Wavelet Transform (ATWT)-M3, (f) TV, (g) ℓ1 and (h) log methods.

7. Conclusions

A variational Bayesian methodology for the pansharpening problem has been proposed. In this methodology, we model the relation between the MS high resolution image and the PAN image as a linear combination of the MS bands whose weights are estimated from the available data. The observed MS image is modelled as a downsampled version of the original MS image. The expected characteristics of the pansharpened image are incorporated in the form of SG sparse image priors. Two penalty functions corresponding to SG distributions are used, ℓ1 and log. All the unknowns and model parameters have been automatically estimated within the variational Bayesian modelling and inference, and an efficient algorithm has been obtained. The proposed ℓ1 and log methods have been compared to classic and state-of-the-art methods obtaining very good results both quantitative and qualitatively. In general, they have obtained the best quantitative results for LANDSAT-7 ETM+, SPOT-5 and FORMOSAT-2 images with a resolution ratio of 4 and SPOT-5 with a resolution ratio of 16. Competitive results were also obtained for the FORMOSAT-2 image with a resolution ratio of 16. They stand out in terms of spectral consistency while improving the spatial resolution of pansharpened images. We argue that the superior spectral consistency of SG methods arises from the modelling of the PAN image which selectively incorporates PAN detailed information into the different MS high resolution bands without changing their spectral properties. Qualitatively, SG methods produce results consistent with the observed PAN and MS images and with the numerical results previously described. The log method is better at removing noise in the images, at the cost of removing some fine details.
  3 in total

1.  SIRF: Simultaneous Satellite Image Registration and Fusion in a Unified Framework.

Authors:  Chen Chen; Yeqing Li; Wei Liu; Junzhou Huang
Journal:  IEEE Trans Image Process       Date:  2015-07-14       Impact factor: 10.856

2.  A variational approach for pan-sharpening.

Authors:  Faming Fang; Fang Li; Chaomin Shen; Guixu Zhang
Journal:  IEEE Trans Image Process       Date:  2013-04-16       Impact factor: 10.856

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.