| Literature DB >> 35890828 |
Hongfeng Wang1, Jianzhong Wang2, Haonan Xu1, Yong Sun1, Zibo Yu1.
Abstract
Infrared images are robust against illumination variation and disguises, containing the sharp edge contours of objects. Visible images are enriched with texture details. Infrared and visible image fusion seeks to obtain high-quality images, keeping the advantages of source images. This paper proposes an object-aware image fusion method based on a deep residual shrinkage network, termed as DRSNFuse. DRSNFuse exploits residual shrinkage blocks for image fusion and introduces a deeper network in infrared and visible image fusion tasks than existing methods based on fully convolutional networks. The deeper network can effectively extract semantic information, while the residual shrinkage blocks maintain the texture information throughout the whole network. The residual shrinkage blocks adapt a channel-wise attention mechanism to the fusion task, enabling feature map channels to focus on objects and backgrounds separately. A novel image fusion loss function is proposed to obtain better fusion performance and suppress artifacts. DRSNFuse trained with the proposed loss function can generate fused images with fewer artifacts and more original textures, which also satisfy the human visual system. Experiments show that our method has better fusion results than mainstream methods through quantitative comparison and obtains fused images with brighter targets, sharper edge contours, richer details, and fewer artifacts.Entities:
Keywords: artificial texture suppression; auto encoder and decoder; channel-wise attention mechanism; deep residual shrinkage network; image fusion
Mesh:
Year: 2022 PMID: 35890828 PMCID: PMC9318496 DOI: 10.3390/s22145149
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Mainstream deep learning methods summary.
| Groups | Representative Methods | Advantages | Challenges | |
|---|---|---|---|---|
| GAN-based methods | FusionGAN, GANMcC | Unsupervised image fusion with GANs | Absence of groundtruth | |
| Encoder–decoder-based methods | Traditional encoders | Refs. [ | Learnable decoders | Handcrafted feature extraction |
| Deep learning encoders | DenseFuse, DIDFuse, VIF-Net | Learnable encoders and decoders | Relatively shallow networks | |
Figure 1Frame of DRSNFuse.
Figure 2Architecture of DRSNFuse.
Architecture of the auto-encoder in DRSNFuse.
| Block | Chl_in | Kernel Num. | Kernel Size | Stride | Padding |
|---|---|---|---|---|---|
| RSB 1 | 1 | 16 | 3 | 1 | 1 |
| RSB 2 | 16 | 16 | 3 | 1 | 1 |
| RSB 3 | 32 | 16 | 3 | 1 | 1 |
| RSB 4 | 48 | 16 | 3 | 1 | 1 |
| Conv 1 | 64 | 16 | 1 | 1 | 0 |
| Conv 2 | 16 | 64 | 3 | 1 | 1 |
| Conv 3 | 16 | 64 | 3 | 1 | 1 |
Figure 3Residualshrinkage block. The left figure shows a residual shrinkage block with channel-shared thresholds. The right figure shows a residual shrinkage block with channel-wise thresholds.
Architecture of the residual shrinkage block.
| Layer | Kernel Num. | Kernel Size | Stride | Padding | Chl_in | Chl_out |
|---|---|---|---|---|---|---|
| ResidualShrinkage Block with Channel-Shared Thresholds | ||||||
| Conv 1 | Chs | 3 | 1 | 1 | - | - |
| Conv 2 | Chs | 3 | 1 | 1 | - | - |
| FC 3 | - | - | - | - | Chs | Chs |
| FC 4 | - | - | - | - | Chs | 1 |
| Residual Shrinkage Block with Channel-Wise Thresholds | ||||||
| Conv 1 | Chs | 3 | 1 | 1 | - | - |
| Conv 2 | Chs | 3 | 1 | 1 | - | - |
| FC 3 | - | - | - | - | Chs | Chs |
| FC 4 | - | - | - | - | Chs | Chs |
Architecture of the auto-decoder in DRSNFuse.
| Layer | Kernel Num. | Kernel Size | Stride | Padding |
|---|---|---|---|---|
| Conv 4 | 64 | 3 | 1 | 1 |
| Conv 5 | 32 | 3 | 1 | 1 |
| Conv 6 | 16 | 3 | 1 | 1 |
| Conv 7 | 1 | 3 | 1 | 0 |
System requirements.
| CPU | Intel 10700K |
| GPU | NVIDIA RTX 3090 |
| OS | Ubuntu 20.04 |
| Language | Python 3.8 with PyTorch 1.11.0 |
Datasets used in experiments.
| Dataset (Pairs) | Illumination | Average Size | |
|---|---|---|---|
| Training | RoadScene—train (181) | Daylight&Nightlight | 514×302 |
| Validation | RoadScene—validation (20) | Daylight&Nightlight | 514×302 |
| Test | RoadScene—test (20) | Daylight&Nightlight | 514×302 |
| TNO(20) | Daylight&Nightlight | 597×450 |
Figure 4Base parts and details from infrared images and visible images.
Figure 5Qualitative results for different methods.
Figure 6Sigmoid layer enhances images with higher contrast.
Figure 7Infrared and visible image pairs in the test set.
Results of quantitative evaluation on TNO and RoadScene. The bold values are the best results. And the underline values rank second.
| Method | ADF | HMSD-GF | VSMWLS | FusionGAN | DenseFuse | GANMcC | DIDFuse | DRSNFuse_CS | DRSNFuse_CW |
|---|---|---|---|---|---|---|---|---|---|
| RoadScene | |||||||||
| MI | 2.252 | 2.410 | 2.274 |
| 2.412 | 2.407 | 2.317 |
| 2.458 |
| AG | 6.668 | 9.044 | 9.190 | 4.800 | 8.324 | 3.977 | 4.720 |
|
|
| EN | 6.948 | 7.447 | 7.282 | 6.841 | 7.386 | 7.034 | 7.297 |
|
|
| SD | 9.349 | 10.217 | 9.844 | 9.242 |
| 9.975 | 9.925 |
| 10.624 |
| SF | 0.065 | 0.089 | 0.091 | 0.047 | 0.082 | 0.038 | 0.042 |
|
|
| SCD | 1.556 | 1.612 | 1.624 | 1.579 | 1.780 | 1.416 | 1.752 |
|
|
| SSIM | 0.866 |
|
| 0.869 | 0.892 | 0.659 | 0.811 | 0.884 | 0.901 |
| VIF | 0.619 |
| 0.678 | 0.611 | 0.657 | 0.517 | 0.618 |
| 0.687 |
| TNO | |||||||||
| MI | 1.712 | 1.825 | 1.899 | 1.954 |
| 1.995 | 2.098 |
| 1.977 |
| AG | 3.881 |
| 4.989 | 2.888 | 4.774 | 2.614 | 2.493 |
| 5.011 |
| EN | 6.432 | 7.075 | 6.800 | 6.371 |
| 6.370 | 6.576 |
| 7.180 |
| SD | 8.754 | 9.448 | 8.874 | 8.702 |
| 8.080 | 9.048 |
| 9.338 |
| SF | 0.038 |
| 0.050 | 0.030 | 0.047 | 0.027 | 0.024 |
| 0.049 |
| SCD | 1.579 | 1.658 | 1.739 | 1.598 | 1.807 | 1.421 | 1.696 |
|
|
| SSIM | 0.856 |
| 0.885 |
| 0.859 | 0.791 | 0.832 | 0.871 | 0.879 |
| VIF | 0.624 |
| 0.738 | 0.623 | 0.799 | 0.629 | 0.627 |
| 0.788 |
Figure 8Fusion results of pairs in RoadScene.
Figure 9Fusion results of pairs in TNO.
Average inference time. The bold values are the best results. And the underline values rank second.
| Method | ADF | HMSD-GF | VSMWLS | FusionGAN | DenseFuse | GANMcC | DIDFuse | DRSNFuse_CS | DRSNFuse_CW |
|---|---|---|---|---|---|---|---|---|---|
| RoadScene | |||||||||
| Inference time | 0.114s | 0.164s | 0.301s | 0.212s |
| 0.456s | 0.017s | 0.015s |
|
| TNO | |||||||||
| Inference time | 0.219s | 0.337s | 0.603s | 0.396s |
| 0.851s | 0.099s | 0.028s |
|