| Literature DB >> 35296633 |
Longqian Huang1, Ruichen Luo2, Xu Liu1, Xiang Hao3,4,5.
Abstract
The goal of spectral imaging is to capture the spectral signature of a target. Traditional scanning method for spectral imaging suffers from large system volume and low image acquisition speed for large scenes. In contrast, computational spectral imaging methods have resorted to computation power for reduced system volume, but still endure long computation time for iterative spectral reconstructions. Recently, deep learning techniques are introduced into computational spectral imaging, witnessing fast reconstruction speed, great reconstruction quality, and the potential to drastically reduce the system volume. In this article, we review state-of-the-art deep-learning-empowered computational spectral imaging methods. They are further divided into amplitude-coded, phase-coded, and wavelength-coded methods, based on different light properties used for encoding. To boost future researches, we've also organized publicly available spectral datasets.Entities:
Year: 2022 PMID: 35296633 PMCID: PMC8927154 DOI: 10.1038/s41377-022-00743-6
Source DB: PubMed Journal: Light Sci Appl ISSN: 2047-7538 Impact factor: 17.782
Fig. 1Typical scanning spectral imaging approaches.
a Whiskbroom. b Pushbroom. PGP prism-grating-prism. c Wavelength scan
Fig. 2Illustration of four CASSI architectures.
In DD-CASSI architecture, the spectral scene experiences a shearing-coding-unshearing procedure. In SD-CASSI, the diffraction grating before the coded aperture is removed, therefore it becomes a coding-shearing process. In SCCSI, the coded aperture is placed behind the dispersive element and the spectral data will experience a shearing-coding procedure. In SS-CASSI, coded aperture position becomes flexible between spectral plane and camera sensor, where the ratio (d1 + d2)/d2 determines the extent of spectral encoding
Fig. 3Spectral imaging process within SD-CASSI architecture.
The spectral data cube first passes a coded aperture for spatial encoding, then its spectral arrangement is shifted by a dispersive element. Finally, a detector captures the spatial and spectral encoded data image
Fig. 4Vectorization and coded-aperture-related sensing matrix generation procedure.
a Illustration of the vectorization process. For a matrix A, vectorization means stacking the columns of A on top of one another; For a spectral cube of the input scene f(m, n, l), vectorization means stacking the vectorized 2D slice on top of one another. b Illustration of generating sensing matrix from colored coded aperture in SD-CASSI architecture. It consists of a set of diagonal patterns that repeat in the horizontal direction, each time with a unit downward shift M that accounts for dispersion. Each diagonal pattern is generated from the vectorized coded aperture pattern of a band. The block–unblock coded aperture is similar, just turning the color bands into black and white
Comparison of different amplitude-coded compressive spectral imaging methods
| Article | CSI architecture | Performance (PSNR) | Reconstruction model | Deep-learning techniques |
|---|---|---|---|---|
| AutoEncoder[ | SD/DD/SS CASSI | 32.46 on CAVE (SS-CASSI) | Autoencoder Equation (Eq. (11) in ref. [ | Autoencoder prior |
| HyperReconNet[ | SD CASSI | 33.63 on ICVL, 31.36 on Harvard | CNN | Hardware representation layer (joint training) |
| Spatial–spectral prior[ | SD CASSI | 34.13 on ICVL, 32.84 on Harvard, 30.03 on KAIST | Unrolled network | Learned network prior |
| External–internal learning[ | SD CASSI | 35.884 on ICVL, 33.585 on Harvard, 29.055 on CAVE | CNN | Dense structure, back-projection pixel loss |
| SD CASSI | 32.29 on ICVL (average of 16 scenes) | conditional GAN | Self-attention, hierarchical structure | |
| DNU[ | SD CASSI | 34.24 on ICVL, 32.71 on Harvard | Unrolled network | Learned network prior |
| HCS2-Net[ | SD/SS CASSI | 34.52 on ICVL (10 scenes), 39.22 on CAVE (SS-CASSI), 29.33 on CAVE (SD-CASSI) | CNN (untrained) | Residual block, attention module, unsupervised learning, hardware code concatenated to the input measurement, deep image prior |
| Deep-Tensor[ | SD CASSI | 30.92 on ICVL, Harvard and KAIST (best mean) | CNN (untrained) | Learned tensor decomposition |
Evaluation results are collected from each original works
Fig. 5Main ideas of the four deep compressive reconstruction approaches.
a End-to-end reconstruction. b Joint mask learning. c Unrolled network. d Untrained network
Fig. 6Schematic diagram of diffractive spectral imaging via a diffractive optical element (DOE).
The left is the system using a transmissive DOE and a sensor, where the incident wave passes a DOE and then propagates a distance z before hitting the sensor. The propagation can be modeled by Fresnel diffraction. The right system uses an imaging lens just behind the DOE. After passing the DOE, the incident wave converged on the sensor through the lens. DOE has a height profile that introduces the phase shift
Fig. 7Illustration of wavelength encoding spectral imaging process.
The scene S is illuminated by the light source E, and is wavelength-coded through filters Q. Then the encoded scene spectral radiance is captured by the imaging lens on a sensor with spectral response D
Comparison of neural network-based works for end-to-end spectral reconstruction from RGB images
| Method | Network Backbone | Deep-Learning Techniques | Evaluation (RMSE) | Evaluation (MRAE) |
|---|---|---|---|---|
| HSCNN[ | CNN | Multi-layer CNN | 17.006 on Clean track (NTIRE 2018) | 0.0190 on Clean track (NTIRE 2018) |
| Adv_rgb2hs[ | cGAN | Additional MAE loss on cGAN | 1.457 on ICVL, 24.81 on Clean, 34.05 on Real World (NTIRE 2018) | 0.0218 on Clean, 0.0396 on Real World (NTIRE 2018) |
| Spectral super-resolution[ | Densenet | Dense structure, sub-pixel convolution layer | 1.98 on ICVL, 5.27 on NUS, 4.76 on CAVE | / |
| HSCNN+[ | 2-level CNN | Residual blocks, dense structure, feature fusion[ | 13.128[14.45] on Clean, 22.935[24.06] on Real World (NTIRE 2018) | 0.0135[0.0137] on Clean, 0.0293[0.0310] on Real World (NTIRE 2018) |
| LFB[ | U-Net | Camera sensitivity prior | 20.146[16.19] on Clean, 27.557[26.44] on Real World (NTIRE 2018) | 0.01704 on Clean, 0.03081 on Real World (NTIRE 2018) |
| CVL[ | 2-level CNN | Residual blocks, PReLU activation | 1.23 on ICVL, 3.66 on NUS, 3.5275 on CAVE, 17.27 on Clean, 27.09 on Real World (NTIRE 2018) | 0.0174[0.0152] on Clean, 0.0364[0.0335] on Real World (NTIRE 2018) |
| 3D-CNN[ | CNN | 3D CNN[ | 1.115 on ICVL, 2.86 on CAVE, 20.010[19.41] on Clean (NTIRE 2018) | 0.018[0.0181] on Clean track (NTIRE 2018) |
| Sensitivity estimation[ | CNN | Back-projection loss | (s) 0.0282 on ICVL, 0.0316 on CAVE | (s) 0.13 on ICVL 0.38 on CAVE |
| Deep Function-Mixture Network[ | multi-level CNN | Pixel attention, feature fusion | 4.54 on CAVE, 2.54 on Harvard, 1.03 on NTIRE 2018, 0.01268 on Clean, 0.01946 on Real World (NTIRE 2020) | 0.03075 on Clean, 0.06212 on Real World (NTIRE 2020) |
| MXR-U-Nets[ | U-Net | XResnet block[ | 0.01645 on Clean, 0.02255 on Real World (NTIRE 2020) | 0.0454[0.04441] on Clean, 0.0840[0.09322] on Real World (NTIRE 2020) |
| AWAN[ | CNN | Residual blocks, channel attention, PReLU activation, back-projection loss, self ensemble, model ensemble | 0.0111[0.01293] on Clean, 0.0170[0.01991] on Real World (NTIRE 2020); 10.24 on Clean, 21.33 on Real World (NTIRE 2018) | 0.0312[0.03010] on Clean, 0.0639[0.06210] on Real World (NTIRE 2020), 0.0114 on Clean, 0.0277 on Real World (NTIRE 2018) |
| RPAN[ | CNN | Pixel attention, global residual connection, feature fusion | 4.301[0.01695] on Clean, 4.984[0.02071] on Real World (NTIRE 2020) | 0.03756[0.03601] on Clean, 0.06787[0.06780] on Real World (NTIRE 2020) |
| HRNet[ | 4-level CNN | Residual blocks, dense structure, feature fusion, sub-pixel convolution layer, model ensemble | 0.01354[0.01389] on Clean, 0.01786[0.01923] on Real World (NTIRE 2020) | 0.04233[0.03231] on Clean, 0.06825[0.06200] on Real World (NTIRE 2020) |
| C2H-Net[ | U-Net | Additional category prior | 4.7313 on CAVE | / |
| Double Ghost[ | GhostNet | GhostNet block[ | 0.0162 on Real World track (NTIRE 2020) | 0.0439 on Real World track (NTIRE 2020) |
In the network architecture column, level means parallel CNN layers for data flow. For the deep-learning techniques column, we highlight the techniques that may play an important role in the method’s performance. Performance evaluations are collected from reported results of the original article, corresponding articles or NTIRE competition. Evaluation results from the original article is considered first, then NTIRE competition, and finally the corresponding articles. If the evaluation results occurred in both the original article and the NTIRE competition report, we use [] to denote the evaluation result in the NTIRE report. Evaluation values labeled “s” in the table are from the scaled dataset (datasets that are linearly scaled to [0,1] range)
Fig. 8Illustration of the physically implausible and plausible set.
Physically implausible set (left) is the spectra points that cannot be mapped to the original RGB point by the RGB-filter response, while physically plausible set (right) is those spectra points that can be mapped back
Fig. 9Schematic of the PCSED framework.
The broadband encoding stochastic (BEST) filters act as an encoding neural network (encoder), where the learned weights is the filter responses. The weights are constrained by the filters' structure parameters, and are generated from a pre-trained structure-to-response forward modeling network (FMN). The figure is reprinted from ref. [84] under a CC BY license (Creative Commons Attribution 4.0 International license)
Spectral dataset parameters
| Dataset Name | Year | Spectral range/nm | Channel step/nm | Resolution | Capacity/images |
|---|---|---|---|---|---|
| CAVE[ | 2008 | 400–700 | 10 | 512 × 512 | 32 |
| Harvard[ | 2011 | 420–720 | 10 | 1040 × 1392 | 75 |
| NUS[ | 2014 | 400–700 | 10 | 1312 × >950 | 66 |
| ICVL[ | 2016 | 400–1000, 400–700 | about 1.25, 10 | 1392 × 1300 | 201 |
| KAIST[ | 2017 | 420–720 | 10 | 2704 × 3376 | 30 |
| NTIRE 2018[ | 2018 | 400–700 | 10 | 1392 × 1300 | 256 |
| NTIRE 2020[ | 2020 | 400–700 | 10 | 482 × 512 | 460 |
| Hyperspectral and color imaging[ | / | / | / | / | 191 |
| Scyllarus[ | / | 400–700 | 10 | 1040 × 1392 | 73 |
| C2H-Data[ | 2020 | 374.1–988.1, 450–740 | about 4.6, 10 | 1392 × 1650 | 697 |
Description of spectral datasets
| Dataset name | Description |
|---|---|
| CAVE[ | Contains diverse objects and materials; taken under CIE Standard Illuminant D65 illumination. |
| Harvard[ | Contains indoor and outdoor scenes; 50 were taken under daylight illumination, 25 under artificial and mixed illumination. |
| NUS[ | Contains outdoor images of natural objects, man-made objects, buildings; taken under artificial wideband lights of different color temperatures, illumination spectra is provided. |
| ICVL[ | Contains urban, suburban, rural, indoor and plant-life scenes; taken under natural light. |
| KAIST[ | Contains color checkerboards with objects for network evaluation; use Xenon Illumination (Thorlab HPLS-30-4) as the light source. |
| Hyperspectral and color imaging[ | Contains 88 outdoor scenes, 57 fruits, 46 color charts and patches; taken under different lighting conditions, illumination data was provided. |
| Scyllarus[ | Contains portraits, office scenes, close-ups, fruits and flowers, landscapes, each has around 15 images; office scenes were taken under fluorescent lighting, others under natural lighting. |
| C2H-Data[ | Contains various real and artificial fruits and vegetables; taken under Tungsten-bromine lamps. |