| Literature DB >> 32549397 |
Hyun-Koo Kim1, Kook-Yeol Yoo1, Ho-Youl Jung1.
Abstract
In this paper, a modified encoder-decoder structured fully convolutional network (ED-FCN) is proposed to generate the camera-like color image from the light detection and ranging (LiDAR) reflection image. Previously, we showed the possibility to generate a color image from a heterogeneous source using the asymmetric ED-FCN. In addition, modified ED-FCNs, i.e., UNET and selected connection UNET (SC-UNET), have been successfully applied to the biomedical image segmentation and concealed-object detection for military purposes, respectively. In this paper, we apply the SC-UNET to generate a color image from a heterogeneous image. Various connections between encoder and decoder are analyzed. The LiDAR reflection image has only 5.28% valid values, i.e., its data are extremely sparse. The severe sparseness of the reflection image limits the generation performance when the UNET is applied directly to this heterogeneous image generation. In this paper, we present a methodology of network connection in SC-UNET that considers the sparseness of each level in the encoder network and the similarity between the same levels of encoder and decoder networks. The simulation results show that the proposed SC-UNET with the connection between encoder and decoder at two lowest levels yields improvements of 3.87 dB and 0.17 in peak signal-to-noise ratio and structural similarity, respectively, over the conventional asymmetric ED-FCN. The methodology presented in this paper would be a powerful tool for generating data from heterogeneous sources.Entities:
Keywords: LiDAR imaging; LiDAR sensor; artificial intelligence; heterogeneous transfer method; image generation; learning systems; selected-connection network; sparse input data.
Year: 2020 PMID: 32549397 PMCID: PMC7349066 DOI: 10.3390/s20123387
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Various network architectures for semantic segmentation.
Figure 2Encoder-decoder structured fully convolutional network (ED-FCN)-based color image-generation network from light detection and ranging (LiDAR) reflection data; the network has five levels including transition level (level 0); for each level, the similarities between representative feature maps of encoder and decoder parts are provided; the kernel size of the convolution filter is , where represents the number of channels at the level L.
Figure 3Conventional and proposed selected connection UNET (SC-UNET) network architectures for color image generation from LiDAR reflection intensity. (a,b) Conventional network architectures; (c–e) proposed SC-UNET network architectures; the feature maps of the encoder part are combined in the form of concatenation into the feature maps of the decoder part in the networks shown in (b–e).
Summary of Information at Each Level of Encoder of the Conventional ED-FCN.
| Level | The Number of Weights in the Encoder Feature Map | Size of Receptive Field | Sparseness (%) | Similarity |
|---|---|---|---|---|
| Level 4 | 66,304 | 5 × 5 | 42.63 | 0.355 |
| Level 3 | 33,152 | 18 × 18 | 8.72 | 0.567 |
| Level 2 | 16,576 | 52 × 52 | 0.92 | 0.573 |
| Level 1 | 8288 | 136 × 136 | 0.00 | 0.821 |
Performance results of the proposed SC-UNET-based architectures.
| The Number of Filters in the First Convolution Layer | Network Architecture | The Number of Weights [ea.] | Average Processing Time [ms] | Dataset in [ | The 5-Fold Cross Validation | ||
|---|---|---|---|---|---|---|---|
| PSNR | SSIM | PSNR | SSIM | ||||
|
| ED-FCN | 1,747,955 | 4.47 | 17.98 | 0.43 | 17.90 (2.12) | 0.43 (0.14) |
| UNET | 1,943,795 | 4.88 | 18.01 | 0.43 | 17.92 (2.88) | 0.43 (0.19) | |
| SC-UNET w/Lv(1,2,3) | 1,941,491 | 4.53 | 18.04 | 0.44 | 18.01 (2.11) | 0.44 (0.17) | |
|
| 1,932,275 | 4.50 |
|
|
|
| |
| SC-UNET w/Lv(3,4) | 1,759,475 | 4.48 | 17.99 | 0.42 | 17.82 (2.15) | 0.41 (0.17) | |
| SC-UNET w/Lv(4) | 1,750,259 | 4.48 | 17.99 | 0.43 | 17.90 (2.11) | 0.43 (0.18) | |
| SC-UNET w/Lv(3) | 1,757,171 | 4.48 | 18.01 | 0.43 | 17.94 (2.19) | 0.43 (0.17) | |
| SC-UNET w/Lv(2) | 1,784,819 | 4.48 | 18.08 | 0.44 | 17.99 (2.18) | 0.44 (0.18) | |
|
| 1,895,411 | 4.49 |
|
|
|
| |
|
| ED-FCN | 6,991,331 | 8.97 | 18.65 | 0.47 | 18.55 (2.28) | 0.44 (0.11) |
| UNET | 7,765,475 | 9.49 | 18.88 | 0.49 | 18.81 (2.27) | 0.49 (0.11) | |
| SC-UNET w/Lv(1,2,3) | 7,018,979 | 9.28 | 18.90 | 0.48 | 18.76 (2.28) | 0.46 (0.11) | |
|
| 7,719,395 | 9.34 |
|
|
|
| |
| SC-UNET w/Lv(3,4) | 6,982,115 | 8.87 | 18.69 | 0.48 | 18.56 (2.23) | 0.47 (0.11) | |
| SC-UNET w/Lv(4) | 7,028,195 | 9.19 | 18.71 | 0.48 | 18.62 (2.28) | 0.47 (0.11) | |
| SC-UNET w/Lv(3) | 7,571,939 | 9.31 | 18.88 | 0.49 | 18.76 (2.28) | 0.48 (0.11) | |
| SC-UNET w/Lv(2) | 7,129,571 | 9.31 | 18.91 | 0.49 | 18.79 (2.31) | 0.48 (0.11) | |
|
| 7,756,259 | 9.40 |
|
|
|
| |
|
| ED-FCN | 15,702,483 | 14.16 | 18.91 | 0.49 | 18.73 (2.20) | 0.47 (0.12) |
| UNET | 17,465,043 | 15.29 | 19.14 | 0.50 | 19.06 (2.29) | 0.50 (0.11) | |
| SC-UNET w/Lv(1,2,3) | 17,444,307 | 15.12 | 19.98 | 0.55 | 19.88 (2.29) | 0.55 (0.11) | |
|
| 17,361,363 | 15.01 |
|
|
|
| |
| SC-UNET w/Lv(3,4) | 15,806,163 | 14.74 | 19.07 | 0.50 | 18.94 (2.31) | 0.49 (0.11) | |
| SC-UNET w/Lv(4) | 15,723,219 | 14.34 | 18.99 | 0.49 | 18.86 (2.29) | 0.48 (0.12) | |
| SC-UNET w/Lv(3) | 15,785,427 | 14.91 | 19.08 | 0.50 | 18.98 (2.28) | 0.49 (0.11) | |
| SC-UNET w/Lv(2) | 16,034,259 | 14.95 | 20.12 | 0.55 | 19.96 (2.52) | 0.55 (0.12) | |
|
| 17,029,587 | 14.96 |
|
|
|
| |
|
| ED-FCN | 27,909,059 | 21.43 | 19.01 | 0.48 | 18.98 (2.21) | 0.47 (0.11) |
| UNET | 31,042,499 | 23.13 | 19.37 | 0.52 | 19.27 (2.22) | 0.51 (0.12) | |
| SC-UNET w/Lv(1,2,3) | 31,005,635 | 22.87 | 20.29 | 0.56 | 19.89 (2.21) | 0.56 (0.11) | |
|
| 30,858,179 | 22.71 |
|
|
|
| |
| SC-UNET w/Lv(3,4) | 28,093,379 | 22.30 | 19.16 | 0.51 | 19.08 (2.32) | 0.50 (0.13) | |
| SC-UNET w/Lv(4) | 27,945,923 | 21.69 | 19.08 | 0.48 | 18.92 (2.21) | 0.48 (0.12) | |
| SC-UNET w/Lv(3) | 28,056,515 | 22.56 | 19.31 | 0.51 | 19.16 (2.28) | 0.50 (0.12) | |
| SC-UNET w/Lv(2) | 28,498,883 | 22.62 | 21.83 | 0.58 | 21.60 (2.52) | 0.58 (0.12) | |
|
| 30,268,355 | 22.63 |
|
|
|
| |
| - | Asymmetric ED-FCN [ | 3,350,243 | 7.74 | 19.38 | 0.50 | 19.28 (2.18) | 0.50 (0.11) |
Figure 4Sample inference results.
Numbers of pooling and convolution operations; and denote numbers of pooling and convolution operations at level L, respectively; P and B denote cumulative numbers of pooling and convolution operations, respectively.
| Level ( |
|
|
|
|
|---|---|---|---|---|
| 4 | 0 | 2 | 0 | 2 |
| 3 | 1 | 2 | 1 | 4 |
| 2 | 1 | 2 | 2 | 6 |
| 1 | 1 | 2 | 3 | 8 |
| 0 | 1 | 2 | 4 | 10 |
Size of Receptive Field and Sparseness at Each Level of Encoder.
| Level ( | Size of Receptive Field | Sparseness (%) |
|---|---|---|
| 4 | 5 × 5 | 42.63 |
| 3 | 18 × 18 | 8.72 |
| 2 | 52 × 52 | 0.92 |
| 1 | 136 × 136 | 0.00 |