| Literature DB >> 36081122 |
Lijuan Shi1, Guoying Wang1, Lufeng Mo1, Xiaomei Yi1, Xiaoping Wu2, Peng Wu1.
Abstract
Semantic segmentation of standing trees is important to obtain factors of standing trees from images automatically and effectively. Aiming at the accurate segmentation of multiple standing trees in complex backgrounds, some traditional methods have shortcomings such as low segmentation accuracy and manual intervention. To achieve accurate segmentation of standing tree images effectively, SEMD, a lightweight network segmentation model based on deep learning, is proposed in this article. DeepLabV3+ is chosen as the base framework to perform multi-scale fusion of the convolutional features of the standing trees in images, so as to reduce the loss of image edge details during the standing tree segmentation and reduce the loss of feature information. MobileNet, a lightweight network, is integrated into the backbone network to reduce the computational complexity. Furthermore, SENet, an attention mechanism, is added to obtain the feature information efficiently and suppress the generation of useless feature information. The extensive experimental results show that using the SEMD model the MIoU of the semantic segmentation of standing tree images of different varieties and categories under simple and complex backgrounds reaches 91.78% and 86.90%, respectively. The lightweight network segmentation model SEMD based on deep learning proposed in this paper can solve the problem of multiple standing trees segmentation with high accuracy.Entities:
Keywords: attention mechanism; deep learning; semantic segmentation; standing tree image
Mesh:
Year: 2022 PMID: 36081122 PMCID: PMC9460454 DOI: 10.3390/s22176663
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Flowchart of standing tree image segmentation.
Figure 2Overall architecture of the SEMD model.
Figure 3One-dimensional view of atrous convolution. (a) Convolution. (b) Atrous convolution.
Figure 4Depthwise separable convolution.
MobileNetV2 network structure.
| Input | Operator | t | c | n | s |
|---|---|---|---|---|---|
| 2242 × 3 | Conv2d | - | 32 | 1 | 2 |
| 1122 × 32 | Bottleneck | 1 | 16 | 1 | 1 |
| 1122 × 16 | Bottleneck | 6 | 24 | 2 | 2 |
| 562 × 24 | Bottleneck | 6 | 32 | 3 | 2 |
| 282 × 32 | Bottleneck | 6 | 64 | 4 | 2 |
| 142 × 64 | Bottleneck | 6 | 96 | 3 | 1 |
| 142 × 96 | Bottleneck | 6 | 160 | 3 | 2 |
| 72 × 160 | Bottleneck | 6 | 320 | 1 | 1 |
| 72 × 320 | Conv2d 1 × 1 | - | 1280 | 1 | 1 |
| 72 × 1280 | Avgpool 7 × 7 | - | - | 1 | - |
| 1 × 1 × 1280 | Conv2d 1 × 1 | - | k | - |
Figure 5Structure of SENet.
MobileNetV2+ SENet network structure.
| Input | Operator | t | c | n | s |
|---|---|---|---|---|---|
| 2242 × 3 | Conv2d | - | 32 | 1 | 2 |
| 1122 × 32 | Bottleneck | 1 | 16 | 1 | 1 |
| 1122 × 16 | Bottleneck | 6 | 24 | 2 | 2 |
| 562 × 24 | Bottleneck | 6 | 32 | 3 | 2 |
| 282 × 32 | Bottleneck | 6 | 64 | 4 | 2 |
| 142 × 64 | Bottleneck | 6 | 96 | 3 | 1 |
| 142 × 96 | Bottleneck | 6 | 160 | 3 | 2 |
| 72 × 160 | Bottleneck | 6 | 320 | 1 | 1 |
Experimental software and hardware configuration.
| Project | Detail |
|---|---|
| CPU | AMD Ryzen 7 5800H with Radeon Graphics @3.20 GHz |
| RAM | 16 GB |
| Operating system | Windows 11 64-bit |
| CUDA | CUDA 11.3 |
| Data processing | Python 3.6 |
Figure 6Data augmentation. (a) Original image. (b) Rotated image. (c) Flipped vertically. (d) Flipped horizontally.
Figure 7Loss function diagram of standing tree segmentation.
The hyper-parameters.
| Project | Value |
|---|---|
| Epoch | 100 |
| Batch size | 8 |
| Lr | 5 × 10−5 |
| Input-shape | 512 × 512 |
Figure 8Overlay of segmentation results. (a) Original image. (b) Segmentation process. (c) Mask overlay.
Figure 9Comparison of standing tree segmentation under simple backgrounds. (a) Original image. (b) Ground-truth. (c) FCN. (d) SegNet. (e) U-Net. (f) PSPNet. (g) DeepLabV3. (h) SEMD.
Figure 10Comparison result of standing tree segmentation under complex backgrounds. (a) Original image. (b) Ground-truth. (c) FCN. (d) SegNet. (e) U-Net. (f) PSPNet. (g) DeepLabV3. (h) SEMD.
Performance comparison in a simple background.
| Model | Category | MIoU (%) | MPA (%) |
|---|---|---|---|
| FCN | Single tree | 73.11 | 91.23 |
| Multiple trees | 71.52 | 90.46 | |
| SegNet | Single tree | 75.36 | 91.78 |
| Multiple trees | 74.89 | 90.08 | |
| U-Net | Single tree | 87.60 | 93.82 |
| Multiple trees | 86.43 | 92.13 | |
| PSPNet | Single tree | 73.23 | 80.36 |
| Multiple trees | 66.35 | 85.73 | |
| DeepLabV3+ | Single tree | 92.15 | 95.57 |
| Multiple trees | 85.62 | 94.85 | |
| SEMD | Single tree | 93.22 | 96.71 |
| Multiple trees | 89.21 | 94.23 |
Performance comparison under complex background.
| Model | Category | MIoU (%) | MPA (%) |
|---|---|---|---|
| FCN | Single tree | 72.64 | 86.75 |
| Multiple trees | 71.35 | 85.32 | |
| SegNet | Single tree | 74.83 | 87.01 |
| Multiple trees | 73.69 | 86.47 | |
| U-Net | Single tree | 76.48 | 85.79 |
| Multiple trees | 81.46 | 89.22 | |
| PSPNet | Single tree | 72.69 | 82.34 |
| Multiple trees | 70.62 | 81.78 | |
| DeepLabV3+ | Single tree | 83.18 | 90.17 |
| Multiple trees | 79.23 | 90.46 | |
| SEMD | Single tree | 87.45 | 94.72 |
| Multiple trees | 86.36 | 92.23 |
PSNR (dB) comparison of different models.
| Background | Sample | FCN | SegNet | U-Net | PSPNet | DeepLabV3+ | SEMD |
|---|---|---|---|---|---|---|---|
| Simple | 1 | 15.76 | 16.75 | 17.49 | 16.49 | 20.59 | 23.42 |
| 2 | 15.32 | 15.86 | 17.06 | 15.95 | 20.94 | 23.96 | |
| 3 | 16.77 | 15.34 | 19.98 | 16.86 | 21.38 | 25.42 | |
| Complex | 1 | 14.69 | 15.98 | 16.23 | 14.76 | 20.46 | 26.32 |
| 2 | 14.32 | 15.33 | 15.46 | 14.23 | 19.58 | 22.20 | |
| 3 | 15.13 | 14.62 | 15.74 | 14.06 | 19.63 | 22.48 |
Comparison results of different model architectures.
| Model | MIoU (%) | Speed (s) | Size (MB) |
|---|---|---|---|
| SE-M-D | 85.21 | 0.53 | 94.30 |
| SE-M+D | 88.63 | 0.27 | 22.30 |
| SEMD | 90.35 | 0.16 | 22.00 |
Average processing time for each method.
| Model | Training Time (h:min) | Inference Time (s) |
|---|---|---|
| FCN | 1:25 | 0.55 |
| SegNet | 1:06 | 0.63 |
| U-Net | 0:53 | 0.23 |
| PSPNet | 2:16 | 0.65 |
| DeepLabV3+ | 1:32 | 0.53 |
| SEMD | 1:03 | 0.16 |
Comparison results of different input-shape.
| Input Shape | MIoU (%) | Speed (s) |
|---|---|---|
| 512 × 512 | 90.35 | 0.16 |
| 321 × 321 | 89.96 | 0.12 |
| 415 × 415 | 90.07 | 0.15 |