| Literature DB >> 29966392 |
Huasheng Huang1,2, Yubin Lan3,4, Jizhong Deng5,6, Aqing Yang7, Xiaoling Deng8,9, Lei Zhang10,11, Sheng Wen12,13.
Abstract
Weed control is necessary in rice cultivation, but the excessive use of herbicide treatments has led to serious agronomic and environmental problems. Suitable site-specific weed management (SSWM) is a solution to address this problem while maintaining the rice production quality and quantity. In the context of SSWM, an accurate weed distribution map is needed to provide decision support information for herbicide treatment. UAV remote sensing offers an efficient and effective platform to monitor weeds thanks to its high spatial resolution. In this work, UAV imagery was captured in a rice field located in South China. A semantic labeling approach was adopted to generate the weed distribution maps of the UAV imagery. An ImageNet pre-trained CNN with residual framework was adapted in a fully convolutional form, and transferred to our dataset by fine-tuning. Atrous convolution was applied to extend the field of view of convolutional filters; the performance of multi-scale processing was evaluated; and a fully connected conditional random field (CRF) was applied after the CNN to further refine the spatial details. Finally, our approach was compared with the pixel-based-SVM and the classical FCN-8s. Experimental results demonstrated that our approach achieved the best performance in terms of accuracy. Especially for the detection of small weed patches in the imagery, our approach significantly outperformed other methods. The mean intersection over union (mean IU), overall accuracy, and Kappa coefficient of our method were 0.7751, 0.9445, and 0.9128, respectively. The experiments showed that our approach has high potential in accurate weed mapping of UAV imagery.Entities:
Keywords: Deep Fully Convolutional Network; UAV; remote sensing; semantic labeling; weed mapping
Year: 2018 PMID: 29966392 PMCID: PMC6069478 DOI: 10.3390/s18072113
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The general location of the study site.
Figure 2An overview of the weed patches in the field.
Technical characteristics of the UAV platform.
| Parameters | Specifications |
|---|---|
| Weight (battery included) | 1380 g |
| Max Flight Time | 28 min |
| Max speed | 20 m/s |
| Typical operating altitude | 10–300 m |
| Resolution | 4000 × 3000 pixels |
| Len | 35 mm |
| Typical spatial resolution (at 6 m altitude) | 0.3 cm |
Figure 3An example of the UAV imagery collected in the experiment.
Figure 4Three samples of our dataset: (a) aerial images; (b) corresponding GT labels.
Figure 5The work-flow of our methodology.
Architectures of ResNet before and after adaption.
| Baseline Architecture (before Adaption) | Modified Architecture (after Adaption) | |||
|---|---|---|---|---|
| Layer Name | Size of Output | Layer Type | Size of Output | Layer Type |
|
| 500 × 500 | 7 × 7, 64 (stride 2) | 500 × 500 | 7 × 7, 64 (stride 2) |
| max-pooling (stride 2) | max pooling (stride 2) | |||
|
| 250 × 250 |
| 250 × 250 |
|
|
| 125 × 125 |
| 125 × 125 |
|
|
| 64 × 64 |
| 125 × 125 |
|
|
| 32 × 32 |
| 125 × 125 |
|
|
| 1 × 1 | 1000-d fc, softmax | 125 × 125 | 3 × 3, 3 |
Architectures of VGG-16 net before and after adaption.
| Baseline Architecture (Before Adaption) | Modified Architecture (After Adaption) | |||
|---|---|---|---|---|
| Layer Name | Size of Output | Layer Type | Size of Output | Layer Type |
|
| 1000 × 1000 | [3 × 3, 64] × 2 | 1000 × 1000 | [3 × 3, 64] × 2 |
|
| 500 × 500 | max-pooling (stride 2) | 500 × 500 | max-pooling (stride 2) |
|
| 500 × 500 | [3 × 3, 128] × 2 | 500 × 500 | [3 × 3, 128] × 2 |
|
| 250 × 250 | max-pooling (stride 2) | 250 × 250 | max-pooling (stride 2) |
|
| 250 × 250 | [3 × 3, 256] × 3 | 250 × 250 | [3 × 3, 256] × 3 |
|
| 125 × 125 | max-pooling (stride 2) | 125 × 125 | max-pooling (stride 2) |
|
| 125 × 125 | [3 × 3, 512] × 3 | 125 × 125 | [3 × 3, 512] × 3 |
|
| 64 × 64 | max-pooling (stride 2) | 125 × 125 | max-pooling (stride 1) |
|
| 64 × 64 | [3 × 3, 512] × 3 | 125 × 125 | [3 × 3, 512] × 3 |
|
| 32 × 32 | max-pooling (stride 2) | 125 × 125 | max-pooling (stride 1) |
|
| 1 × 1 | 1000-d fc, softmax | 125 × 125 | 3 × 3, 3 |
Figure 6Illustration of two types of convolution. (a) Standard convolution. (b) Atrous convolution.
Figure 7Illustration of multi-scale processing.
Figure 8Illustration of the fully connected CRF.
Figure 9Illustration of skip architecture for FCN-8s. Only the pooling and prediction layers are shown, and other types of layers are ignored in this figure.
Experimental results of different baseline architectures.
| Approach | Mean IU | Overall Accuracy | Kappa |
|---|---|---|---|
| ResNet-101 | 0.7668 | 0.9409 | 0.9076 |
| VGG-16 net | 0.7478 | 0.9350 | 0.8979 |
Figure 10The training process with and without transfer learning.
Experimental results of transfer learning.
| Approach | Mean IU | Overall Accuracy | Kappa |
|---|---|---|---|
| ResNet-101 with transfer learning | 0.7668 | 0.9409 | 0.9076 |
| ResNet-101 without transfer learning | 0.6959 | 0.8995 | 0.8417 |
Experimental results of ASPP.
| Approach | Mean IU | Overall Accuracy | Kappa |
|---|---|---|---|
| ASPP-12 | 0.7660 | 0.9395 | 0.9054 |
| ASPP-S | 0.7703 | 0.9397 | 0.9059 |
| ASPP-L | 0.7668 | 0.9409 | 0.9076 |
| ASPP-1 | 0.7721 | 0.9423 | 0.9094 |
Figure 11The classification results obtained by ASPP-1 before and after CRFs. (a) aerial images; (b) corresponding GT labels; (c) output by ASPP-1 before CRFs; (d) output by ASPP-1 after CRFs.
Experimental results of CRF.
| Approach | Mean IU | Overall Accuracy | Kappa |
|---|---|---|---|
| ASPP-12 before CRF | 0.7660 | 0.9395 | 0.9054 |
| ASPP-12 after CRF | 0.7690 | 0.9415 | 0.9084 |
| ASPP-S before CRF | 0.7703 | 0.9397 | 0.9059 |
| ASPP-S after CRF | 0.7731 | 0.9417 | 0.9088 |
| ASPP-L before CRF | 0.7668 | 0.9409 | 0.9076 |
| ASPP-L after CRF | 0.7674 | 0.9433 | 0.9112 |
| ASPP-1 before CRF | 0.7721 | 0.9423 | 0.9094 |
| ASPP-1 after CRF | 0.7751 | 0.9445 | 0.9128 |
Figure 12The classification results obtained by methods in comparison. (a) aerial images; (b) corresponding GT labels; (c) output by Pixel-based-SVM; (d) output by FCN-8s; (e) output by ASPP-1 without CRFs; (f) output by ASPP-1 with CRFs.
Classification results of FCN-8s and our approach. The metrics of speed is measured in seconds by the inference time of a single image.
| Approach | Mean IU | Overall Accuracy | Kappa | Speed |
|---|---|---|---|---|
| Pixel-based-SVM | 0.6549 | 0.8513 | 0.6451 | 233.7187 s |
| FCN-8s | 0.7478 | 0.9350 | 0.8979 | 0.1406 s |
| ASPP-1 without CRF | 0.7721 | 0.9423 | 0.9094 | 0.2916 s |
| ASPP-1 with CRF | 0.7751 | 0.9445 | 0.9128 | 2.9171 s |
Confusion matrix of FCN-8s and our approach.
| Approach | GT/Predicted Class | Others | Rice | Weeds |
|---|---|---|---|---|
| Pixel-based-SVM | others |
| 0.054 | 0.048 |
| rice | 0.037 |
| 0.098 | |
| weeds | 0.141 | 0.128 |
| |
| FCN-8s | others |
| 0.050 | 0.010 |
| rice | 0.027 |
| 0.017 | |
| weeds | 0.063 | 0.054 |
| |
| ASPP-1 before CRF | others |
| 0.034 | 0.016 |
| rice | 0.039 |
| 0.017 | |
| weeds | 0.050 | 0.025 |
|