| Literature DB >> 33113788 |
Ziran Ye1, Bo Si1, Yue Lin1, Qiming Zheng1,2, Ran Zhou1, Lu Huang3, Ke Wang1,3.
Abstract
New ongoing rural construction has resulted in an extensive mixture of new settlements with old ones in the rural areas of China. Understanding the spatial characteristic of these rural settlements is of crucial importance as it provides essential information for land management and decision-making. Despite a great advance in High Spatial Resolution (HSR) satellite images and deep learning techniques, it remains a challenging task for mapping rural settlements accurately because of their irregular morphology and distribution pattern. In this study, we proposed a novel framework to map rural settlements by leveraging the merits of Gaofen-2 HSR images and representation learning of deep learning. We combined a dilated residual convolutional network (Dilated-ResNet) and a multi-scale context subnetwork into an end-to-end architecture in order to learn high resolution feature representations from HSR images and to aggregate and refine the multi-scale features extracted by the aforementioned network. Our experiment in Tongxiang city showed that the proposed framework effectively mapped and discriminated rural settlements with an overall accuracy of 98% and Kappa coefficient of 85%, achieving comparable and improved performance compared to other existing methods. Our results bring tangible benefits to support other convolutional neural network (CNN)-based methods in accurate and timely rural settlement mapping, particularly when up-to-date ground truth is absent. The proposed method does not only offer an effective way to extract rural settlement from HSR images but open a new opportunity to obtain spatial-explicit understanding of rural settlements.Entities:
Keywords: fully convolutional network; high spatial resolution images; multi-scale context; rural settlements
Mesh:
Year: 2020 PMID: 33113788 PMCID: PMC7662595 DOI: 10.3390/s20216062
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1GaoFen-2 image of Tongxiang study area on July 2016. Example of (a) low-density rural settlement and (b) high-density rural settlement.
Figure 2Flowchart of the proposed research framework: (A) generate data sets, (B) model training, and (C) accuracy assessment.
Figure 3Overview of the proposed detection architecture. (A) the Dilated-ResNet extracted multi-level features with high spatial resolution; (B) the context subnetwork exploited the multi-scale context and mapped features to desired outputs.
Figure 4The (a) Tongxiang data set used in the experiments. (b) Example of test samples.
The number of testing samples.
| LDS | HDS | Backgrounds | Sum | |
|---|---|---|---|---|
| Point-based testing samples | 6125 | 2616 | 2887 | 11,628 |
| Polygon-based testing samples | 1831 | 438 | / | 2269 |
Figure 5Classification result of the polygon test area.
Confusion matrix of point test set.
| Predicted Class | |||||
|---|---|---|---|---|---|
| LDS | HDS | Backgrounds | Sum | ||
| Ground truth | LDS | 5997 | 3 | 125 | 6125 |
| HDS | 4 | 2551 | 61 | 2616 | |
| Backgrounds | 4 | 0 | 2883 | 2887 | |
| Sum | 6005 | 2554 | 3069 | 11,628 | |
| UA | 99.87% | 99.88% | 93.94% | ||
| PA | 97.91% | 97.52% | 99.86% | ||
| OA | 98.31% | ||||
| Kappa | 0.9724 | ||||
Confusion matrix of polygon test set (m2).
| Predicted Class | |||||
|---|---|---|---|---|---|
| LDS | HDS | Backgrounds | Sum | ||
| Ground truth | LDS | 720,551 | 9228 | 118,198 | 847,977 |
| HDS | 2673 | 349,060 | 60,476 | 412,209 | |
| Backgrounds | 95,539 | 51,323 | 24,231,862 | 24,378,724 | |
| Sum | 818,763 | 409,611 | 24,410,536 | 25,638,910 | |
| UA | 88.00% | 85.22% | 99.27% | ||
| PA | 84.97% | 84.68% | 99.40% | ||
| OA | 98.68% | ||||
| Kappa | 0.8591 | ||||
Model comparisons with baseline, where values in bold are the best.
| OA | UA | PA | Kappa | |||
|---|---|---|---|---|---|---|
| LDS | HDS | LDS | HDS | |||
| Res50Seg (Baseline) | 98.36% | 82.50% | 80.45% | 83.30% | 67.75% | 0.8329 |
| +Dilation | 98.39% | 84.25% | 78.76% | 80.53% | 76.90% | 0.8363 |
| +Dilation+Multiscale | 98.53% | 87.24% | 84.88% | 81.90% | 83.19% | 0.8513 |
| +Dilation+Multiscale+SE (Ours) |
|
|
|
|
|
|
Figure 6Visualization of test set samples before (A) and after recalibration (B) with SE block. Different colors represent different categories.
Figure 7Accuracy assessment of different data input strategies.
Figure 8Example of results on Tongxiang polygon test set. (a) Original images, (b) OBIA, (c) FCN, (d) UNet, (e) SegNet, (f) DeeplabV3+, (g) The proposed method.
Accuracy assessment of different methods, where values in bold are the best.
| Method | OA | UA | PA | Kappa | ||
|---|---|---|---|---|---|---|
| LDS | HDS | LDS | HDS | |||
| OBIA | 97.54% | 75.24% | 71.44% | 72.24% | 79.95% | 0.7397 |
| FCN | 97.46% | 73.11% | 75.44% | 70.28% | 55.46% | 0.7205 |
| UNet | 98.39% | 84.58% | 77.08% | 80.32% | 66.45% | 0.8245 |
| SegNet | 98.37% | 84.06% | 78.51% | 80.20% | 68.79% | 0.8232 |
| DeeplabV3+ | 98.69% | 87.92% | 83.43% |
| 82.93% | 0.8520 |
| Ours |
|
|
| 84.97% |
|
|
The efficiency of different methods.
| Method | Parameters | Training Time | Inference Time |
|---|---|---|---|
| OBIA | ~0.5 h | ~10 m | |
| FCN | 12.38 million | ~3.1 h | 0 m 17 s |
| UNet | 33.40 million | ~11.8 h | 0 m 39 s |
| SegNet | 29.44 million | ~ 8.2 h | 0 m 31 s |
| DeeplabV3+ | 39.76 million | ~12.9 h | 0 m 32 s |
| Ours | 28.04 million | ~5.8 h | 0 m 25 s |