| Literature DB >> 35320265 |
Ying Li1,2, Wenyue Li2,3, Zhijie Zhao2,3, JiaHao Fan1,2.
Abstract
Three-dimensional (3D) image reconstruction is an important field of computer vision for restoring the 3D geometry of a given scene. Due to the demand for large amounts of memory, prevalent methods of 3D reconstruction yield inaccurate results, because of which the highly accuracy reconstruction of a scene remains an outstanding challenge. This study proposes a cascaded depth residual inference network, called DRI-MVSNet, that uses a cross-view similarity-based feature map fusion module for residual inference. It involves three improvements. First, a combined module is used for processing channel-related and spatial information to capture the relevant contextual information and improve feature representation. It combines the channel attention mechanism and spatial pooling networks. Second, a cross-view similarity-based feature map fusion module is proposed that learns the similarity between pairs of pixel in each source and reference image at planes of different depths along the frustum of the reference camera. Third, a deep, multi-stage residual prediction module is designed to generate a high-precision depth map that uses a non-uniform depth sampling strategy to construct hypothetical depth planes. The results of extensive experiments show that DRI-MVSNet delivers competitive performance on the DTU and the Tanks & Temples datasets, and the accuracy and completeness of the point cloud reconstructed by it are significantly superior to those of state-of-the-art benchmarks.Entities:
Mesh:
Year: 2022 PMID: 35320265 PMCID: PMC8942269 DOI: 10.1371/journal.pone.0264721
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of recent methods for 3D reconstruction.
| Method | Reference | Year | Single-view Reconstruction | Multiview Reconstruction |
|---|---|---|---|---|
| Voxel-based Methods | [ | 2021 | Yes | No |
| [ | 2021 | No | Yes | |
| [ | 2021 | No | Yes | |
| Patch-based Methods | [ | 2019 | No | Yes |
| [ | 2021 | No | Yes | |
| [ | 2021 | No | Yes | |
| Point cloud-based Methods | [ | 2020 | Yes | No |
| [ | 2021 | Yes | No | |
| [ | 2021 | Yes | No | |
| Depth map-based Methods | [ | 2020 | No | Yes |
| [ | 2020 | No | Yes | |
| [ | 2021 | No | Yes |
Fig 1The network structure of DRI-MVSNet.
Fig 2The channel and spatial combined processing (CSCP) module.
Fig 3Illustration of SENet.
Fig 4Illustration of strip pooling network.
Fig 5Illustration of cross-view similarity-based feature map fusion module.
Fig 6The non-uniform depth sampling strategy.
R is the total range of search and Δ is the initial depth interval. The number of hypothetical depth planes M at different stages is 64, 32, and 8.
Quantitative results on the DTU dataset (lower is better).
| Method | Acc. | Comp. | Overall |
|---|---|---|---|
| Camp [ | 0.835 | 0.554 | 0.695 |
| Furu [ | 0.613 | 0.941 | 0.777 |
| Tola [ | 0.342 | 1.190 | 0.766 |
| Gipuma [ |
| 0.873 | 0.578 |
| Colmap [ | 0.400 | 0.664 | 0.532 |
| MVSNet [ | 0.396 | 0.527 | 0.462 |
| SurfaceNet [ | 0.450 | 1.040 | 0.745 |
| MVSNet [ | 0.406 | 0.434 | 0.420 |
| R-MVSNet [ | 0.383 | 0.452 | 0.417 |
| PruMVSNet [ | 0.495 | 0.433 | 0.464 |
| MVSCRF [ | 0.371 | 0.426 | 0.398 |
| AttMVS (M = 256) [ | 0.412 | 0.397 | 0.403 |
| PVSNet (Low-Res) [ | 0.408 | 0.393 | 0.4001 |
| SurfaceNet+ [ | 0.385 | 0.448 | 0.416 |
| DRI-MVSNet (Ours) | 0.432 |
|
|
Quantitative results on Tanks & Temples benchmark (higher is better).
| Method | Mean | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train |
|---|---|---|---|---|---|---|---|---|---|
| MVSNet [ | 43.48 | 55.99 | 28.55 | 25.07 | 50.79 | 53.96 | 50.86 | 47.9 | 34.69 |
| R-MVSNet [ | 48.4 | 69.96 | 46.65 | 32.59 | 42.95 | 51.88 | 48.8 | 52 | 42.38 |
| Point-MVSNet [ | 48.27 | 61.79 | 41.15 | 34.24 | 50.79 | 51.97 | 50.85 | 52.38 | 43.06 |
| VA-Point-MVSNet [ | 48.7 | 61.95 | 43.73 | 34.45 | 50.01 | 52.67 | 49.71 | 52.29 | 44.75 |
| MVSNet++ [ | 49.12 | 62.64 | 38.49 | 39.60 | 48.40 |
|
| 52.28 | 44.92 |
| MVSCRF [ | 45.73 | 59.83 | 30.6 | 29.93 | 51.15 | 50.61 | 51.45 | 52.60 | 39.68 |
| Fast-MVSNet [ | 47.39 | 65.18 | 39.59 | 34.98 | 47.81 | 49.16 | 46.20 | 53.27 | 42.91 |
| HighRes-MVSNet [ | 49.81 | 66.62 | 44.17 | 30.84 | 55.13 | 53.20 | 50.32 | 55.45 | 42.73 |
| SurfaceNet+ [ | 49.38 | 62.38 | 32.35 | 29.35 |
| 54.77 | 54.14 | 56.13 | 43.10 |
| DRI-MVSNet (Ours) |
|
|
|
| 53.90 | 48.48 | 46.44 |
|
|
Ablation study of the CSCP, CVSF, and MDRP on the DTU dataset.
| Experiment | Model | Acc. | Comp. | Overall |
|---|---|---|---|---|
| Ablation study of the CSCP | CVSF+MDRP+baseline (without CSCP) | 0.436 |
| 0.380 |
| Ablation study of the CVSF | CSCP+MDRP+baseline (without CVSF) | 0.435 | 0.338 | 0.387 |
| Ablation study of the MDRP | CSCP+CVSF+baseline (without MDRP) | 0.549 | 0.415 | 0.482 |
| DRI-MVSNet | CSCP+CVSF+MDRP+baseline |
| 0.327 |
|
Ablation study of the CSCP, the CVSF and the MDRP on the Tanks & Temples dataset.
| Experiment | Mean | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train |
|---|---|---|---|---|---|---|---|---|---|
| Ablation study of the CSCP | 48.06 | 67.39 | 50.01 | 37.10 | 50.25 | 41.69 |
| 48.35 | 42.07 |
| Ablation study of the CVSF | 49.12 | 69.30 | 53.35 | 35.72 | 51.22 | 48.02 | 44.53 | 51.73 | 39.10 |
| Ablation study of the MDRP | 42.78 | 57.12 | 48.63 | 29.74 | 46.39 | 39.32 | 35.37 | 45.21 | 40.42 |
| DRI-MVSNet |
|
|
|
|
|
| 46.44 |
|
|
Comparisons of time complexity and memory consumption on the DTU dataset.
| Methods | Input Size | Depth number | Depth Map Size | GPU Memory | Runtime | Overall |
|---|---|---|---|---|---|---|
| MVSNet [ | 1600x1184 | 256 | 400x296 | 15.4GB | 1.18s | 0.462 |
| R-MVSNet [ | 1600x1184 | 512 | 400x296 | 6.7GB | 2.35s | 0.417 |
| Point-MVSNet-HiRes [ | 1600x1152 | 96 | 800x576 | 8.9GB | 5.44s |
|
| VA-Point-MVSNet [ | 1280x960 | 96 | 640x480 | 8.7GB | 3.35s | 0.391 |
| DRI-MVSNet (Ours) | 800x576 | 64,32,8 | 800x576 |
|
| 0.379 |