| Literature DB >> 31561639 |
Weiping Liu1, Jia Sun2, Wanyi Li3, Ting Hu4, Peng Wang5.
Abstract
Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as Light Detection and Ranging (LIDAR) and RGB-D cameras. Being unordered and irregular, many researchers focused on the feature engineering of the point cloud. Being able to learn complex hierarchical structures, deep learning has achieved great success with images from cameras. Recently, many researchers have adapted it into the applications of the point cloud. In this paper, the recent existing point cloud feature learning methods are classified as point-based and tree-based. The former directly takes the raw point cloud as the input for deep learning. The latter first employs a k-dimensional tree (Kd-tree) structure to represent the point cloud with a regular representation and then feeds these representations into deep learning models. Their advantages and disadvantages are analyzed. The applications related to point cloud feature learning, including 3D object classification, semantic segmentation, and 3D object detection, are introduced, and the datasets and evaluation metrics are also collected. Finally, the future research trend is predicted.Entities:
Keywords: application of point cloud; deep learning; feature learning; point cloud
Year: 2019 PMID: 31561639 PMCID: PMC6806315 DOI: 10.3390/s19194188
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Related works on surveys of point clouds and their application.
| Reference | Main Contents |
|---|---|
| Nygren et al. 2016 [ | The traditional algorithms for 3D point cloud segmentation and classification |
| Nguyen et al. 2013 [ | The segmentation methods for the 3D point cloud. |
| Ahmed et al. 2018 [ | The 3D data from Euclidean and the non-Euclidean geometry and a discussion on how to apply deep learning to the 3D dataset. |
| Hana et al. 2018 [ | The feature descriptors of point clouds with three classes, i.e., local-based, global-based, and hybrid-based. |
| Garcia et al. 2017 [ | The semantic segmentation methods based on deep learning. |
| Bronstein et al. 2017 [ | The problems of geometric deep learning, extending grid-like deep learning methods to non-Euclidean structures. |
| Griffiths et al. 2019 [ | The classification models for processing 3D unstructured Euclidean data. |
Figure 1The main models for feature learning with raw point clouds as input.
Figure 2The architecture of a recurrent neural network (RNN).
Figure 3The architecture of an autoencoder.
Available point cloud datasets for classification, segmentation, and object detection.
| Datasets Name | Descriptions | Application Tasks |
|---|---|---|
| ModelNet40 [ | It consists of 12,311 CAD models in 40 object classes. | 3D object classification [ |
| ShapeNet part [ | There are 16,881 shapes represented by 3D CAD models in 16 categories with a total of 50 parts annotated. | Part segmentation [ |
| Stanford 3D semantic parsing [ | This dataset has 271 rooms in six areas captured by 3D Matterport scanners captured by Matterport Camera. | Semantic Segmentation [ |
| SHREC15 [ | There are 1200 shapes in 50 categories by scanning real human participants and using 3D modeling software [ | Non-rigid shape classification [ |
| SHREC16 [ | It contains about 51,300 3D models in 55 categories. | 3D shape retrieval [ |
| ScanNet [ | There are 1513 scanned and reconstructed indoor scenes. | Virtual scan generation [ |
| S3DIS [ | It consists of 271 rooms in six areas captured by 3D Matterport scanners. | 3D semantic segmentation [ |
| TU-Berlin [ | It has sketches in 250 categories. Each category has 80 sketches. | Classification [ |
| ShapeNetCore [ | It has 51,300 3D shapes in 55 categories, which is indicated by triangular meshes. The dataset is labeled manually and a subset of the ShapeNet dataset. | 3D shape retrieval task [ |
| ModelNet10 [ | The 10-class of Model-Net (ModelNet10) benchmarks are used for 3D shape classification. They contain 4,899 and 12,311 models respectively. | Object classification [ |
| RueMonge2014 [ | The images are multi-view in high-resolution images from a street in Paris and the number of these images is 428. | 3D point cloud labeling [ |
| 3DMatch Benchmark [ | It contains a total of 62 scenes. | Point Cloud representation [ |
| KITTI-3D Object Detection [ | There are 16 classes, including 40,000 objects in 12,000 images captured by a Velodyne laser scanner. | 3D object detection [ |
| vKITTI [ | This dataset includes a sparse point cloud captured by LiDAR without color information. It can be used for generalization verification, but it cannot be used for supervised training. | Semantic segmentation [ |
| 3DRMS [ | This dataset comes from the challenge of combining 3D and semantic information in complex scenarios and was captured by a robot that drove through a semantically rich garden with beautiful geometric details. | Semantic segmentation [ |
| Cornell RGBD Dataset | It has 52 labeled point cloud indoor scenes including 24 office scenes and 28 family scenarios with the Microsoft Kinect sensor. The data set consists of approximately 550 views with 2495 segments marked with 27 object classes. | Segmentation [ |
| VMR-Oakland dataset | It contains point clouds captured by mobile platforms with Navlab11 around the Carnegie Mellon University (CMU) campus. | Segmentation [ |
| Robot 3D Scanning Repository | The 3D point clouds acquired by Cyberware 3030 MS are provided for both indoor and outdoor environments. Heat and color information is included in some datasets. | Segmentation [ |
| ATG4D [ | There are over 1.2 million, 5,969, and 11,969 frames in the training, validation, and test datasets, respectively. This dataset is captured by a PrimeSense sensor. | Point object detection [ |
| Paris-Lille-3D [ | There are 50 classes in 143.1M point clouds acquired by Mobile Laser Scanning. | Segmentation and classification [ |
| Semantic3D [ | There are eight classes in 1660M point clouds acquired by static LIDAR scanners. | Semantic segmentation [ |
| Paris-rueMadame [ | There are 17 classes in 20M point clouds acquired by static LIDAR. | Segmentation, classification, and detection [ |
| IQmulus [ | There are 22 classes in 12M point clouds acquired by static LIDAR. | Classification and detection [ |
| MLS 1 - TUM City Campus [ | There are more than 16,000 scans captured by mobile laser scanning (MLS) in this dataset. | 3D detection [ |
Classification performance on point cloud with different models.
| Methods | ModelNet 10 | ModelNet 40 | |||
|---|---|---|---|---|---|
| Class Accuracy | Instance Accuracy | Class Accuracy | Instance Accuracy | Training | |
| PointNet [ | - | - | 86.2 | 89.2 | 3–6 h |
| PointNet++ [ | - | - | - | 91.9 | 20 h |
| Deepsets [ | - | - | - | 90.0 | - |
| SO-Net [ |
|
| 90.8 |
| 3 h |
| Dynamic Graph CNN [ | - | - | - | 92.2 | - |
| PointCNN [ | - | - |
| - | - |
| Kd-Net [ | 93.5 | 94.0 | 88.5 | 91.8 | 120 h |
| 3DContextNet [ | - | - | - | 91.1 | - |
| MRTNet [ | - | - | - | 91.7 | - |
| SPLATNet [ | - | - | 83.7 | 86.4 | - |
| FoldingNet [ | - | 94.4 | - | 88.4 | - |
| NeuralSampler [ | - | 95.3 | - | 88.7 | - |
Evaluation performance regarding for semantic segmentation on the ShapeNet part dataset [6,45,46,48,51,58,59,81].
| Intersection over Union (IoU) | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | Air- Place | Bag | Cap | Car | Chair | Ear- Phone | Guitar | Knife | |
| PointNet [ | 83.7 | 83.4 | 78.7 | 82.5 | 74.9 | 89.6 | 73.0 | 91.5 | 85.9 |
| PointNet++ [ | 85.1 | 82.4 | 79.0 | 87.7 | 77.3 | 90.8 | 71.8 | 91.0 | 85.9 |
| SO-Net [ | 84.6 | 81.9 | 83.5 | 84.8 | 78.1 | 90.8 | 72.2 | 90.1 | 83.6 |
| Dynamic Graph CNN [ | 85.1 | 84.2 | 83.7 | 84.4 | 77.1 | 90.9 | 78.5 | 91.5 | 87.3 |
| Kd-Net [ | 82.3 | 80.1 | 74.6 | 74.3 | 70.3 | 88.6 | 73.5 | 90.2 | 87.2 |
| 3DContextNet [ | 84.3 | 83.3 | 78.0 | 84.2 | 77.2 | 90.1 | 73.1 | 91.6 | 85.9 |
| MRTNet [ | 79.3 | 81.0 | 76.7 | 87.0 | 73.8 | 89.1 | 67.6 | 90.6 | 85.4 |
| SPLATNet [ | 83.7 | 85.4 | 83.2 | 84.3 | 89.1 | 80.3 | 90.7 | 75.5 | 93.1 |
Evaluation for segmentation for semantic segmentation on point cloud on ShapeNet part dataset [6,45,46,48,51,58,59,81].
| Intersection over Union (IoU) | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | Lamp | Laptop | Motor | Mug | Pistol | Rocket | Skate | Table | |
| PointNet [ | 83.7 | 80.8 | 95.3 | 65.2 | 93.0 | 81.2 | 57.9 | 72.8 | 80.6 |
| PointNet++ [ | 85.1 | 83.7 | 95.3 | 71.6 | 94.1 | 81.3 | 58.7 | 76.4 | 82.6 |
| SO-Net [ | 84.6 | 82.3 | 95.2 | 69.3 | 94.2 | 80.0 | 51.6 | 73.1 | 82.6 |
| Dynamic Graph CNN [ | 85.1 | 82.9 | 96.0 | 67.8 | 93.3 | 82.6 | 59.7 | 75.5 | 82.0 |
| Kd-Net [ | 82.3 | 81.0 | 94.9 | 57.4 | 86.7 | 78.1 | 51.8 | 69.9 | 80.3 |
| 3DContextNet [ | 84.3 | 81.4 | 95.4 | 69.1 | 92.3 | 81.7 | 60.8 | 71.8 | 81.4 |
| MRTNet [ | 79.3 | 80.6 | 95.1 | 64.4 | 91.8 | 79.7 | 57.0 | 69.1 | 80.6 |
| SPLATNet [ | 83.7 | 83.9 | 96.3 | 75.6 | 95.8 | 83.8 | 64.0 | 75.5 | 81.8 |
Point cloud object detection results [93,110]. mAPScanNet, mAPSUN RGB-D, and mAP3D results on ScanNet, SUN RGB-D, and KITTI datasets with only the ‘Car’ category.
| Model | Feature Extraction | mAPScanNet | mAPSUN RGB-D | mAP3D | ||
|---|---|---|---|---|---|---|
| Easy | Moderate | Hard | ||||
| FVNet [ | PointNet | - | - | 65.43 | 57.34 | 51.85 |
| VoxelNet [ | - | - | - | 81.97 | 65.46 | 62.85 |
| PointRCNN [ | PointNet++, multi-scale grouping | - | - | 88.88 | 78.63 | 77.38 |
| F-PointNet [ | PointNet++ | - | - | 81.20 | 70.39 | 62.19 |
| MVX-Net [ | VoxelNet | - | - | 83.20 | 72.70 | 65.20 |
| Deep Hough voting model [ | PointNet++ | 46.80 | 57.70 | - | - | - |