| Literature DB >> 30115845 |
Yujin Chen1, Ruizhi Chen2,3, Mengyun Liu4, Aoran Xiao5, Dewen Wu6, Shuheng Zhao7.
Abstract
Indoor localization is one of the fundamentals of location-based services (LBS) such as seamless indoor and outdoor navigation, location-based precision marketing, spatial cognition of robotics, etc. Visual features take up a dominant part of the information that helps human and robotics understand the environment, and many visual localization systems have been proposed. However, the problem of indoor visual localization has not been well settled due to the tough trade-off of accuracy and cost. To better address this problem, a localization method based on image retrieval is proposed in this paper, which mainly consists of two parts. The first one is CNN-based image retrieval phase, CNN features extracted by pre-trained deep convolutional neural networks (DCNNs) from images are utilized to compare the similarity, and the output of this part are the matched images of the target image. The second one is pose estimation phase that computes accurate localization result. Owing to the robust CNN feature extractor, our scheme is applicable to complex indoor environments and easily transplanted to outdoor environments. The pose estimation scheme was inspired by monocular visual odometer, therefore, only RGB images and poses of reference images are needed for accurate image geo-localization. Furthermore, our method attempts to use lightweight datum to present the scene. To evaluate the performance, experiments are conducted, and the result demonstrates that our scheme can efficiently result in high location accuracy as well as orientation estimation. Currently the positioning accuracy and usability enhanced compared with similar solutions. Furthermore, our idea has a good application foreground, because the algorithms of data acquisition and pose estimation are compatible with the current state of data expansion.Entities:
Keywords: CNN features; image geo-localization; image retrieval; indoor positioning; pose estimation
Year: 2018 PMID: 30115845 PMCID: PMC6111796 DOI: 10.3390/s18082692
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of our visual indoor positioning method. The process is composed of (a) database construction; (b) image retrieval; and (c) pose estimation stages.
Composition of database.
| Scene Labels | Color Images | Pose Information | CNN Features |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| … | … | ||
|
|
|
|
|
Figure 2Architecture of VGG16.
Figure 3Convolution layers visualization. The first 16 matrices of each layer were visualized, and empty matrices corresponding to dropped out part in CNN. In order to better visualize features in layers, a viridis color map was employed, so layer maps looked greenish.
Figure 4Image feature vectors visualization. (a) A shows the vector (512 dimensions) of the query image; (b) shows the vector of retrieved image with a top score; (c) shows the vector of an unrelated image in the same scene; and (d) shows the vector of an image from a different scene.
Figure 5Results of feature detection and matching. (a,b) show key-points detected in a pair of images; and (c) shows the first 50 matches.
Figure 6Images in office room scene of ICL-NUIM dataset.
Figure 7Images in living room scene of ICL-NUIM dataset.
Figure 8Images in TUM RGB-D dataset.
We chose images from different scenarios of ICL-NUIM and TUM RGB-D to compose our own experiment dataset.
| Dataset | Scenario | The Number of Raw Images | The Number of Database Images | The Number of Test Images |
|---|---|---|---|---|
| ICL-NUIM | office room | 4602 | 289 | 1495 |
| living room | 4602 | 304 | 1533 | |
| TUM RGB-D | freiburg1_plant | 1141 | 115 | 456 |
| freiburg1_room | 1362 | 91 | 454 | |
| freiburg2_360_hemisphere | 2729 | 273 | 1092 | |
| freiburg2_flowerbouquet | 2972 | 149 | 1188 | |
| freiburg2_pioneer_slam3 | 2544 | 128 | 1017 | |
| freiburg3_long_office_household | 2585 | 130 | 1034 |
The average number of ORB Oriented FAST and Rotated BRIEF) good-match in two datasets.
| Similarity Rank | ICL-NUIM Dataset (Test on 3026 Images) | TUM RGB-D Dataset (Test on 5241 Images) |
|---|---|---|
| 1 | 252.8 | 225.3 |
| 2 | 203.1 | 135.9 |
| 3 | 158.4 | 104.0 |
| Average | 204.8 | 155.1 |
Figure 9Cumulative distribution function of location error.
Figure 10Cumulative distribution function of angle error.
Localization performance in different scenarios from different datasets.
| Dataset | Scenario | The Median Error | The Mean Error | 90% Accuracy |
|---|---|---|---|---|
| ICL-NUIM | office room | 0.07 m 0.01° | 0.31 m 2.47° | 0.35 m 0.83° |
| living room | 0.05 m 0.02° | 0.36 m 4.36° | 0.23 m 1.03° | |
|
| 0.06 m 0.01° | 0.34 m 3.43° | 0.28 m 0.94° | |
| TUM RGB-D | freiburg1_plant | 0.12 m 0.01° | 0.38 m 3.37° | 0.45 m 1.95° |
| freiburg1_room | 0.17 m 0.54° | 0.43 m 4.82° | 0.71 m 4.04° | |
| freiburg2_360_hemisphere | 0.05 m 0.16° | 0.38 m 6.55° | 0.38 m 1.08° | |
| freiburg2_flowerbouquet | 0.07 m 0.12° | 0.15 m 5.32° | 0.26 m 2.54° | |
| freiburg2_pioneer_slam3 | 0.13 m 0.13° | 0.34 m 8.80° | 0.66 m 1.54° | |
| freiburg3_long_office_household | 0.15 m 0.21° | 0.36 m 3.00° | 0.41 m 2.05° | |
|
| 0.10 m 0.16° | 0.32 m 5.58° | 0.45 m 2.03° |
Comparison of average pose estimation error in ICL-NUIM dataset.
| Method | Living Room | Office Room |
|---|---|---|
| PoseNet | 0.60 m, 3.64° | 0.46 m, 2.97° |
| 4D PoseNet | 0.58 m, 3.40° | 0.44 m, 2.81° |
| CNN+LSTM [ | 0.54 m, 3.21° | 0.41 m, 2.66° |
|
|
Comparison of database sizes.
| Method | Database Image per Scene | Median Localization Error |
|---|---|---|
| PoseNet | 3000 | 0.47 m 14.40° |
| 6000 | 0.48 m 7.68° | |
| NNnet [ | 2000 | 0.27 m 11.82° |
| 4000 | 0.24 m 6.35° | |
| VLocNet | 2000 | 0.097 m 6.48° |
| 4000 | 0.036 m 1.71° | |
|
| 148 | 0.102 m 0.164° |
| 289 | 0.062 m 0.011° |