| Literature DB >> 27730029 |
Dongdong Bai1, Chaoqun Wang1, Bo Zhang1, Xiaodong Yi1, Yuhua Tang1.
Abstract
The loop closure detection (LCD) is an essential part of visual simultaneous localization and mapping systems (SLAM). LCD is capable of identifying and compensating the accumulation drift of localization algorithms to produce an consistent map if the loops are checked correctly. Deep convolutional neural networks (CNNs) have outperformed state-of-the-art solutions that use traditional hand-crafted features in many computer vision and pattern recognition applications. After the great success of CNNs, there has been much interest in applying CNNs features to robotic fields such as visual LCD. Some researchers focus on using a pre-trained CNNs model as a method of generating an image representation appropriate for visual loop closure detection in SLAM. However, there are many fundamental differences and challenges involved in character between simple computer vision applications and robotic applications. Firstly, the adjacent images in the dataset of loop closure detection might have more resemblance than the images that form the loop closure. Secondly, real-time performance is one of the most critical demands for robots. In this paper, we focus on making use of the feature generated by CNNs layers to implement LCD in real environment. In order to address the above challenges, we explicitly provide a value to limit the matching range of images to solve the first problem; meanwhile we get better results than state-of-the-art methods and improve the real-time performance using an efficient feature compression method.Entities:
Keywords: CNNs; Feature compression; Loop closure detection
Year: 2016 PMID: 27730029 PMCID: PMC5028405 DOI: 10.1186/s40638-016-0047-x
Source DB: PubMed Journal: Robotics Biomim ISSN: 2197-3768
Architecture of the Places CNNs model and the dimension of the feature of each layer
| Convolutional | Fully connected | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Layer | CONV1 | POOL1 | CONV2 | POOL2 | CONV3 | CONV4 | CONV5 | POOL5 | FC6 | FC7 | FC8 |
| Dimension | 290,400 | 69,984 | 186,624 | 43,264 | 64,896 | 64,896 | 43,264 | 9216 | 4096 | 4096 | 1000 |
Fig. 1Three example images of City Centre dataset. a NO.0349, b NO.1399, c NO.1401
Fig. 2Visual loop closure detection precision−recalls on City Centre dataset and New College dataset use feature generated by pool5 with different value L. a Precision−recalls on City Center dataset, b precision−recalls on New College dataset
Fig. 3Hamming distance over the original feature vectors of 9126 generated by pool5 can be closely approximated by the hamming distance over bit vectors of length 1024 with marginal precision loss. a City Centre dataset, b New College dataset
Runtime comparison between original features and compressed features
| Original pool5 | 128 bits | 256 bits | 512 bits | 1024 bits | 2048 bits | |
|---|---|---|---|---|---|---|
| Feature compression | – | 0.1492 ms | 0.2822 ms | 0.5493 ms | 1.0913 ms | 2.1550 ms |
| Match 1000 candidates | 21.3476 s | 3.3334 s | 3.4750 s | 3.8738 s | 4.5803 s | 5.8495 s |