| Literature DB >> 35877632 |
Neetika Gupta1, Naimul Mefraz Khan1.
Abstract
Two-Dimensional (2D) object detection has been an intensely discussed and researched field of computer vision. With numerous advancements made in the field over the years, we still need to identify a robust approach to efficiently conduct classification and localization of objects in our environment by just using our mobile devices. Moreover, 2D object detection limits the overall understanding of the detected object and does not provide any additional information in terms of its size and position in the real world. This work proposes an object localization solution in Three-Dimension (3D) for mobile devices using a novel approach. The proposed method works by combining a 2D object detection Convolutional Neural Network (CNN) model with Augmented Reality (AR) technologies to recognize objects in the environment and determine their real-world coordinates. We leverage the in-built Simultaneous Localization and Mapping (SLAM) capability of Google's ARCore to detect planes and know the camera information for generating cuboid proposals from an object's 2D bounding box. The proposed method is fast and efficient for identifying everyday objects in real-world space and, unlike mobile offloading techniques, the method is well designed to work with limited resources of a mobile device.Entities:
Keywords: ARCore; object detection; object localization
Year: 2022 PMID: 35877632 PMCID: PMC9323171 DOI: 10.3390/jimaging8070188
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Flowgraph of the proposed method.
Figure 2Screenshots of the mobile application developed based on our framework. (a) The 2D bounding box (green box) of the detected object using SSD-MobileNetV1, (b) the final 3D cuboid computed (light blue and red surfaces represent the top and bottom of the cuboid, respectively, and are joined with dark blue colored line segments) from the 2D bounding box.
Figure 3Vanishing point representation for a cube.
Figure 4Alignment of world and camera coordinate system.
Images used for experiments.
| Object Category | Image No.1 | Image No.2 | Image No.3 | Image No.4 |
|---|---|---|---|---|
| Book |
|
| ||
| Cellphone |
|
| ||
| Chair |
|
|
| |
| Dog |
|
|
| |
| Laptop |
|
|
| |
| Mug |
|
|
|
|
| Potted_Plant |
|
|
| |
| Table |
|
|
| |
| Tennis_Racket |
|
|
|
|
Object predicted by SSD-MobileNetV1 (second column from the left) in the image and the corresponding 3D cuboid output using [26] and our approach.
| Object Category | Object Predicted | Yang and Scherer [ | Ours |
|---|---|---|---|
| Book | TV 56% |
|
|
| Chair | Chair 56% |
|
|
| Dog | Dog 76% |
|
|
| Potted_Plant | Potted Plant 53% |
|
|
3D cuboid generated when 2D bounding box coordinates are obtained using SSD-MobileNetV1 and when defined manually. Note that in the case of object category , there is no object detected by SSD-MobileNetV1 and hence 3D cuboid is not generated.
| Object Category | SSD-MobileNetV1 | Manual | ||
|---|---|---|---|---|
|
|
|
|
| |
| Mug |
|
|
|
|
| Table |
|
|
|
|
| Tennis_Racket |
|
|
|
|
The 3D-IoU results for generated cuboid by [26] and our approach.
| Object Category | Book | Cellphone | Chair | Dog | Laptop | Mug | Potted Plant | Table | Tennis Racket |
|---|---|---|---|---|---|---|---|---|---|
| Yang and Scherer [ | 0.0903 | 0.0036 | 0.2804 | 0.0303 |
|
| 0.0993 |
|
|
| Ours |
|
|
|
| 0.1135 | 0.0238 |
| 0.1847 | 0.0529 |
Figure 5Comparison graph for time taken (in seconds) by [26] and our approach.