| Literature DB >> 33286711 |
Rakesh Chandra Joshi1, Saumya Yadav1, Malay Kishore Dutta1, Carlos M Travieso-Gonzalez2.
Abstract
Visually impaired people face numerous difficulties in their daily life, and technological interventions may assist them to meet these challenges. This paper proposes an artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings. A deep-learning model is trained with multiple images of objects that are highly relevant to the visually impaired person. Training images are augmented and manually annotated to bring more robustness to the trained model. In addition to computer vision-based techniques for object recognition, a distance-measuring sensor is integrated to make the device more comprehensive by recognizing obstacles while navigating from one place to another. The auditory information that is conveyed to the user after scene segmentation and obstacle identification is optimized to obtain more information in less time for faster processing of video frames. The average accuracy of this proposed method is 95.19% and 99.69% for object detection and recognition, respectively. The time complexity is low, allowing a user to perceive the surrounding scene in real time.Entities:
Keywords: YOLO-v3; artificial intelligence; assistive systems; computer vision; deep learning; machine learning; object recognition; visually impaired person
Year: 2020 PMID: 33286711 PMCID: PMC7597210 DOI: 10.3390/e22090941
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Block Diagram for proposed methodology.
Figure 2Different image augmentation techniques on acquired images: (a) Original Image; (b) 90° left rotation; (c) 90° right rotation; (d) 180° Horizontal rotation; (e) Horizontal Flip; (f) Vertical Flip; (g) Increased Brightness; (h) Addition of noise; (i) Low contrast; (j) Background Removal.
Figure 3Network Structure of YOLO-v3.
Figure 4(a) Bounding Box Prediction; (b) Object detection with YOLO-v3.
Figure 5Information Optimization and Object–Obstacle Differentiation.
Figure 6Activity Diagram for the proposed assistive approach for visually impaired people.
Number of images in each class of the collected dataset.
| Total Images in Each Object Class | Original Dataset | After Augmentation | Test Set | ||
|---|---|---|---|---|---|
| Training Set | Validation Set | Training Set | Validation Set | ||
| 650 | 350 | 150 | 3500 | 1500 | 150 |
Performance analysis of proposed model on most relevant objects.
| Objects | Total Testing Images | Correctly Detected | Detection Accuracy (%) | Correctly Recognized | Recognition Accuracy (%) |
|---|---|---|---|---|---|
| Person | 150 | 148 | 98.67 | 148 | 100.00 |
| Car | 150 | 146 | 97.33 | 145 | 99.32 |
| Bus | 150 | 144 | 96.00 | 144 | 100.00 |
| Truck | 150 | 143 | 95.33 | 141 | 98.60 |
| Chair | 150 | 147 | 98.00 | 146 | 99.32 |
| TV | 150 | 140 | 93.33 | 140 | 100.00 |
| Bottle | 150 | 148 | 98.67 | 148 | 100.00 |
| Dog | 150 | 145 | 96.67 | 144 | 99.31 |
| Fire hydrant | 150 | 146 | 97.33 | 146 | 100.00 |
| Stop Sign | 150 | 149 | 99.33 | 147 | 98.66 |
| Socket | 150 | 143 | 95.33 | 143 | 100.00 |
| Pothole | 150 | 129 | 86.00 | 128 | 99.22 |
| Pharmacy | 150 | 141 | 94.00 | 139 | 98.58 |
| Stairs | 150 | 139 | 92.67 | 139 | 100.00 |
| Washroom | 150 | 145 | 96.67 | 145 | 100.00 |
| Wrist Watch | 150 | 140 | 93.33 | 139 | 99.29 |
| Eye glasses | 150 | 141 | 94.00 | 141 | 100.00 |
| Cylinder | 150 | 131 | 87.33 | 131 | 100.00 |
| 10 ₹ Note | 150 | 141 | 94.00 | 141 | 100.00 |
| 20 ₹ Note | 150 | 148 | 98.67 | 148 | 100.00 |
| 50 ₹ Note | 150 | 143 | 95.33 | 143 | 100.00 |
| 100 ₹ Note | 150 | 140 | 93.33 | 140 | 100.00 |
| 200 ₹ Note | 150 | 144 | 96.00 | 144 | 100.00 |
| 500 ₹ Note | 150 | 140 | 93.33 | 140 | 100.00 |
| 2000 ₹ Note | 150 | 149 | 99.33 | 149 | 100.00 |
|
|
|
|
Figure 7Confusion Matrix.
Figure 8Few results after object detection and recognition.
Testing accuracy and frame processing time for proposed and other methods.
| Methods | Testing Accuracy | Frame Processing Time |
|---|---|---|
| AlexNet [ | 83.39 | 0.275 s |
| VGG-16 [ | 86.80 | 0.53 s |
| VGG-19 [ | 90.21 | 0.39 s |
| YOLO-v3 | 95.19 | 0.1 s |
Average time taken for different parameters.
| Parameters | Average Time Taken (s) |
|---|---|
| Object Detection in single frame with GPU | 0.1 |
| Object Detection in single frame in single board DSP processor without GPU | 0.3 |
| Average time of Audio for name of object | 0.4 |
| Average time of Audio for count of object | 0.2 |
| Average time of Audio for name of object with count | 0.6 |
Processing time of each frame in different condition in single board computer.
| Number of Object Class | Number of Instances of Each Object | Total Number of Objects in Frame | Average Time Taken for Object Detection (s) | Average Time of Audio Prompt (s) | Total Time to Process Single Frame (s) |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0.3 | 0 | 0.3 |
| 1 | 1 | 1 | 0.3 | 0.4 | 0.7 |
| 1 | 2 | 2 | 0.3 | 0.6 | 0.9 |
| 1 | 5 | 5 | 0.3 | 0.6 | 0.9 |
| 2 | 1 | 2 | 0.3 | 0.4 + 0.4 | 1.1 |
| 2 | 2 | 4 | 0.3 | 0.6 + 0.6 | 1.5 |
| 3 | 5 | 15 | 0.3 | 0.6 + 0.6 + 0.6 | 2.1 |
| 4 | 1 | 4 | 0.3 | 0.4 + 0.4 + 0.4 + 0.4 | 1.9 |
| 4 | 5 | 20 | 0.3 | 0.6 + 0.6 + 0.6 + 0.6 | 2.7 |
| 5 | 1 | 5 | 0.3 | 0.4 + 0.4 + 0.4 + 0.4 + 0.4 | 2.3 |
| 5 | 3 | 15 | 0.3 | 0.6 + 0.6 + 0.6 | 2.1 |
| 5 | 5 | 25 | 0.3 | 0.6 + 0.6 + 0.6 + 0.6 + 0.6 | 3.3 |
| 5 | 10 | 50 | 0.3 | 0.6 + 0.6 + 0.6 + 0.6 + 0.6 | 3.3 |
Comparison with state-of-the-art methods.
| Method | Components | Dataset | Result | Coverage Area | Connection | Cost |
|---|---|---|---|---|---|---|
| Hoang et al. [ | Mobile Kinect, laptop Electrode matrix, headphone and RF transmitter | Local dataset | Detect obstacle and generate audio warning | Indoor | Offline | High |
| Bai et al. [ | Depth camera, glasses, CPU, headphone and ultrasonic sensor | Not included | Obstacle Recognition and audio output | Indoor | Offline | High |
| Yang et al. [ | Depth Camera on Smart glass, Laptop, and headphone | ADE20, PASCAL, and COCO | Obstacle Recognition and generate clarinet sound as warning | Indoor, Outdoor | Internet Required | High |
| Mancini et al. [ | Camera, PCB, and vibration motor | Not included | Obstacle recognition and vibration feedback for the direction | Outdoor | Offline | Low |
| Bauer et al. [ | Camera, smartwatch, and smartphone | PASCAL VOC Dataset | Object detection with direction of object into audio output | Outdoor | Internet Required | High |
| Patil et al. [ | Sensors, vibration motors, | No Dataset | Obstacle detection with audio output | Indoor, Outdoor | Offline | Low |
| Eckert et al. [ | RGB-D camera and IMU sensors | PASCAL VOC dataset | Object detection with audio output | Indoor | Internet Required | High |
| Parikh et al. [ | Smartphone, server, and headphone | Local dataset of 11 objects | Object detection with audio output | Outdoor | Internet Required | High |
| AL-Madani et al. [ | BLE fingerprint, fuzzy logic | Not included | Localization of the person in the building | Indoor | Offline | Low |
| Proposed Method | RGB Camera, Distance Sensor, DSP processor, Headphone | Local dataset of highly relevant objects for VIP | Object detection, Count of objects, obstacle warnings, read text, and works in different modes | Indoor, Outdoor | Offline | Low |