| Literature DB >> 29144395 |
Salim Malek1, Farid Melgani2, Mohamed Lamine Mekhalfi3, Yakoub Bazi4.
Abstract
This paper describes three coarse image description strategies, which are meant to promote a rough perception of surrounding objects for visually impaired individuals, with application to indoor spaces. The described algorithms operate on images (grabbed by the user, by means of a chest-mounted camera), and provide in output a list of objects that likely exist in his context across the indoor scene. In this regard, first, different colour, texture, and shape-based feature extractors are generated, followed by a feature learning step by means of AutoEncoder (AE) models. Second, the produced features are fused and fed into a multilabel classifier in order to list the potential objects. The conducted experiments point out that fusing a set of AE-learned features scores higher classification rates with respect to using the features individually. Furthermore, with respect to reference works, our method: (i) yields higher classification accuracies, and (ii) runs (at least four times) faster, which enables a potential full real-time application.Entities:
Keywords: assistive technologies; coarse scene description; deep learning; feature fusion; image representation; multiobject recognition; visible cameras; visually impaired (VI) people
Mesh:
Year: 2017 PMID: 29144395 PMCID: PMC5712811 DOI: 10.3390/s17112641
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Binary descriptor construction for a training image.
Figure 2Pipeline of the feature learning-based image multilabeling scheme.
Figure 3One layer architecture of an AE.
Figure 4Diagram of the first fusion strategy (Fusion 1) based on a low-level feature aggregation.
Figure 5Diagram of the second fusion strategy (Fusion 2) based on a AE induced-level aggregation.
Figure 6Diagram of the third fusion strategy (Fusion 3) based on a decision-level aggregation.
Figure 7View of the wearable prototype with its main components.
Figure 8Impact of the threshold value on the classification rates using the three feature types. Upper row for Dataset 1, bottom row for Dataset 2.
Obtained recognition results using single features.
| Dataset | Dataset 1 | Dataset 2 | ||||
|---|---|---|---|---|---|---|
| Method | HOG | BoW_RGB | LBP | HOG | BoW_RGB | LBP |
| SEN (%) | 76.77 | 79.77 | 76.77 | 72.73 | 88.64 | 81.82 |
| SPE (%) | 82.16 | 82.9 | 80.07 | 88.67 | 90.24 | 86.27 |
| AVG (%) | 79.46 | 81.33 | 78.42 | 80.7 | 89.44 | 84.04 |
Classification outcomes of all the fusion schemes.
| Dataset | Dataset 1 | Dataset 2 | ||||
|---|---|---|---|---|---|---|
| Method | SEN (%) | SPE (%) | AVG (%) | SEN (%) | SPE (%) | AVG (%) |
| Fusion 1 | 79.40 | 87.45 | 83.42 | 85.00 | 91.80 | 88.40 |
| Fusion 2 | 80.89 | 87.69 | 84.29 | 87.27 | 91.56 | 89.41 |
| Fusion 3 | 89.51 | 81.30 | 85.40 | 90.00 | 90.12 | 90.06 |
Figure 9Example of results obtained by the proposed multilabeling fusion approach for both datasets. Upper row for Dataset 1, and lower one for Dataset 2.
Comparison of classification rates on Dataset 1.
| Method | SEN (%) | SPE (%) | AVG (%) |
|---|---|---|---|
| SSCS | 79.77 | 66.54 | 73.15 |
| EDCS | 69.66 | 80.19 | 74.92 |
| ResNet | 66.29 | 94.46 | 80.38 |
| GoogLeNet | 67.04 | 94.22 | 80.63 |
| VDCNs | 71.91 | 94.46 | 83.19 |
| Ours | 89.51 | 81.3 | 85.40 |
Comparison of classification rates on Dataset 2.
| Method | SEN (%) | SPE (%) | AVG (%) |
|---|---|---|---|
| SSCS | 75 | 74.09 | 74.54 |
| EDCS | 70 | 90.12 | 80.06 |
| ResNet | 68.18 | 96.75 | 82.46 |
| GoogLeNet | 72.27 | 97.11 | 84.69 |
| VDCNs | 81.82 | 96.39 | 89.10 |
| Ours | 90.00 | 90.12 | 90.06 |
Comparison of classification rates on Dataset 1 under different resolutions.
| Method | 100% | 50% | 20% | 10% |
|---|---|---|---|---|
| SSCS | 73.15 | 73.34 | 74.52 | 74.51 |
| EDCS | 74.92 | 74.74 | 75.11 | 75.43 |
| ResNet | 80.38 | 79.88 | 79.32 | 78.76 |
| GoogLeNet | 80.63 | 81.63 | 82.52 | 79.02 |
| VDCNs | 83.19 | 83.37 | 84.50 | 84.57 |
| Ours | 85.40 | 86.02 | 86.14 | 86.63 |
Comparison of classification rates on Dataset 2 under different resolutions.
| Method | 100% | 50% | 20% | 10% |
|---|---|---|---|---|
| SSCS | 74.54 | 74.54 | 73.91 | 74.48 |
| EDCS | 80.06 | 80.06 | 79.30 | 78.60 |
| ResNet | 82.46 | 82.40 | 84.69 | 87.13 |
| GoogLeNet | 84.69 | 84.69 | 84.74 | 84.12 |
| VDCNs | 89.10 | 88.71 | 87.80 | 88.13 |
| Ours | 90.06 | 90.34 | 90.03 | 90.69 |
Comparison of average runtime per image on Dataset 1 under different resolutions.
| Method | 100% | 50% | 20% | 10% |
|---|---|---|---|---|
| SSCS | 2.16 | 1.42 | 1.22 | 1.17 |
| EDCS | 2.44 | 1.41 | 1.1 | 1.08 |
| ResNet | 0.136 | 0.132 | 0.131 | 0.131 |
| GoogLeNet | 0.100 | 0.098 | 0.096 | 0.093 |
| VDCNs | 0.300 | 0.295 | 0.291 | 0.288 |
| Ours | 1.230 | 0.200 | 0.048 | 0.022 |
Comparison of average runtime per image on Dataset 2 under different resolutions.
| Method | 100% | 50% | 20% | 10% |
|---|---|---|---|---|
| SSCS | 2.66 | 1.53 | 1.21 | 1.17 |
| EDCS | 2.69 | 1.54 | 1.23 | 1.2 |
| ResNet | 0.136 | 0.132 | 0.131 | 0.131 |
| GoogLeNet | 0.100 | 0.098 | 0.096 | 0.093 |
| VDCNs | 0.300 | 0.295 | 0.291 | 0.288 |
| Ours | 1.230 | 0.200 | 0.048 | 0.022 |