| Literature DB >> 31681911 |
Leo Neat1, Ren Peng1, Siyang Qin2, Roberto Manduchi1.
Abstract
We present a study with seven blind participants using three different mobile OCR apps to find text posted in various indoor environments. The first app considered was Microsoft SeeingAI in its Short Text mode, which reads any text in sight with a minimalistic interface. The second app was Spot+OCR, a custom application that separates the task of text detection from OCR proper. Upon detection of text in the image, Spot+OCR generates a short vibration; as soon as the user stabilizes the phone, a high-resolution snapshot is taken and OCR-processed. The third app, Guided OCR, was designed to guide the user in taking several pictures in a 360° span at the maximum resolution available by the camera, with minimum overlap between pictures. Quantitative results (in terms of true positive ratios and traversal speed) were recorded. Along with the qualitative observation and outcomes from an exit survey, these results allow us to identify and assess the different strategies used by our participants, as well as the challenges of operating these systems without sight.Entities:
Keywords: Assistive technologies; OCR; Text spotting
Year: 2019 PMID: 31681911 PMCID: PMC6824725 DOI: 10.1145/3301275.3302271
Source DB: PubMed Journal: IUI
Figure 1.Schematic comparison of the Short Text and Spot+OCR modalities for text discovery and reading. User interface tasks are marked by a thick frame.
Figure 2.Examples of binary masks produced by the text detector from images taken by our participants in Experiment 1.
Total walking length in the considered corridors (N or W type, 2nd or 3rd floor), and number of marked textual signs in the corridors and in the three open spaces (OS in 3rd, 4th and 5th floor).
| N2 | W2 | N3 | W3 | OS3 | OS4 | OS5 | |
|---|---|---|---|---|---|---|---|
| Tot. length (m) | 85 | 67 | 85 | 79 | — | — | — |
| # textual tokens | 79 | 68 | 104 | 50 | 22 | 20 | 23 |
Questionnaire 1 outcomes.
| P1 | P2 | P3 | P4 | P5 | P6 | P7 | μ | σ |
|---|---|---|---|---|---|---|---|---|
| S1: I thought that the text read by the systeml was very valuable | ||||||||
| 4 | 4 | 4 | 5 | 2 | 4 | 4 | 3.9 | 0.9 |
| S2: I found that there was too much useless information produced by the system | ||||||||
| 2 | 3 | 1 | 1 | 3 | 3 | 2.1 | 0.9 | |
| S3: I think that hearing the direction towards the text was useful | ||||||||
| 5 | 3 | 5 | 3 | 4 | 5 | 5 | 4.3 | 0.9 |
| S4: I care only about the text content, not its precise location | ||||||||
| 3 | 4 | 1 | 2 | 2 | 4 | 3 | 2.7 | 1.1 |
| S5: I think that I would only use this system to find a certain desired text, rather than hearing all text in view | ||||||||
| 4 | 5 | 1 | 3 | 2 | 3 | 2 | 2.9 | 1.3 |
| S6: I think that is necessary to discover any text in the scene, even if not all of it is understandable | ||||||||
| 3 | 1 | 4 | 1 | 1 | 3 | 4 | 2.4 | 1.4 |
| S7: I think that being able to read text visible in a scene using | ||||||||
| 4 | 3 | 5 | 3 | 3 | 5 | 2 | 3.6 | 1.1 |
| S8: I would only use this system if it worked much better than it does now | ||||||||
| 2 | 2 | 2 | 1 | 3 | 4 | 4 | 2.6 | 1.1 |
| S9: I felt that the text produced by the SeeingAI product was of better quality than the other systems | ||||||||
| 1 | 1 | 1 | 4 | 2 | 3 | 4 | 2.3 | 1.4 |
Figure 3.The different ways our participants held the smartphone in the Spot+OCR trials (Experiment 1)
Experiment 1 quantitative results.
| Modality | Short Text | Spot+OCR | |||
|---|---|---|---|---|---|
| Corridor type | N | W | N | W | |
| TPR (%) | 0 | 0 | 28 | 35 | |
| TPR (%) | 1 | 0 | 32 | 42 | |
| TPR (%) | 18 | 18 | 38 | 64 | |
| TPR (%) | 63 | 66 | 35 | 16 | |
| TPR (%) | 12 | 4 | 23 | 26 | |
| TPR (%) | 69 | 82 | 46 | 29 | |
| TPR (%) | 11 | 38 | 31 | 48 | |
Experiment 2 quantitative results.
| Modality | Short Text | Spot+OCR | Guided OCR | |
|---|---|---|---|---|
| TPR (%) | 15 | 14 | 61 | |
| TPR (%) | 4 | 23 | 50 | |
| TPR (%) | 9 | 20 | 65 | |
| TPR (%) | 10 | 17 | - | |
| TPR (%) | 0 | 0 | 11 | |
| TPR (%) | 27 | 61 | 55 | |
| 376 | ||||
| TPR (%) | 0 | 17 | 27 | |
| 192 | 191 | 332 | ||
Questionnaire 2 outcomes.
| P1 | P2 | P3 | P4 | P5 | P6 | P7 | μ | σ |
|---|---|---|---|---|---|---|---|---|
| SeeingAI | ||||||||
| 71 | 71 | 50 | 100 | 92 | 25 | 21 | 62.1 | 30.7 |
| Spot+OCR | ||||||||
| 67 | 83 | 67 | 67 | 67 | 50 | 71 | 67.3 | 9.8 |
| Guided OCR | ||||||||
| 92 | 83 | 71 | - | 87 | 63 | 96 | 82 | 12.8 |