| Literature DB >> 34065360 |
Zihao Ji1, Weijian Hu1, Ze Wang1, Kailun Yang2, Kaiwei Wang1.
Abstract
Scene sonification is a powerful technique to help Visually Impaired People (VIP) understand their surroundings. Existing methods usually perform sonification on the entire images of the surrounding scene acquired by a standard camera or on the priori static obstacles acquired by image processing algorithms on the RGB image of the surrounding scene. However, if all the information in the scene are delivered to VIP simultaneously, it will cause information redundancy. In fact, biological vision is more sensitive to moving objects in the scene than static objects, which is also the original intention of the event-based camera. In this paper, we propose a real-time sonification framework to help VIP understand the moving objects in the scene. First, we capture the events in the scene using an event-based camera and cluster them into multiple moving objects without relying on any prior knowledge. Then, sonification based on MIDI is enabled on these objects synchronously. Finally, we conduct comprehensive experiments on the scene video with sonification audio attended by 20 VIP and 20 Sighted People (SP). The results show that our method allows both participants to clearly distinguish the number, size, motion speed, and motion trajectories of multiple objects. The results show that our method is more comfortable to hear than existing methods in terms of aesthetics.Entities:
Keywords: computer vision for visually impaired people; event-based camera; sonification; unsupervised object tracking
Mesh:
Year: 2021 PMID: 34065360 PMCID: PMC8161033 DOI: 10.3390/s21103558
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flow chart of proposed sonification framework with the event-based camera.
The mapping relationship between object attributes and MIDI attributes.
| Note Pitch | Note Density | Pan | Pedal | Instrument | Poly-Phony | ||
|---|---|---|---|---|---|---|---|
| Appearance | Size | √ | |||||
| New attribute | √ | √ | |||||
| Motion | Speed | √ | |||||
| Abscissa | √ | ||||||
| Ordinate | √ |
Figure 2(a) Physical picture of using event camera to record the laptop screen; (b–d) are all animated slides of PowerPoint that make up the dataset. The solid circle represents the starting point of the object’s motion, the hollow circle represents the endpoint of the object’s movement, and the dotted line represents the trajectory of the object’s movement. Different colors represent different objects.
The number and name of all Control Change (CC) messages used by pre-experiments.
| Attributes No. | 1 (CC #1) | 2 (CC #64) | 3 (CC #91) | 4 (CC #93) | 5 |
|---|---|---|---|---|---|
| Attributes Name | Vibrato | Pedal | Reverb | Chorus | Polyphony |
The number and specific content of the control group and each experimental group.
| Experimental Group 1 No. | Description: | Experimental Group 2 No. | Description: | Control Group |
|---|---|---|---|---|
| 1-1 | Lead with vibrato | 2-1 | Pad with vibrato | monophonic with no new attributes |
| 1-2 | Lead with pedal | 2-2 | Pad with pedal | |
| 1-3 | Lead with reverb | 2-3 | Pad with reverb | |
| 1-4 | Lead with chorus | 2-4 | Pad with chorus | |
| 1-5 | Lead with polyphony | 2-5 | Pad with polyphony |
The selection rates of each experimental group.
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| Experimental Group 1 | 30.0% | 10.0% | 0 | 0 | 60.0% |
| Experimental Group 2 | 22.5% | 72.5% | 0 | 0 | 5.0% |
The experimental settings of the control group and the experimental group included the sonification method and appearance attributes of each group.
| Sonification Method | Appearance Attributes Contained | |
|---|---|---|
| Control Group | Ours | Object Size |
| Experimental Group 1 | Ours | Object Size and Object Width |
| Experimental Group 2 | Hu [ | Object Size |
The correct rate of objective questions in different conditions of the control group. SP indicates Sighted People; VIP indicates Visually Impaired People. Single and Multiple indicate the number of objects. Q1 and Q2 indicate the index of questions for perceiving the number and the size of objects.
| Objective | Single_Q1 | Multiple_Q1 | Single_Q2 | Multiple_Q2 |
|---|---|---|---|---|
| SP | 99.0% | 98.6% | 94.8% | 80.6% |
| VIP | 100.0% | 100.0% | 96.4% | 78.6% |
Figure 3Box chart of subjective questions in different conditions in the control group. Q1 to Q6 indicate six subjective questions for evaluating motion speed perception, trajectory perception, cocktail party effect, perception difficulty, audio comfort, audio and motion adaptation. The score is based on the 7-point Likert scale, where 1 means strongly disagree and 7 means strongly agree.
The correct rate of objective questions in different conditions of all the groups. Q1 to Q3 indicate three objective questions for perceiving object size, number, and width, respectively.
| Objective Questions | Single_Q1 | Multiple_Q1 | Single_Q2 | Multiple_Q2 | Single_Q3 | Multiple_Q3 |
|---|---|---|---|---|---|---|
| Control Group | 99.5% | 99.3% | 95.6% | 79.6% | / | / |
| Experimental Group 1 | 100.0% | 72.2% | 88.9% | 55.6% | 72.2% | 38.9% |
| Experimental Group 2 | 66.7% | 77.8% | 61.1% | 44.4% | / | / |
The 7-point Likert scales-based scores of subjective questions in different conditions of all the group.
| Subjective Questions | Q1 | Q2 | Q4 | Q5 | Q6 | |
|---|---|---|---|---|---|---|
| Control Group | Single | 6.509 | 5.72 | 5.684 | 5.938 | 6.142 |
| Multiple | 6.175 | 5.314 | 5.371 | 5.54 | 5.671 | |
| Experimental Group 1 | Single | 6.111 | 5.611 | 5.556 | 6.278 | 5.722 |
| Multiple | 5.667 | 4.944 | 5.056 | 6.111 | 5.333 | |
| Experimental Group 2 | Single | 5.667 | 4.556 | 4.833 | 5.833 | 5.111 |
| Multiple | 4.833 | 3.611 | 3.722 | 5.111 | 3.833 | |