| Literature DB >> 26370217 |
Rebecca S Schaefer1,2, Lilian J Beijer3, Wiel Seuskens3,4, Toni C M Rietveld3,5, Makiko Sadakata4,6.
Abstract
Visualizing acoustic features of speech has proven helpful in speech therapy; however, it is as yet unclear how to create intuitive and fitting visualizations. To better understand the mappings from speech sound aspects to visual space, a large web-based experiment (n = 249) was performed to evaluate spatial parameters that may optimally represent pitch and loudness of speech. To this end, five novel animated visualizations were developed and presented in pairwise comparisons, together with a static visualization. Pitch and loudness of speech were each mapped onto either the vertical (y-axis) or the size (z-axis) dimension, or combined (with size indicating loudness and vertical position indicating pitch height) and visualized as an animation along the horizontal dimension (x-axis) over time. The results indicated that firstly, there is a general preference towards the use of the y-axis for both pitch and loudness, with pitch ranking higher than loudness in terms of fit. Secondly, the data suggest that representing both pitch and loudness combined in a single visualization is preferred over visualization in only one dimension. Finally, the z-axis, although not preferred, was evaluated as corresponding better to loudness than to pitch. This relation between sound and visual space has not been reported previously for speech sounds, and elaborates earlier findings on musical material. In addition to elucidating more general mappings between auditory and visual modalities, the findings provide us with a method of visualizing speech that may be helpful in clinical applications such as computerized speech therapy, or other feedback-based learning paradigms.Entities:
Keywords: Audio-visual processing; Feedback learning; Speech therapy; Visualizing sound
Mesh:
Year: 2016 PMID: 26370217 PMCID: PMC4828474 DOI: 10.3758/s13423-015-0934-0
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
Fig. 1Stimulus visualizations. Time is always represented on the x-axis, and pitch and loudness are represented either on the y- or z-axes (a–d) or both (e). The original static feedback used in the e-learning based speech therapy (EST) system is also shown (f)
Participants: age distribution and experiment language choice
| Age group (years) | 0-20 | 21-40 | 41-60 | 61-80 |
| Dutch version | 1 | 67 | 54 | 25 |
| English version | 1 | 70 | 23 | 8 |
Fig. 2Example experiment screenshot showing LoudnessY and YZ-Combined visualizations, with clickable rating options below the two panels
Fig. 3The full dataset ranking and distances according to Scheffe’s test of paired comparisons, with higher values representing increased preference. Preferred visualizations are ranked from high to low-matching as YZ-Combined, PitchY, LoudnessY, LoudnessZ, PitchZ, static graphs (EST)
Summary of estimated differences between preference scores for all visualization comparisons (abbreviations described under ‘Stimuli’)
| PItchZ | LoudnessY | LoudnessZ | YZ-Combined | EST | |
|---|---|---|---|---|---|
| PitchY | 0.98 | 0.155 | 0.885 | 0.185 | 1.697 |
| PitchZ | – | 0.825 | 0.095 | 1.165 | 0.718 |
| LoudnessY | – | – | 0.730 | 0.340 | 1.543 |
| LoudnessZ | – | – | – | 1.071 | 0.812 |
| YZ-Combined | – | – | – | – | 1.883 |