| Literature DB >> 24032023 |
Anja Moos1, David Simmons, Julia Simner, Rachel Smith.
Abstract
Voice-induced synesthesia, a form of synesthesia in which synesthetic perceptions are induced by the sounds of people's voices, appears to be relatively rare and has not been systematically studied. In this study we investigated the synesthetic color and visual texture perceptions experienced in response to different types of "voice quality" (e.g., nasal, whisper, falsetto). Experiences of three different groups-self-reported voice synesthetes, phoneticians, and controls-were compared using both qualitative and quantitative analysis in a study conducted online. Whilst, in the qualitative analysis, synesthetes used more color and texture terms to describe voices than either phoneticians or controls, only weak differences, and many similarities, between groups were found in the quantitative analysis. Notable consistent results between groups were the matching of higher speech fundamental frequencies with lighter and redder colors, the matching of "whispery" voices with smoke-like textures, and the matching of "harsh" and "creaky" voices with textures resembling dry cracked soil. These data are discussed in the light of current thinking about definitions and categorizations of synesthesia, especially in cases where individuals apparently have a range of different synesthetic inducers.Entities:
Keywords: color; cross-modal correspondence; speech acoustics; texture; voice-induced synesthesia
Year: 2013 PMID: 24032023 PMCID: PMC3759022 DOI: 10.3389/fpsyg.2013.00568
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Acoustic values for the different voice qualities, averaging across speakers, and sentences.
| MODAL | 119 | 1339 | 20.32 | 12.22 |
| RAISED LARYNX | 156 | 1245 | 13.19 | 13.68 |
| LOWERED LARYNX | 124 | 1247 | 19.83 | 8.89 |
| NASAL | 110 | 1352 | 19.49 | 9.30 |
| DENASAL | 114 | 1341 | 17.63 | 9.14 |
| FALSETTO | 232 | N/A | N/A | 10.24 |
| BREATHY | 109 | 1319 | 23.33 | 7.46 |
| WHISPER | N/A | 1463 | 0.30 | N/A |
| HARSH | 106 | 1408 | −0.52 | 6.42 |
| CREAK | 92 | 1311 | 14.96 | 9.47 |
Pitch-related values could not be entered for WHISPER because there is no voicing in WHISPER. Formant-related measures cannot be given for FALSETTO because the pitch is so high that formant frequencies cannot be accurately resolved.
RGB values and CIELUV coordinates of the 16 colors used for creating the color patches in the online survey.
| White | 255 | 255 | 255 | 129.8 | −1.6 | 6 |
| Yellow | 255 | 225 | 0 | 121.9 | 30.8 | 125.5 |
| Cyan | 0 | 220 | 220 | 110 | −80.6 | −23 |
| Pale pink | 255 | 175 | 175 | 109.4 | 44.3 | 4.4 |
| Olive | 150 | 200 | 0 | 105.4 | −29.2 | 111.5 |
| Orange | 255 | 128 | 0 | 92.6 | 124.6 | 83.2 |
| Green | 0 | 160 | 0 | 82.1 | −67.5 | 88.9 |
| Pink | 255 | 0 | 255 | 81.3 | 92 | −124.2 |
| Grey | 115 | 115 | 115 | 74.4 | −5.4 | −6.3 |
| Red | 255 | 0 | 25 | 73.9 | 206 | 52.4 |
| Blue | 0 | 100 | 255 | 69.6 | −39.8 | −154.9 |
| Purple | 120 | 0 | 150 | 48.7 | 32.5 | −107.4 |
| Brown | 110 | 60 | 0 | 48.1 | 51.7 | 38.4 |
| Dark green | 0 | 75 | 0 | 45.1 | −30.3 | 42.5 |
| Dark blue | 0 | 50 | 128 | 41.5 | −19.9 | −95.1 |
L*, u*, v* values calculated from the average L, x, y measures of ten randomly selected computer screens.
Figure 1Sixteen textures used in the response display.
Factor analysis with promax rotation of texture rating data.
| Rough—smooth | ||||
| Fine—coarse | ||||
| Low—high contrast | −0.154 | |||
| High—low complexity | −0.183 | |||
| Repetitive—non-repetitive | 0.123 | −0.214 | ||
| Non-directional—directional | 0.130 | −0.390 | ||
| Line-like—blob-like | ||||
| Regular—irregular | 0.204 | |||
| SS loadings | 2.542 | 1.874 | 1.562 | 1.208 |
| Proportion variance | 0.318 | 0.234 | 0.195 | 0.151 |
| Cumulative variance | 0.318 | 0.552 | 0.747 | 0.898 |
Top: rotated factor loadings of the semantic differentials for four factors. Bottom: sums of squares (SS) loadings of the above factor loadings and the proportional and cumulative variance of the data that they account for. Bold typeface indicates a strong regression coefficient of the semantic differential with a factor. The semantic differential with the strongest regression coefficient, chosen as representative of a factor, is indicated in italics.
Figure 2Use of synesthetic codes for verbal descriptions by synesthetes. The result for taste is 0.1%.
Results of linear mixed effects modeling testing acoustic influences on participants' color associations.
| L* | f0 mean | 12.57 | <0.001 | Higher f0, lighter |
| pitch range | 6.58 | <0.001 | Larger range, lighter | |
| LTF2 | 2.46 | 0.014 | Higher LTF2, lighter | |
| group (control vs. phoneticians) | −3.52 | <0.001 | Phoneticians darker | |
| LTF2*group (control vs. phoneticians) | 3.67 | <0.001 | Phoneticians > control | |
| u* | f0 mean | 5.30 | <0.001 | Higher f0, redder |
| v* | spectral tilt | −2.31 | 0.021 | Steeper tilt, bluer |
| pitch range | 2.50 | 0.012 | Smaller range, bluer |
The four voice quality—texture associations with highest agreement between and across groups.
| WHISPER | 30 | 36 | 30 | 29 | |
| HARSH | 26 | 36 | 30 | 22 | |
| CREAK | 20 | 25 | 18 | 20 | |
| NASAL | 17 | 18 | 15 | 18 | |
Results of linear mixed effects modeling testing acoustic influences on participants' texture associations.
| Rough—smooth | f0 mean | 2.63 | 0.009 | Higher f0, smoother |
| Pitch range | 2.30 | 0.022 | Larger range, smoother | |
| LTF2 | 4.30 | <0.001 | Higher LTF2, smoother | |
| Spectral tilt | 5.18 | <0.001 | Steeper tilt, smoother | |
| High complexity—low complexity | f0 mean | −3.02 | 0.003 | Higher f0, more complex |
| Pitch range | 4.33 | <0.001 | Smaller range, more complex | |
| LTF2 | −2.13 | 0.034 | Higher LTF2, more complex | |
| Spectral tilt | 7.11 | <0.001 | Shallower tilt, more complex | |
| Group (syn vs. con) | −2.40 | 0.016 | Synesthetes more complex | |
| Group (syn vs. phon) | −2.46 | 0.014 | Synesthetes more complex | |
| Repetitive—non-repetitive | f0 mean | 4.40 | <0.001 | Higher f0, less repetitive |
| Pitch range | −2.48 | 0.013 | Smaller range, less repetitive | |
| LTF2 | 5.93 | <0.001 | Higher LTF2, less repetitive | |
| Line-like—blob-like | Spectral tilt | 3.61 | <0.001 | Steeper tilt, more blob-like |
Summary of significant influences of acoustic characteristics of the voices on the associated textures.
| Pitch (f0) | High | Smoother | More complex | Less repetitive |
| Low | Rougher | Less complex | More repetitive | |
| Pitch range | Small | Rougher | More complex | Less repetitive |
| Large | Smoother | Less complex | More repetitive | |
| LTF2 | High | Smoother | More complex | Less repetitive |
| Low | Rougher | Less complex | More repetitive | |
| Spectral tilt | Steep | Smoother | More complex | Blob-like |
| Shallow | Rougher | Less complex | Line-like | |
| neutral setting of speech organs; sound of a healthy voice | |
| additional air flow through the nose | |
| no air flow through the nose, as if the nose was blocked | |
| with an elevated larynx, sounding slightly strained and higher pitched | |
| with a lowered larynx, sounding slightly relaxed and lower pitched | |
| no voicing, turbulent airflow only | |
| so called “head voice,” high pitched with taut vocal folds | |
| tense and rough irregular voicing, with constriction of the ventricular folds | |
| soft and lax voice with an increased air flow due to incomplete closure of the vocal folds | |
| low pitched irregular voicing with slow, irregular vibration of the vocal folds. |
| Mean fundamental frequency of the voice recording; this relates to the overall pitch of the voice. | |
| Long-term formant distribution (LTF) of the second vowel formant | |
| Energy distribution across the frequency range measured in one accented vowel per recording (“app | |
| Variability of f0 in a speaker, calculated by subtracting the minimum from the maximum pitch of a voice recording and converting to semitones. This describes the differences between (for example) a “singsongy” vs. monotonous voice (i.e., large vs. small pitch range). |
Formants are spectral peaks of intensity at different frequencies (usually measured in Hz) in the frequency spectrum of the sound. They are created by the resonances of the vocal tract (Clark et al., 2007). A vowel sound contains several formants. The lowest two formants mainly characterize the vowel quality, while all formants additionally give information about speaker characteristics.
| 1. Associations | |
| associations with real or fictitious people | |
| time period of recording | |
| comparing the speaker to one previously heard | |
| 2. Description of person | |
| age | |
| sex of speaker | |
| occupation | |
| physical appearance, clothing | |
| state of health | |
| character, habits, attitudes | |
| emotional state of the speaker, feelings of the speaker | |
| 3. Feelings in listener | |
| emotions or feelings evoked in the listener | |
| 4. Phonetics | |
| professional terminology or layperson's description of voice quality | |
| other terminology related to phonetics (other than relating to voice qualities), e.g., pitch or speaking rate | |
| regional area, accent | |
| disguised or pretend voice | |
| evaluation of the voice, e.g., “good voice,” “could be better if … ” | |
| speaking style, e.g., “story telling,” “newsreader,” “telling off…” | |
| 5. Synesthetic perceptions | |
| color terms | |
| texture terms | |
| terms describing a shape | |
| terms describing where in space something is positioned and/or whether it moves | |
| gustatory terms | |
| olfactory terms | |
| terms related to temperature | |
| 6. Unclassified | |
| terms not falling within the categories above. | |