| Literature DB >> 28374144 |
Laura Rachman1,2, Marco Liuni3, Pablo Arias3, Andreas Lind4, Petter Johansson4,5, Lars Hall4, Daniel Richardson6, Katsumi Watanabe7,8, Stéphanie Dubal9, Jean-Julien Aucouturier3.
Abstract
We present an open-source software platform that transforms emotional cues expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real time, using live input from a microphone, with less than 20-ms latency. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments aiming to position the tool against several methodological requirements: that transformed emotions be recognized at above-chance levels, valid in several languages (French, English, Swedish, and Japanese) and with a naturalness comparable to natural speech.Entities:
Keywords: Emotional transformations; Infra-segmental cues; Nonverbal behavior; Real-time; Software; Voice
Mesh:
Year: 2018 PMID: 28374144 PMCID: PMC5809549 DOI: 10.3758/s13428-017-0873-y
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
List of the atomic digital audio effects used in this work, and how they are combined to form emotional transformations happy, sad, and afraid
| Effects | Transformations | |||
|---|---|---|---|---|
| Happy | Sad | Afraid | ||
| Time-varying | Vibrato | ✓ | ||
| Inflection | ✓ | ✓ | ||
| Pitch shift | Up | ✓ | ||
| Down | ✓ | |||
| Filter | High-shelf (“brighter”) | ✓ | ||
| Low-shelf (“darker”) | ✓ | |||
Fig. 2Illustration of the delays involved in the realization of our real-time audio processing system. Beyond a baseline I/O latency (input and output Δ), each atomic effect in the signal data flow (3 as illustrated here) imparts further delay, which depends on the effect’s algorithmic complexity
List of the parameters used in the validation experiments. For the afraid transformation different values were used for male and female voices, due to strong differences of the audio effects depending on the gender of the speaker
| Effects | Transformations | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Happy | Sad | Afraid | |||||||
| low | medium | high | low | medium | high | low | medium | high | |
|
| |||||||||
| shift, | + 29.5 | + 40.9 | + 50.0 | −39.8 | −56.2 | −70.0 | – | – | – |
|
| |||||||||
| rate, | – | – | – | – | – | – | 8.5 | 8.5 | 8.5 |
| depth, | – | – | – | – | – | – | 26.1 | 33.8 | 40.0 |
|
| |||||||||
| duration, | 500 | 500 | 500 | – | – | – | 500 | 500 | 500 |
| min., | −144.8 | −158.9 | −200 | – | – | – | − 109.3 | − 141.0 | − 169.2 |
| − 50.2 | − 101.1 | − 158.6 | |||||||
| max., | + 101.3 | + 111.3 | + 140 | – | – | – | + 109.3 | + 141.0 | + 169.2 |
| + 50.2 | + 101.1 | + 158.6 | |||||||
|
| |||||||||
| cut-off, | > 8000 | > 8000 | > 8000 | < 8000 | < 8000 | < 8000 | – | – | – |
| slope, | + 5.8 | + 6.6 | + 9.5 | −7.8 | −9.6 | −12 | – | – | – |
Fig. 3Raw hit rates. French (a), English (b), Swedish (c) and Japanese (d) raw accuracy scores for three emotions at the nominal level (‘high’) and two lower intensity levels, error bars represent SEM, black line represents chance level (20%)
Fig. 4Confusion matrices. French (a), English (b), Swedish (c), and Japanese (d) confusion matrices showing the distribution of responses (in %) at the nominal level. Diagonal cells in bold indicate correct responses
Emotion recognition scores, four languages
| Biased | Unbiased | ||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| FR | Happy | 43.8 | .76 | .34 | .042 | 19 | 6.2∗∗∗ |
| Sad | 55.4 | .83 | .32 | .061 | 19 | 5.5∗∗ | |
| Afraid | 37.1 | .70 | .28 | .035 | 19 | 5.6∗∗ | |
| EN | Happy | 31.9 | .65 | .31 | .042 | 23 | 4.6∗∗ |
| Sad | 43.1 | .75 | .23 | .053 | 23 | 4.9∗∗ | |
| Afraid | 42.0 | .74 | .31 | .039 | 23 | 6.1∗∗∗ | |
| SW | Happy | 29.2 | .62 | .19 | .047 | 19 | 3.7∗ |
| Sad | 22.5 | .54 | .14 | .051 | 19 | 2.9∗ | |
| Afraid | 25.8 | .58 | .21 | .031 | 19 | 4.2∗ | |
| JP | Happy | 28.3 | .61 | .26 | .049 | 19 | 5.2∗∗ |
| Sad | 36.7 | .70 | .21 | .049 | 19 | 3.5∗ | |
| Afraid | 48.8 | .79 | .38 | .043 | 19 | 5.8∗∗ | |
FR = French; EN = English; SW = Swedish; JP = Japanese; H = raw hit rate (%); pi = proportion index; H = unbiased hit rate; p = chance proportion; df = degrees of freedom; t = t-score; p values are Holm- Bonferroni corrected. Please note that chance performance is 20% for H and .50 for pi. ∗ p < .01, ∗∗ p < .001, ∗∗∗ p < .0001.
Fig. 5Naturalness. French (a), English (b), Swedish (c), and Japanese (d) naturalness ratings for three emotions at three intensity levels compared to unmodified voices (grey: mean ± 1 SD), error bars represent 95% confidence intervals
Cohen’s d and probability of inferiority (POI) of the naturalness ratings for each emotional transformation compared to natural emotional voices
| French | English | Swedish | Japanese | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Cohen’s | POI ( | Cohen’s | POI ( | Cohen’s | POI ( | Cohen’s | POI ( | ||
| Happy | low | 0.86 | 27.6 | 0.54 | 35.1 | 0.29 | 41.9 | 1.51 | 14.3 |
| med | 0.81 | 28.3 | 0.92 | 25.8 | 0.45 | 37.5 | 1.40 | 16.1 | |
| high | 1.08 | 22.2 | 0.75 | 29.8 | 0.64 | 32.5 | 1.42 | 15.8 | |
| Sad | low | 0.21 | 44.1 | 0.67 | 31.8 | 0.18 | 44.9 | 1.44 | 15.4 |
| med | 0.57 | 34.3 | 0.77 | 29.3 | 0.58 | 34.1 | 1.30 | 17.9 | |
| high | 0.85 | 27.4 | 1.04 | 23.1 | 0.78 | 29.1 | 1.69 | 11.6 | |
| Afraid | low | 1.20 | 19.8 | 1.18 | 20.2 | 1.58 | 13.2 | 1.68 | 11.7 |
| med | 1.71 | 11.3 | 1.43 | 15.6 | 2.77 | 2.5 | 2.31 | 5.1 | |
| high | 2.66 | 3.0 | 2.43 | 4.3 | 3.82 | 0.3 | 2.87 | 2.1 | |
Fig. 6Intensity. French (a), English (b), Swedish (c), and Japanese (d) intensity ratings for three emotions at three intensity levels compared to unmodified voices (grey: mean ± 1 SD), error bars represent 95% confidence intervals