| Literature DB >> 29768426 |
Steven R Livingstone1,2, Frank A Russo1.
Abstract
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.Entities:
Mesh:
Year: 2018 PMID: 29768426 PMCID: PMC5955500 DOI: 10.1371/journal.pone.0196391
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart of RAVDESS creation and validation.
Flowchart illustrating the method of stimulus recording, editing, and validation.
Fig 2Physical setup of the recording studio.
The physical layout of the recording studio used to record RAVDESS stimuli. All measurements refer to horizontal distances unless otherwise specified.
Fig 3Examples of the eight RAVDESS emotions.
Still frame examples of the eight emotions contained in the RAVDESS, in speech and song.
Description of factor-level coding of RAVDESS filenames.
| Identifier | Coding description of factor levels |
|---|---|
| Modality | 01 = Audio-video, 02 = Video-only, 03 = Audio-only |
| Channel | 01 = Speech, 02 = Song |
| Emotion | 01 = Neutral, 02 = Calm, 03 = Happy, 04 = Sad, 05 = Angry, 06 = Fearful, |
| Intensity | 01 = Normal, 02 = Strong |
| Statement | 01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door" |
| Repetition | 01 = First repetition, 02 = Second repetition |
| Actor | 01 = First actor, …, 24 = Twenty-fourth actor |
Validity task accuracy measures across channel, modality, and intensity.
| Channel | Modality | Intensity | N | Mean (SD) Proportion correct | Mean (SD) Unbiased hit rate | Mean (SD) Intensity | Mean (SD) Genu. | Kappa |
|---|---|---|---|---|---|---|---|---|
| Speech | AV | Normal | 768 | 0.77 (0.23) | 0.57 (0.17) | 3.44 (0.51) | 3.47 (0.44) | 0.62 |
| Strong | 672 | 0.83 (0.19) | 0.62 (0.15) | 4.01 (0.56) | 3.56 (0.56) | 0.71 | ||
| VO | Normal | 768 | 0.70 (0.25) | 0.52 (0.19) | 3.40 (0.54) | 3.42 (0.46) | 0.53 | |
| Strong | 672 | 0.75 (0.25) | 0.56 (0.19) | 3.88 (0.60) | 3.55 (0.48) | 0.62 | ||
| AO | Normal | 758 | 0.58 (0.30) | 0.43 (0.22) | 3.14 (0.42) | 3.12 (0.41) | 0.41 | |
| Strong | 672 | 0.67 (0.27) | 0.50 (0.21) | 3.71 (0.62) | 3.51 (0.46) | 0.52 | ||
| Song | AV | Normal | 552 | 0.77 (0.23) | 0.57 (0.19) | 3.37 (0.49) | 3.33 (0.48) | 0.61 |
| Strong | 460 | 0.84 (0.20) | 0.63 (0.18) | 3.91 (0.58) | 3.46 (0.51) | 0.72 | ||
| VO | Normal | 552 | 0.75 (0.25) | 0.55 (0.21) | 3.41 (0.53) | 3.36 (0.46) | 0.61 | |
| Strong | 460 | 0.79 (0.23) | 0.59 (0.20) | 3.89 (0.61) | 3.54 (0.51) | 0.67 | ||
| AO | Normal | 552 | 0.53 (0.28) | 0.39 (0.21) | 3.13 (0.39) | 3.24 (0.37) | 0.31 | |
| Strong | 460 | 0.62 (0.28) | 0.47 (0.23) | 3.55 (0.57) | 3.37 (0.40) | 0.44 |
Description of validity ratings for spoken and sung expressions, across channel, modality, and emotional intensity (N = 247 participants, each rating 298 stimuli). AV = audio-video; VO = video only; AO = audio only. As neutral had no intensity manipulation, neutral scores were collapsed into the ‘normal’ intensity category.
Validity task accuracy measures across emotion and channel.
| Emotion | N | Mean (SD) Proportion correct | Mean (SD) Unbiased hit rate | Mean (SD) Intensity | Mean (SD) Genuineness | Kappa |
|---|---|---|---|---|---|---|
| Neutral (speech) | 288 | 0.87 (0.14) | 0.60 (0.10) | 3.16 (0.44) | 3.36 (0.45) | 0.58 |
| Neutral (song) | 276 | 0.78 (0.18) | 0.53 (0.12) | 3.03 (0.36) | 3.22 (0.40) | 0.49 |
| Calm (speech) | 576 | 0.70 (0.24) | 0.48 (0.16) | 3.26 (0.41) | 3.39 (0.39) | 0.58 |
| Calm (song) | 552 | 0.63 (0.25) | 0.43 (0.17) | 3.24 (0.40) | 3.38 (0.40) | 0.49 |
| Happy (speech) | 576 | 0.68 (0.32) | 0.49 (0.23) | 3.68 (0.58) | 3.51 (0.45) | 0.63 |
| Happy (song) | 552 | 0.75 (0.29) | 0.55 (0.21) | 3.68 (0.59) | 3.40 (0.50) | 0.65 |
| Sad (speech) | 576 | 0.61 (0.30) | 0.42 (0.21) | 3.33 (0.61) | 3.37 (0.45) | 0.53 |
| Sad (song) | 552 | 0.68 (0.28) | 0.43 (0.18) | 3.41 (0.55) | 3.34 (0.46) | 0.51 |
| Angry (speech) | 576 | 0.81 (0.22) | 0.64 (0.17) | 3.96 (0.67) | 3.71 (0.55) | 0.67 |
| Angry (song) | 552 | 0.83 (0.22) | 0.73 (0.19) | 3.83 (0.62) | 3.45 (0.51) | 0.75 |
| Fearful (speech) | 576 | 0.71 (0.24) | 0.56 (0.19) | 3.76 (0.66) | 3.46 (0.49) | 0.60 |
| Fearful (song) | 552 | 0.65 (0.29) | 0.51 (0.22) | 3.70 (0.58) | 3.37 (0.47) | 0.57 |
| Disgust (speech) | 576 | 0.70 (0.27) | 0.55 (0.21) | 3.73 (0.57) | 3.43 (0.46) | 0.60 |
| Surprise (speech) | 552 | 0.72 (0.24) | 0.55 (0.19) | 3.53 (0.49) | 3.47 (0.45) | 0.60 |
Description of validity ratings and interrater reliability values for emotional expressions in speech and song.
Validity task mean proportion correct scores.
| Speech | Song | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| AV | VO | AO | Total | AV | VO | AO | Total | Channel Total | |
| Calm | 0.72 | 0.58 | 0.75 | 0.68 | 0.66 | 0.59 | 0.64 | 0.63 | 0.66 |
| Happy | 0.84 | 0.89 | 0.44 | 0.72 | 0.93 | 0.90 | 0.50 | 0.78 | 0.75 |
| Sad | 0.81 | 0.77 | 0.62 | 0.73 | 0.85 | 0.83 | 0.51 | 0.73 | 0.73 |
| Angry | 0.94 | 0.92 | 0.91 | 0.92 | 0.93 | 0.90 | 0.86 | 0.89 | 0.91 |
| Fearful | 0.79 | 0.70 | 0.73 | 0.74 | 0.83 | 0.75 | 0.59 | 0.72 | 0.73 |
| Disgust | 0.88 | 0.68 | 0.54 | 0.70 | 0.70 | ||||
| Surprise | 0.86 | 0.69 | 0.74 | 0.76 | 0.76 | ||||
| Calm | 0.73 | 0.62 | 0.79 | 0.71 | 0.61 | 0.58 | 0.68 | 0.62 | 0.67 |
| Happy | 0.80 | 0.85 | 0.29 | 0.65 | 0.86 | 0.88 | 0.40 | 0.72 | 0.68 |
| Sad | 0.56 | 0.56 | 0.34 | 0.49 | 0.73 | 0.74 | 0.40 | 0.63 | 0.55 |
| Angry | 0.75 | 0.78 | 0.59 | 0.71 | 0.88 | 0.88 | 0.57 | 0.77 | 0.74 |
| Fearful | 0.77 | 0.66 | 0.59 | 0.67 | 0.71 | 0.64 | 0.40 | 0.58 | 0.63 |
| Disgust | 0.89 | 0.69 | 0.50 | 0.70 | 0.70 | ||||
| Surprise | 0.82 | 0.61 | 0.62 | 0.68 | 0.68 | ||||
| Neutral | 0.88 | 0.81 | 0.91 | 0.87 | 0.83 | 0.76 | 0.75 | 0.78 | 0.82 |
Validity task Mean proportion correct scores across channel, modality, emotion, and intensity, for speech and song.
Fig 4Confusion matrices of emotional validity.
The confusion matrices present mean proportion correct scores for actors’ intended emotions as per rater chosen emotion labels for: (A) Speech (N = 43200 ratings), and (B) Song (N = 30360 ratings). Proportion scores that equal or exceed 15% are notated on the corresponding bar.
Validity task ICC calculations for intensity and genuineness using single- and multiple-rating, consistency-agreement, 1-way random-effects models.
| Response Scale | ICC test | Value | 95% Conf. Interval | F-test with True Value 0 | ||||
|---|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | Value | df1 | df2 | Sig | |||
| Intensity (speech) | Single (1, 1) | 0.22 | 0.21 | 0.23 | 3.84 | 4319 | 38880 | 0.000 |
| Average (1, k) | 0.74 | 0.73 | 0.75 | 3.84 | 4319 | 38880 | 0.000 | |
| Intensity (song) | Single (1, 1) | 0.21 | 0.20 | 0.22 | 3.63 | 3035 | 27324 | 0.000 |
| Average (1, k) | 0.72 | 0.71 | 0.74 | 3.63 | 3035 | 27324 | 0.000 | |
| Genuineness (speech) | Single (1, 1) | 0.07 | 0.06 | 0.08 | 1.73 | 4319 | 38880 | 0.000 |
| Average (1, k) | 0.42 | 0.40 | 0.45 | 1.73 | 4319 | 38880 | 0.000 | |
| Genuineness (song) | Single (1, 1) | 0.07 | 0.06 | 0.07 | 1.71 | 3035 | 27324 | 0.000 |
| Average (1, k) | 0.42 | 0.38 | 0.45 | 1.71 | 3035 | 27324 | 0.000 | |
Validity task intraclass correlations of the response scales emotional intensity and genuineness, for speech and song.
Test-retest task intrarater reliability ratings by emotion and channel.
| Emotion | Mean Proportion correct Time 1 (SD) | Mean Proportion correct Time 2 (SD) | Kappa |
|---|---|---|---|
| Neutral (speech) | 0.85 (0.36) | 0.89 (0.31) | 0.75 |
| Neutral (song) | 0.78 (0.42) | 0.82 (0.39) | 0.67 |
| Calm (speech) | 0.73 (0.45) | 0.72 (0.45) | 0.75 |
| Calm (song) | 0.64 (0.48) | 0.62 (0.49) | 0.67 |
| Happy (speech) | 0.67 (0.47) | 0.67 (0.47) | 0.77 |
| Happy (song) | 0.73 (0.44) | 0.75 (0.44) | 0.79 |
| Sad (speech) | 0.62 (0.49) | 0.61 (0.49) | 0.73 |
| Sad (song) | 0.69 (0.46) | 0.67 (0.47) | 0.70 |
| Angry (speech) | 0.77 (0.42) | 0.80 (0.40) | 0.77 |
| Angry (song) | 0.82 (0.38) | 0.83 (0.37) | 0.83 |
| Fearful (speech) | 0.71 (0.45) | 0.73 (0.44) | 0.75 |
| Fearful (song) | 0.62 (0.49) | 0.64 (0.48) | 0.72 |
| Disgust (speech) | 0.68 (0.47) | 0.71 (0.45) | 0.73 |
| Surprise (speech) | 0.68 (0.44) | 0.73 (0.44) | 0.73 |
Ratings from the test-retest intrarater reliability task across emotions, in speech and song (N = 72 participants, each rating 102 stimuli twice).
Test-retest task intrarater ICC calculations for intensity and genuineness using single- and multiple-rating, consistency-agreement, 1-way random-effects models.
| Response Scale | ICC test | Value | 95% Conf. Interval | F-test with True Value 0 | ||||
|---|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | Value | df1 | df2 | Sig | |||
| Intensity (speech) | Single (1, 1) | 0.46 | 0.44 | 0.49 | 2.71 | 4319 | 4320 | 0.000 |
| Average (1, k) | 0.63 | 0.61 | 0.65 | 2.71 | 4319 | 4320 | 0.000 | |
| Intensity (song) | Single (1, 1) | 0.46 | 0.43 | 0.49 | 2.70 | 3035 | 3036 | 0.000 |
| Average (1, k) | 0.63 | 0.60 | 0.66 | 2.70 | 3035 | 3036 | 0.000 | |
| Genuineness (speech) | Single (1, 1) | 0.42 | 0.39 | 0.44 | 2.42 | 4319 | 4320 | 0.000 |
| Average (1, k) | 0.59 | 0.56 | 0.61 | 2.42 | 4319 | 4320 | 0.000 | |
| Genuineness (song) | Single (1, 1) | 0.43 | 0.40 | 0.45 | 2.48 | 3035 | 3036 | 0.000 |
| Average (1, k) | 0.60 | 0.57 | 0.62 | 2.48 | 3035 | 3036 | 0.000 | |