| Literature DB >> 30515113 |
Martine Van Puyvelde1,2,3, Xavier Neyt1, Francis McGlone4, Nathalie Pattyn1,2,5.
Abstract
People rely on speech for communication, both in a personal and professional context, and often under different conditions of physical, cognitive and/or emotional load. Since vocalization is entirely integrated within both our central (CNS) and autonomic nervous system (ANS), a mounting number of studies have examined the relationship between voice output and the impact of stress. In the current paper, we will outline the different stages of voice output, i.e., breathing, phonation and resonance in relation to a neurovisceral integrated perspective on stress and human performance. In reviewing the function of these three stages of voice output, we will give an overview of the voice parameters encountered in studies on voice stress analysis (VSA) and review the impact of the different types of physiological, cognitive and/or emotional load. In the section "Discussion," with regard to physical load, a competition for ventilation processes required to speak and those to meet metabolic demand of exercised muscles is described. With regard to cognitive and emotional load, we will present the "Model for Voice and Effort" (MoVE) that comprises the integration of ongoing top-down and bottom-up activity under different types of load and combined patterns of voice output. In the MoVE, it is proposed that the fundamental frequency (F0) values as well as jitter give insight in bottom-up/arousal activity and the effort a subject is capable to generate but that its range and variance are related to ongoing top-down processes and the amount of control a subject can maintain. Within the MoVE, a key-role is given to the anterior cingulate cortex (ACC) which is known to be involved in both the equilibration between bottom-up arousal and top-down regulation and vocal activity. Moreover, the connectivity between the ACC and the nervus vagus (NV) is underlined as an indication of the importance of respiration. Since respiration is the driving force of both stress and voice production, it is hypothesized to be the missing-link in our understanding of the underlying mechanisms of the dynamic between speech and stress.Entities:
Keywords: Model for Voice and Effort; bottom-up and top-down modeling; human performance; stress; voice output; voice stress analysis
Year: 2018 PMID: 30515113 PMCID: PMC6255927 DOI: 10.3389/fpsyg.2018.01994
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Glossary of abbreviations.
| Abbreviation | Name | Description |
|---|---|---|
| ACC | Anterior cingulate cortex | The frontal part of the cingulate cortex. |
| ANS | Autonomic nervous system | One division of the peripheral nervous system, being part of the nervous system. |
| AR | Articulation rate | SPP divided by the total length of the sample minus the duration of pauses. |
| BP | Blood pressure | Pressure of the blood measured in the arteries. |
| CNS | Central nervous system | Part of the nervous system that contains the brain and spinal cord. |
| EEG | Electroencephalography | The electrophysiological monitoring of electrical activity of the brain. |
| GSR | Galvanic skin response | The monitoring of electrodermal activity as a reflection of sympathetic activity. |
| HNR | Harmonic to noise ratio | Indicator of the amount of periodicity against aperiodicity in the voice |
| HR | Heart rate | The number of heartbeats per unit of time. |
| HRF | Harmonic richness factor | The ratio of the sum of the amplitudes of the harmonics and the amplitude of the component at the fundamental frequency. |
| HRV | Heart rate variability | The natural variability in the heart rate under influences of the autonomous nervous system. |
| IP | Inappropriate pauses | Number of inappropriate breathing pauses within one phrase. |
| MFCC | Mel-frequency cepstral coefficients | The coefficients that make up the Mel-Frequency Cepstrum. |
| NAQ | Normalized amplitude quotient | Indicator for breathiness of the voice: ratio of the maximum peak-to-peak amplitude of the glottal flow to the minimum of the glottal flow derivative, normalized by the fundamental period and the sampling frequency. |
| NV | Nervus vagus | Cranial nerve X. |
| OCQ | Open and closing quotients | Timing of opened and closed phases of glottal waveform. |
| SNR | Signal to noise ratio | Indicator of the amount of periodicity against aperiodicity in the voice. |
| SPP | Syllables per phrase | Number of used syllables between two inspirations. |
| STAI | State-Trait Anxiety Inventory | Questionnaire to measure anxiety states and traits. |
| VO2/VO2max | Oxygen consumption/maximal oxygen consumption | Oxygen consumption/the maximum volume of oxygen the body can consume during intense exercise. |
| VOT | Voice onset time | The time interval between the release of a plosive such as ‘p,’ ‘b’ and the beginning of the vocal fold vibration associated with the subsequent vowel. |
| VSA | Voice stress analysis | The technique to analyze the impact of stress on the voice output. |
| VSSR | Vibration space shift rate | The widest vibration space of the voice during a baseline (standard vibration space) compared with that encountered during a target situation. |
Overview of the speech variables related to their respective stage in the process of speech production (i.e., breathing, phonation and resonance), the stressors that impact them.
| Speech parameters | Speech process | Stressors impact | Remarks |
|---|---|---|---|
| • Respiration rate | Breathing | Physical load | Competition between ventilation processes speech |
| • Articulation rate | and metabolic demands muscles: inappropriate breathing | ||
| • Word duration | pause placements. | ||
| • Vowel duration | |||
| • Respiration time between words or sentences | |||
| • Voice Onset Time (VOT) | |||
| • Voicing and voiceless transients | |||
| Acute hypoxia | Different impact on speech between chronic and acute hypoxia. | ||
| Alcohol | Slurred speech. | ||
| Emergencies | Faster articulation rate. | ||
| • Mean F0 SD | Phonation | Physical activity | Fatigue vs. metabolic response? |
| • Min to max range | |||
| • F0 peaks | |||
| • F0 floor values | |||
| • Relative average perturbation | |||
| Acute hypoxia | Difference between chronic and acute hypoxia. | ||
| Alcohol | Replication study needed. | ||
| Sleep deprivation | Impact in correspondence with circadian rhythm. | ||
| Emergencies | Real-life stress clear impact but influence of voluntary control. | ||
| Cognitive workload | Challenge to differentiate between emotional and cognitive load. | ||
| Different types of emotions | Variable results. | ||
| • Jitter | Phonation | Emergencies | Decrease jitter: only one study with |
| • Shimmer | |||
| Cognitive workload | Decrease jitter Decrease shimmer. | ||
| • Signal to noise ratio (SNR) | Phonation | Alcohol | Strong indicator in combination with F0. |
| • Harmonic to noise ratio (HNR) | |||
| Harmonic Richness Factor | Phonation | Physical activity | Subject dependent. |
| Harmonics | Phonation | Different emotions: anger, neutral with little sadness and loudness | Only one study found. |
| Formants | Resonance | Physical load | Only one study with lot of non-responders. |
| Emotional load – Emergency | Significant variations between stress and non-stress but not for all the types of vowels, with different senses of variation on vowel type with stress arousal. | ||
| Cognitive load | F1, F2, and F3 are vowel specific. F1/F2 ratio potential to differentiate between low and high cognitive load. | ||
| Glottal flow | Resonance | Physical load | Increased open and closing quotient indicative of a breathy voice – decreased open and closing quotient of a pressed voice. |
| NAQ | Resonance | Physical load | Potential for NAQ – F0 combination. |
| MFCC | Resonance | Sleep deprivation | Circadian pattern |
| Emotional load | Vowel-dependent? Important to preselect appropriate mel-filters. |
Studies on the impact of physical load on voice and speech production.
| Study | Speech process | Subjects | Context | Task | Speech measures | Results |
|---|---|---|---|---|---|---|
| Breathing | Laboratory | Aerobic task with progressive workload at 50% and 75% of VO2max: speaking and no-speaking condition. Baseline with six additional time points at 50% of VO2max and two at the 75% of VO2max. Speech task: 15 s standardized novel fragment every 3 min. | SPP AR IP | • SPP decreased in the 50 and 75% of VO2max speaking tasks. | ||
| • IP increased the 50 and 75% of VO2max speaking tasks. | ||||||
| • AR no change. | ||||||
| Phonation Resonance | Laboratory | 35 standard speech sentences. Physical activity on an elliptical stair stepper. | F0 F0 SD Utterance duration Voiced – non voiced frames Formants | • Speaker independent correlates: percentage of voicing (decrease in 88.2% of the participants). | ||
| • Speaker dependent correlates: F0 (increase in 60.8%, decrease in 13.6% and no change in 25.5% of the participants), F0 SD (no significant impact), utterance duration (50–50%), glottal waveform and formant parameters (significant shift in F1 but many non-responders). | ||||||
| Phonation Resonance | Laboratory | Five repeated series of eight vowel-consonant-vowel (VCV), and eight consonant-vowel (CV) utterances in BL (seated) and during physical load. | F1, F2 OQ | • F1: interaction effect between speaker and physical load. | ||
| • F2: main effect of physical load. | ||||||
| • OQ: interaction between speaker, vowel, and physical load. | ||||||
| Phonation Resonance | Laboratory | 65 readings of 15 s. (Non)native read and spontaneous speech. Maintaining 10 mph on an elliptical stair stepper. | NAQ HRF F0 | Correlation between F0, NAQ, and HRF shift: a shift in F0 on the entire sample showed significant correlations with a NAQ shift ( | ||
| Phonation | Laboratory | Standard cycle test progressively increased load until breaking point of exhaustion. Speech task: counting 1–10. | F0 | Increased F0, only at submaximal and maximal effort. | ||
| Breathing Phonation | Laboratory | Incremental treadmill test with 4 min exercise – 15 min pause intervals. Speech test: 3–5 s single ‘a.’ | F0 | Linear relationship between F0 and physical load in terms of dyspnea, oxygen consumption (VO2) and ventilation (VE). Anxiety creates ceiling effect (i.e., higher F0 onset in anxious state). |
Deleterious impacts on human performance: studies on the impact of alcohol, sleep deprivation and hypoxia on voice and speech production.
| Study | Speech process | Subjects | Context | Task | Speech measures | Results |
|---|---|---|---|---|---|---|
| Phonation | Real-life | No task. Speech samples of the captain engaged in the Exxon Valdez oil disaster 33 h before, 1 h before and 1 h after the disaster. | Speech rate, articulatory errors | • Less syllables per hour. | ||
| • Increased speech errors. | ||||||
| Phonation Resonation | Laboratory | Reading task: text in sober and intoxicated condition. | F0 and SNR F1/F2 ratio | • Combination F0 and SNR robust detector, 0% error rate; F0 2.8%; SNR 3.2%. | ||
| • The F1/F2 ratio responded only in high intoxication. | ||||||
| Breathing-phonation | Laboratory | Three separate reading sessions: sober, mild, moderate intoxication with interval of 48 h. Reading task: linguistic passage of 613 words. | Reading time, interjections, omissions | • Increased mean reading time. | ||
| • Increased interjections. | ||||||
| • Increased omissions. | ||||||
| Phonation | Laboratory | Three separate reading sessions: sober, moderate, high intoxication. | Reading rate Amplitude in dB F0 | • Decreased reading rate. | ||
| • Decreased amplitude (from sober to moderate). | ||||||
| • No impact on F0. | ||||||
| Phonation | Laboratory | Reading task. | F0 Articulation rate | • F0 mixed results. | ||
| • Decreased articulation rate. | ||||||
| Resonance | Laboratory | List of 31 words read at six time points (10:00 AM, 4:00 PM, 10:00 PM, 4:00 AM, 10:00 AM, and 4:00 PM) through a 34 h sleep deprivation period. | SAFTE sleep reports MFCC: 12000 formant frequencies | • Correlation between fatigue score and Mel-frequency cepstral coefficients (MFCCs). | ||
| • Circadian periodicity in both sleep and voice measures. | ||||||
| • Character “p” in particular sensitive. | ||||||
| Phonation | Laboratory | • Sleep deprivation 36 h, some naps allowed. | FO Word duration | Circadian pattern in cognitive performance and voice aspects: during early A.M. hours lowest cognitive performance – increased F0 – decreased word duration. | ||
| • Speech semi-standard sentence including standard words (e.g., ‘Futility Magelan’). | ||||||
| • Speech non-standard words (e.g., the pilot’s name and zulu-time). | ||||||
| • Cognitive matrix comparison task, a logical reasoning task, a tracking task, attention switching task and a recognition task. | ||||||
| Breathing Phonation | Chronic hypoxia | Real-life | Short text phonetically balanced folk tale (about six sentences North Wind and the Sun”). | Articulation rate Transient segment rate | • Articulation rate: no effect. | |
| • Transient segment rate: decrease and similar shape but with a 30 days shift delay with regard to SO2-dip. | ||||||
| Breathing Phonation | Chronic hypoxia | Real-life | Cognitive test battery for Parkinson Disease patients: sorting card tests – Wisconsin Card Sorting Test, the ‘Odd-Man-Out test’ (OMO test), Mini-Cog Quick Assessment Battery of | VOT Vowel duration Comprehension errors. | • Decreased VOT separation time. | |
| • Increased vowel duration. | ||||||
| • Increased comprehension errors. | ||||||
| Breathing Phonation | Acute hypoxia | Real-life | No task. Fatal air crash. | F0 VOT | • Decreased F0. | |
| • increased VOT. | ||||||
Studies on the impact of emotional load on voice and speech production.
| Study | Speech process | Subjects | Context | Task | Speech measures | Results |
|---|---|---|---|---|---|---|
| Phonation | Real-life | No task. Crash vs. routine check. | F0 | Increased F0: 115–163 Hz | ||
| Increased F1: 510–537 Hz. | ||||||
| Phonation | Real-life and laboratory | No task. Pilot communication. | F0 | • Increased F0: 95–148 Hz; 101–123 Hz; 149–264 Hz. | ||
| F0 | • Increased F0 SD: 12.9–23.7 Hz; 12.6–63.8 Hz; 30.1–66.0 Hz. | |||||
| SD Jitter | • Decreased jitter: 1.90–1.53%. | |||||
| Breathing Phonation | Real-life | No task. Phone Emergency calls in function of different types of emotions. | F0 | • Fear: increased F0, F0 range and speech rate with high maximal peak frequencies. | ||
| F0 range speech rate (i.e., syllables per s) maximal peak frequencies | • Anger – irritation: increase F0 and F0 range. | |||||
| Breathing Phonation | 400 F0 contours (number of participants not mentioned) | Real-life | Stress conditions from SUSAS corpus for anger. | • Increased F0, F0-variance and F0-range, | ||
| • Increase in formants F1 and F2, | ||||||
| • Increased vowel duration and increased word intensity. | ||||||
| Phonation | Real-life | No task. Pilot communication of 14 aircraft accidents. | VSSR | A higher VSSR in the start of the emergency communication related with more critical/fatal accident. | ||
| Phonation | Real-life | No task. Crash vs. routine check. | Speech rate | • Speaking rate little impact. | ||
| Syllable count | • Syllable count significantly decreased. | |||||
| F0 | • Increased mean F0: 123.9–200.1 Hz | |||||
| F0-range | • Increased F0 range from 124.2 to 297.3 Hz. | |||||
| Phonation | Real-life | No task. Interview. | Galvanic skin response Mean and SD of F0, RAP and jitter | • Negative correlation between GSR and F0 SD. | ||
| • Increased jitter levels in emotional load speech fragments. | ||||||
| Phonation Resonance | Real-life | No task. Three stress stages before crash: Stress 0 (neutral), Stress 1 (first incident), Stress 2 (final incident before a crash). | F0 | • F0-increase during stress 1 in pilot (117–150 Hz) and only during stress 2 in co-pilot (to 204 Hz with a maximal frequency of 340 Hz). | ||
| Formants | • Significant increase in F2 in the pilot and a significant decrease in F3 in the co-pilot. | |||||
| Phonation Resonance | Real-life | Exam stress – public presentation. | F0 | • Increased F0 and V0 variance. | ||
| F0 variance | • Increased F1 and F2 frequencies. | |||||
| MFCC | • Decreased mel-cepstral coefficients. | |||||
| Formants | ||||||
| Phonation | Real-life | No task. Radio communication during. | F0 | • System operator small decrease in F0 (138–136 Hz), stable F0-range (22 Hz), decreased F0 max (202–197 Hz), decreased speech rate (4.6–4.1 words/s). Stable amplitude. | ||
| Speech rate | Superior small increased F0 (147–155 Hz), and F0-range (20–26 Hz), increased F0 max (193–218 Hz) decreased speech rate (5.3–4.8 words/s), increased amplitude and decrease in speech rate. | |||||
| Amplitude. | ||||||
| Phonation | Real-life | No task. 1968: Crash vs. routine check. 1972: radio reporter. | F0 | Increase F0: 208–432 Hz. |
Studies on the impact of cognitive load on voice and speech production.
| Study | Speech process | Subjects | Context | Task | Speech measures | Results |
|---|---|---|---|---|---|---|
| Phonation | Laboratory | Psychomotor tests, increased difficulty. | F0 Amplitude in dB Word duration | Level 4 caused: | ||
| • Level 1 counting from 1 to 10 for 10 times. | • Significantly increased F0 (106.95–118.91 Hz) | |||||
| • Level 2: psychomotor test (PMT) while counting 0–9. | • Increased intensity (49.38–57.12 dB). | |||||
| • Level 3: simple dichotic listening task (DLT) with vocalized responses. | • Decreased word duration (384.81–338.80 ms). | |||||
| • Level 4: combined DLT-PMT task. | ||||||
| Phonation | Laboratory | Time-constraints in a read-aloud calculation task. | F0 | Mixed results: increased and decreased F0-patterns. | ||
| Phonation | Simulation flight | Flight simulation task with three levels of cognitive load: situation awareness, information processing and decision making. | F0 F0 range amplitude | • Increased F0 and amplitude in function of cognitive load. | ||
| • Decreased F0 range. | ||||||
| Phonation | Not mentioned – model testing | laboratory | • BL: questionnaire and 10 min relaxation. | F0 prediction models | No reliable predictive models in F0. | |
| • Low cognitive load: Stroop-Word congruent color test and easy mental-math test. | Voice stress is an individual dependent factor. | |||||
| • High cognitive load: Stroop-Word incongruent color test and hard mental-math tests. | ||||||
| Phonation | Laboratory | Psychomotor tests: time pressure, problem solving test, sensorimotor coordination and a handgrip physical strength test) with alternated relaxation (nature images with composed music). | F0 | • Increased F0 in time pressure and problem solving tasks. | ||
| Breathing Phonation | Laboratory | Stroop-task with increased difficulty: shortening the time between the appearances of the presentation of the matched and non-matched color-word sample every minute with half a second. | F0 Jitter Duration | • Significant increased F0 (114.28–122.20 Hz) and F0-variation (7.36–10.11 Hz). | ||
| • Significant decreased jitter (1.24–0.94%). | ||||||
| • Significant decrease in jitter (1.24–0.94%). | ||||||
| • High frequency energy more present in longer time-slots. | ||||||
| • Decreased utterance duration. | ||||||
| Phonation | Laboratory | Stroop-task. | F0 Formants F1, F2, F3 | • Significant increase in F0 (127–164.8 Hz with a maximal peak value of 250 Hz). | ||
| • Impact on formants vowel specific. |
Studies on the impact of mixed cognitive and emotional load on voice and speech production.
| Study | Speech process | Subjects | Context | Task | Speech measures | Results |
|---|---|---|---|---|---|---|
| Breathing Phonation | Laboratory | Counting during a computer tracking task. | F0, speaking rate, amplitude, vocal intensity, vocal jitter and shimmer and a derived measure (combination of all of the speech variables except of vocal shimmer) | Only the results of | ||
| • Increased F0 (i.e., 2 Hz) and amplitude (i.e., 1 dB) at the difficult task level in comparison with the easy task level. | ||||||
| • Decreased marginal effect on jitter, shimmer, and speech rate (i.e., 4%). | ||||||
| • Amplitude and heart rate showed an interaction effect with the emotional factor of reward: increased heart rate when closer to the point of winning extra money. | ||||||
| Phonation | Cognitive load laboratory Emotional load real-life | Students were classified in high, medium, or low anxiety based on the STAI-Trait Anxiety Inventory | F0 Jitter, shimmer Harmonic energy | • Increased F0. | ||
| • Backward reading of the alphabet. | • Decreased values in jitter and shimmer. | |||||
| • Tongue-twisters under time-pressure (with and without delayed auditory feedback). | • Increase in high-frequency harmonic energy (1600–4500 Hz). | |||||
| • sustained ‘a.’ | ||||||
| Breathing Phonation | Laboratory | • Psychomotor cognitive tests: time pressure, problem solving test, sensorimotor coordination; handgrip physical strength test alternated with relaxation periods. | Speech rate Energy below 500 Hz F0 | • Increased speech rate in cognitive load but no impact of emotional load. | ||
| • No particular inducement of emotional load, only a subjective self-rate. | • Decreased proportion of energy below 500 Hz during cognitive load but no impact of emotional load. | |||||
| • Increased F0 related to emotional load and not cognitive load. | ||||||
| Phonation | Both cognitive and emotional load: Laboratory | • Classification based on anxiety coping style (low anxiety, high anxiety and anxiety deniers) – based on the combination of the scores of the Manifestation Anxiety Scale ( | F0 F0 floor Formants F1/F2 | • Male – low and high anxiety traits: higher F0 values under cognitive load than under emotional load. | ||
| • Cognitive load: easy and difficult logical reasoning tasks. | • Anxiety deniers (both male and female): higher mean F0 under emotional than cognitive load. | |||||
| • Low and high emotional load: pictures of skin diseases/severe care accident injuries. | • F0 floor: increase in high emotional load in high-anxious subjects and anxiety deniers; decrease in high emotional load in low-anxious subjects. | |||||
| • Anxiety-denying women: increased distance between F1/F2 and decreased distance between F1/F2 in high emotional load. |
FIGURE 1Switching zone of voice reactivity to physical load. As a consequence of a competition for ventilation processes, F0 increases in response to physical load from the point that this load is not well tolerated anymore (e.g., Johannes et al., 2007).
FIGURE 2“Model for Voice and Effort” (MoVE). The MoVE shows how the activity of ongoing top-down (TD on the Figure) and bottom-up (BU on the Figure) processes are mirrored within the phonation voice parameters F0, F0-range and jitter. Increased F0-ranges correspond with reduced top-down processes reaching an alarm zone when cognitive top-down control is lost—such as in life-threatening emergency situations (e.g., flight crash, alcohol intoxication). Decreased F0-ranges are consequential of high cognitive load and top-down control. The additional information of mean F0 values gives insight in the bottom-up arousal activity and the effort a subject is capable to generate. Highly increased or decreased F0-values are indicative of effort-depletion, also reaching an alarm-zone in life-threatening emergency situations (e.g., flight crash, alcohol intoxication). Jitter expresses bottom-up arousal in an inverse manner. Cognitive and emotional load correspond with respective small and larger reduced jitter values.