| Literature DB >> 34241490 |
Katrin D Bartl-Pokorny1, Florian B Pokorny1, Anton Batliner1, Shahin Amiriparian1, Anastasia Semertzidou1, Florian Eyben2, Elena Kramer3, Florian Schmidt3, Rainer Schönweiler3, Markus Wehler4, Björn W Schuller1.
Abstract
COVID-19 is a global health crisis that has been affecting our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. Many symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the first time, the present study investigates voice acoustic correlates of a COVID-19 infection based on a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /u:/, /o:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in front vowels are additionally reflected in fundamental frequency variation and the harmonics-to-noise ratio, group differences in back vowels in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Our findings represent an important proof-of-concept contribution for a potential voice-based identification of individuals infected with COVID-19.Entities:
Year: 2021 PMID: 34241490 PMCID: PMC8269757 DOI: 10.1121/10.0005194
Source DB: PubMed Journal: J Acoust Soc Am ISSN: 0001-4966 Impact factor: 1.840
Vowel-wise acoustic features with a differentiation effect r > 0.3 between COVID-19 negative and COVID-19 positive participants, ranked according to the effect size r. r is rounded to two decimal places. p-values of the underlying Mann-Whitney U tests rounded to three decimal places are given as well. Level of significance after Bonferroni correction: 7 × 10−4. A1 = relative amplitude of first harmonic, AF3 = amplitude of third vowel formant, B1,2,3 = bandwidth of first, second, and third vowel formant, fo = fundamental frequency, F3 = frequency of third vowel formant, HNR = harmonics-to-noise ratio, MFCC1,2,4 = first, second, and fourth Mel-frequency cepstral coefficient, pctl = percentile, pctlrg = percentile range, SDnorm = standard deviation normalised by the arithmetic mean (coefficient of variation), VR = voiced regions.
| Vowel | Rank | Feature | ||
|---|---|---|---|---|
| /i:/ | 1 | voiced segments per second | 0.46 | 0.030 |
| 2 | mean local shimmer | 0.43 | 0.042 | |
| 3 | mean voiced segment length | 0.42 | 0.049 | |
| 4 | mean rising slope | 0.41 | 0.057 | |
| 5 | rising slope | 0.39 | 0.066 | |
| 6 | harmonic difference | 0.38 | 0.076 | |
| 7 | MFCC2 VR SDnorm | 0.36 | 0.088 | |
| 8 | HNR SDnorm | 0.35 | 0.101 | |
| 9 | 0.34 | 0.115 | ||
| 10 | voiced segment length SD | 0.33 | 0.118 | |
| 11 | 0.32 | 0.131 | ||
| 12 | MFCC1 VR SDnorm | 0.31 | 0.149 | |
| /e:/ | 1 | mean voiced segment length | 0.51 | 0.017 |
| 2 | mean local jitter | 0.49 | 0.022 | |
| 3 | 0.48 | 0.026 | ||
| 4 | voiced segments per second | 0.48 | 0.026 | |
| 5 | HNR SDnorm | 0.39 | 0.066 | |
| 6 | 0.38 | 0.076 | ||
| 7 | MFCC1 VR SDnorm | 0.34 | 0.115 | |
| /u:/ | 1 | 0.46 | 0.030 | |
| 2 | 0.42 | 0.049 | ||
| 3 | slope500–1500Hz VR SDnorm | 0.42 | 0.049 | |
| 4 | mean MFCC1 | 0.39 | 0.066 | |
| 5 | MFCC4 SDnorm | 0.39 | 0.066 | |
| 6 | MFCC1 VR SDnorm | 0.39 | 0.066 | |
| 7 | mean MFCC1 VR | 0.38 | 0.076 | |
| 8 | loudness pctl20 | 0.36 | 0.088 | |
| 9 | mean | 0.34 | 0.115 | |
| 10 | slope0–500Hz VR SDnorm | 0.34 | 0.115 | |
| /o:/ | 1 | mean | 0.48 | 0.027 |
| 2 | 0.42 | 0.053 | ||
| 3 | slope500–1500Hz VR SDnorm | 0.39 | 0.073 | |
| 4 | mean MFCC1 VR | 0.39 | 0.073 | |
| 5 | mean local shimmer | 0.35 | 0.113 | |
| 6 | 0.35 | 0.113 | ||
| 7 | falling slope loudness SD | 0.33 | 0.130 | |
| 8 | mean MFCC1 | 0.33 | 0.130 | |
| 9 | rising slope loudness SD | 0.32 | 0.149 | |
| 10 | mean MFCC2 VR | 0.32 | 0.149 | |
| /a:/ | 1 | mean voiced segment length | 0.39 | 0.066 |
| 2 | mean loudness | 0.38 | 0.076 | |
| 3 | mean MFCC2 VR | 0.38 | 0.076 | |
| 4 | loudness pctl50 | 0.35 | 0.101 | |
| 5 | spectral flux SDnorm | 0.35 | 0.101 | |
| 6 | mean MFCC2 | 0.34 | 0.115 | |
| 7 | mean Hammarberg index VR | 0.32 | 0.131 | |
| 8 | loudness pctl80 | 0.31 | 0.149 | |
| 9 | slope500–1500Hz VR SDnorm | 0.31 | 0.149 | |
| 10 | loudness peaks per second | 0.31 | 0.149 | |
| 11 | voiced segments per second | 0.31 | 0.149 |
Acoustic features with a differentiation effect r > 0.3 between COVID-19 negative and COVID-19 positive participants, ranked according to the effect size r for the combination of the front vowels /i:/ and /e:/, the back vowels /u:/ and /o:/, and all vowels. r is rounded to two decimal places. p-values of the underlying Mann-Whitney U tests rounded to three decimal places are given as well. Level of significance after Bonferroni correction: 7 × 10−4. A1 = relative amplitude of first harmonic, AF3 = amplitude of third vowel formant, B1,2,3 = bandwidth of first, second, and third vowel formant, fo = fundamental frequency, F3 = frequency of third vowel formant, HNR = harmonics-to-noise ratio, MFCC1,2 = first and second Mel-frequency cepstral coefficient, SDnorm = standard deviation normalised by the arithmetic mean (coefficient of variation), VR = voiced regions.
| Vowel | Rank | Feature | ||
|---|---|---|---|---|
| /i:/ | 1 | voiced segments per second | 0.48 | 0.002 |
| 2 | mean voiced segment length | 0.47 | 0.002 | |
| 3 | HNR SDnorm | 0.37 | 0.014 | |
| 4 | 0.37 | 0.014 | ||
| 5 | mean local shimmer | 0.35 | 0.022 | |
| 6 | 0.32 | 0.036 | ||
| 7 | harmonic difference | 0.31 | 0.038 | |
| 8 | MFCC1 VR SDnorm | 0.31 | 0.038 | |
| 9 | voiced segment length SD | 0.31 | 0.039 | |
| /u:/ | 1 | mean | 0.42 | 0.006 |
| 2 | mean MFCC1 VR | 0.49 | 0.009 | |
| 3 | 0.38 | 0.012 | ||
| 4 | 0.38 | 0.014 | ||
| 5 | mean MFCC1 | 0.36 | 0.018 | |
| 6 | MFCC1 VR SDnorm | 0.34 | 0.026 | |
| 7 | mean local shimmer | 0.33 | 0.032 | |
| 8 | 0.32 | 0.036 | ||
| 9 | slope500–1500Hz VR SDnorm | 0.31 | 0.040 | |
| 10 | mean voiced segment length | 0.31 | 0.041 | |
| 11 | mean MFCC2 VR | 0.30 | 0.048 | |
| /i:/ | 12 | mean voiced segment lengthvoiced segments per second | 0.390.38 | 4 × 10−58 × 10−5 |
FIG. 1.(Color online) Vowel-wise acoustic feature comparisons between COVID-19 negative (–COV) and COVID-19 positive (+COV) participants in form of boxplots for features with a differentiation effect r > 0.4 ordered from left to right according to a decreasing r, respectively. The effect size r as well as the p-value of the Mann-Whitney U difference test are given above each boxplot. r is rounded to two decimal places. p is rounded to three decimal places. Level of significance after Bonferroni correction: 7 × 10−4. Outliers (marked with red plus symbols) are defined as values that are more than 1.5 times the interquartile range away from the bottom or top of the respective box. # = number of, B1,3 = bandwidth of first and third vowel formant, fo = fundamental frequency, F3 = frequency of third vowel formant, len. = length, pctlrg = percentile range, RS = rising slope, seg. = segment, ST = semitone from 27.5 Hz, SDnorm = standard deviation normalised by the arithmetic mean (coefficient of variation), slp = slope, VR = voiced regions.
FIG. 2.(Color online) Fundamental frequency (fo) trajectories for the production of the sustained vowel /e:/ of a 68-year-old male COVID-19 negative participant (–COV, green line) and a 54-year-old male COVID-19 positive participants (+COV, red line). ST = semitone from 27.5 Hz.