Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Modal and non-modal voice quality classification using acoustic and electroglottographic features.

Literature DB >> 33748320

Modal and non-modal voice quality classification using acoustic and electroglottographic features.

Michal Borsky, Daryush D Mehta, Jarrad H Van Stan, Jon Gudnason.

Abstract

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

Entities: Chemical Disease Gene Species

Keywords: COVAREP; Consensus Auditory-Perceptual Evaluation of Voice; acoustics; electroglottograph; glottal glottal inverse filtering; mel-frequency cepstral coefficients; modal voice; non-modal voice; voice quality assessment

Year: 2017 PMID： 33748320 PMCID： PMC7971071 DOI： 10.1109/taslp.2017.2759002

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

20 in total

1. Vocal projection in actors: the long-term average spectral features that distinguish comfortable acting voice from voicing with maximal projection in male actors.

Authors: Rachel Pinczower; Jennifer Oates
Journal: J Voice Date: 2005-09 Impact factor: 2.009

Review 2. Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.

Authors: R Fraile; N Sáenz-Lechón; J I Godino-Llorente; V Osma-Ruiz; C Fredouille
Journal: Folia Phoniatr Logop Date: 2009-07-01 Impact factor: 0.849

3. Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality.

Authors: Soren Y Lowell; Richard T Kelley; Shaheen N Awan; Raymond H Colton; Natalie H Chan
Journal: Ann Otol Rhinol Laryngol Date: 2012-08 Impact factor: 1.547

4. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.

Authors: Athanasios Tsanas; Max A Little; Patrick E McSharry; Jennifer Spielman; Lorraine O Ramig
Journal: IEEE Trans Biomed Eng Date: 2012-01-09 Impact factor: 4.538

5. Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders.

Authors: Michael P Karnell; Sarah D Melton; Jana M Childes; Todd C Coleman; Scott A Dailey; Henry T Hoffman
Journal: J Voice Date: 2006-07-05 Impact factor: 2.009

Modal and non-modal voice quality classification using acoustic and electroglottographic features.

1. Vocal projection in actors: the long-term average spectral features that distinguish comfortable acting voice from voicing with maximal projection in male actors.

Review 2. Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.

3. Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality.

4. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.

5. Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders.

Review 6. Perceptual evaluation of voice quality: review, tutorial, and a framework for future research.

7. Modeling the glottal volume-velocity waveform for three voice types.

8. Laryngeal aerodynamics associated with selected voice disorders.

9. Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.

Review 10. Evidence-based clinical voice assessment: a systematic review.

1. Automatic Detection of COVID-19 Based on Short-Duration Acoustic Smartphone Speech Analysis.