Literature DB >> 33748320

Modal and non-modal voice quality classification using acoustic and electroglottographic features.

Michal Borsky, Daryush D Mehta, Jarrad H Van Stan, Jon Gudnason.   

Abstract

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

Entities:  

Keywords:  COVAREP; Consensus Auditory-Perceptual Evaluation of Voice; acoustics; electroglottograph; glottal glottal inverse filtering; mel-frequency cepstral coefficients; modal voice; non-modal voice; voice quality assessment

Year:  2017        PMID: 33748320      PMCID: PMC7971071          DOI: 10.1109/taslp.2017.2759002

Source DB:  PubMed          Journal:  IEEE/ACM Trans Audio Speech Lang Process


  20 in total

1.  Vocal projection in actors: the long-term average spectral features that distinguish comfortable acting voice from voicing with maximal projection in male actors.

Authors:  Rachel Pinczower; Jennifer Oates
Journal:  J Voice       Date:  2005-09       Impact factor: 2.009

Review 2.  Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.

Authors:  R Fraile; N Sáenz-Lechón; J I Godino-Llorente; V Osma-Ruiz; C Fredouille
Journal:  Folia Phoniatr Logop       Date:  2009-07-01       Impact factor: 0.849

3.  Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality.

Authors:  Soren Y Lowell; Richard T Kelley; Shaheen N Awan; Raymond H Colton; Natalie H Chan
Journal:  Ann Otol Rhinol Laryngol       Date:  2012-08       Impact factor: 1.547

4.  Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.

Authors:  Athanasios Tsanas; Max A Little; Patrick E McSharry; Jennifer Spielman; Lorraine O Ramig
Journal:  IEEE Trans Biomed Eng       Date:  2012-01-09       Impact factor: 4.538

5.  Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders.

Authors:  Michael P Karnell; Sarah D Melton; Jana M Childes; Todd C Coleman; Scott A Dailey; Henry T Hoffman
Journal:  J Voice       Date:  2006-07-05       Impact factor: 2.009

Review 6.  Perceptual evaluation of voice quality: review, tutorial, and a framework for future research.

Authors:  J Kreiman; B R Gerratt; G B Kempster; A Erman; G S Berke
Journal:  J Speech Hear Res       Date:  1993-02

7.  Modeling the glottal volume-velocity waveform for three voice types.

Authors:  D G Childers; C Ahn
Journal:  J Acoust Soc Am       Date:  1995-01       Impact factor: 1.840

8.  Laryngeal aerodynamics associated with selected voice disorders.

Authors:  R Netsell; W Lotz; A L Shaughnessy
Journal:  Am J Otolaryngol       Date:  1984 Nov-Dec       Impact factor: 1.808

9.  Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.

Authors:  Stephen Jannetts; Anja Lowit
Journal:  J Voice       Date:  2014-05-16       Impact factor: 2.009

Review 10.  Evidence-based clinical voice assessment: a systematic review.

Authors:  Nelson Roy; Julie Barkmeier-Kraemer; Tanya Eadie; M Preeti Sivasankar; Daryush Mehta; Diane Paul; Robert Hillman
Journal:  Am J Speech Lang Pathol       Date:  2012-11-26       Impact factor: 2.408

View more
  1 in total

1.  Automatic Detection of COVID-19 Based on Short-Duration Acoustic Smartphone Speech Analysis.

Authors:  Brian Stasak; Zhaocheng Huang; Sabah Razavi; Dale Joachim; Julien Epps
Journal:  J Healthc Inform Res       Date:  2021-03-11
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.