| Literature DB >> 31236467 |
Maria E Powell1, Marcelino Rodriguez Cancio2, David Young1, William Nock3, Beshoy Abdelmessih1, Amy Zeller1, Irvin Perez Morales4,5, Peng Zhang3, C Gaelyn Garrett1, Douglas Schmidt3, Jules White3, Alexander Gelbard1.
Abstract
OBJECTIVE: Acoustic analysis of voice has the potential to expedite detection and diagnosis of voice disorders. Applying an image-based, neural-network approach to analyzing the acoustic signal may be an effective means for detecting and differentially diagnosing voice disorders. The purpose of this study is to provide a proof-of-concept that embedded data within human phonation can be accurately and efficiently decoded with deep learning neural network analysis to differentiate between normal and disordered voices.Entities:
Keywords: Voice disorders; acoustic analysis; classification; convolutional neural network; detection
Year: 2019 PMID: 31236467 PMCID: PMC6580072 DOI: 10.1002/lio2.259
Source DB: PubMed Journal: Laryngoscope Investig Otolaryngol ISSN: 2378-8038
Figure 1To standardize input into the neural network, acoustic signals were segmented into 3‐second chunks (left) and transformed into spectrograms using the Fourier transform (middle). Spectrograms displayed frequency over time, with intensity coded by grayscale (right).
Total Sample Size for Each Group and the Derived Baseline Accuracy for Each Classification Task.
| Diagnostic Group | Normal | ADSD | ETV | MTD | PCord | UVFP | Polyp | RRP | Total |
|---|---|---|---|---|---|---|---|---|---|
|
| 45 | 56 | 74 | 49 | 54 | 59 | 56 | 58 | 451 |
|
| |||||||||
| Naïve algorithm (%) | – | 56/101 (55%) | 74/119 (62%) | 49/94 (52%) | 54/99 (55%) | 59/104 (57%) | 56/101 (55%) | 58/103 (56%) | – |
Normal = vocally healthy; AdSD = adductor spasmodic dysphonia; ETV = essential tremor of voice; MTD = muscle tension dysphonia; Pcord = polypoid corditis or Reinke's edema; Polyp = vocal fold polyp; RRP = recurrent respiratory papillomatosis; UVFP = unilateral vocal fold paralysis.
Figure 2Summary of the Keras convolutional neural network models trained for the seven binary classification tasks. Conv2D = 2D convolutional layer; MaxPooling2D = 2D max‐pooling layer.
Figure 3Spectrograms of all audio files from vocally healthy individuals (left) and patients with adductor spasmodic dysphonia (right). The fifth validation fold classified all spectrograms from normal subject 5 and ADSD patient 5. Frames used in the binary classification task are surrounded by lines. Synthetic images derived from all other organic spectrograms were used to train the model. ADSD = adductor spasmodic dysphonia.
Figure 4Accuracy and loss results for the fifth fold (best case) from the ADSD diagnostic category. Baseline accuracy, as well as the accuracy and loss results from the highest performing epoch (epoch 10) are labeled. ADSD = adductor spasmodic dysphonia.
Figure 5Average results of all folds obtained from 10‐fold cross validation for the binary classification of (a) adductor spasmodic dysphonia, (b) polypoid corditis or Reinke's edema, (c) unilateral vocal fold paralysis, (d) vocal fold polyp, (e) recurrent respiratory papillomatosis, (f) essential tremor of voice, and (g) muscle tension dysphonia. Baseline accuracies, as well as the accuracy and loss results from the highest performing epochs are labeled for each model.
Figure 6Accuracy and loss results for the best fold for (a) muscle tension dysphonia and (b) essential tremor of voice. Baseline accuracies, as well as the accuracy and loss results from the highest performing epochs are labeled for each model.