| Literature DB >> 35651538 |
Manuel Milling1, Florian B Pokorny1,2,3, Katrin D Bartl-Pokorny1,2,3, Björn W Schuller1,4.
Abstract
In recent years, advancements in the field of artificial intelligence (AI) have impacted several areas of research and application. Besides more prominent examples like self-driving cars or media consumption algorithms, AI-based systems have further started to gain more and more popularity in the health care sector, however whilst being restrained by high requirements for accuracy, robustness, and explainability. Health-oriented AI research as a sub-field of digital health investigates a plethora of human-centered modalities. In this article, we address recent advances in the so far understudied but highly promising audio domain with a particular focus on speech data and present corresponding state-of-the-art technologies. Moreover, we give an excerpt of recent studies on the automatic audio-based detection of diseases ranging from acute and chronic respiratory diseases via psychiatric disorders to developmental disorders and neurodegenerative disorders. Our selection of presented literature shows that the recent success of deep learning methods in other fields of AI also more and more translates to the field of digital health, albeit expert-designed feature extractors and classical ML methodologies are still prominently used. Limiting factors, especially for speech-based disease detection systems, are related to the amount and diversity of available data, e. g., the number of patients and healthy controls as well as the underlying distribution of age, languages, and cultures. Finally, we contextualize and outline application scenarios of speech-based disease detection systems as supportive tools for health-care professionals under ethical consideration of privacy protection and faulty prediction.Entities:
Keywords: artificial intelligence; disease detection; healthcare; machine learning; speech
Year: 2022 PMID: 35651538 PMCID: PMC9149088 DOI: 10.3389/fdgth.2022.886615
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Comparison of a 19-year-old symptomatic male with COVID-19 (top) and a 37-year-old asymptomatic COVID-19 negative male (bottom) by means of speech spectrograms of the recorded first clause of the German standard text “The Northwind and the Sun” [“Einst stritten sich Nordwind und Sonne, wer von ihnen beiden wohl der Stärkere wäre."]. The recordings are part of the “Your Voice Counts” dataset (12, 13).
Overview of recent speech-based disease detection studies.
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|
| Alzheimer's disease | ( | 78, 50–80 y, w/ and w/o, MMST interval: 0–30 | Spontaneous speech | BoAW, ZFF-Signals | Prediction of MMST value | SVR | RMSE: 6.97 |
| w/ vs w/o (2-class) | E2E CNN | Acc: 0.74 | |||||
| Anxiety disorder | ( | 239 (69/170), 18–68 y (31.5 ± 12.3 y), BAI interval: 0–63 | Vocalization exercises | ComParE, eGeMAPS, DS | BAI prediction | SVR | ρ ≤ 0.70 |
| Bipolar disorder | ( | 46 (30/16), 18–60 y YMRS: remission (0–7), hypomania (8–19), mania (20–60) | Audio from structured interview | MFCCs | YMRS (3-class) | DNN | UAR: 0.57 |
| Bronchial asthma | ( | 71 (N/A), w/ and 135 (NA) w/o bronchial asthma, 8 ± N/A y | Sustained vowel /a:/ | MFCCs, CQCCs | w/ vs w/o (2-class) | GMM-UBM | Acc: 0.72 |
| COVID-19 | ( | 52 (32/20), 63.4 ± 9.9 y Hospitalized w/, 3 severity categories | Speech, 5 sentences | eGeMAPS, ComParE | severity prediciton (3-class) | SVM | UAR ≤ 0.68 |
| ( | 20 (12/8) w/ 60 (40/20) w/o and healthy | Speech | (Δ/Δ2)-MFCCs, LLDs | w/ vs w/o (2-class) | LSTM-RNN | Acc: 0.88 | |
| Depression | ( | 275, PHQ-8 interval: 0–24 | Audio from semi-clinical interviews | LLDs, BoAW, DS | PHQ-8 prediction | RNNs | CCC ≤ 0.108, RMSE ≥ 8.19 |
| ( | 292 (N/A), 18–63 y (31.5 ± 12.3 y), BDI-II interval: 0–63 | Audio from HCI scenario | log-Mel-spectrograms | BDI-II prediction | CNN | RMSE: 9.65 | |
| ( | 182 (N/A) w/ or w/o, binary PHQ-8 | Speech from clinical interviews | MFCC | depression prediction (2-class) | LSTM-RNN | Acc: 0.763 | |
| Develop-mental disorder | ( | 11 children w/ ASD, 10 w/ PDD, 13 w/ SLI, 68 typically developed, 6–18 y | Spontaneous speech | eGeMAPS, ComPARE | Developmental disorder prediction (4-class) | GANs | UAR: 0.47 |
| ( | 10 infants later diagnosed w/ ASD (5/5), 10 typically developed (5/5), 10 m | Audio from PCI scenario | eGeMAPS | ASD prediction (2-class) | SVM, RNN | Acc: 0.75 | |
| Parkinson's disease | ( | 23 (N/A) w/ and 8 (N/A) w/o | Speech sound samples | 22 selected acoustic features | w/ vs w/o (2-class) | k-NN, RF, NB, SVM | CR ≤ 85.81% |
| ( | 50 w/ (25/25) and 50 w/o (25/25), 31–86 y | Read words/texts, monolog, diadochokinetic exercises | 488 articulatory features, 28 phonation features, 103 prosody features, 192 glottal features | w/ vs w/o (2-class) | SVM | Acc ≤ 0.68 | |
| E2E CNN | Acc ≤ 0.69 | ||||||
| Pathological speech | ( | 126 (N/A) | speech | Cochleogram, Hilbert Spectrum | w/ vs w/o (2-class) | VGG-16 CNN | Acc: 0.92 |
| Upper respiratory tract infection | ( | 630 (382/248), 12–84 y (29.5 ± 12.1 y), w/ and w/o, WURSS-24 (German version) | spontaneous speech, text reading | ComParE | w/ vs w/o (2-class) | DNNs | UAR: 0.67 |
For further details, the reader is referred to the original articles. Acc, accuracy; Appr., approach; ASD, autism spectrum disorder; BAI, Beck anxiety inventory; BDI-II, Beck depression inventory-II; BoAW, Bag-of-Audio-Words; CCC, concordance correlation coefficient; CNN, convolutional neural network; ComParE, computational paralinguistics challenge [representations]; CQCCs, constant-Q cepstral coefficients; CR, classification rate; Δ, first derivative; Δ2, second derivative; DNN, deep neural network; DS, deep spectrum [features]; E2E, end-to-end; eGeMAPS, extended Geneva minimalistic acoustic parameter set; GAN, generative adversarial network; GMM-UBM, Gaussian mixture model-universal background model; HCI, human-computer interaction; k-NN, k-nearest neighbor; LLDs, low level descriptors; LSTM, long short-term memory; MFCCs, Mel frequency cepstral coefficients; MMST, mini-mental-status-test; N/A, not available; NB, naive Bayes; PCI, parent-child interaction; PDD, pervasive developmental disorder; m, months; Perform., performance; PHQ-8, 8-item patient health questionnaire depression scale; RF, random forest; ρ, Spearman's Correlation Coefficient; RMSE, root mean square error; RNN, recurrent neural network; SLI, specific language impairment; SVM, support vector machine; SVR, support vector regressor; UAR, unweighted average recall; w/, with [corresponding disease]; w/o, without [corresponding disease]; y, years; ZFF, zero-frequency filtered.