| Literature DB >> 35005616 |
Alireza Goudarzi1, Gemma Moya-Galé2.
Abstract
The sophistication of artificial intelligence (AI) technologies has significantly advanced in the past decade. However, the observed unpredictability and variability of AI behavior in noisy signals is still underexplored and represents a challenge when trying to generalize AI behavior to real-life environments, especially for people with a speech disorder, who already experience reduced speech intelligibility. In the context of developing assistive technology for people with Parkinson's disease using automatic speech recognition (ASR), this pilot study reports on the performance of Google Cloud speech-to-text technology with dysarthric and healthy speech in the presence of multi-talker babble noise at different intensity levels. Despite sensitivities and shortcomings, it is possible to control the performance of these systems with current tools in order to measure speech intelligibility in real-life conditions.Entities:
Keywords: Parkinson's disease; automatic speech recognition; dysarthria; intelligibility; multi-talker babble noise
Year: 2021 PMID: 35005616 PMCID: PMC8727902 DOI: 10.3389/frai.2021.809321
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Participant information, including age, sex, years since PD diagnosis and dysarthria severity.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| P1 | PD | 81 | M | 8 | Moderate |
| P2 | PD | 71 | F | 4 | Mild |
| P3 | PD | 76 | F | 9 | Mild |
| P4 | PD | 48 | M | 7 | Mild |
| P5 | PD | 79 | M | 10 | Moderate |
| P6 | HC | 66 | M | ||
| P7 | HC | 71 | F | ||
| P8 | HC | 71 | M | ||
| P9 | HC | 68 | F | ||
| P10 | HC | 40 | M |
PD, Parkinson's disease; HC, healthy control; M, male; F, female.
Automatic speech recognition accuracy scores at a sampling rate of 8 kHz and a sampling rate at 48 kHz.
|
| |||
|---|---|---|---|
|
|
|
| |
| Mean accuracy (%) | 90 | 92 | 91 |
| Standard deviation | 14 | 13 | 13 |
|
| |||
| Mean accuracy (%) | 96 | 100 | 98 |
| Standard deviation | 8 | 0 | 6 |
Figure 1Change in accuracy of speech recognition with no hinting for speech embedded in 10-talker babble noise at different SNR levels.
Figure 2Change in accuracy of speech recognition with hinting for speech embedded in 10-talker babble noise at different SNR levels.