Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Literature DB >> 35402977

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Apiwat Ditthapron¹, Emmanuel O Agu¹, Adam C Lammert².

Abstract

Goal: Smartphones can be used to passively assess and monitor patients' speech impairments caused by ailments such as Parkinson's disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer's disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers' speech in audio recordings with two or more speakers' voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS).
Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual's speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings.
Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.

Entities: Chemical

Keywords: Impact Statement—The proposed Deep-MASKS mitigates cross-talk in speech encoded as MFCC features, which are widely utilized to preserve voice privacy in passive health assessment and other speech applications on smartphones; Mel-Frequency Cepstrum Coefficients (MFCCs); overlapped speech; speaker representation; speech separation

Year: 2021 PMID： 35402977 PMCID： PMC8940203 DOI： 10.1109/OJEMB.2021.3063994

Source DB: PubMed Journal: IEEE Open J Eng Med Biol ISSN： 2644-1276

Keyword Cloud
References

10 in total

1. Classification of speech dysfluencies using LPC based parameterization techniques.

Authors: M Hariharan; Lim Sin Chee; Ooi Chia Ai; Sazali Yaacob
Journal: J Med Syst Date: 2011-01-20 Impact factor: 4.460

2. Speech impairment in a large sample of patients with Parkinson's disease.

Authors: A K Ho; R Iansek; C Marigliani; J L Bradshaw; S Gates
Journal: Behav Neurol Date: 1999-01-01 Impact factor: 3.342

3. Smartphones Offer New Opportunities in Clinical Voice Research.

Authors: C Manfredi; J Lebacq; G Cantarella; J Schoentgen; S Orlandi; A Bandini; P H DeJonckere
Journal: J Voice Date: 2016-04-07 Impact factor: 2.009

4. A voice-based automated system for PTSD screening and monitoring.

Authors: Roger Xu; Gang Mei; Guangfan Zhang; Pan Gao; Timothy Judkins; Michael Cannizzaro; Jiang Li
Journal: Stud Health Technol Inform Date: 2012

5. An Online Telepractice Model for the Prevention of Voice Disorders in Vocally Healthy Student Teachers Evaluated by a Smartphone Application.

Authors: Elizabeth U Grillo
Journal: Perspect ASHA Spec Interest Groups Date: 2017-06-30

10. Atypical Repetition in Daily Conversation on Different Days for Detecting Alzheimer Disease: Evaluation of Phone-Call Data From Regular Monitoring Service.

Authors: Yasunori Yamada; Kaoru Shinkawa; Keita Shimmei
Journal: JMIR Ment Health Date: 2020-01-12

10 in total

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

1. Classification of speech dysfluencies using LPC based parameterization techniques.

2. Speech impairment in a large sample of patients with Parkinson's disease.

3. Smartphones Offer New Opportunities in Clinical Voice Research.

4. A voice-based automated system for PTSD screening and monitoring.

5. An Online Telepractice Model for the Prevention of Voice Disorders in Vocally Healthy Student Teachers Evaluated by a Smartphone Application.

Review 6. Evidence-based clinical voice assessment: a systematic review.

7. Smartphone Allows Capture of Speech Abnormalities Associated With High Risk of Developing Parkinson's Disease.

Review 8. Connected Speech in Neurodegenerative Language Disorders: A Review.

9. Enhancement of Neurocognitive Assessments Using Smartphone Capabilities: Systematic Review.

10. Atypical Repetition in Daily Conversation on Different Days for Detecting Alzheimer Disease: Evaluation of Phone-Call Data From Regular Monitoring Service.