Lasse Hansen1,2,3,4, Yan-Ping Zhang4, Detlef Wolf4, Konstantinos Sechidis5, Nicolai Ladegaard1,2, Riccardo Fusaroli6,7. 1. Department of Clinical Medicine, Aarhus University, Aarhus, Denmark. 2. Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark. 3. Center for Humanities Computing Aarhus, Aarhus University, Aarhus, Denmark. 4. Roche Pharmaceutical Research & Early Development Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland. 5. Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland. 6. Cognitive Science, School of Communication and Culture, Aarhus University, Aarhus, Denmark. 7. The Interacting Minds Centre, Aarhus University, Aarhus, Denmark.
Abstract
OBJECTIVE: Affective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: We train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language. METHODS: A Mixture of Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data. RESULTS: The model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 s of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions. CONCLUSION: A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.
OBJECTIVE: Affective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: We train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language. METHODS: A Mixture of Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data. RESULTS: The model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 s of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions. CONCLUSION: A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.