| Literature DB >> 32128436 |
Daniel M Low1,2, Kate H Bentley3,4, Satrajit S Ghosh1,4,5.
Abstract
OBJECTIVE: There are many barriers to accessing mental health assessments including cost and stigma. Even when individuals receive professional care, assessments are intermittent and may be limited partly due to the episodic nature of psychiatric symptoms. Therefore, machine-learning technology using speech samples obtained in the clinic or remotely could one day be a biomarker to improve diagnosis and treatment. To date, reviews have only focused on using acoustic features from speech to detect depression and schizophrenia. Here, we present the first systematic review of studies using speech for automated assessments across a broader range of psychiatric disorders.Entities:
Keywords: machine learning; mental health; psychiatry; speech; voice
Year: 2020 PMID: 32128436 PMCID: PMC7042657 DOI: 10.1002/lio2.354
Source DB: PubMed Journal: Laryngoscope Investig Otolaryngol ISSN: 2378-8038
Figure 1How machine learning works
Advantages and disadvantages of different types of psychiatric assessments
| Measurement | Advantages | Disadvantages |
|---|---|---|
|
|
Clinician experience Tests have populational norms Clinicians can ask for further assessments Clinicians can offer treatment pathway Questionnaire items are interpretable |
Costs of clinic and clinician Time‐consuming Requires extensive training Normally assessed sporadically in clinic Questionnaires often use ordinal and vague variables (eg, never, sometimes) Prone to clinician's biases: expertise, culture and race, Patient's memory distortions Patients' perceived barriers to pursuing treatment (see main text) Inter‐rater reliability can be low Cannot capture complex features |
|
|
Potentially free Less time‐consuming than clinician assessments No clinical training required Can be administered anywhere Tests have populational norms |
More narrow than clinical evaluation Biased by patient's voluntary responses Generally cannot offer personalized treatment Cannot capture complex features Assessments have to be created and validated based on observation of symptoms |
|
|
Potentially free Potentially instantaneous Can be done remotely, continuously, and naturalistically (app prompts) Can incorporate larger and more varied samples than clinic samples Avoids human biases and single rater Can capture multimodal features (audio, video, text, accelerometer) Ratio and continuous variables Can capture complex features due to linear and nonlinear multivariate models, and find new structure in data Allows scalability because models can be fast and automated |
Most models have not been validated through clinical trials thus far Needs large amounts of data Many sources of variation in the signal, and their relative contributions are poorly understood Models can be affected by biases in data (eg, race, age, noise) Difficult to incorporate expert priors (eg, body language, clinical history) into models Assessment does not automatically lead to treatment or intervention options |
Figure 2PRISMA flow diagram of study inclusion and exclusion criteria for the systematic review
Summary of systematic review results
| Disorder | Articles % (N) | Median sample size (range) | Clinical assessment % (N) | Predictive models % (N) |
|---|---|---|---|---|
| Depression | 49.6 (63) | 123 (11‐1688) | 38 (24) | 87 (55) |
| PTSD | 7.9 (10) | 41 (10‐253) | 70 (7) | 80 (8) |
| Schizophrenia | 18.1 (23) | 44 (18‐195) | 86 (20) | 13 (3) |
| Anxiety | 4.7 (6) | 45 (20‐104) | 50 (3) | 0 (0) |
| Bipolar | 16.5 (21) | 39 (5‐89) | 90 (19) | 66 (14) |
| Bulimia | 0.8 (1) | 22 (‐) | 100 (1) | 0 (0) |
| Anorexia | 1.6 (2) | 107 (66‐148) | 100 (2) | 0 (0) |
| OCD | 0.8 (1) | 35 (‐) | 100 (1) | 0 (0) |
Note: The distribution of the 127 studies that matched the inclusion criteria is described in the Articles column. Within each disorder, the following characteristics are described: median sample size (case group plus control group), proportion of clinical diagnosis vs self‐report measures, and proportion of predictive vs null‐hypothesis testing studies.
Abbreviations: OCD, obsessive‐compulsive disorder; PTSD, post‐traumatic stress disorder.
Figure 3Synthesis of null‐hypothesis testing studies across psychiatric disorders. Acoustic features are color‐coded on the y‐axis into source features from the vocal folds (blue), filter features from the vocal tract (red), spectral features (purple), and prosodic or melodic features (black).56 Features that are significantly higher in a psychiatric population than healthy controls or that correlate positively with the severity of a disorder receive a score of 1 (red), features that are lower or correlate negatively receive a score of −1 (blue), and nonsignificant or contradicting findings receive a score of 0 (gray). The mean is computed for features with multiple results. The cell size is weighed by the amount of studies. Features not studied in a disorder are blank. Anxiety, social or general anxiety disorder; OCD, obsessive‐compulsive disorder; PTSD, post‐traumatic stress disorder
Figure 4Glossary of acoustic features. Classification based on References 29 and 56. For further discussion, see the Geneva Minimalistic Acoustic Parameter Set (GeMAPS)57 and Section 4.3.3
Figure 5Nested bootstrapping for more robust performance estimation on small datasets and hyperparameter tuning. Example uses RMSE as performance metric on 60 bootstrapping samples and 5‐fold cross‐validation. K‐fold cross‐validation assumes large sample sizes and on small datasets may return a biased estimate of the underlying performance distribution. RMSE, root mean squared error
Advantages of different types of speech‐eliciting tasks
| Task and examples | Advantages | |
|---|---|---|
| Constrained | Sustained vowel Maximum phonation time | Optimal for measuring source and respiration features Captures muscle weakness and aspects of motor control |
| Repeating “PATAKA” | Tests diadochokinetic rate, | |
| Counting | More control over acoustic patterns using a common vocabulary | |
| Reading Emotion‐evoking sentences Rainbow passage The Grandfather passage |
More control over evoked emotions Contains every sound in English and is representative of normal speech Paragraph used to assess communication disorders | |
| Free speech | Monologue: describing, retelling happy, or traumatic memory | More ecologically valid than reading |
| Dialogue: Semi‐structured interviews Phone conversations | Social dynamics (turn taking, intimacy) Already done in many clinics By not recording other caller, no need for diarization. Smartphones provide accelerometer data | |