| Literature DB >> 31978933 |
Carla Agurto1, Guillermo A Cecchi2, Raquel Norel1, Rachel Ostrand1, Matthew Kirkpatrick3, Matthew J Baggott4, Margaret C Wardle5, Harriet de Wit6, Gillinder Bedi7.
Abstract
The detection of changes in mental states such as those caused by psychoactive drugs relies on clinical assessments that are inherently subjective. Automated speech analysis may represent a novel method to detect objective markers, which could help improve the characterization of these mental states. In this study, we employed computer-extracted speech features from multiple domains (acoustic, semantic, and psycholinguistic) to assess mental states after controlled administration of 3,4-methylenedioxymethamphetamine (MDMA) and intranasal oxytocin. The training/validation set comprised within-participants data from 31 healthy adults who, over four sessions, were administered MDMA (0.75, 1.5 mg/kg), oxytocin (20 IU), and placebo in randomized, double-blind fashion. Participants completed two 5-min speech tasks during peak drug effects. Analyses included group-level comparisons of drug conditions and estimation of classification at the individual level within this dataset and on two independent datasets. Promising classification results were obtained to detect drug conditions, achieving cross-validated accuracies of up to 87% in training/validation and 92% in the independent datasets, suggesting that the detected patterns of speech variability are associated with drug consumption. Specifically, we found that oxytocin seems to be mostly driven by changes in emotion and prosody, which are mainly captured by acoustic features. In contrast, mental states driven by MDMA consumption appear to manifest in multiple domains of speech. Furthermore, we find that the experimental task has an effect on the speech response within these mental states, which can be attributed to presence or absence of an interaction with another individual. These results represent a proof-of-concept application of the potential of speech to provide an objective measurement of mental states elicited during intoxication.Entities:
Year: 2020 PMID: 31978933 PMCID: PMC7075895 DOI: 10.1038/s41386-020-0620-4
Source DB: PubMed Journal: Neuropsychopharmacology ISSN: 0893-133X Impact factor: 7.853
Demographics and substance use characteristics.
| Category | Training/validation dataset (N = 31) | ID1 ( | ID2 ( | |
|---|---|---|---|---|
| Demographics | Sex | 39% females | 50% females | 31% females |
| Age | 24.3 (4.4) | 24.6 (4.7) | 24.5 (5.4) | |
| Race | 100% Caucasian | 67% Caucasian, 11% African American, 3% Asian, 19% other/mixed race | 84% Caucasian, 8% African American, 8% other/mixed race | |
| Education in years | 14.7 (1.30) | 15.1(1.5) | – | |
| Current substance use | Alcohol drink per week | 8.7 (6.8) | 9.9 (10.6) | 7.4 (5.5) |
| Smoking past month | 32% | 22% | – | |
| Lifetime occasions recreational use | MDMA | 12.6 (9.3) | 10.2 (8.2) | 12.6 (19.1) |
| Cannabis (days in past month) | 7 (7.24) | 64% (more than 100 times) | 9.5 (10.8) |
Notes: Statistically significant difference was found for race category when the training/validation set was compared to ID1 (p-value of 4E−4) and ID2 (p-value of 2E−2).
Description of extracted speech features.
| Type of Feature | Category | List of all features |
|---|---|---|
| Acoustic | Voice stability | Jitter, shimmer, voice breaks |
| Noise measurements | Noise to harmonics ratio, harmonics to noise ratio, mean autocorrelation | |
| Pitch variations | Pitch distribution | |
| Spectral characterization | Max dB, max frequency, energy, slope | |
| Vowel space | Total area, ‘a-i-u’ area, Formants 1,2,3 distribution | |
| Mel-frequency cepstral coefficients (MFCC) | Sixteen MFCCs | |
| Temporal Features | Pause duration distribution, articulation and speech rates | |
| Semantic | LSA (21 Concepts of interest) | |
| Psycholinguistic | CPIDR | Ideas, total words, propositional density |
| Parts of speech | pronouns, nouns, verbs, determiners, indefinites and definites, I (singular first person noun) | |
| Lexical content | Honore’s statistic and Brunet’s index, content words, total words, empty words, type-token, frequency, and fillers. |
Univariate analysis: features ranked using the p-value of Wilcoxon paired t-test.
| Conditions | Monologues feature name | Description feature name | ||||
|---|---|---|---|---|---|---|
| Acoustic | Semantic | Psycholinguistic | Acoustic | Semantic | Psycholinguistic | |
| MDMA 0.75 vs. PBO | Pitcha | Think* | W-Empty | F1b | Sad* | Density |
| Pitche | Talk* | Determiners | Angle | Happy | W-Empty | |
| MFCC #13b | Feeling* | Indefinites | PauseDiste | Confidence | N-Nouns | |
| MDMA 1.5 vs. PBO | MFCC #12a | Talk | Frequency | Angle | Support | Ideas* |
| F3e | Love | Determiners | PauseDista | Think | Honores* | |
| PauseDistg | Peace | Definites | PauseDistb | Love | W – Total* | |
| MDMA 0.75 vs. MDMA1.5 | Pitcha | Support | Frequency | PauseDista | Sad | W-Empty * |
| MFCC #4b | Think | Determiners | PauseDistg | Love | W-Total | |
| MFCC #12a | Affect | Density | PauseDistb | Rapport | W-content | |
| OT vs. PBO | F2c* | Emotion | Density | PauseDista | Support | Definites |
| F2b* | Anxiety | N-Nouns | Shimmerh | Peace | Determiners | |
| F2e | Talk | N-Verbs | Unvoicedi | Feeling | W-empty | |
Notes: Sub-index in the name of the feature indicate the descriptor: (a) median, (b) IQR, (c) kurtosis, (d) skewness, (e) percentile 5th, (f) percentile 50, (g) percentile 95th, (h) local; (i) frames. W refers to number of words. * indicates that the test passed FDR correction. PBO = placebo; MDMA 0.75 = 3,4-methylenedioxymethamphetamine 0.75 mg/kg; MDMA 1.5 = 3,4-methylenedioxymethamphetamine 1.5 mg/kg; OT = oxytocin 20 international units.
Fig. 1a Partial correlations between the statistically significant features found in Table 2 identified as a function of drug condition and task (Monologue presented in the top row and Description in the bottom row). b Multidimensional scaling representation of the partial correlations in Fig. 1a. Observe the horizontal axis differentiating the monologue and description task for each drug condition, and the vertical axis differentiating low and high MDMA conditions for each task. Moreover, the dashed line contains exclusively all of the monologue tasks, stressing the consistency of the representation with the experimental conditions.
Fig. 2Classification accuracy by task, feature type, and binary condition comparison.
The number of features obtained after feature selection is specified at the top of each bar. The symbols at the bottom of the bar indicate with which algorithm the maximum accuracy was achieved: o Linear SVM, * Random Forest, and + Nearest neighbors. The types of features are indicated as follows: A = Acoustic features only; B = Semantic features only; C = Psycholinguistic/syntactic features only; D = Combined features. PBO = placebo; MDMA 0.75 = 3,4-methylenedioxymethamphetamine 0.75 mg/kg; MDMA 1.5 = 3,4-methylenedioxymethamphetamine 1.5 mg/kg; OT = oxytocin 20 international units. Letters underlined in black indicate that at least one of the models achieved classification higher than chance at p < 0.05, underlined in red at p < 0.001.
Fig. 3Weight representation of combined features found by optimal linear classification models (2 tasks x 4 conditions).
Weights are normalized to represent the relevant contribution of each feature as percentages. Two heatmaps are shown corresponding to both speech tasks analyzed in this study (left: monologue, right: description). Features that contributed less than 10% were not displayed here. First letter in the feature name indicates the type of feature: A = Acoustic, S = semantic, P = Psycholinguistic.
Fig. 4Classification accuracy by feature type and binary condition comparison in the validation datasets.
PBO = placebo; MDMA 0.75 = 3,4-methylenedioxymethamphetamine 0.75 mg/kg; MDMA 1.5 = 3,4-methylenedioxymethamphetamine 1.5 mg/kg. Letters underlined in black indicate that at least one of the models achieved classification higher than chance at p < 0.05, underlined in red at p < 0.002 (note these are pessimistic significance estimates based on multiple differences between the training and validation datasets).