| Literature DB >> 35853912 |
Amir H Nikzad1, Yan Cong2, Sarah Berretta2, Katrin Hänsel3, Sunghye Cho4, Sameer Pradhan4, Leily Behbehani2, Danielle D DeSouza5,6, Mark Y Liberman4, Sunny X Tang7.
Abstract
Graphical representations of speech generate powerful computational measures related to psychosis. Previous studies have mostly relied on structural relations between words as the basis of graph formation, i.e., connecting each word to the next in a sequence of words. Here, we introduced a method of graph formation grounded in semantic relationships by identifying elements that act upon each other (action relation) and the contents of those actions (predication relation). Speech from picture descriptions and open-ended narrative tasks were collected from a cross-diagnostic group of healthy volunteers and people with psychotic or non-psychotic disorders. Recordings were transcribed and underwent automated language processing, including semantic role labeling to identify action and predication relations. Structural and semantic graph features were computed using static and dynamic (moving-window) techniques. Compared to structural graphs, semantic graphs were more strongly correlated with dimensional psychosis symptoms. Dynamic features also outperformed static features, and samples from picture descriptions yielded larger effect sizes than narrative responses for psychosis diagnoses and symptom dimensions. Overall, semantic graphs captured unique and clinically meaningful information about psychosis and related symptom dimensions. These features, particularly when derived from semi-structured tasks using dynamic measurement, are meaningful additions to the repertoire of computational linguistic methods in psychiatry.Entities:
Year: 2022 PMID: 35853912 PMCID: PMC9261087 DOI: 10.1038/s41537-022-00263-7
Source DB: PubMed Journal: Schizophrenia (Heidelb) ISSN: 2754-6993
Fig. 1Speech graph methodology.
a Structural and semantic graph representations of a given text are illustrated. The structural representation is produced based on sequential relations between lemmatized content words (e.g., I→see→cookie→jar, etc.). The semantic representation is produced by connecting elements that act upon each other (e.g., I→chair; kid→cookie jar), and linking verb predicates to their arguments (e.g., see→I; see→chair; grab→kid; grab→cookie jar). b Dynamic graph features are computed by sliding a window of fixed length throughout each sample to produce n instances of graph representations. Subsequently, each graph feature is calculated as the mean value of n features each belonging to a particular instance. Successive sequential graphs progress one word at a time in windows of 30 words, and semantic graphs slide one utterance at a time in windows of three utterances.
Demographic characteristics of the participants.
| Overall | PS+ | PS− | ||
|---|---|---|---|---|
| 205 | 81 (40%) | 124 (60%) | ||
| Age | 25.9 ± 6.3 | 25.2 ± 5.9 | 0.20 | |
| Sex | <0.01** | |||
| Female | 118 (58%) | 35 | 83 | |
| Male | 87 (42%) | 46 | 41 | |
| Race | <0.01** | |||
| African American | 53 (26%) | 31 | 22 | |
| Asian | 24 (12%) | 11 | 13 | |
| Caucasian | 89 (43%) | 20 | 69 | |
| Other | 39 (19%) | 19 | 20 | |
| Education | 14.1 ± 2 | 15.5 ± 1.9 | <0.001*** | |
| Caregiver education | 14.6 ± 3 | 15.3 ± 2.3 | 0.07 | |
| BPRS total score | 42 ± 14 | 22 ± 4 | <0.001*** | |
| SANS total score | 30 ± 15 | 4 ± 4.5 | <0.001*** | |
| TLC total score | 16 ± 14 | 4 ± 5 | <0.001*** |
The significant associations are tagged by asterisks (*<0.05, **<0.01, and ***<0.001). P-values were calculated using Pearson’s chi-squared test (for sex and race), and the Mann–Whitney U test (for age, education, caregiver education, and BPRS, SANS, and TLC total scores).
PS+ individuals with psychosis spectrum disorders, PS− individuals without psychosis spectrum disorders, BPRS Brief Psychiatric Rating Scale, SANS scale for assessment of negative symptoms, TLC scale for the assessment of thought, language, and communication.
Survived graph features in sequential VIF comparisons in picture description task.
Columns within each segment accommodate survived graph features. VIF comparison was conducted and features of the highest VIF were excluded successively until a set of features all showing VIF < 5 was attained. Survived features were then passed to the next column on right for another comparison on a more integrated level. 1. Domain column shows results of intra-domain comparisons. 2. Type column presents the features integrated on graph-type level, i.e., semantic vs structural graph features. 3. Task column combines all graph features per each task. For the dynamic semantic graph, features belonging to three domains of size, connectedness and organization remained in the final set. Graph features of different methods are color coded. VIF comparisons for open-ended narrative task is provided in Supplementary Table 4. More details on graph features are available in Supplementary Table 3.
S_AP static action-predication graph feature, D_AP dynamic action-predication graph feature, S_SEQ static sequential graph feature, D_SEQ dynamic sequential graph feature, NN number of nodes, NE number of edges, diameter graph diameter, ASPL average shortest path length, AWD average weighted degree, density graph density, LSCC size of largest strongly connected component, LSCCZ z-score of LSCC compared to 1000 random graphs, ASPLZ z-score of ASPL compared to 1000 random graphs.
Relationships between structural and semantic graph features and psychosis.
| Graph feature | Task | Static action-predication graph (S_AP) | Dynamic action-predication graph (D_AP) | Static sequential graph (S_Seq) | Dynamic sequential graph (D_Seq) |
|---|---|---|---|---|---|
| Number of nodes (NN) | Picture | 0.17* | 0.27*** | 0.20* | |
| Open-ended | 0.13 | 0.21* | 0.14 | ||
| Number of edges (NE) | Picture | 0.19* | 0.17* | 0.27** | |
| Open-ended | 0.12 | 0.19* | 0.11 | ||
| Diameter | Picture | 0.16 | 0.23** | 0.24** | 0.30*** |
| Open-ended | 0.00 | 0.03 | 0.28*** | 0.25** | |
| Average shortest path length (ASPL) | Picture | 0.11 | 0.18* | 0.24** | 0.28** |
| Open-ended | 0.08 | 0.17* | 0.27** | 0.26** | |
| Average weighted degree (AWD) | Picture | 0.15 | 0.22** | −0.05 | |
| Open-ended | −0.03 | 0.06 | −0.13 | −0.29*** | |
| Density | Picture | −0.06 | −0.15 | −0.26** | |
| Open-ended | −0.20* | −0.23** | −0.21* | ||
| Size of largest strongly connected component (LSCC) | Picture | 0.15 | 0.24** | 0.20* | 0.18* |
| Open-ended | 0.08 | 0.13 | 0.14 | 0.29*** | |
| LSCC z-score (LSCCZ) | Picture | -0.22** | 0.28*** | 0.25** | |
| Open-ended | −0.09 | −0.13 | 0.26** | 0.31*** | |
| ASPL z-score (ASPLZ) | Picture | −0.22** | −0.02 | 0.25** | |
| Open-ended | −0.11 | −0.14 | 0.01 | 0.24** | |
Graph features are categorized into three domains of size, connectedness, and psychosis. For each feature rank biserial correlation coefficient (RBC) is reported for picture description task (Picture) and open-ended narrative (Open-ended) as a measure of effect size. The significant associations are tagged by asterisks (*<0.05, **<0.01, and ***<0.001). Associations survived in Bonferroni correction are bolded (alpha = 0.0003). Semantic graphs showed a preference for picture description tasks with higher effect sizes and more significant associations. Structural graphs showed a task-independent relation with psychosis. In overall, dynamic graph features outperformed static counterparts in both graph types.
Fig. 2Correlations between structural and semantic graph features and dimensional clinical characteristics.
a Heatmap representations of the Spearman’s correlation coefficient for structural and semantic graph features and clinical measures in picture description and open-ended narrative tasks across all participants. Significant relationships with uncorrected p values < 0.05 are shaded based on their effect sizes (Spearman’s rho). Correlations surviving Bonferroni correction are starred. Bar plots of correlation coefficients per clinical dimension are available in Supplementary Fig. 2. b Network representation of significant relationships between graph features and clinical measures. Multi-collinearities were separately handled for structural and semantic graph features by stepwise comparison of variance inflation factors and feature exclusion. Multiple comparisons were accounted for using Bonferroni correction. S_AP static action-predication graph feature, D_AP dynamic action-predication graph feature, S_SEQ static sequential graph feature, D_SEQ dynamic sequential graph feature, NN number of nodes, NE number of edges, diameter graph diameter, ASPL average shortest path length, AWD average weighted degree, density graph density, LSCC size of largest strongly connected component, LSCCZ z-score of LSCC compared to 1000 random graphs, ASPLZ z-score of ASPL compared to 1000 random graphs. More details on graph features are available in Supplementary Table 3.