Literature DB >> 35316132

Linguistic measures of psychological distance track symptom levels and treatment outcomes in a large set of psychotherapy transcripts.

Erik C Nook¹, Thomas D Hull^2,3, Matthew K Nock¹, Leah H Somerville¹.

Abstract

SignificanceUsing language to "distance" ourselves from distressing situations (i.e., by talking less about ourselves and the present moment) can help us manage emotions. Here, we translate this basic research to discover that such "linguistic distancing" is a replicable measure of mental health in a large set of therapy transcripts (N = 6,229). Additionally, clustering techniques showed that language alone could identify participants who differed on both symptom severity and treatment outcomes. These findings lay the foundation for 1) tools that can rapidly identify people in need of psychological services based on language alone and 2) linguistic interventions that can improve mental health.

Entities: Chemical

Keywords: internalizing symptoms; language; linguistic distance; psychotherapy; treatment outcomes

Mesh：

Year: 2022 PMID： 35316132 PMCID： PMC9060508 DOI： 10.1073/pnas.2114737119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

Psychopathology is both common and costly: Approximately 50% of Americans will experience a psychological disorder, and psychopathology accounts for 7% of the global burden of disease (1, 2). Scholars agree that the current mental healthcare system is insufficient for meeting this demand, due to a number of factors, including there being too few therapists, therapists being hard to reach, and most therapies having only moderate efficacy (3–5). As such, there is a dire need for tools that increase treatment accessibility and efficacy. Clinical scientists have called for technological innovations that could achieve these goals (6–9), leading to a wave of technology-assisted psychotherapies in which therapists treat clients via text messaging (10–12). These platforms increase the reach of any given therapist and can effectively treat internalizing disorders like anxiety and depression at a fraction of the cost of in-person treatment (11–14). Although these technological innovations provide a vital step toward addressing the burden of global mental health, there remains a need for tools that can detect individuals who may need them, as well as techniques that improve the efficacy of existing treatments. Language represents a prime entry point for developing these detection and intervention tools at a large scale, given the facts that 1) verbal and digital conversation is ubiquitous in human society and 2) psychotherapeutic interventions are essentially a set of conversations. In this study, we leverage basic understanding of the relationships between language and emotion to demonstrate that a linguistic measure of healthy emotion regulation tracks psychological symptoms in a large set of psychotherapy transcripts. Every day, people use words to identify and communicate about their emotional experiences (15–17). Difficulties with emotion regulation [i.e., the ways in which people modify or manage their emotional experiences (18, 19)] are robustly connected to psychopathology (20–22), and successful therapy operates through changing patients’ emotion regulation habits (23–26). Consequently, finding a linguistic signature of poor emotion regulation could measure levels of psychopathology and their remission over treatment at a large scale. Interestingly, a growing number of studies show that using language to increase psychological distance [i.e., “taking a step back” and seeing challenging situations as separated from oneself (27, 28)] can facilitate effective emotion regulation (29–33). Increasing distance along “social” and “temporal” dimensions by reducing use of first-person singular pronouns (e.g., “I”) and present-tense verbs (e.g., “feel”) both decreases the intensity of negative affect and predicts the success of emotion regulation (29, 30). These studies provide compelling evidence that linguistic distance predicts emotion regulation success in the laboratory, and here we ask about its clinical relevance in real-world therapeutic conversations. Individuals who fail to gain distance from their negative emotions may struggle to effectively regulate these emotions, leading to symptoms of internalizing disorders like anxiety and depression. If so, linguistic markers of low psychological distance should track symptoms of psychopathology. Preliminary results align with this notion, as greater use of first-person singular pronouns (indicating less psychological distance along the social dimension) is associated with clinical problems like depression, anxiety, PTSD, and suicide (34–40). However, only a handful of studies have investigated this relationship within the context of treatment itself, and they have returned mixed results (41–44). Consequently, there is a need for a large, systematic test of whether linguistic distance is longitudinally associated with psychological symptoms in naturalistic exchanges during psychotherapy. Here, we investigate relations between linguistic distance and psychotherapy outcomes in a large dataset (n = 6,229 participants) from a message-based psychotherapy service (Talkspace). Talkspace clients and their licensed therapists communicate primarily via text messaging, and clients complete self-report inventories of anxiety and depression symptoms every 3 wk. In this study, we examined a programmatic set of research questions that, together, test whether a client’s linguistic distance tracks levels of internalizing symptoms both between and within individuals and whether it might serve as a mediating mechanism of treatment outcomes. We initially conducted analyses in an exploratory subset of the data (n = 3,727). We then preregistered our hypothesized model and analytic plan (https://osf.io/r5gn2) and replicated all analyses in a holdout validation dataset (n = 2,500). Our analyses investigated 1) simple linear relations between time in treatment, symptoms, and linguistic distance; 2) whether increasing linguistic distance mediated reduced symptoms over time in treatment; and 3) whether clusters of clients defined purely on the trajectory of their linguistic distance over treatment differed in their symptom severity or treatment outcomes. Establishing replicable relations between linguistic distance and symptoms in such large-scale real-world data lays a foundation for research that can use language to both detect people at risk for psychopathology and enhance the efficacy of psychotherapy, ultimately reducing the global burden of psychopathology.

Results

Are Internalizing Symptoms, Linguistic Distance, and Time in Treatment Related?

Internalizing symptoms over time in treatment.

Mixed-effects regressions showed that internalizing symptoms fell over the course of treatment with a medium effect size in both the exploratory and validation datasets; βe = −0.42, pe < 0.001, R2e = 0.37, βv = −0.43, pv < 0.001, and R2v = 0.40 (Fig. 1). The subscripts “e” and “v” are used to indicate that statistics are from the exploratory and validation datasets, respectively.

Fig. 1.

Effect size plot for mixed-effects regressions depicting relations between internalizing symptoms, linguistic distance, and time in treatment within the exploratory (black) and validation (gray) datasets. All 95% CIs do not include zero, indicating significant associations. Ss = Subjects.

Linguistic distance over time.

Linguistic distance (i.e., client’s use of verbs and pronouns that were distanced from themselves and the present moment) increased over time in therapy in both the exploratory and validation datasets; βe = 0.07, pe < 0.001, R2e = 0.02, βv = 0.08, pv < 0.001, and R2v = 0.02 (Fig. 1). A small effect size indicated that this was a subtle linguistic shift over the course of therapy.

Linguistic distance and internalizing symptoms.

We observed small but significant relationships between internalizing symptoms and linguistic distance in the exploratory and validation datasets; βe = −0.12, pe < 0.001, R2e = 0.05, βv = −0.14, pv < 0.001, and R2v = 0.06. As hypothesized, worse internalizing symptoms were associated with less distanced language. We next decomposed variance in linguistic distance into within-person and between-person components (Fig. 1). Mixed-effects models revealed that internalizing symptoms were significantly associated with both between-person, βe = −0.20, pe < 0.001, R2e = 0.03, βv = −0.21, pv < 0.001, and R2v = 0.04, and within-person, βe = −0.04, pe < 0.001, R2e = 0.02, βv = −0.06, pv < 0.001, and R2v = 0.005, variance in linguistic distance. Effects ranged from very small to small, and effects were larger for between-person than within-person relationships.

Does Linguistic Distance Mediate Symptom Reduction?

Mediation analyses implemented in a Bayesian framework produced mixed support for the notion that within-person increases in linguistic distance mediate decreased symptoms across time. The mediation model was not significant in the exploratory dataset (Fig. 2), but it was significant in the validation dataset (Fig. 2). The very small proportion mediated in the validation dataset (0.5%) indicates that the potential mediating role of within-person fluctuations in internalizing symptoms is extremely small.

Fig. 2.

Bayesian mediation models testing whether within-person variance in client linguistic distance mediated changes in internalizing symptoms across time in the (A) exploratory and (B) validation datasets. The 95% CR for the indirect effect included zero for the exploratory but not the validation dataset, providing mixed evidence for the proposed mediation model. Median regression estimates are reported from Bayesian regression models, with their corresponding 95% CRs.

Can Symptoms Be Inferred from Linguistic Patterns Alone?

We used finite mixture regression analyses to cluster clients into groups that shared similar trajectories of linguistic distance over treatment. We then found that these groups—defined based on language alone—significantly differed in both treatment outcomes and symptom severity. We first used Akaike Information Criterion (AIC) values to establish that a four-cluster mixture regression solution provided the best fit to the data for both the exploratory dataset (AICe-4-cluster = 3,562,482; AICe-3-cluster = 3,567,742; AICe-2-cluster = 3,574,110; AICe-1-cluster = 3,593,658) and the validation dataset (AICv-4-cluster = 2,388,523; AICv-3-cluster = 2,392,075; AICv-2-cluster = 2,395,775; AICv-1-cluster = 2,411,255). Clusters were remarkably similar across the exploratory and validation datasets, even though they were defined completely independently (Fig. 3 ). Linguistic distance for clusters 1e and 1v started high and rose over therapy (although not significantly for the validation dataset); βe = 0.08, pe = 0.002, R2e = 0.02, Ne = 569, βv = 0.06, pv = 0.139, R2v = 0.01, and Nv = 270. Clusters 2e and 2v started slightly less high but rose strongly over therapy; βe = 0.13, pe < 0.001, R2e = 0.05, Ne = 1,277, βv = 0.14, pv < 0.001, R2v = 0.06, and Nv = 722. Clusters 3e and 3v started low and remained low over therapy; βe = 0.01, pe = 0.610, R2e = 0.0004, Ne = 735, βv = 0.02, pv = 0.501, R2v = 0.001, and Nv = 654. Finally, clusters 4e and 4v started low and rose over therapy; βe = 0.06, pe = 0.001, R2e = 0.01, Ne = 1,146, βv = 0.09, pv < 0.001, R2v = 0.02, and Nv = 854.*

Fig. 3.

Results of finite mixture regressions clustering participants based on the linguistic distance in their texts over the course of therapy for the (A–D) exploratory and (E–G) validation datasets. (A and E) Four clusters were identified, which differed in starting and ending linguistic distance as well as the slope of changes in linguistic distance over the course of treatment. (B and F) Clusters differed significantly in baseline internalizing symptoms such that the clusters that started with higher linguistic distance (i.e., clusters 1 and 2) had lower symptoms at the start of treatment than those that started with lower linguistic distance (i.e., clusters 3 and 4). (C and G) Clusters differed significantly in final internalizing symptoms, with clusters 1 and 2 also reporting significantly fewer symptoms than clusters 3 and 4. (D and H) Estimated marginal means of changes in internalizing symptoms across clusters (accounting for baseline symptom levels). Cluster 2, which had both a high starting level of linguistic distance and the strongest increase over time, achieved the best treatment response, significantly stronger than clusters 3 and 4. ***P < 0.001, **P < 0.01, *P < 0.05. A one-way ANOVA revealed that these clusters—defined solely based on trajectories of linguistic distance over time—differed significantly in baseline internalizing symptom scores in both the exploratory and validation dataset; Fe (3, 3,723) = 14.31, pe < 0.001, η2e = 0.01, Fv (3, 2,496) = 9.18, pv < 0.001, and η2v = 0.01 (Fig. 3 ). Pairwise comparisons of conditions revealed remarkably consistent results in both the validation and exploratory datasets. Post hoc Tukey comparisons revealed that baseline internalizing symptom levels were significantly lower for clusters 1 and 2 than clusters 3 and 4; pes < 0.001 and pvs < 0.05. Clusters 1 and 2 did not differ in their baseline symptom levels, and neither did clusters 3 and 4; pes > 0.175 and pvs < 0.641. As such, clusters of clients who started treatment with higher linguistic distance had lower internalizing symptoms at baseline. Analyses of final internalizing symptom scores showed similar patterns. Clusters differed significantly in their final symptom levels in both the exploratory and validation datasets; Fe (3, 3,723) = 23.18, pe < 0.001, η2e = 0.02, Fv (3, 2,496) = 13.94, pv < 0.001, and η2v = 0.02 (Fig. 3 ). Tukey post hoc comparisons indicated that final internalizing symptom scores were significantly lower for clusters 1 and 2 compared to clusters 3 and 4; pes < 0.001 and pvs < 0.006. Again, clusters 1 and 2 did not differ from each other, and neither did clusters 3 and 4; pes > 0.848 and pvs < 0.848. As such, clusters of clients with higher linguistic distance had less severe internalizing symptoms. Finally, one-way analyses of covariance (ANCOVAs) showed that these clusters also differed in how strongly their internalizing symptoms changed across therapy (i.e., final – baseline internalizing symptom scores, controlling for baseline scores); Fe (3, 3,722) = 5.49, pe < 0.001, ηp2e = 0.004, Fv (3, 2,495) = 2.85, pv = 0.036, and ηp2v = 0.003 (Fig. 3 ). Post hoc Tukey tests revealed that symptoms fell more strongly across treatment for cluster 2 than for clusters 3 and 4; pes < 0.001 and pvs < 0.02. As such, the cluster with the strongest increase in linguistic distance across treatment also had the greatest treatment response. In the validation dataset only, cluster 3v also showed significantly less treatment gain than cluster 1v, pv = 0.029, meaning that the cluster that did not increase in linguistic distance over treatment faired poorest. No other pairwise comparisons for change in internalizing symptoms reached significance; pes >0.07 and pvs > 0.31. As such, clustering participants based on trajectories of linguistic distance revealed replicable signatures of participants who differed in both their overall symptom severity and their treatment response. Analyses of temporal and social components of the linguistic distancing measure revealed that temporal distance clusters differed in treatment outcomes but not baseline symptom levels, whereas social distance clusters differed in chronic symptom levels but not treatment outcomes ().

Discussion

Given the immense burden of mental illness, there is a dire need for methods that can detect psychological symptoms and enhance current treatments at a large scale. Due to the central role of language in psychotherapy, we tested whether a linguistic marker of psychological distance could be used as an indicator of a client’s symptom severity throughout treatment. Using a large sample of psychotherapy transcripts, we found replicable evidence that linguistic distance indeed tracks internalizing symptoms at both within- and between-person levels and that clustering analyses reveal groups of participants who differ in both their symptom severity and treatment gains. These results support a theoretical model in which linguistic distance reflects healthy emotion regulation, making linguistic distance a tool for measuring mental health and treatment progress. Both initial analyses of an exploratory dataset and preregistered analyses of a holdout validation dataset provided consistent evidence that linguistic distance increased over time and tracked internalizing symptoms. Although prior research has shown that using language to distance oneself from aversive stimuli is related to effective emotion regulation (29–33), here we demonstrate the translational impact of this basic finding by showing its relationship with psychological symptoms in a naturalistic and longitudinal dataset of psychotherapy transcripts. Not only does this result support a theoretical model in which distancing language facilitates healthy emotion regulation, it also lends evidence to theories that emphasize the transdiagnostic role of emotion regulation in psychopathology (20, 45, 46). Additionally, the discovery of a stable small relation to psychopathology in a large dataset clarifies mixed results obtained from prior studies, most of which used small samples (41–44). Furthermore, this study shows that the relationship between linguistic distance and internalizing symptoms exists at both between-person and within-person levels. This extends prior research that focused only on between-level relationships (34, 47), providing strong evidence that linguistic distance can serve as a diagnostic and prognostic indicator of symptom severity, even as symptom levels fluctuate over the course of treatment. However, evidence for mediation (i.e., that increasing linguistic distance explained reduced symptoms across treatment) was inconsistent across exploratory and validation datasets. These inconsistencies suggest either that linguistic distance plays a very small mechanistic role in psychotherapy or that it merely reflects (rather than reduces) internalizing symptoms in therapy. Although this result runs contrary to hypotheses, it prompts future research that can decipher how linguistic distance tracks internalizing symptoms without serving a mediating role. Testing relations between language and symptoms at closer timescales or examining measures of a client’s actual emotion regulation or psychological distancing skills (rather than their linguistic correlates) could provide better tests of this underlying model. Finally, clustering approaches provided replicable evidence that trajectories of linguistic distance can predict treatment outcomes and symptom severity. In particular, we found that starting levels and slopes of linguistic distance related to symptom severity and treatment response, respectively. These results supplement the regression results described above to demonstrate that client language is intimately linked to mental health and treatment response, allowing us to deduce clinically relevant diagnostic and prognostic information from linguistic data alone. These clustering methods pave the way for more sophisticated machine learning approaches that could provide accurate clinical predictions from a client’s linguistic data. Additionally, differences between the temporal and social distancing metrics presented in carry several interesting implications for psychotherapy. These analyses showed that temporal distance clusters differed significantly in their treatment outcomes but not baseline symptoms, but social distance clusters differed in internalizing symptom severity but not changes in symptoms over time. Consequently, linguistic measures of social distance (i.e., pronoun use) may provide a trait-like measure of overall internalizing dysfunction, whereas temporal distance may reflect within-person shifts in one’s retreating symptoms. Future research that parses temporal and social distance at both the linguistic and phenomenological levels (i.e., assessing client’s experienced tendency to dilate their psychological focus away from themselves and/or the present moment) could shed further light on these hallmark symptoms of depression and anxiety, as well as the role of this process in successful treatment (see for further discussion). A strength of the current study is its unprecedented scale and naturalism, made possible through an inclusive approach to analyses. Indeed, using unfiltered data reduced experimenter degrees of freedom and provided the most conservative test of our research questions. However, taking such an unconstrained approach means that substantial noise remains in the data. The decisions to 1) include every text message (even if they are extremely short or may not be related to therapeutic interventions; e.g., messages about scheduling), 2) include all participants (even those who provided very few text messages), and 3) average linguistic data over a 3-wk period to match the frequency of symptom measures could add noise and cloud accurate assessments of effect sizes. Developing principled inclusion criteria and filtering methods could improve effect size estimates of relationships between variables. Nonetheless, this study serves as a foundational litmus test of these relationships, and future studies using machine learning and natural language processing approaches could further refine effect size estimates. Potentially because of this naturalistic approach, effect sizes for linguistic relationships were consistently small. This indicates that we observed subtle linguistic shifts over treatment and that linguistic interventions may only provide a small “nudge” when it comes to actual clinical impact. However, it is important to remember that linguistic distance was a byproduct, not a target of treatment, meaning that observing this effect in the context of an inclusive and naturalistic dataset provides strong support for the underlying theoretical model. Additionally, it’s possible that the higher level of noise at the text level and relatively small number of within-person symptom measurements (i.e., three to five) compared to the high number of subjects (i.e., thousands) could have rendered within-subject relations much weaker than between-person relations. Future research should use principles noted above to reduce noise and improve estimates of effect sizes, examine whether there are moderators that shift “for whom” these effects work, and increase the frequency of within-person symptom sampling to test whether within-person effects are actually larger than those estimated here. That said, there are reasons to value these small effects. Researchers have recently argued that celebrating small effects is key to developing a replicable psychological science (48), and even small effect sizes can have a large impact when they are employed on a large scale. For example, if 25% of 327 million Americans suffer from psychopathology in any given year, helping patients recover just 1 d faster will restore 82 million days of human productivity. As such, it would also be prudent to conduct cost–benefit analyses to quantify the actual impact of these interventions. The current findings advance the field’s ability to detect mental health problems from language alone. To work toward deployable tools with real-world impact, future research should address a few key limitations of the current study. First, we propose a theoretical model in which linguistic distance reflects emotion regulation abilities, which increase across time and ultimately improve internalizing symptoms. However, the current study does not include measures of emotion regulation, leaving it unclear what linguistic distance represents in this study. It could, indeed, reflect improved emotion regulation, but it could also reflect myriad other constructs (e.g., avoidance or improved therapeutic alliance). Future studies that empirically evaluate how adaptive emotion regulation fits in the proposed model are needed. Second, because there was no control group in this study, we cannot infer that Talkspace conversations were causally involved in either clients’ decreased symptoms across time or their increased linguistic distance across time. Studies that utilize active control conditions are needed to determine causal relationships. These experiments could also 1) give a clearer sense of the downstream impacts of these relationships (e.g., if they can causally reduce symptoms in the long term) and 2) adjudicate between the directions of language–symptom relationships. The current study tests one direction (i.e., that language predicts and explains symptom changes), and, although this association emerges, mediations were inconsistent. Follow-up experiments that include measures at a fine timescale could compare this direction with its reverse (i.e., that symptoms predict and explain language changes). Third, although we demonstrate an overall relationship between higher linguistic distance and reduced symptoms, it is possible that distancing is not always an adaptive strategy. Indeed, substantial data show that “experiential avoidance” (i.e., pushing away internal or external stressors; ostensibly increasing distance) is maladaptive, whereas mindfully attending to the present moment (ostensibly decreasing distance) is adaptive (49–52). Similarly, depression and anxiety are stereotypically seen as disorders in which people are overly focused on past losses or future threats, respectively. Why then would distancing from the present be helpful if habitually being “away from” the present moment is associated with psychopathology? One way to reconcile this apparent paradox is to consider that people suffering from depression and anxiety might not be distancing themselves from the present moment to think about the past and future; instead they are pulling these past and future moments into the present, seeing them with very low psychological distance, and acting as if they are currently happening (e.g., “I can’t believe I am such a failure” or “catastrophe is imminent”). As such, it is possible that learning to resist the avoidant strategies of worry and rumination requires taking a distanced perspective on that maladaptive habit and gaining skills to interrupt these processes. Mindful awareness is one strategy to do just that, as, even though it requires attending to the present moment, it also calls for viewing one’s thoughts as detached and separate from oneself (51, 53), a highly distanced perspective. These are initial attempts at resolving the puzzle of how a distanced perspective may facilitate psychological health, even though prior research establishes avoidance as unhelpful and present-focused mindful awareness as helpful. However, these possibilities require additional empirical investigation, a line of research that would benefit from incorporating emerging frameworks that emphasize the contextual nature of emotion regulation to parse when and in which contexts high distance is adaptive (54–56). In conclusion, this study used a large dataset of therapeutic exchanges to show that the psychological distance encoded in one’s speech reflects one’s level of internalizing symptoms and can even track within-person changes in symptom severity across time in treatment. Although mixed results emerged for whether linguistic distance played a mediating role in treatment outcomes, the current study lends support to the theoretical model suggesting that linguistic distancing tracks both emotion regulation and mental health. Findings extend prior research, foster future research questions, and lay the foundation for future tools that can use linguistic tools to both detect individuals suffering from psychopathology and guide interventions that reduce human suffering.

Methods

Participants.

This study included data from a random sample of 6,229 clients who utilized the digital psychotherapy service Talkspace (https://www.talkspace.com/) between 2016 and 2019. Given the longitudinal focus of this study, participants were only included if they had completed at least three symptom inventory questionnaires spanning at least 6 wk of treatment. For included clients, we downloaded 1) a fully deidentified record of all text message exchanges between the client and their Talkspace therapist, 2) their responses to measures of depression and anxiety, and 3) their self-reported demographics. Talkspace clients and therapists agreed to third parties conducting research on their data as a part of the terms of use (https://www.talkspace.com/public/terms). The Harvard University institutional review board designated the current study not human research (IRB18-1583), as the study utilized preexisting deidentified data for which consent to research was provided. The overall sample of 6,229 participants was randomly divided into an exploratory dataset (Ne = 3,729; 60%) and a validation dataset (Nv = 2,500; 40%), with analyses of the validation dataset only occurring after preregistering analyses and hypotheses (see https://osf.io/r5gn2). Participant demographics are displayed in Table 1.

Table 1.

Sample and platform description

	Full sample	Exploratory subsample	Validation subsample
Gender, No. (%)
Female	4,742 (77.4)	2,857 (77.7)	1,885 (77.0)
Male	1,306 (21.3)	766 (20.8)	540 (22.1)
Transgender female	11 (0.2)	8 (0.2)	3 (0.1)
Transgender male	11 (0.2)	5 (0.1)	6 (0.2)
Gender queer	27 (0.4)	21 (0.6)	6 (0.2)
Gender variant	6 (0.1)	3 (0.1)	3 (0.1)
Other	20 (0.3)	16 (0.4)	4 (0.2)
No response	106	53	53
Age, No. (%)*
18–25	1,031 (22.2)	606 (21.8)	425 (22.8)
26–35	2,536 (54.6)	1,517 (54.6)	1,019 (54.8)
36–49	871 (18.8)	526 (18.9)	345 (18.5)
50+	203 (4.4)	131 (4.7)	72 (3.9)
No response	1,588	949	639
Race, No. (%)
Caucasian	1,172 (60.4)	698 (60.3)	474 (60.5)
African American	284 (14.6)	179 (15.5)	105 (13.4)
Asian	140 (7.2)	82 (7.1)	58 (7.4)
Hispanic	120 (6.2)	66 (5.7)	54 (6.9)
Native American	5 (0.3)	2 (0.2)	3 (0.4)
Other	195 (10.1)	115 (9.9)	80 (10.2)
Declined to identify	24 (1.2)	15 (1.3)	9 (1.1)
No response	4,289	2,572	1,717
Education level, No. (%)
Less than high school	28 (0.5)	14 (0.4)	14 (0.7)
High school	808 (15.7)	477 (15.3)	331 (16.2)
Associate’s degree	78 (1.5)	40 (1.3)	38 (1.9)
Some college no degree	200 (3.9)	126 (4.0)	74 (3.6)
Bachelor’s degree	3,683 (71.4)	2,238 (71.9)	1,445 (70.7)
Master’s degree	260 (5.0)	163 (5.2)	97 (4.7)
Professional degree	43 (0.8)	23 (0.7)	20 (1.0)
Doctoral degree	56 (1.1)	31 (1.0)	25 (1.2)
No response	1,073	617	456
Symptom measures, mean (SD)
Baseline internalizing symptoms	22.21 (9.90)	21.99 (9.89)	22.54 (9.92)
Final internalizing symptoms	15.16 (9.87)	14.95 (9.79)	15.48 (10.00)
Baseline depression symptoms	11.03 (5.82)	10.93 (5.84)	11.18 (5.80)
Final depression symptoms	7.56 (5.58)	7.44 (5.54)	7.73 (5.62)
Baseline anxiety symptoms	11.18 (5.04	11.06 (5.02)	11.36 (5.06)
Final anxiety symptoms	7.60 (4.90	7.50 (4.83)	7.75 (4.99)
Therapy and text qualities
Text-only subscription, No. (%)	6,108 (98.1)	3,659 (98.1)	2,449 (98.0)
No. of client messages	759,706	455,379	304,327
Length of client messages (words), mean (SD)	80.81 (146.80)	81.61 (147.02)	79.61 (146.46)
No. of therapist messages	461,911	273,208	188,703
Length of therapist messages (words), mean (SD)	82.69 (107.30)	84.32 (108.00)	80.33 (106.23)
Present-tense verbs per message, mean (SD)	8.73 (15.10)	8.81 (15.11)	8.61 (15.09)
Past-tense verbs per message, mean (SD)	3.87 (9.34)	3.89 (9.34)	3.83 (9.34)
Future-tense verbs per message, mean (SD)	0.82 (1.80)	0.83 (1.81)	0.81 (1.79)
First-person singular pronouns per message, mean (SD)	8.13 (14.50)	8.17 (14.49)	8.06 (14.51)
Other pronouns per message, mean (SD)	3.75 (9.38)	3.78 (9.38)	3.71 (9.38)
Number of therapists, mean (SD)	1.21 (0.51)	1.21 (0.51)	1.22 (0.52)
Number of symptom measures, mean (SD)	3.45 (0.51)	3.45 (0.51)	3.45 (0.51)
Exactly 3 measures, No. (%)	3,453 (55)	2,058 (55)	1,395 (56)
Exactly 4 measures, No. (%)	2,741 (44)	1,651 (44)	1,090 (44)
Exactly 5 measures, No. (%)	35 (1)	20 (1)	15 (1)
Days between start of therapy and final symptom measure, mean (SD)	63.93 (11.90)	63.94 (11.89)	63.91 (11.92)

Percentages ignore clients who did not respond to each demographic question.

*Participants selected from predefined age bins. Age is given in years.

Sample and platform description Percentages ignore clients who did not respond to each demographic question. *Participants selected from predefined age bins. Age is given in years.

Therapy Platform.

Talkspace is a digital mental health platform that provides session-based teletherapy, as well as asynchronous messaging therapy, from which these data were drawn. Potential clients register with the service and begin by describing their presenting complaint and treatment goals to a consultation therapist. This information enables the system to provide the client with three licensed National Committee for Quality Assurance credentialed therapist options. These recommendations are based on each therapist’s history with demographically and diagnostically similar clients. The chosen “primary therapist” then treats the client. Clients can purchase live phone and video sessions, but most clients select the messaging-only plan (98.1% in this dataset; Table 1). Clients may send therapists messages whenever they wish using the HIPAA-compliant smartphone-based application or the Talkspace website. Therapists respond by messages during designated hours. Clients have the option to transfer to a different primary therapist, but most clients interacted with only one therapist (i.e., 82.5% in the full dataset; Table 1). Mean length of text messages in the full dataset was ∼80 words, providing substantial data for linguistic analysis (Table 1).

Symptom Assessments.

Procedure.

Symptom questionnaires were sent to clients via the messaging platform approximately every 3 wk over the course of therapy. The link to complete questionnaires expired only when the next set of questionnaires were sent (i.e., participants could complete questionnaires whenever they would like after receiving the link, up until the next questionnaire administration). The date on which participants completed the questionnaire was recorded. This date was transformed into a measure of their current time in therapy at that symptom measurement by computing the number of days between questionnaire completion and the start of therapy (i.e., the date of the first text message between the client and their primary therapist).

Depression symptoms.

Symptoms of depression were measured using the eight-item Personal Health Questionnaire (PHQ-8), a validated and widely used tool for assessing depressive symptoms (57). Participants rated how often over the last 2 wk they had been bothered by eight of the nine symptoms of major depressive disorder (i.e., anhedonia, low mood, sleep disturbance, fatigue, appetite disturbance, low self-esteem, concentration difficulties, and psychomotor agitation or slowing). Responses were made on a four-point scale (0 = not at all, 1 = several days, 2 = more than half the days, and 3 = nearly every day). Responses were summed to provide a measure of overall depression symptom severity, with scores ranging from 0 to 24. Unlike PHQ-9, PHQ-8 does not include an item assessing suicidal ideation. However, studies have shown that PHQ-8 and PHQ-9 provide equivalently sensitive and valid measures of depressive symptoms (58–60).

Anxiety symptoms.

Anxiety symptoms were assessed using the seven-item Generalized Anxiety Disorder Questionnaire (GAD-7) (61), a widely used and validated measure of anxiety symptoms. Participants rated how often over the last 2 wk they had been bothered by core symptoms of generalized anxiety disorder (i.e., feelings of anxiety, uncontrollable worrying, difficulty relaxing, restlessness, irritability, and fears of catastrophic outcomes). Responses were made on a four-point scale (0 = not at all, 1 = several days, 2 = more than half the days, 3 = nearly every day) and summed to provide a measure of overall anxiety symptom severity, with scores ranging from 0 to 21.

Data Processing.

Producing a combined measure of internalizing symptoms.

Preliminary analyses in the exploratory dataset revealed that scores on PHQ-8 and GAD-7 were strongly related to each other [within-person correlation using the statsBy function in the psych package (62): re = 0.70, pe < 0.001, rv = 0.69, and pv < 0.001]. We consequently collapsed these two measures into a single assessment of internalizing symptoms by summing the two scales together, as has been done in prior work (63). Nonetheless, preregistered supplementary analyses were conducted on depression and anxiety scores separately both to present these individual statistics and to show that results were largely equivalent across the two measures ().

Text processing.

We developed code in R Version 4.0.4 (64) to extract individual text messages from Talkspace text records. Text messages were extracted with their corresponding date and time of delivery, as well as the author of the text (i.e., client or therapist). Linguistic distance for each text was first computed following prior work (29, 30, 65). However, analyses of the exploratory dataset suggested that overall pronoun and verb use increased over the course of treatment (presumably due to changes in topics of conversation). These overall shifts across time made this measure unsuited to the current study (). We consequently developed more-precise measures of linguistic distance that controlled for overall shifts in verb and pronoun use over the course of treatment. Linguistic Inquiry and Word Count (LIWC) (66) software was used to compute the percentage of words that were verbs (divided into past, present, and future tense) and pronouns (divided into first-person singular, first-person plural, second person, third-person singular, and third-person plural). We computed a temporal distance score for each text message by computing the proportion of verbs that were not in the present tense [i.e., (past + future)/(past + future + present)]. Similarly, we computed a social distance score for each text message by computing the proportion of pronouns that were not first-person singular [i.e., (second person + first-person plural + third-person singular + third-person plural)/(second person + first-person plural + third-person singular + third-person plural + first-person singular)]. Temporal distance scores were treated as missing for text messages that included no verbs (7.1% of client text messages for the exploratory and 6.9% of validation dataset), and social distance scores were treated as missing for text messages that included no pronouns (9.4% of client text messages for exploratory and 9.1% of validation dataset). We then averaged these two measures at the text level into a single combined linguistic distance score (11.2% messages were unusable due to no pronouns or verbs used in exploratory dataset and 10.9% in validation dataset). This revised measure of linguistic distance 1) captures the relative focus on temporal and social targets that are distanced from the present moment and 2) accounts for overall differences in verb and pronoun use across treatment. Analyses of social and temporal distance as separate metrics are presented in .

Aligning text and questionnaire data.

Text data were collected at a more granular timescale (i.e., minutes, hours, or days) compared to symptom measures (i.e., every 3 wk). We computed the mean linguistic distance in users’ text messages within the ∼3-wk periods between symptom assessments and aligned these averages with the symptom assessments completed at the end of each of these observation periods. We quantified time (i.e., days in therapy) by computing the number of days between the date questionnaires were completed and the beginning of therapy (i.e., the date of the first text message between the client and the primary therapist). This resulted in a dataset comprising baseline symptom measures (at time = 0), symptom measures at each subsequent symptom measurement point, and the mean linguistic distance of client text messages sent before each of these symptom measurements, all nested within participants.

Consideration of exclusion criteria.

It is worth noting that we adopted an inclusive approach to analyzing this real-world dataset. Although criteria could have been developed to exclude participants (e.g., minimum number of text messages, minimum initial symptom severity, or type of subscription) or text messages (e.g., minimum word count), we refrained from imposing experimenter-defined cutoffs as much as possible. Given the novelty of this naturalistic analysis, we chose to take an inclusive approach to provide unbiased insight into research questions, but the presence of unfiltered noise should be noted when interpreting results.

Research Questions, Analyses, and Hypotheses.

Are internalizing symptoms, linguistic distance, and time in treatment related?

We first tested the “arms” of a mediation model in which linguistic distance mediates reductions in internalizing symptoms over time in treatment. This involved using mixed-effect models to test for linear relationships between 1) days in treatment and internalizing symptoms, 2) days in treatment and linguistic distance, and 3) linguistic distance and internalizing symptoms. We hypothesized that 1) time in therapy would be negatively related to symptoms, 2) linguistic distance would be positively related to time in therapy, and 3) linguistic distance would be negatively related to internalizing symptoms at both within-person and between-person levels of analysis. For this third relationship, it was important to decompose measures of linguistic distance into within-person and between-person components within this longitudinal design (67, 68). This is because a relationship between linguistic distance and internalizing symptoms could emerge in mixed-effect models either 1) because, as people increase their linguistic distance, their symptoms reduce (a within-person relationship) or 2) because individuals who, overall, have higher linguistic distance have lower symptoms than individuals who, overall, have lower linguistic distance (a between-person relationship). We consequently followed prior work in decomposing linguistic distance into within-person and between-person components and used these components in mixed-effects regressions (67–72). A variable representing the within-person fluctuation in linguistic distance was created by subtracting each individual’s mean linguistic distance score from the score of each of their observations, producing a variable representing within-person deviation, centered around their individual mean. Then, a variable representing between-person variance in linguistic distance was constructed by subtracting the overall group mean of linguistic distance from that participant’s average temporal distance. This produced a variable that was constant for each participant and represented how their mean level deviated from the group’s mean. These within-person and between-person variables were entered simultaneously in mixed-effects models testing relations between linguistic distance and internalizing symptoms.

Does linguistic distance mediate symptom reduction?

We next conducted mediation analyses to formally test whether increasing linguistic distance over the course of treatment mediated symptom changes, using measures of linguistic distance that had been decomposed into their within-person and between-person components. Typically, mediation analyses utilize bootstrapping methods (i.e., randomly sampling from the original dataset with preplacement thousands of times) to generate many samples from which a confidence window can be constructed to test the significance of mediation model (73). However, the appropriate method for bootstrapping multilevel data is not clear, as random samples can be drawn at the participant level, at the observation level, or at both participant and observation levels. We thus used Bayesian analytic procedures—which do not involve bootstrapping methods—for our mediation analyses, to sidestep this issue (70). Like the mixed-effects models described above, Bayesian regression models included a random effect of participant to account for the multilevel nature of the dataset. To provide relatively unbiased starting points for Bayesian analyses, we supplied weakly informative priors (Gaussian distribution of M = 0, SD = 10) for all regressors in the models. Bayesian analyses were implemented using the Stan language in R (74). Two Markov chains used the Monte Carlo No U-Turn Sampler (75) to approximate the posterior distribution of each regressor across 12,500 iterations, with the first 2,500 iterations discarded as burn-in. The indirect effect (i.e., the a × b pathway for the within-person parameter) and proportion mediated (i.e., indirect effect/[indirect effect + direct effect] × 100) were computed for each mediation model. A significant mediation was determined when the 95% credible range (CR) of posterior density for the indirect effect did not include zero. We hypothesized that within-person increases in linguistic distance would mediate decreased symptoms across therapy.

Can symptoms be inferred from linguistic patterns alone?

Finally, we used clustering approaches to supplement the regression models utilized above. One limitation of regressions is the extent of aggregation that is required to align text and questionnaire data, resulting in loss of information and introduction of noise. As such, we utilized finite mixture regression techniques (76), which analyze data at the text level. In essence, mixture regressions identify clusters of individuals based on similarities of joint distributions among variables. This means that participants who tend to have the same relationship between two variables are grouped together. In this case, we used mixture regressions to cluster individuals based on how their linguistic distance in individual text messages varied across time (e.g., grouping clients whose linguistic distance increased over time into one cluster and grouping clients whose linguistic distance decreased over time into a different cluster). This allowed us to test whether text data could be used to draw inferences about clients’ symptoms and treatment outcomes. Mixture regression models were conducted on a dataset that included the linguistic distance score for every text message that clients sent to their primary therapist over the course of therapy. For added precision, time in therapy was quantified as a decimal value that included the proportion of a day that had passed since the first text sent between the client and the therapist. We conducted mixture regression analyses that grouped participants into one, two, three, and four clusters, and we then selected the number of clusters that provided the best fit, as determined by AIC. For additional stability and model fit, mixture regressions for each cluster size were implemented 10 times (to account for subtle differences that can emerge depending on random starting points of the clustering algorithm), and the best-fitting model was selected. Mixture regressions included a random effect of subject to account for nesting of text messages within subjects. We then conducted analyses in the aggregated dataset described above (i.e., in which linguistic data were averaged to match the timeline of symptom inventories) to test how each measure of linguistic distance varied across time in each cluster (using mixed-effects models), as well as how clusters differed in baseline and final internalizing symptoms (using ANOVAs and post hoc Tukey tests), and how they differed in their change in internalizing symptom scores [i.e., analyzing final – baseline internalizing symptoms change scores using ANCOVAs to control for baseline symptom levels (77, 78)].

Additional preregistered analyses.

All analyses were initially only conducted in the exploratory dataset of 3,720 participants, and analyses of the 2,500 participants in the validation dataset occurred following preregistration. Note that we preregistered analyzing social and temporal components of linguistic distancing measure separately. In the revision process, we decided to combine these into a single measure. Results and conclusions are largely the same when each component is analyzed separately, and all preregistered analyses are provided in , including separate analyses of depression and anxiety symptoms. We also preregistered an additional set of analyses related to the role of therapist linguistic distancing in treatment outcomes. Because the current paper focuses on client language, we have reserved analyses of therapist language for a subsequent report focused on interpersonal processes in therapy.

Model building.

Mixed-effects models all included a random intercept for subject. We followed conventional model-building steps to test whether adding random slopes improved model fit (as determined by a lower AIC and a significant model comparison). For both the exploratory and validation datasets, these steps consistently revealed that adding a random slope for time in therapy significantly improved model fit. Hence, a random slope of time in therapy was added to all models that included this variable as a fixed effect. Random slopes for models without time as a predictor (e.g., relating client temporal distance and internalizing symptoms) were included when doing so improved model fit. Linear mixed-effect model regression estimates are reported in standardized units (i.e., β). Note that there are several methods for computing standardized βs in mixed-effects models, and here coefficients are standardized at their relevant “level” (i.e., in relation to within-person or between-person variance) using the “pseudo” option of the “standardize_parameters” function in the effectsize package (79). We characterize effect sizes according to conventions for correlation coefficients (i.e., ∼0.1 = small, ∼0.3 = medium, and ∼0.5 = large) (80). To provide an additional estimate of effect sizes in mixed-effects models, we report the proportion variance explained by each predictor (i.e., semipartial R2) following the conventions described by Edwards and coworkers (81, 82) and using Satterthwaite estimation of degrees of freedom. Regression estimates for Bayesian mediation models are reported in their raw unstandardized form (i.e., b), but we report the proportion mediated as the key effect size for each mediation model. We use eta squared (i.e., η2) to report the effect size of one-way ANOVAs and use partial eta squared (i.e., ηp2) for ANCOVAs that control for baseline symptoms.

Software.

LIWC 2007 (66) was used to extract word class frequencies from text messages. Mixed-effects models were conducted in lme4 (83), with P values calculated using the lmerTest package (84). Standardized betas of linear mixed-effects models were extracted using the effectsize package (79). Bayesian analyses were conducted using the brms package (85, 86). Mixture regressions were conducted in the flexmix package (76).

53 in total

1. Shortages of rural mental health professionals.

Authors: Elizabeth Merwin; Ivora Hinton; Bruce Dembling; Steven Stern
Journal: Arch Psychiatr Nurs Date: 2003-02 Impact factor: 2.218

2. Experiential avoidance as a generalized psychological vulnerability: comparisons with coping and emotion regulation strategies.

Authors: Todd B Kashdan; Velma Barrios; John P Forsyth; Michael F Steger
Journal: Behav Res Ther Date: 2006-09

3. Emotion-regulation strategies across psychopathology: A meta-analytic review.

Authors: Amelia Aldao; Susan Nolen-Hoeksema; Susanne Schweizer
Journal: Clin Psychol Rev Date: 2009-11-20

4. Emotion-regulation skills as a treatment target in psychotherapy.

Authors: Matthias Berking; Peggilee Wupperman; Alexander Reichardt; Tanja Pejic; Alexandra Dippel; Hansjörg Znoj
Journal: Behav Res Ther Date: 2008-08-30

5. The Future of Emotion Regulation Research: Capturing Context.

Authors: Amelia Aldao
Journal: Perspect Psychol Sci Date: 2013-03

6. A brief measure for assessing generalized anxiety disorder: the GAD-7.

Authors: Robert L Spitzer; Kurt Kroenke; Janet B W Williams; Bernd Löwe
Journal: Arch Intern Med Date: 2006-05-22

7. A linguistic signature of psychological distancing in emotion regulation.

Authors: Erik C Nook; Jessica L Schleider; Leah H Somerville
Journal: J Exp Psychol Gen Date: 2017-01-23

8. The PHQ-9 versus the PHQ-8--is item 9 useful for assessing suicide risk in coronary artery disease patients? Data from the Heart and Soul Study.

Authors: Ilya Razykov; Roy C Ziegelstein; Mary A Whooley; Brett D Thombs
Journal: J Psychosom Res Date: 2012-06-30 Impact factor: 3.006

9. Me, myself, and I: self-referent word use as an indicator of self-focused attention in relation to depression and anxiety.

Authors: Timo Brockmeyer; Johannes Zimmermann; Dominika Kulessa; Martin Hautzinger; Hinrich Bents; Hans-Christoph Friederich; Wolfgang Herzog; Matthias Backenstrass
Journal: Front Psychol Date: 2015-10-09

10. Two-way messaging therapy for depression and anxiety: longitudinal response trajectories.

Authors: Thomas D Hull; Matteo Malgaroli; Philippa S Connolly; Seth Feuerstein; Naomi M Simon
Journal: BMC Psychiatry Date: 2020-06-12 Impact factor: 3.630