| Literature DB >> 34939068 |
Zhaolu Liu1, Robert L Peach1,2,3, Emma L Lawrance4,5, Ariele Noble5, Mark A Ungless5, Mauricio Barahona1.
Abstract
The current mental health crisis is a growing public health issue requiring a large-scale response that cannot be met with traditional services alone. Digital support tools are proliferating, yet most are not systematically evaluated, and we know little about their users and their needs. Shout is a free mental health text messaging service run by the charity Mental Health Innovations, which provides support for individuals in the UK experiencing mental or emotional distress and seeking help. Here we study a large data set of anonymised text message conversations and post-conversation surveys compiled through Shout. This data provides an opportunity to hear at scale from those experiencing distress; to better understand mental health needs for people not using traditional mental health services; and to evaluate the impact of a novel form of crisis support. We use natural language processing (NLP) to assess the adherence of volunteers to conversation techniques and formats, and to gain insight into demographic user groups and their behavioural expressions of distress. Our textual analyses achieve accurate classification of conversation stages (weighted accuracy = 88%), behaviours (1-hamming loss = 95%) and texter demographics (weighted accuracy = 96%), exemplifying how the application of NLP to frontline mental health data sets can aid with post-hoc analysis and evaluation of quality of service provision in digital mental health services.Entities:
Keywords: crisis; deep learning; digital mental health; machine learning; mental health; natural language processing; transformers
Year: 2021 PMID: 34939068 PMCID: PMC8685221 DOI: 10.3389/fdgth.2021.779091
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1High-level overview of the Shout data set. (A) The distribution of conversation lengths as measured by the number of characters. The red box indicates the conversations between 1,200 and 9,200 characters which passed the inclusion criteria for analysis. (B) The histogram of the number of words in a conversation. Since 82% of the conversations exceed the maximum word limit (512 words) of traditional BERT-style NLP models, we use the Longformer model. (C) Age distribution and (D) gender distribution of texters as obtained from the texter surveys (which we show in section 3.3 to be biased toward particular demographic groups). (E) A synthetic example of a portion of a conversation to illustrate annotations of behaviours and conversation stages.
Pre-defined conversation stages which Shout Volunteers are trained to follow to best support the texters.
|
|
|
|
|---|---|---|
| 0 | Initialise | Greetings and bot messages |
| 1 | Build rapport | Active listening and good contact techniques. |
| 2 | Explore | Understand what is the crisis and asses risk. |
| 3 | Identify goal | Clarify what support the texter needs. |
| 4 | Problem-solve | Identify current resources and create an action plan. |
| 5 | End the conversation | Review the action plan and close warmly |
Definitions of behaviours.
|
|
|
|
|---|---|---|
| 1 | Setting intention | Communicate an immediate (near future) aim |
| 2 | Enquiring | Explore one's understanding of another's experience |
| 3 | Expressing distress | Communicate an offloading of negative feelings |
| 4 | Reflecting | Mirror something the texter has said |
| 5 | Corresponding | Show comparability between both parties |
| 6 | Discord | Involves a lack of harmony between both parties |
Figure 2Additional pre-training on the masked language modelling (MLM) task using Shout mental health data. E represents the input embeddings. R represents the contextual representation of token i. B and T are the contextual representations of special token [bot] and [texter] respectively. After the tokenisation the numerical vectors are processed via Longformer.
Figure 3Mechanism to include context in input data: m represents the message at time i and S represents the stages assigned for message j. Exactly one prior and one post message are concatenated with the message of interest. The label of the contextualised text inherits the original label of the current message.
Comparison of conversation stage classification performance for models with different performance improvement techniques.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | 100 | 100 | × | No context | 0.6231 |
| 2 | 100 | 100 | × | ± 1 message | 0.8620 |
|
| 100 | 100 | ✓ | ± 1 message |
|
| 4 | 0 | 100 | × | ± 1 message | 0.2010 |
| 5 | 33 | 100 | × | ± 1 message | 0.7512 |
| 6 | 100 | 25 | × | ± 1 message | 0.6364 |
| 7 | 100 | 50 | × | ± 1 message | 0.6966 |
“No context” means that only the message of interest is input to the model, whereas “± 1 message” refers to the context concatenation scheme introduced in .
Classification accuracy for each conversation stage with the optimal model (Model ID 3 in Table 3).
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| Accuracy | 0.9001 | 0.8890 | 0.8936 | 0.8995 | 0.8709 | 0.8929 |
Figure 4Word clouds for LIME-derived words that best explain the model prediction for each conversation stage. (A) Bot messages are detected from the initialisation. (B) Greeting words are frequent when building rapport. (C) Volunteer begins to explore the texter's problem with key words from the discussion featuring “sharing” and “life.” (D) Volunteer works to identify goals with the texter and these messages usually start with first-person and second-person pronouns. (E) Texter-specific solutions are suggested such as a “helpful” “URL.” (F) The conversations are brought to a close using phrases like “take care” and “bye”.
Results of behaviour classification for individual messages.
|
|
|
|
|
|---|---|---|---|
| No context |
|
|
|
| ± 1 message | 0.7407 | 0.9410 | 0.9485 |
Comparison of two models: without context and with message context, i.e., including the text of the preceding and following messages. Bold values indicate the model with the best performance.
Classification performance for three demographic variables.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
| Age 13 or under | 0.9575 | 0.0663 | 0.0435 | −0.0228 |
| Autism | 0.9533 | 0.0618 | 0.0128 | −0.0490 |
| Non-binary gender | 0.9592 | 0.0494 | 0.0160 | −0.0334 |
We report the proportion of texters that self-identified in each demographic category in the survey. Using the trained models, we then predicted the class of the remaining (unlabelled) conversations and report the percentage of total texters predicted in each demographic category. We also report the difference between the survey-reported and predicted proportions of each category.
Figure 5Word cloud for words that best explains the model predictions for (A) age group 13 or under, (B) autism, and (C) non-binary gender.