| Literature DB >> 35396451 |
Tianlin Zhang1, Annika M Schoene1, Shaoxiong Ji2, Sophia Ananiadou3,4.
Abstract
Mental illness is highly prevalent nowadays, constituting a major cause of distress in people's life with impact on society's health and well-being. Mental illness is a complex multi-factorial disease associated with individual risk factors and a variety of socioeconomic, clinical associations. In order to capture these complex associations expressed in a wide variety of textual data, including social media posts, interviews, and clinical notes, natural language processing (NLP) methods demonstrate promising improvements to empower proactive mental healthcare and assist early diagnosis. We provide a narrative review of mental illness detection using NLP in the past decade, to understand methods, trends, challenges and future directions. A total of 399 studies from 10,467 records were included. The review reveals that there is an upward trend in mental illness detection NLP research. Deep learning methods receive more attention and perform better than traditional machine learning methods. We also provide some recommendations for future studies, including the development of novel detection methods, deep learning paradigms and interpretable models.Entities:
Year: 2022 PMID: 35396451 PMCID: PMC8993841 DOI: 10.1038/s41746-022-00589-7
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Keywords for literature search.
| Category | Keywords |
|---|---|
| Mental illness (1) | Mental disorder, mental health, mental illness |
| Depression, suicide, psychology, insomnia, stress, anxiety, schizophrenia, phobias, PTSD (post-traumatic stress disorder), ASD (autism spectrum disorder), anorexia, bipolar | |
| Data sources (2) | Social media, text, language, posts, notes, interviews, records, survey |
| Twitter, reddit, weibo, microblog, facebook, instagram | |
| Methods (3) | Natural language processing, deep learning, machine learning, text mining, text analysis |
| Neural network, CNN, LSTM, SVM, tree | |
| Detection (4) | Detect, identify, recognize, predict, prevent, screen, assess, understand |
| Search query | (1) AND (2) AND (3) AND (4) |
Fig. 1Overview of article selection process.
Six databases (PubMed, Scopus, Web of Science, DBLP computer science bibliography, IEEE Xplore, and ACM Digital Library) were searched. The flowchart lists reasons for excluding the study from the data extraction and quality assessment.
Fig. 2NLP trends applied to mental illness detection research using machine learning and deep learning.
The trend of the number of articles containing machine learning-based and deep learning-based methods for detecting mental illness from 2012 to 2021.
Fig. 3Sankey diagram of NLP methods, illness, languages and applications.
The different methods with their associated application are represented via flows. Nodes are represented as rectangles, and the height represents their value. The width of each curved line is proportional to their values.
Fig. 4Distribution of different data sources.
The pie chart depicts the percentages of different textual data sources based on their numbers.
Fig. 5Proportions of various types of mental illness.
The chart depicts the percentages of different mental illness types based on their numbers.
An overview of features used in machine learning-based models.
| Feature categories | Feature types | Features | Description | Typical examples |
|---|---|---|---|---|
| Linguistic features | Sysntactic features | Part-of-Speech (POS) | Based on the grammatical use and functions, words are categorized into different types of POS (like None, Verb, Adverb). | |
| Dependency parsing | The grammatical structure of a sentence. | |||
| Lexicon-based features | Bag-of-words (BoW) | The simplest form of text representation using numbers of vocabularies. | ||
| Lexical diversity, lexical density | The unique vocabulary usage and proportion of content words. | |||
| Emotion features | Sentiment scores | Sentiment scores are used to quantify the feeling of texts and determine the sentiment polarity (positive, negtive,or netural). The way of calculation includes using VADER sentiment analysis (Valence Aware Dictionary and sEntiment Reasoner)[ | ||
| Emotion scores | The emotion scores indicates the user’s emotions and opinions of texts to an extent, which is beneficial for mental issues detection. NRC Affect Intensity Lexicon[ | |||
| Semantic features | Semantic similarity | Using semantic similarity predict whether the sentence or word is sematically related to the target sentence or word. | ||
| Topic features | Topic features | The topics extracted from texts using some topic-modeling algorithms, like Latent Dirichlet Allocation (LDA)[ | ||
| Linguistic features | LIWC | Linguistic Inquiry and Word Count (LIWC)[ | ||
| Others | Hashtag, emoji | Hashtag is metadata tag from social media platform,which present a theme or topic. emoticons or emojis are often used to show various types of emotions. | ||
| Statistical features | Statistical corpus features | n-gram | N-gram is a contiguous sequence of n words. | |
| TF-IDF | Term frequency-inverse document frequency (TF-IDF) reflect the importance of the word in document. | |||
| Length statistics | The length of posts, documents or average sentence. | |||
| Vector-based features | Word embedding | The vector-based representation of words. Examples: word2vec[ | ||
| Document embedding | The vector-based representation of document. | |||
| Domain knowledge features | Conceptual features | UMLS | Unified Medical Language System (UMLS) is a set of key terminology, coding standards, and associated resources related to biomedical information. | |
| Linguistic dictionary | The dictionary contains mental health illness related words | |||
| Other auxiliary features | Social behavioral features | Social connectivity | The degree of social interaction on social media, like number of followers, friends, and communities joined[ | |
| User behaviors | The user’s behavioral signals on social media, such as the frequency of comments and forwardings. | |||
| Time features | Time features | Focusing on the time-related features, like sending time, time interval. | ||
| User’s profile features | User’s profile features | The user’s profile features contain their individual information on social networks. |
The deep learning methods for mental illness detection.
| Type | Method | Description |
|---|---|---|
| CNN-based methods | Standard CNN[ | Standard CNN structure: convolutional layer, pooling layer and fully connected layer. Some studies also incorporate other textual features (like POS, LIWC, BoW, etc.). |
| Multi-Gated LeakyReLU CNN (MGL-CNN)[ | Two hierarchical (post-level and user-level)neural network models with gated units and convolutional networks. | |
| Graph model combined with Convolutional Neural Network[ | A unified hybrid model combining CNN with factor graph model which leverages social interactions and content. | |
| RNN-based methods | LSTM or GRU (some with attention mechanism)[ | Standard RNN structure: Long Short-Term Memory networks(LSTM) or Gate Recurrent Unit(GRU), and some studies add attention mechanism. |
| Hierarchical Attention Network (HAN) with GRU[ | The GRU with a word-level attention layer and a sentence-level attention layer. | |
| LSTM with transfer learning[ | Using transfer learning on open dataset for model pre-training. | |
| LSTM or GRU with multi-task learning[ | Using multi-task learning to help illness detection get better result. The tasks include multi-risky behaviors classification, severity score prediction,word vector classification,and sentiment classification. | |
| LSTM or GRU with reinforcement learning[ | Using reinforcement learning to automatically select the important posts. | |
| LSTM or GRU with multiple instance learning[ | Using multiple instance learning to get the possibility of post-level labels and improve the prediction of user-level labels. | |
| SISMO[ | An ordinal hierarchical LSTM attention model | |
| Transformer-based methods | Self-attention models[ | Using the encoder structure of transformer which has self-attention module. |
| BERT-based models (BERT[ | Different BERT-based pre-trained models. | |
| Hybrid-based methods | LSTM+CNN[ | Combining LSTM with CNN to extract local features and sequence features. |
| STATENet (using transformer and LSTM)[ | A time-aware transformer combining emotional and historical information. | |
| Sub-emotion network[ | Integrating Bag-of-Sub-Emotion embeddings into LSTM to get emotional information. | |
| Events and Personality traits for Stress Prediction (EPSP) model[ | A joint memory network for learning the dynamics of user’s emotions and personality. | |
| PHASE[ | A time and phase-aware model that learns historical emotional features from users. | |
| Hyperbolic graph convolution networks[ | Hyperbolic Graph Convolutions with the Hawkes process to learn the historical emotional spectrum of a user. |