Literature DB >> 35402139

Predicting subjective well-being in a high-risk sample of Russian mental health app users.

Polina Panicheva1, Larisa Mararitsa1,2, Semen Sorokin1, Olessia Koltsova1, Paolo Rosso3.   

Abstract

Despite recent achievements in predicting personality traits and some other human psychological features with digital traces, prediction of subjective well-being (SWB) appears to be a relatively new task with few solutions. COVID-19 pandemic has added both a stronger need for rapid SWB screening and new opportunities for it, with online mental health applications gaining popularity and accumulating large and diverse user data. Nevertheless, the few existing works so far have aimed at predicting SWB, and have done so only in terms of Diener's Satisfaction with Life Scale. None of them analyzes the scale developed by the World Health Organization, known as WHO-5 - a widely accepted tool for screening mental well-being and, specifically, for depression risk detection. Moreover, existing research is limited to English-speaking populations, and tend to use text, network and app usage types of data separately. In the current work, we cover these gaps by predicting both mentioned SWB scales on a sample of Russian mental health app users who represent a population with high risk of mental health problems. In doing so, we employ a unique combination of phone application usage data with private messaging and networking digital traces from VKontakte, the most popular social media platform in Russia. As a result, we predict Diener's SWB scale with the state-of-the-art quality, introduce the first predictive models for WHO-5, with similar quality, and reach high accuracy in the prediction of clinically meaningful classes of the latter scale. Moreover, our feature analysis sheds light on the interrelated nature of the two studied scales: they are both characterized by negative sentiment expressed in text messages and by phone application usage in the morning hours, confirming some previous findings on subjective well-being manifestations. At the same time, SWB measured by Diener's scale is reflected mostly in lexical features referring to social and affective interactions, while mental well-being is characterized by objective features that reflect physiological functioning, circadian rhythms and somatic conditions, thus saliently demonstrating the underlying theoretical differences between the two scales.
© The Author(s) 2022.

Entities:  

Keywords:  Digital traces; Mental health prediction; Subjective well-being

Year:  2022        PMID: 35402139      PMCID: PMC8978494          DOI: 10.1140/epjds/s13688-022-00333-x

Source DB:  PubMed          Journal:  EPJ Data Sci        ISSN: 2193-1127            Impact factor:   3.184


Introduction

In recent years, evaluation, analysis and improvement of subjective well-being (SWB) has gained a growing attention of both researchers and practitioners [1, 2]. Attention to SWB has naturally been coupled with the increasing research interest in depression – the leading cause of disability and subjective well-being loss worldwide [3, 4]. The COVID-19 pandemic, resulting in the shift to hybrid work and the decline in face-to-face communication has put many individuals at additional mental health risks [5, 6]. Some of the most widely available instruments to mitigate such risks are online and mobile services that offer quick screening tests of subjective well-being and mental health states and automatically generate respective recommendations. More than 240 mental health apps are available in the App Store today, some of which are extensively using machine learning for classifying and scoring their users in terms of their psychological or mental conditions [7-9]. Such apps attract consumers concerned with their psychological states, while these concerns are usually associated with higher risks for users’ SWB or mental health. As these individuals agree to donate parts of their digital traces, psychological apps become natural hubs accumulating data on individuals at risk. Such data, if available, provide ample opportunities for the development of open source algorithms for early automatic detection of threats to well-being in high-risk populations with their digital traces. Subjective well-being is most commonly defined in accordance with Diener’s approach [10] as a person’s satisfaction with their life (which constitutes SWB’s cognitive component) and the prevalence of positive emotions over negative ones (affective balance, which constitutes SWB’s affective component). To date, about 100 assessment tools measuring about 200 facets of well-being have been proposed, thus complicating the selection of relevant metrics [1]. The two most widely used SWB measurement tools are Diener’s Satisfaction with Life Scale (SWLS) [10] and the scale introduced by the World Health Organization in 1998, known as the WHO-5 index [11]. The former aims to capture generalized long-term subjective well-being, while the original goal of the latter was to screen and rate depression. Later, Bech, one of the WHO-5 developers, also showed that this scale is equally good at detecting high degrees of psychological well-being, which he proposed to consider a component of mental health, along with the absence of depression symptoms [12]. Both SWLS and WHO-5 are short unidimensional 5-item scales with proven validity and reliability (α coefficients 0.79–0.89 for the former and 0.82–0.95 for the latter) [13-15]. Both have become common for well-being screening in a wide range of populations and among different nationalities [15-18]. The wide use and the proven quality of these metrics defines their choice for our research in automatic SWB prediction; however, some more details on their distinctive features should be added. SWLS, apart from being centered on pleasure and satisfaction, is also meant to be time- and dimension-independent. The first feature means that it is not tied to a specific time interval and measures satisfaction with our past, present and future. The second feature refers to the generalized character of such satisfaction, not being tied to any particular dimension of human life, such as health, relationships or finance. The choice of the dimensions to be taken into account and the weight assigned to them is left with the subject and is expected to be based on a blend of objective reality and the subject’s subjective experience of it. It is assumed that a person is able to adequately assess her well-being and has all the necessary and unbiased information for that [10]. SWLS is widely used by psychologists, public health professionals, and economists. According to the World Happiness Report, SWLS provides a more informative measure for international comparisons of well-being than some measures capturing affective component only [19]. Importantly, SWLS is stable under unchanging conditions, but is sensitive to changes in life circumstances: thus,its growth is associated with higher likelihood of marriage and childbirth and with lower likelihood of job loss and relocating [20]. It is also predictive of physical and physiological outcomes, as judged from a 4-year follow-up period in the same study. It is these meaningful changes that have been found responsible for the drop of SWLS test-retest reliability from 0.84 in the window of a few weeks to 0.54 in the 4-year window [21]. These changes are clearly distinct from the short-term random mood fluctuations responsible for explaining 16% of variance in the short run. It thus can said that SWLS captures a stable and a transient components both of which are present in human well-being. In contrast to SWLS, WHO-5 index aims at a brief assessment of emotional well-being over a 14-day period (thus containing no cognitive component and being highly time-sensitive). Its items represent positive affect whose absence corresponds to the depression symptoms (negative affect). This is an important advantage of WHO-5 as the subjects are not forced to confess of the presence of any unpleasant and potentially hard-to-admit negative emotions or states. As mentioned above, WHO-5 has been proven effective for the detection of both depression risk [22, 23] and the high levels of well-being[12]. Being a short, sensitive, specific and non-invasive tool, it gains over more detailed, but heavier methods for preliminary depression and suicide risk assessment in settings without psychological/psychiatric expertise. WHO-5 has shown high clinimetric validity and the ability to accurately predict a wide range of mental health conditions, including depression; moreover, it has been recommended as an outcome measure balancing the wanted and unwanted effects of treatments [24]. That is why WHO-5 has been adopted in many research fields such as suicidology, geriatrics, youth and alcohol abuse studies, personality disorder research, and occupational psychology [15, 24]. Thus, WHO-5 and SWLS, being psychometrically sound screening tools with known outcomes, also measure complementary aspects of subjective well-being. Although measures of emotional affect and reported life satisfaction often correlate, substantial divergences have been found. For instance, almost half of the people who rated themselves as ‘completely satisfied’ also reported significant symptoms of anxiety and distress [17]. Therefore, quality of life in the current coronavirus crisis is usually measured with both scales [5, 6, 25–27]: while WHO-5 helps to assess influence of different practices on SWB and the persistence of diminished well-being beyond and during COVID-19, SWLS shows how people feel and how their life perspective changes due to the pandemic. This complementarity indicates the importance of comparative research in prediction of both metrics. This task is novel for SWB prediction with digital traces: despite the advances in detection of specific mental health problems and the attempts to predict some SWB metrics, no research so far has been dedicated to predicting WHO-5 and its comparison with SWLS in terms of digital behavior traces; moreover, most research is limited to English-speaking populations. Best models predicting SWLS with digital traces from social media, search engine and smartphone activity data demonstrate performance below 0.4 in terms of Pearson correlation – a well-known threshold for correlation between psychological characteristics and objective behavior [28, 29] (see also [30, 31] for an overview). None of the models combines language, social media and smartphone usage data. The goal of this study is to predict individual WHO-5 and SWLS levels with a new combination of digital traces in a high-risk Russian-speaking population, to find out which features are the most predictive and what the overall predictive power of our models is. A high-risk population is defined as a population with a higher probability of having problematic levels of SWB, as compared to more general populations. We thus address a completely novel task of comparative prediction of two different aspects of subjective well-being, which should have different objective indicators and suggest different actions to be taken by the user. Additionally, we find out that depression risk in Russian-speaking population can be detected by the level of WHO-5 below a certain threshold as successfully as in the populations for which WHO-5 was tested earlier, and this allows us to predict the threshold as well. To do so, we make use of a sample of 372 psychological application users who have explicitly consented to share their private messages, social media data and mobile device usage traces. We use extensive feature engineering combined with regression and classification modeling, the first type of models being aimed at SWB score prediction, and the second – and depression risk identification based on theoretically justified thresholds. We also check our regression models against newest neural network approaches that, however, do not show sufficient quality at the dataset of our size. The rest of the paper is structured as follows. In the next section we review the existing literature in prediction of SWB and related psychological and mental health phenomena with digital traces. Next, we describe our dataset, our numerous features and the approach to their engineering, as well as the models used. In the Results section we report our best models’ performance and the most useful features. In the Discussion section we interpret our results and indicate the most important limitations. We conclude with the perspectives for future research.

Subjective well-being prediction

Prediction of internal psychological and mental states from objective behavior pattern is a highly difficult task [29, 32]. Additionally, clinically diagnosed mental disorders (such as depression) and mental disorder risks assessed through threshold scores of screening tests (such as WHO-5) are different categories for prediction. While the former may be partially manifest, the latter, along with psychological traits and conditions, are latent constructs. This means that psychological theory does not expect them to fully correlate with any observable patterns since the former are not thought of as reducible to the latter in principle. This may be one of the reasons why such correlation is seldom high, although this is a subject for further research. As both high SWB and the absence of mental disorder symptoms have been shown to be components of mental health [12, 33], prediction of both SWB and mental disorder (or its risk) constitutes two related tasks. However, due to the different nature of SWB and mental disorder as concepts, the former is usually evaluated with continuous predictive models, while the detection of the latter is most often formulated as a classification task.

Detection of mental disorders

A vast amount of studies predict specific mental health conditions with digital traces, mostly with the data from social media, such as Facebook and Twitter. The most widely analyzed conditions of such studies are depression and Post Traumatic Stress Disorder [34-38]. Other conditions include Bipolar Disorder, Anxiety and Social Anxiety Disorder, eating disorders, self-harm and suicide attempt [39-42]. Linguistic features used typically include word n-grams, sentiment, specific lexica (e.g., Linguistic Inquiry & Word Count dictionary, LIWC) and topic modelling, with other features related to social networks, emotions, cognitive styles, user activity and demographics [34–39, 42]. Model evaluation metrics include Area Under the Curve (AUC), Precision, Accuracy of classification, and Correlation for continuous measurements. The results for binary mental health problem identification are high, reaching an AUC of 0.7–0.89, Precision up to 0.85, and Accuracy of 0.69–0.72 [30]. Ground truth information in such studies is obtained from different sources, leading to different quality. Most studies use either self-reported survey data [34, 37] or self-declared mental illness [36, 39]. The latter is prone to errors and bias induced by specific data collection methods. In a recent study Eichstaedt et al. [38] effectively predict depression of Facebook users against medical records information. The authors use a 6-month history of Facebook statuses posted by 683 hospital patients, of whom 114 were diagnosed with depression (rate similar to the general population), and classify depression VS other medical diagnoses with an AUC = 0.72. Features of Facebook statuses include words and word bigrams, temporal characteristics of posting activity, metainformation on post length and frequency, topics and dictionary categories, with interpersonal, emotional and cognitive categories being among the best predictors. The effects of smartphone usage on mental disorders, until very recently, have been mostly studied with self-reported data (see [43, 44] for an overview). Meanwhile, smartphone apps that collect usage data provide an unprecedented opportunity to access objective and precise information on smartphone application usage. Hung et al. [45] find that phone call duration and rhythm patterns are predictive of negative emotions, while Saeb et al. [46] predict depressive symptom severity with geographical location and phone usage frequency information. However, as feature engineering with phone app usage data requires considerable time and effort [47], the potential of such data of psychological research is yet to be discovered.

Prediction of SWB levels

There have been a few studies aimed at predicting subjective well-being levels, mostly with regression, which obtain modest results. Individual and relational well-being was predicted from social network data [28, 48] and from objective smartphone use data [49]. The reported results are close to the upper bound expected in this task: the meta-analytic correlation between digital traces and psychological well-being has been estimated as across nine studies, including prediction of subjective well-being, emotional distress and depression [28]. The only study that reaches a higher correlation of 0.66 in one of the models [49] does not specify the scales used for measuring SWB; however, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively. Diener’s SWLS, to our knowledge, has been predicted in only four studies that use digital traces in a cross-validated setting. In his pioneering study, Kosinski et al. [50] predicted SWLS with linear regression for 2340 Facebook users based on 58K ‘Likes’ – preferences of webpages indicated by the users. The Likes data dimensionality was reduced to top 100 components in a SVD model based on a larger dataset (58K users). The obtained correlation reached , whereas empirical test-retest correlation for SWLS was . Collins et al. [51] predicted SWLS with Random Forest Regression and various Facebook features, including demographics, networking data, photos, likes, ground truth Big Five traits of the users, of their significant others and friends, and predicted Big Five as a proxy. The best result for a sample of 1360 users with Big Five features as a proxy reached the Mean Absolute Error (MAE) = 0.162, whereas the model with social network features produced MAE = 0.173 for SWLS. Unfortunately, no other evaluation metrics were reported in this study. Schwartz et al. [52] applied Ridge Regression to predict SWLS of 2198 individuals using their Facebook statuses. Thousands of linguistic features were extracted from the status texts, including 2000 topics obtained with the Latent Dirichlet Allocation topic modeling algorithm, word uni- and bi-grams, LIWC and sentiment lexica. A message-user level cascaded aggregation model was additionally trained on a disjoint dataset, which allowed to improve regression results from Pearson to . Facebook status data were also used by Chen et al. [53] to predict SWLS of 2612 users. Features included affect measured by sentiment word usage, 2K topics obtained with topic modeling and 66 LIWC categories. After feature selection with Elastic Net regression, Random Forest model was tested for prediction of an unseen subset. The results reach Root-Mean-Square Error RMSE = 1.30 (0.217 when rescaled to ) and . There is a certain number of studies predicting SWB with app usage data. Some of them rely on self-reported measures of app use [54], while others collect objective data [49, 55]. Correlation in David’s model range from 0.31 to 0.66, however, the research does not specify the scales used for measuring SWB. At the same time, interestingly, it finds that while some apps predictably have a negative effect on well-being, others affect it positively. Gao and colleagues [55] report correlation from 0.34 for male users to 0.66 for female users in the task of predicting SWLS, however, they do not report the full feature set and the contribution of each feature in their best models. Instead, they mention that the most predictive variables are communication apps, certain types of games and the frequency of photo taking. None of these studies mentions cross-validation. Overall, although the results of subjective well-being prediction are promising, several gaps in the existing research can be identified. First, WHO-5, which is an effective screening tool for depression risk and subjective well-being, has never been studied in a predictive research design. Second, all the studies predicting SWLS are limited to English-speaking populations and respective linguistic features. Moreover, these works only address Facebook digital traces, including profile, texts and likes. Finally, only scarce feature interpretation is reported in the previous studies, and digital trace manifestations of different well-being dimensions have never been compared.

Our approach

In this study, we set out to predict two different concepts of subjective well-being: one combining affective balance and life satisfaction (measured by SWLS index and further referred to as satisfaction-related SWB) and the other conceptualized as a reflection of mental health (measured by WHO-5 index and further referred to as mental SWB). For predicting well-being values, our task is defined as regression, while for detecting depression risk, we formulate our goal as a binary and trinary classification task. For the latter, we identify the threshold values of WHO-5 by validating them against the scores of the same users on the scales of depression, anxiety and stress, so that the WHO-5 values predicting these scores with the highest sensitivity and specificity are chosen. We perform our prediction of SWB on the texts of private messages, social media and smartphone usage information and perform regression and classification experiments in a cross-validated Machine Learning design. The novelty of the current study lies in the following: We present the first study so far on predicting subjective well-being measured by WHO-5; We find out a close association of WHO-5 thresholds with three scales of mental health which is promising in terms of extending our approach to the task of simultaneous prediction of a range of various mental health risks. We are the first to compare satisfaction-based and mental SWB, analyzing their intersections and differences in terms of predictive features; This is the first study to combine language, social media and phone app usage features in well-being research; To our knowledge, our study is the first to address subjective well-being prediction in a Russian-speaking population and respective data: the Russian social network VKontakte and texts in the Russian language; We use a dataset of a psychological application users, allowing us to predict subjective well-being in real-world conditions for a sample with high mental risks, which has never been done before;

Materials and methods

Dataset

Our dataset was collected in collaboration with Humanteq social analytics company, using its DigitalFreud app (DF) – a Russian-language phone application for psychological self-assessment – promoted among Android-based smartphone users through Google Ads. Android was chosen as the basic operational system for data collection, as at the time of the app development and promotion its users constituted the majority (68–76%) [56] of Russian smartphone users who in turn were the app’s target audience and who constituted 57–64% [57] of Russia’s population. Additionally, the app was available to Russian speakers from any country, and although users from the countries other than Russia constituted the minority, none of the samples we further analyze is intended to be representative of Russia. Data collection via a psychological app of such type was used to access a high-risk population (its high-risk status was confirmed in subsequent comparison of its mean SWB to those in other populations, presented further below). Users were offered to take as many free tests as they wanted (including personality traits, depression, anxiety, stress, cognitive, motivation and SWB tests) and to explicitly consent to the access to their VKontakte profile data and/or smartphone use data. Based on the test results, users were offered psychological feedback and analytics on the use of VKontakte and/or their smartphones. On average, DigitalFreud users chose to fill in 1.5 questionnaires and shared varying subsets of their data, which made the overall dataset quite sparse. Privacy policy included a clause stating that the data could be used for research. The study was approved by the HSE Ethics Committee; nevertheless, the data were anonymized prior to the analysis. No personal information (i.e. allowing to identify the users) was included in the sample. In particular, all the user profile ids were encrypted. The initial sample included 2050 accounts of DigitalFreud users who have completed at least one of the two questionnaires of our interest: SWLS [10] or WHO-5 [58]. The vast majority completed either of the tests only once; for those who did it more than once, the earliest score was taken into our dataset. The following digital traces data were available for the participants: DigitalFreud profile data; VKontakte user data; Phone application data. Due to data sparsity, our final sample used in prediction contains digital traces by 372 users. The procedure of data cleaning that produced this dataset is given in Appendix 1. Thus the dataset is small because the data on well-being combined with personal digital traces is highly difficult to obtain, as it requires both considerable effort from a user on completing the questionnaires, and trust allowing them to share sensitive digital traces. However, our dataset is uniquely tailored to the task of predicting SWB in a high-risk population of mental health app users. Additionally, there is a heldout dataset, which consists of messages written by 572 users, who lack other important features for prediction (demographics, phone app usage) but have text data. The heldout dataset is used for preliminary feature selection (see sections Words, Word clusters below). Before feature selection, texts were tokenized with happiestfuntokenizing1 and lemmatized it with pymorphy [59]. The phone app dataset consists of phone application usage data by 992 users who lack other important features for prediction. The phone app dataset was used for preliminary phone application categorization and feature engineering. We also collected a sub-sample of users (), who have completed the WHO-5 and at least one of the following questionnaires evaluating different mental health risks (mental health dataset): Depression measured with the Patient Health Questionnaire (PHQ-9) [60]; Anxiety measured with the General Anxiety Disorder scale (GAD) [61]; Stress measured with the Perceived Stress Scale (PSS) [62, 63]. The mental health dataset was used in the WHO-5 classification task to select cutoff thresholds of the classes to be predicted, so the former would be representative of a range of mental health conditions.

Self-reported well-being measures

Satisfaction-related well-being scale (SWLS)

The SWLS questionnaire was translated to Russian and validated by Ledovaya et al. [64]. The questionnaire contains 5 statements, each characterized by 7-point Likert scale ranging from 1 (strongly agree) to 7 (strongly disagree). The resulting SWLS score ranges from 5 (low satisfaction) to 35 (high satisfaction). The scale has good internal consistency: α coefficients ranging from 0.79 to 0.89. Test-retest coefficient, as already mentioned, ranges from 0.54 to 0.84 depending on the time lag between measurements (years or weeks, respectively) [21] and amounts to 0.78 in the Russian language version[64]. In our sample, 1727 accounts have information about the SWLS score.

Mental well-being scale (WHO-5)

We use the official Russian-language version of WHO-5 scale developed by WHO itself [58]. Each of WHO-5 items is scored on a 6-point Likert scale ranging from 0 (at no time) to 5 (all of the time). The WHO-5 score ranges from 5 (absence of well-being) to 30 (maximal well-being).The scale has good Internal consistency: α coefficients ranging from 0.82 to 0.95 [13]. Test-retest coefficients are available for specific populations only and only in the short run ranging from 0.81 to 0.83 [65, 66]. In our sample, 1791 accounts have information about the WHO-5 score.

Mental well-being classes

As mentioned earlier, WHO-5, unlike SWLS, is indicative of a range of mental health conditions [24] and was directly designed to detect one of them [11]. Decisions of mental health, be it screening test results or medical diagnoses, are usually binary and point either at the absence or the presence of a disease. For such tasks scales need to be transformed into sets of discrete classes based on a certain threshold values. Such validated values exist for the original English-language WHO-5 scale (0.28 for major depression and 0.5 for depression). They are recommended for all nations and languages, but in fact have never been tested for the Russian-language population. Meanwhile, it has been shown that cultural differences matter in scale construction [67] and that, specifically, they complicate both mean WHO-5 comparison and threshold comparison across countries [15]. Therefore, we validated several thresholds ourselves. For this, we analyzed the mental health dataset of 417 DigitalFreud users who have completed both WHO-5 and one of the three questionnaires – on depression, anxiety and stress – and found the values of WHO-5 index best predictive of the classes of these three scales. This approach was our choice for two reasons: We tried out different WHO-5 thresholds to reach better sensitivity and specificity in representing the following conditions: PHQ/GAD ≥ 10 for depression and anxiety [68], and PSS ≥ 21 for stress [63]. Additionally, as from our earlier work [69] we know that classes derived from scale reduction might be better predicted in a trinary design in social science NLP tasks, we also experimented with three-class divisions. the data on clinically diagnosed depression are absent from our dataset; the three mentioned scales were validated for the Russian language and thus have been used here as the best available benchmarks. Eventually, our analysis resulted in the following cutoff values of the normalized WHO-5 scale: Table 1 illustrates sample statistics for each of the mental health conditions, and specificity and sensitivity in terms of the selected WHO-5 cutoff values.
Table 1

Specificity and sensitivity of the selected WHO-5 cutoff values in the mental health dataset

ConditionN (mental health dataset)MetricBinary cutoff (0.51)Lower trinary cutoff (0.35)Upper trinary cutoff (0.59)
Depression344Sensitivity0.800.490.90
Specificity0.580.870.45
Anxiety309Sensitivity0.820.530.92
Specificity0.540.830.41
Stress323Sensitivity0.850.470.93
Specificity0.660.880.50
Binary cutoff = 0.51 with classes containing 221 and 151 users in the low and high SWB classes, respectively; Trinary cutoffs with classes containing 111, 158 and 103 users in the low, medium and high SWB classes. Specificity and sensitivity of the selected WHO-5 cutoff values in the mental health dataset In our high-risk sample of mental health app users, the binary WHO-5 cutoff value 0.51 allows to reach high sensitivity across the analyzed mental health conditions, while preserving moderate specificity. The trinary cutoff values 0.35 and 0.59 allow to obtain low and high mental well-being classes with very high specificity.

Digital traces

DigitalFreud profile

Account information about the DigitalFreud user includes encrypted DigitalFreud and VKontakte user ids, SWLS and WHO-5 scores, gender, birth year, education, employment and marital status, and date and time of the DigitalFreud app installation.

VKontakte user information

Humanteq chooses to match DigitalFreud data with VKontakte data since the latter is the most popular social networking site in Russia. We use the following data obtained with VKontakte application programming interface (API): User Profile data. Although VKontakte API provides access to potentially rich user information, in practice users seldom fill in their profiles, and the data is sparse. As a result, we only use gender, birthdate, and the number of friends and subscriptions in our analysis. Wall posts (text, date and time, information on reposting with the original post contents and encrypted user id, number of reposts, comments and likes) available for 1871 users. Directed private messages (text, date and time, encrypted author and addressee ids) available for 1044 users.

Phone application usage

Phone application usage was monitored for one week following the initial consent obtained from the user when she started using DigitalFreud, which was consistent both with the app’s terms of use and the policies of the Android platform. The collected information includes name and package of the application, start time and duration of the application usage in foreground in milliseconds. It is available for 992 users. In a few cases when the users quit the phone app data sharing before the end of the week, the recorded period was shorter.

Descriptive statistics

The main parameters of the descriptive statistics for our final dataset of 372 users are given in Tables 2 and 3. Our dataset is predictably skewed towards containing more females (80%) and young people (mean age 23 ± 5 y.o.) against 53% of females and the mean age of 39 y.o. in the general Russian population [70]. However, as it has been mentioned, this sample is not theoretically intended to represent Russia. Consistent with Collins et al [51], we normalize both well-being scores to the ranges between ; to do so, we subtract 5 from both scores, then multiply SWLS values by 1/30, and WHO-5 values by 1/25. Additionally, the distribution of the SWB and demographic data in the final dataset is illustrated in Appendix 2, Figs. 1–4.
Table 2

Descriptive statistics for subjective well-being, age and gender in the final dataset

NRangeMeanStdMean (norm)Std (norm)Cronbach’s α
SLWS3725–3518.306.730.44330.22430.8365
WHO-55–3016.514.660.46040.18650.8205
Age18–5323.065.06
GenderMale, Female298 (80%) Female
Table 3

Descriptive statistics for the textual and phone app usage features in the final dataset

DataSumMeanMedianMinMax
Messages6739K18,11510,948.552131,368
Message alters53K143107.521029
Message volume (chars)160,707K432,009240,8316712,983,231
Posts7K19401880
Post volume (chars)857K230384087,708
App Usage (seconds)1573K42313715.52416,329
Figure 1

Distribution of SWSL values

Figure 4

Distribution of Gender values

Distribution of SWSL values Descriptive statistics for subjective well-being, age and gender in the final dataset Descriptive statistics for the textual and phone app usage features in the final dataset SWLS and WHO-5 intercorrelate strongly with , . The level of internal consistency of both scales is high (Cronbach’s ). Both SWB scores in our final sample are consistently lower than in other studies made on other groups of Russians. Thus, WHO-5 score amounts to the average of 0.46 ± 0.187 in our dataset against 0.60 ± 0.191 obtained in a study of Russian Facebook users [71], the only available evaluation of WHO-5 for Russia. Likewise, while the mean SWLS score among our participants is 18.3, a study on a sample close to the general Russian population (mean age 41 y.o. with 54% of women) shows the score of 23.6 [72]. A younger group of Russian students (mean age 20 with 65% of women) which is more similar to our sample scores even higher: 24.4 [73]. The lower SWB levels in our dataset are explained by self-selection of specific individuals to the DigitalFreud app: it naturally attracts users interested in seeking psychological and mental health information and advice, i.e., potentially more likely to have problematic mental health conditions. This is in line with our research goal of studying high-risk populations, of which our sample is an obvious example exactly due to the lower SWB scores.

Feature engineering

For our task of SWLS and WHO-5 prediction, we construct features of three main types: User metadata and overall activity: demographics, DigitalFreud & VKontakte profile statistics, and overall phone app usage statistics; Textual, or linguistic features: Words; Sentiment scores; RuLIWC; Word clusters; Phone app usage statistics by app category. Overall, we constructed 660 features for SWLS and 651 for WHO-5. Most features were calculated as counts, ratios or counts by time period directly from the final dataset. However, words and word clusters as features were trained on the heldout dataset that does not intersect with the final dataset. Of these features, only those that correlated with the target variables were selected for the main experiments. In the main experiments, the features were submitted to the regression or classification models, which performed on the final dataset that was divided into train, development and test subsets in a 10-fold cross-validation scenario. In this scenario, (1) multiple models were trained on the train set, (2) recursive feature elimination was performed on the development set based on MAE of the models, and (3) final scores for each feature type and each model were computed based on the test set. More details on the main experiment procedure are given in the Machine Learning Experiments section.

User metadata and overall activity features

There are 40 features describing demographics, overall phone application usage data and the data on the overall activity patterns based on DigitalFreud and VKontakte profiles (see Table 4). The activity-related data include three groups of features: (1) numbers and volumes of personal messages written during one month preceding test completion, (2) numbers of alters, or accounts that a user has a message history with, for every user in each of the 12 months preceding test completion, and (3) weighted differences between the last two months in terms of the message volume and the number of alters. In building phone app usage features, we follow the previous research [74, 75] which identified three- and six-hour periods of online activity to be significant markers of mental illness. In our research, we break phone app usage into three-hour periods of activity. Some features have been excluded from the analysis, due to data saprsity.
Table 4

User metadata and overall activity features

Feature nameDescriptionNumber
Age1
Gender1
NVkFriendsNo of friends in VKontakte1
AllAltersNo of alters (accounts that a user has a message history with) in the last 12 months1
SubscriptionsNo of VKontakte page subscriptions1
Mess_ 1Total number of messages written in the last 30 days1
MessChars_ 1Total size (in characters) of messages written in the last 30 days1
growth-2to-1weightedWeighted difference between total size of messages written in the months −1 and −21
altersdiffWeighted difference between numbers of alters in the months −1 and −21
AppUsage1WeekNumber of active app usage instances in the period of app data sharing time (one week)1
AllAppTime1WeekTotal time of phone app usage in the period of app data sharing time (in seconds)1
RatioAppTime1WeekRatio of phone app usage time in the week of app data sharing time1
AppUsage 0–3, 3–6, 6–9, 9–12, 12–15, 15–18, 18–21, 21–24Time of phone app usage in 3-hour time periods – each out of the 8 features represents a 3-hour time period8
AppUsage 0–3, 3–6, 6–9, 9–12, 12–15, 15–18, 18–21, 21–24 RatioTime of phone app usage in 3-hour time periods normalized by total app usage time – each out of the 8 features represents a 3-hour time period8
Alters −1–−12Numbers of alters in every month (30 days) before the DigitalFreud install time, for months between −1 and −1212
Total40
User metadata and overall activity features

Linguistic features

Our extensive analysis of user texts has shown that VKontakte public wall posts are too sparse and include mostly web link content, which does not allow for effective prediction. As a result, we construct all the linguistic features based on private messages written by the users in VKontakte messenger, mostly during one year preceding the installation of DigitalFreud app.

Sentiment scores

We use six features representing the proportions of positive and of negative words in the messages created during one month or one year preceding test participation, or in the entire messaging history of a user. Each feature represents the proportion, or l1-normalized frequency, of positive or negative sentiment words written in one of the three time periods (which results in features). The sentiment words were identified with a closed-vocabulary approach based on the Russian sentiment lexicon RuSentiLex [76].

Words

We adopt the open-vocabulary approach to word features predictive of well-being [77]. Given the small size of our final dataset (372 observations), using all the frequent words as features (12K words with frequency ≥ 200) would inevitably result in overfitting. To overcome this and to select a reasonable number of interpretable features, we use the heldout dataset as follows: First, a sub-sample of users who have filled both well-being questionnaires was selected from the heldout dataset (396 users); Next, we selected 12.5K words occurring more than 200 times in the joint one-year long message collection of all users and calculated their TfIDF scores using 396 individual message collections as 396 texts for such calculation; We filtered out words with in the ANOVA tests relating these words to SWLS and WHO-5 values in the heldout dataset, which has resulted in the selection of 165 words for SWLS and 224 words for WHO-5 (see Appendix 3 for the full list). Words belonging to either of these sets (353 words) are used as features for prediction.

RuLIWC

For obtaining closed-vocabulary features, we used RuLIWC dictionary – a translation of the most prominent categories of the Linguistic Inquiry and Word Count (LIWC, [78]) performed by Panicheva & Litvinova [79]. RuLIWC consists of eight word categories: Bio, Cognitive, Social, Time, Percept and subcategories of the latter: Feel, Hear, See, with 563–2624 words in each category and 20–303 words in each subcategory. For this research, RuLIWC feature values have been computed as the sums of all the words’ TfIDF values for every user. All the words regardless of their (in)frequency were accounted for.

Word clusters

Content features were computed by clustering words with a word2vec semantic model [80] based on the heldout dataset. The word2vec model we used had been trained on the web-based Taiga corpus containing over 5 billion words [81] by Kutuzov & Kuzmenko [82], with skipgram algorithm, vector dimensionality = 300, and window size = 2. For clustering, we used 7128 words present in the model vocabulary with frequency ≥ 200 in the heldout dataset. Next, we performed KMeans clustering with cosine distance and 300 clusters. As KMeans algorithm is stochastic and may give very different results in different runs, we used the following procedure to obtain reproducible cluster solutions: We employed cluster regularization, where the regularization parameter was the sum of p-values of the cluster occurrence correlation with SWLS or WHO-5;2 the regularization weights were ; For every weight value, ten random cluster solutions were obtained; Based on these solutions, consensus cluster solutions were constructed3 with the following thresholds: ; This resulted in five consensus cluster solutions for every weight value, thus the overall number of solutions totaling to 20. In each solution, clusters were additionally augmented with infrequent words in the dataset, every infrequent word being ascribed to the closest cluster. Thus each of 20 solutions was supplemented by a paired solution with augmented clusters. The clustering results were evaluated on the heldout dataset as follows: For every cluster solution, only the clusters that correlated with with SWLS or WHO-5 were used as features; Each cluster feature was computed as the sum of the respective words’ TfIDF values; The resulting features were used for RandomForest regression predicting SWLS and WHO-5 on the heldout dataset, with 10-fold train/test cross-validation and recursive feature elimination; The best cluster features were chosen by Mean Average Error (MAE) of the regression models trained on the heldout dataset; later they were used for prediction on the final dataset. The main parameters of the resulting feature sets are described in Table 5.
Table 5

Best word cluster features

Regularization weightConsensus clustering thresholdInfrequent wordsNo of clustersMAE
SWLS5000.45280.1704
WHO-500.45 +190.1525
Best word cluster features

Phone app categories and usage features

The phone app categories and usage features are based on the 1-week phone app usage history shared by the participants. App categories, or types were obtained from the phone app dataset data by using 53 app categories generated automatically from 28K app descriptions and by manually uniting them into larger groups as described in [47, 49]. As a result, we identified the following nine app categories: Game, Education+Productivity, Tools, Entertainment, Personalization, Health+Medical, Social+Communication+Dating, Photography, covering 21.5K apps, with the rest 6.5K apps having been assigned to Other. The main app usage features were calculated as the total time devoted to a certain app category (e.g. Game, Photography or Other) in each of eight three-hour time slots of a day, averaged over all days of a given user ( features), as well as overall time spent for this category in the entire app usage history of an individual (9 features). Next, we constructed several normalized versions of each feature. Namely, we normalized them by the total app usage time in this category, and by the total app usage logged in the current three-hour period. This resulted in features. The phone app category features are exemplified in Table 6.
Table 6

Phone app category features

Feature typeNo of featuresExample feature nameDescription
Total time logged in category by a user9GAMETotal time logged in Game apps by a user
Total time logged in category in time period by a user72GAME_21-24Total time logged in Game apps between 21 and 24 h by a user
Total time logged in category in time period/total time logged in category by a user72PHOTOGRAPHY_0-3/PHOTOGRAPHYRatio of time logged in Photography apps between 0 and 3 AM to total time logged in Photography apps by a user
Total time logged in category in time period/total time logged in time period by a user72EDUCATION + PRODUCTIVITY_15-18/15-18Ratio of time logged in Education+Productivity apps between 15 and 18 h AM to total time logged in apps between 15 and 18 h AM by a user
Phone app category features

Machine learning experiments

We performed specific experiments for each of our two subtasks: prediction of satisfaction-related and mental well-being scales and prediction of the classes in the latter. As we aimed at interpretable results, our main experiments were based on classical regressions. Simultaneously, to make sure that we obtain the best possible prediction quality with the available contemporary methods, we also carried out extensive experiments employing deep learning approaches (described in Appendix 4). However, they yielded inferior results. The two main possible reasons for that are the following (1) our data are hard to obtain, and the obtained data are sparse and loosely intersect between users, which reduces the sample significantly; (2) our message data is hierarchically organized, with numerous alters with whom every participant communicates and numerous messages sent to every alter, while additionally the number of alters and messages highly varies between the participants/alters (see Table 3 above). Our experiment on prediction of SWLS and WHO-5 scales was performed using a 10-fold cross-validation design with train, development and test sets (298/37/37 users, 80/10/10%). The non-overlapping train, development and test sets were constructed as follows: The sample was shuffled and sorted by the well-being values; The sorted sample was divided into 10 bins containing 37 users each so that consisted of users with , where K varied in the range . Thus every bin was equally distributed in terms of the SWB values. For ith cross-validation fold, was used as the test set, – as the dev set, and the remaining users belonged to the training set. Our evaluation metrics for regression include Mean Absolute Error (MAE), Pearson r and R2-score. Hyperparameter values were chosen inside the cross-validation loop based on the results obtained from development by MAE values. Recursive Feature Elimination (RFE) was performed based on the development set to identify the informative features in each cross-validation fold. RFE was adopted based on the earlier experiments which had shown the increase in model performance with RFE. Additionally, RFE allows to select a small number of informative features, improving the model interpretability. The selected best hyperparameters and features were used to evaluate the quality of prediction on the test set inside the cross-validation loop. In the end, the evaluation metrics were averaged across all 10 folds. Predictions of SWLS and WHO-5 scores were performed with seven regression models, including Linear Regression with various regularization techniques, Decision Tree, and two ensemble methods (see Appendix 5). WHO-5 classification was performed with three classification models based on our preliminary experiments (Appendix 6). Classification of individual WHO-5 levels was performed in a binary mode with two classes (low VS high well-being) and in a trinary mode with three classes (low VS medium VS extremely high). The models and hyperparameter values are described in Appendix 6. We report F1-macro and F1-weighted metrics over all the classes, as well as F1 metric for the lowest and the highest classes separately. We additionally report True Positive and False Positive Rates for the low well-being class, as these measures are typically used for screening test of various mental health conditions (cf. [38]). All the calculations were performed in python with pandas, scipy, and scikit-learn libraries.

Results

Prediction of well-being scale values

The continuous modeling results for the SWLS and WHO-5 well-being values are presented in Tables 7 and 8, respectively.
Table 7

SWLS value prediction results

FeaturesBest modelResults
MAEPearson RR-2
Mean baseline0.1853
Median baseline0.185
WordsElasticNet0.17440.34020.1022
RuLIWCDecisionTree0.1820.21680.0142
AppCatsElasticNet0.17620.27370.0172
BehaviorDecisionTree0.17850.1910.0195
ClustersRandomForest0.18140.17090.026
Clusters + AppCats + Behavior + WordsElasticNet0.16980.40240.1045
Clusters + AppCats + RuLIWC + Behavior + WordsElasticNet0.16810.37760.1164
Table 8

WHO-5 value prediction results

FeaturesBest modelResults
MAEPearson RR-2
Mean baseline0.1542
Median baseline0.1533
WordsLasso0.14410.31790.0817
RuLIWCLasso0.15290.12760.0197
AppCatsElasticNet0.15110.21720.0329
BehaviorDecisionTree0.14970.24630.0096
ClustersLasso0.15160.15330.0241
Clusters + RuLIWC + WordsAdaBoost0.14360.32020.081
AppCats + RuLIWC + Behavior + WordsElasticNet0.14380.3670.1193
SWLS value prediction results WHO-5 value prediction results The results for every individual feature set, and for the best feature sets in terms of every evaluation metric are included; the best results are highlighted in bold. The full results for all the feature set combinations are presented in Appendices 7, 8. Overall, the best feature set is words written by the users in messages, and the best model is ElasticNet.

Prediction of WHO-5 classes

The main classification results for the WHO-5 well-being are presented in Table 9. The full WHO-5 classification results are presented in Appendix 9.
Table 9

Best WHO-5 classification results

Classifi cationThre-sholdN (Classes)Best modelBest featuresF1-macroF1-weigh-tedF1-lowF1-highTrue Positive Rate (low)False Positive Rate (low)
Binary0.51221/151Ada-BoostWords + RuLIWC + AppCats0.6920.7060.7680.6160.7920.404
Binary majority baseline0.3780.4560.37301.01.0
Trinary0.35/0.59111/158/103Ada-BoostClusters + RuLIWC + Words0.4830.4930.5020.4330.4500.161
Trinary majority baseline0.1990.2530.00.0
Best WHO-5 classification results

Significant features

The features in the best performing continuous models of satisfaction-related well-being (SWLS) and mental well-being (WHO-5) scales are illustrated in Tables 10 and 11. Only the features which were selected by RFE in at least five out of ten cross-validation folders are included; the features significant in both SWLS and WHO-5 regression are highlighted in bold. All the significant features are listed in Appendices 10, 11.
Table 10

Predictive features in SWLS scale. Slang, misspellings and unconventional word forms are shown with an asterisk (*). Errors in lemmatization are enclosed in brackets

Feature typeFeatureTranslation/DescriptionCoefficient
Wordsспать_[NOUN]sleep_VERB41,086
интим_NOUNintimacy_NOUN (suggestive of ‘intercourse’)−44,937
орг_NOUN*org(aniser)_NOUN23,978
дропнуть_VERB*quit_VERB−64,677
тратиться_VERBspend_VERB−24,593
отл_UNKN*fine_UNKN34,184
пояснение_NOUNexplanation_NOUN−22,499
стебать_VERB*bully_VERB (rude)−28,898
[вифя]_NOUN*wifi_NOUN−48,114
спойлерить_VERB*spoil_VERB−48,530
ооохнуть_VERB*gasp_VERB−44,864
милый_COMPnice_COMPARATIVE56,128
[пиздёжа]_NOUN*lie_NOUN (rude)−22,727
обжечь_VERBburn_VERB−40,019
SentimentNegative_monthnegative sentiment in the last month−29
ActivityAppUsage9-12RatioRatio of phone app usage time between 9 and 12 AM normalized by total app usage time10
AppUsage0-3RatioRatio of phone app usage time between 0 and 3 AM normalized by total app usage time−8
AppCatsSOCIAL + COMMUNICATION + DATING_0-3/SOCIAL + COMMUNICATION + DATINGRatio of time logged in Social + Communication + Dating apps between 0 and 3 AM to total time logged in Social + Communication + Dating apps11
PHOTOGRAPHY_18-21/18-21Ratio of time logged in Photography apps between 18 and 21 h PM to total time logged in apps between 18 and 21 h PM8
Table 11

Predictive features in WHO-5 scale

Feature typeFeatureTranslation/DescriptionCoefficient
AppCatsGAME_3-6/GAMERatio of time logged in Game apps between 3 and 6 h AM to total time logged in Game apps−5
ENTERTAINMENT_3-6/ENTERTAINMENTRatio of time logged in Entertainment apps between 3 and 6 h AM to total time logged in Entertainment apps4
HEALTH+MEDICAL_3-6/HEALTH+MEDICALRatio of time logged in Health + Medical apps between 3 and 6 h AM to total time logged in Health + Medical apps3
PERSONALIZATION_0-3/0-3Ratio of time logged in Personalization apps between 0 and 3 h AM to total time logged in apps between 0 and 3 h AM−4
EDUCATION + PRODUCTIVITY_9-12/EDUCATION + PRODUCTIVITYRatio of time logged in Education + Productivity apps between 9 and 12 h AM to total time logged in Education + Productivity apps−3
TOOLS_18-21/18-21Ratio of time logged in Tools apps between 18 and 21 h PM to total time logged in apps between 18 and 21 h PM−3
SOCIAL + COMMUNICATION + DATING_3-6/SOCIAL + COMMUNICATION + DATINGRatio of time logged in Social + Communication + Dating apps between 3 and 6 AM to total time logged in Social + Communication + Dating app7
GAME_9-12/GAMERatio of time logged in Game apps between 9 and 12 h AM to total time logged in Game apps2
OTHER_3-6/OTHERRatio of time logged in Other apps between 3 and 6 h AM to total time logged in Other apps−2
ENTERTAINMENT_9-12/ENTERTAINMENTRatio of time logged in Entertainment apps between 9 and 12 h AM to total time logged in Entertainment apps2
PHOTOGRAPHY_0-3/PHOTOGRAPHYRatio of time logged in Photography apps between 0 and 3 h AM to total time logged in Photography apps−2
EDUCATION + PRODUCTIVITY_21-24/EDUCATION + PRODUCTIVITYRatio of time logged in Education + Productivity apps between 21 and 24 h PM to total time logged in ducation + Productivity apps−2
RuLIWCBio_RuLIWCWords related to Biological processes in RuLIWC−20
Words(face-blowing-a-kiss_emoji)_UNKN(face-blowing-a-kiss_emoji)35
но_CONJbut_CONJ−16
ActivityAppUsage9-12RatioRatio of phone app usage time between 9 and 12 AM normalized by total app usage time7
SentimentNegative_monthnegative sentiment in the last month−33
Negative_yearnegative sentiment in the last year−29
Negative_allnegative sentiment in overall messages−23
Predictive features in SWLS scale. Slang, misspellings and unconventional word forms are shown with an asterisk (*). Errors in lemmatization are enclosed in brackets Predictive features in WHO-5 scale

Discussion

In this paper, we have introduced a novel task of predicting mental well-being measured by WHO-5 index, as compared to traditionally studied satisfaction-related SWLS, with digital traces, and performed it in both continuous modeling and classification designs. In the latter, we have shown that the selected WHO-5 thresholds are representative of a range of three mental well-being-related conditions (depression, anxiety and stress) with high sensitivity and specificity. Furthermore, the results obtained in mental well-being classification are highly promising (0.792 True Positive Rate and 0.404 False Positive Rate) in the binary task with our highly sensitive threshold. This threshold is very close to the one recommended by WHO for moderate depression screening (0.51 against 0.50). The classification result itself is similar to the performance of the best existing models that predict other mental conditions with digital traces [30, 38]. Likewise, our results of SWLS and WHO-5 scale prediction, with Pearson and 0.367, respectively, improve the state-of-the-art metrics reported previously in similar tasks with cross-validation designs [51, 53]. Since, as mentioned earlier, prediction of internal states with observable behaviors has its limitations [29, 30], the obtained correlation may be considered high. As a result, we obtain a model which is highly sensitive and sufficiently specific for identifying low levels of subjective well-being requiring intervention in a high-risk population of mental health application users. Our model is unique not only in its accurate prediction of WHO-5 classes that have a proven ability of depression risk detection, but also in its potential to develop into a tool for broader screening for mental health risks, not limited to specific conditions reported in previous studies (see [28, 30, 48] for an overview). We have performed a unique comparison of regression models predicting both SWLS and WHO-5 indices on the same sample. Our best models for both indices show similar performance in terms of correlation and R2 metrics, but WHO-5 is predicted better in terms of MAE across all feature combinations; however, this is likely an outcome of different distributions of SWLS and WHO-5 in our sample (see Fig. 1, 2, Table 1 above).
Figure 2

Distribution of WHO-5 values

Distribution of WHO-5 values Distribution of Age values Our design also allows us to compare the features predictive of life satisfaction-related SWB and mental SWB. Although our experiments have revealed only two highly predictive features that are common for both SWLS and WHO-5, they are highly interpretable in terms of psychological theory. These two metrics are (1) phone app usage time between 9 and 12 AM normalized by total app usage time, and (2) negative sentiment expressed in private messages in the last month, which have positive and negative coefficients, respectively, in both SWLS and WHO-5 tasks. Both of these findings confirm previous results obtained in various populations: participants affected by depression and other low SWB conditions have been found less likely than average individuals to participate in online activities in the morning hours around 9–10 AM [74, 75], while their circadian rhythms have been often disrupted [7]. Such disruption is what usually accompanies insomnia or hypersomnia, a symptom of the major depressive disorder listed in DSM-5 [83], the Diagnostic and Statistical Manual of Mental Disorders developed by the American Psychological Association. Negative sentiment has been shown to correlate negatively with life satisfaction [34, 53, 84] and subjective well-being [71]. Negative sentiment in written or oral speech may also sometimes, although not always, be a manifestation of depressed mood, another symptom of depressive disorder according to DMS-5. Thus, these two highly predictive features intersecting in both SWLS and WHO-5 prediction models can indicate different degrees of SWB: from simple dissatisfaction with life, circumstances or personal achievements (relevant for SWLS), to a deterioration in mental or physical condition and serious symptoms of the depressive spectrum (relevant for WHO-5). They can be recommended for use across various SWB-prediction tasks. Predictors unique for satisfaction-related well-being are much more dominated by verbal features related to affect-laden psychological and social content. They are often obscene lexemes, but also represent both negative and positive sentiment polarities (quit_VERB, spend_ VERB, fine_UNKN, explanation_NOUN, bully_VERB, spoil_VERB, gasp_ VERB, nice_COMPARATIVE). Association of positive lexica with SWB is consistent with Weismayer [85], who also finds negative relation of SWB with lexica expressing anger and fear. Some of our predictive words are likely to express these emotions (e.g. bully [rude], burn, lie [rude], gasp). Also, these lexica fit well with some of the ontologies developed for depression detection [45]. Prevalence of lexical features among SWLS predictors suggests that this index, indeed, captures subjective perception of well-being rather than symptoms of mental disorders, such as depression. On the contrary, in mental well-being level prediction, phone app usage features take a clear lead, especially those related to the ratio of nighttime app usage (3–6 AM). Additionally, lexica related to biological processes are also a distinctive marker of low WHO-5 levels. All this aligns well with the primary goal of WHO-5 to reveal depression and its proved ability to differentiate between problematic mental health states and high levels of mental health-related well-being. Specifically, app usage rhythms and biological lexica are likely to be manifestations of such depression symptoms as increase or decrease in either weight or appetite, insomnia or hypersomnia, and fatigue or loss of energy [86]. At the same time, they can be markers of a poor physical condition, which is also detected by WHO-5 [18]. Finally, the significance of negative sentiment in the long periods of messaging (1 year and longer) for WHO-5 levels suggests that mental SWB measured by this index might in fact have a more stable behavioral pattern than SWLS. However, there is also a possibility that the stable component of SWLS is underrepresented in our features or subjects. Simultaneously, it may be that not only SWLS (as shown in [21]), but also WHO-5 contains both stable and transient components that may be explained by different factors. While the temporal stability of SWB may be expected to be related to constant individual features, such as presence of a chronic disease, SWB volatility, on the contrary, should be explained by short-term mood fluctuations and long-term meaningful changes in life, such as those listed in the introduction. Individual predictors of SWB stability and volatility may differ for SWLS and WHO-5, and it may happen that in our sample the feature set is skewed in favor of WHO-5 stability factors. In any case, our analysis of the overlapping and the differing predictors for WHO-5 and SWLS shows that satisfaction-related SWB and mental SWB share some of their transient factors rather than stable ones. These preliminary observations of the temporal dimension of SWB set a promising direction for future research.

Conclusions

The growing interest in tracking human mental states and in the development of mindfulness leads to the growth of applications that screen or even diagnose mental conditions and offer solutions for their correction, including those based on objective data. Our research has shown that it is possible to create machine learning models based on interpretable traits and predict various aspects of subjective well-being at the state-of-the-art level. In doing so, we have performed the first study on predicting subjective well-being measured by WHO-5. We have demonstrated that certain WHO-5 level thresholds are indicative of a range of mental health conditions prevalent in a sample characterized by high risk of mental health problems. We have obtained promising results in classification of mental SWB into classes constructed based on these thresholds. This approach has allowed us to identify individuals affected by low subjective well-being with high recall and reasonable false positive rates, based on their digital traces. Our study is also the first to compare prediction performance and predictive features of mental SWB and satisfaction-related SWB. We show that several predictors are shared by well-being measured by both WHO-5 and SWLS, and these digital traces are bluntly indicative of overall (un)well-being. At the same time, digital traces distinguishing between WHO-5 and SWLS are closely related to the conceptual difference between these two indices: while SWLS is characterized by expressions denoting affect-laden psychological and social content, WHO-5 levels are manifested in objective features reflecting physiological functioning and somatic conditions, i.e., lexica related to biological processes and circadian rhythm-related ratios of phone app usage. To our knowledge, this is the first approach to subjective well-being prediction in a Russian-speaking population, and the first to combine language, social network and phone app usage features in well-being research. By leveraging phone app usage logs, profile and message data from the Russian social network VKontakte, we have been able to improve prediction of satisfaction-related SWB (SWLS) and propose a first predictive model for mental SWB (WHO-5). At the same time, as our sample has been very small and limited to a high-risk population, the study needs replication on larger samples representative of wider social and psychological groups. The major obstacle to this is that VKontakte private message data are no longer available for any type of download, while other social media are even more restrictive. Development of public policies and regulations encouraging private data-collecting companies to share portions of their data for public good purposes is highly recommended.
Table 12

Distribution of the most common cities identified in the overall data sample

CityPercents
Moscow47.4
St. Petersburg36.9
Yekaterinburg8
Kazan6.2
Minsk5.7
Chelyabinsk5.7
Novosibirsk5.7
Nizhny Novgorod5
Krasnodar4.7
Rostov-on-Don4.2
Table 13

Distribution of the most common cities identified in the final dataset

CityPercents
Moscow41.6
St. Petersburg31.9
Yekaterinburg8
Nizhny Novgorod5.3
Voronezh4.4
Chelyabinsk4.4
Vladivostok4.4
Tyumen3.5
Kirov3.5
Yaroslavl3.5
Table 14

Total list of words used as features for the SWLS and WHO-5 prediction

SWLSWHO-5
1000_NUMB!_PNCT
22_NUMB2000_NUMB
https://ru.wikipedia.org/wiki/_LATN2500_NUMB
t_LATNr_LATN
адрес_NOUNаааа_NOUN
апрель_ NOUNааааа_NOUN
ахуесть_VERBааааааааааааааааа_NOUN
ахуеть_ VERBадрес_NOUN
ахуй_NOUNанимешник_NOUN
бабочка_NOUNарми_ NOUN
баня_NOUNахахахи_NOUN
бгод_NOUNахахахха_NOUN
бесить_ VERBбайка_ NOUN
бланк_NOUNбантан_ NOUN
бля_INTJблестеть_ VERB
блядь_INTJблч_UNKN
блятба_ NOUNблять_NOUN
блять_NOUNбляяяяять_VERB
бляять_ GRNDбляяяяяять_GRND
большой_ADJборис_NOUN
борис_NOUNбудто_CONJ
будто_CONJваля_NOUN
бухать_ GRNDвежливый_ADJ
василий_NOUNвифя_NOUN
ващий_ADJвообще_ ADV
вечно_ADVвоооот_ NOUN
водный_ADJвоскресение_ NOUN
воскресение_NOUNвпервые_ADV
впустить_VERBвпустить_VERB
выглянуть_VERBвскрыться_VERB
графика_NOUNвыпилиться_VERB
грубый_ADJвыставить_VERB
даж_UNKNвыступить_VERB
делаться_VERBглупенький_ADJ
день_NOUNгоре_NOUN
добрый_ADJгуглить_VERB
договориться_VERBдаун_NOUN
долбиться_VERBдельфин_NOUN
е_NOUNдемон_NOUN
ебал_NOUNдерьмо_NOUN
ебануться_VERBджон_NOUN
ебать_VERBджуна_NOUN
еби_UNKNдилемма_NOUN
еблана_ NOUNдобровольно_ADV
ебу_UNKNдобрый_ADJ
ет_UNKNдоказательство_NOUN
жарко_ADVдразнить_VERB
жить_VERBдропнуть_VERB
заебал_ NOUNебаный_ADJ
заебок_ NOUNебу_UNKN
закрыться_VERBет_ UNKN
замечание_NOUNж_CONJ
запасный_ADJжестокий_ADJ
запрещать_VERBживотный_ADJ
знач_NOUNзагуглила_NOUN
именно_ PRCLзаезжать_VERB
комиссия_NOUNзаехать_VERB
корея_NOUNзамуж_ADV
кофеёк_ NOUNзапереть_VERB
критерий_NOUNзаржать_VERB
лана_NOUNзасиживаться_VERB
лариса_ NOUNзвонить_VERB
лень_NOUNзвёздочка_NOUN
ложь_NOUNинглихой_COMP
лях_NOUNинтим_NOUN
маман_NOUNистинный_ADJ
мамаша_ NOUNкак_CONJ
маркетинг_NOUNкальян_NOUN
маркус_ NOUNкамбэк_NOUN
милах_NOUNкапёс_NOUN
мразь_NOUNкб_NOUN
мудак_NOUNколь_CONJ
мёд_NOUNкомикс_NOUN
набрать_VERBкореец_NOUN
научный_ADJкорея_NOUN
нах_UNKNкосплей_NOUN
нахуй_NOUNкпоп_NOUN
нееет_UNKNладить_ VERB
ненавидеть_VERBлисточек_NOUN
несмотря_PREPлосиный_ADJ
неудобный_ADJмагнитный_ADJ
никто_NPROмилах_NOUN
нихуй_NOUNмилый_COMP
обжечь_ VERBмонст_NOUN
окончание_NOUNмразь_NOUN
орало_NOUNмррррра_NOUN
орг_NOUNмутный_ADJ
организация_NOUNмфц_UNKN
отвлечь_VERBмэн_NOUN
отвратительный_ADJнабрать_VERB
отл_UNKNнаверна_NOUN
отлично_ADVнаехать_VERB
отсталый_ADJнамджуна_NOUN
передать_VERBнаорать_VERB
петух_NOUNнастолько_ADV
пизда_NOUNнеинтересно_ADV
пиздец_ NOUNнеловко_ADV
пиздуть_VERBненавидеть_ VERB
пиздёжа_NOUNнесчастный_ADJ
подробный_ADJнету_PRED
поебать_VERBнеудобно_ADV
пока_ADVникогда_ADV
показатель_NOUNно_CONJ
получить_VERBноооо_NOUN
пользователь_ NOUNноут_NOUN
помереть_VERBобидный_ADJ
помеха_ NOUNоблизывать_VERB
потерять_PRTSобъяснять_VERB
похуй_NOUNобъёмный_ADJ
пояснение_NOUNон_ NPRO
предать_VERBоооо_ NOUN
предсказуемый_ADJооооооо_NOUN
признак_NOUNоооохнуть_VERB
приобрести_VERBооохнуть_VERB
припереться_VERBорало_NOUN
прогуливать_VERBостанавливать_VERB
прогулять_VERBотбирать_ VERB
равно_CONJотвлечься_VERB
разом_ADVотвратительный_ADJ
разреветься_VERBотвратный_ADJ
разрывать_VERBотлично_ADV
рамка_NOUNофф_UNKN
растеряться_VERBох_INTJ
результат_NOUNпаника_NOUN
рил_NOUNпедик_NOUN
руководитель_ NOUNпереключить_VERB
рушить_ VERBпереписывать_ VERB
рэп_NOUNпересматривать_ VERB
свалить_VERBпиздец_NOUN
скот_NOUNпират_NOUN
скучно_ ADVписаться_VERB
смеяться_VERBподъехать_VERB
сосуд_NOUNпоебать_VERB
спока_NOUNпожениться_VERB
спорый_ADJпокинуть_VERB
ссылка_ NOUNпомнить_VERB
стебать_VERBпоплакать_VERB
сук_NOUNпорешать_VERB
съебывать_VERBпоступок_ NOUN
тиндёр_ NOUNпотерянный_ADJ
тратиться_VERBпотерять_PRTS
труп_NOUNпоттер_NOUN
трус_NOUNпошло_ADV
тэхен_NOUNппц_UNKN
ущербный_ADJпредатель_ NOUN
факультет_NOUNпредать_VERB
херить_ VERBпривет_NOUN
хит_NOUNпригонять_VERB
хм_INTJприобнять_VERB
хрень_NOUNпродумать_VERB
хуй_NOUNпрописать_VERB
хуйня_NOUNпсих_NOUN
хула_NOUNпсихануть_VERB
хы_UNKNпытаться_VERB
цель_NOUNпялить_VERB
через_PREPработа_ NOUN
шава_NOUNразреветься_VERB
шеф_NOUNразрывать_VERB
шлюшка_ NOUNрасплатиться_VERB
шуга_NOUNрасстроить_PRTF
эт_UNKNрастягивать_VERB
эх_INTJреветь_VERB
я_NPROрепер_NOUN
(glowing-star_emoji)_UNKNрепетиция_NOUN
(thinking-face_emoji)_UNKNриал_NOUN
рил_NOUN
рушить_VERB
саба_NOUN
сам_ADJ
свалить_VERB
серега_ NOUN
серия_NOUN
слеза_NOUN
слишком_ADV
смеяться_VERB
спасать_VERB
спать_NOUN
спойлерить_VERB
спорый_ADJ
ссора_NOUN
старший_NOUN
стебать_VERB
страдать_VERB
страшно_ADV
стремный_ADJ
съездить_VERB
таак_NOUN
тони_NOUN
тренировка_NOUN
труп_NOUN
тц_UNKN
тэхен_NOUN
убивать_VERB
удовлетворение_NOUN
умирать_VERB
умыться_VERB
упад_NOUN
фандом_ NOUN
ханна_NOUN
хардкор_NOUN
хдд_UNKN
хл_UNKN
хм_INTJ
хорошо_ADV
хотя_CONJ
худой_COMP
червь_NOUN
через_PREP
чертовый_ADJ
чонгук_ NOUN
чувство_NOUN
чудом_ADV
чуть_ADV
шлюшка_NOUN
шов_NOUN
шуга_NOUN
ь_UNKN
это_NPRO
этот_ADJ
юнга_NOUN
я_NPRO
(medium-light-skin-tone_emoji)_UNKN
(face-blowing-a-kiss_emoji)_UNKN
(drooling-face_emoji)_UNKN
Table 15

Correlation between sentiment class and WHO score

Sentiment classCorrelation with WHO score
negative−0.14921
positive0.024321
neutral0.09399
speech0.152864
skip−0.114221
Table 16

Results for linear regression model with sentiment class frequency features. Mean absolute error and Pearson correlation

Sentiment classes combinationsMean absolute errorPearson correlation
negative, positive0.14340.1243
negative, neutral, positive0.14450.1265
negative, neutral, positive, skip, speech0.14470.136
Table 17

Models and hyperparameters used for SWLS and WHO-5 regression

ModelHyperparameters
AdaBoostRegressorloss’: [‘linear’, ‘square’, ‘exponential’], ‘n_estimators’: [10,100]
DecisionTreeRegressorcriterion’: [‘mae’], ‘max_depth’: [2,3], ‘min_samples_leaf’: [2], ‘max_leaf_nodes’: [3], ‘splitter’: [‘best’], ‘min_samples_split’: [2], ‘max_features’: [‘auto’]
ElasticNetalpha’: [100,10,1,0.1,0.01,0.001,0.0001], ‘normalize’: [False, True], ‘selection’: [‘cyclic’, ‘random’], ‘max_ iter’: [500,1000], ‘l1_ratio’: [0.25,0.5,0.75]
Lassoalpha’: [100,10,1,0.1,0.01,0.001,0.0001], ‘normalize’: [False, True], ‘selection’: [‘cyclic’, ‘random’],’max_iter’: [500,1000,2000]
LinearRegressionnormalize’: [False, True]
RandomForestRegressorn_estimators’: [2,5,10,20], ‘max_depth’: [2,3], ‘min_samples_split’: [2], ‘min_samples_leaf’: [1], ‘max_ features’: [‘auto’]
Ridgealpha’: [100,10,1,0.1,0.01,0.001,0.0001], ‘normalize’: [False, True]
Table 18

Models and hyperparameters used for WHO-5 classification

ModelHyperparameters
AdaBoostClassifier“algorithm”: [“SAMME.R”]
DecisionTreeClassifier“criterion”: [“gini”, “entropy”], “max_depth”: [None, 10, 50, 100]
RandomForestClassifier“n_estimators”: [10, 50, 100], “max_depth”: [None, 10, 50, 100]
Table 19

SWLS regression results for all feature sets

FeaturesBest modelResults
MAEPearson RR-2
Mean baseline0.1853
Median baseline0.185
WordsElasticNet0.17440.34020.1022
RuLIWCDecisionTree0.1820.21680.0142
AppCatsElasticNet0.17620.27370.0172
BehaviorDecisionTree0.17850.1910.0195
ClustersRandomForest0.18140.17090.026
AppCats + RuLIWCElasticNet0.17760.24780.0296
AppCats + BehaviorElasticNet0.17840.22270.0248
AppCats + WordsRidge0.17560.29920.0864
RuLIWC + BehaviorDecisionTree0.18180.19490.0133
RuLIWC + WordsElasticNet0.17220.3520.0988
Behavior + WordsElasticNet0.17540.3140.0752
clusters + AppCatsElasticNet0.17860.25450.0129
clusters + RuLIWCDecisionTree0.17690.27690.0507
clusters + BehaviorDecisionTree0.17650.22430.0368
clusters + WordsLasso0.17150.34350.112
AppCats + RuLIWC + BehaviorElasticNet0.17610.30930.0704
AppCats + RuLIWC + WordsLasso0.17530.29130.0711
AppCats + Behavior + WordsElasticNet0.17350.30040.0724
RuLIWC + Behavior + WordsElasticNet0.17520.35060.0934
clusters + AppCats + RuLIWCElasticNet0.17780.26360.0314
clusters + AppCats + BehaviorElasticNet0.17560.23410.0528
clusters + AppCats + WordsLasso0.17120.29580.0932
clusters + RuLIWC + BehaviorDecisionTree0.17650.22750.038
clusters + RuLIWC + WordsElasticNet0.17120.36730.1192
clusters + Behavior + WordsElasticNet0.17120.34590.1228
clusters + AppCats + RuLIWC + BehaviorElasticNet0.17480.29620.0048
clusters + AppCats + RuLIWC + WordsRidge0.17510.28820.0811
clusters + AppCats + Behavior + WordsElasticNet0.16980.40240.1045
clusters + RuLIWC + Behavior + WordsLasso0.17760.2940.0616
AppCats + RuLIWC + Behavior + WordsElasticNet0.17190.32550.096
clusters + AppCats + RuLIWC + Behavior + WordsElasticNet0.16810.37760.1164
Table 20

WHO-5 regression results for all feature sets

FeaturesBest modelResults
MAEPearson RR-2
Mean baseline0.1542
Median baseline0.1533
WordsLasso0.14410.31790.0817
RuLIWCLasso0.15290.12760.0197
AppCatsElasticNet0.15110.21720.0329
BehaviorDecisionTree0.14970.24630.0096
ClustersLasso0.15160.15330.0241
AppCats + RuLIWCRidge0.15050.25780.0371
AppCats + BehaviorLasso0.14580.29340.0678
AppCats + WordsElasticNet0.14580.32280.0772
RuLIWC + BehaviorDecisionTree0.15050.23990.0032
RuLIWC + WordsRidge0.14450.32420.0964
Behavior + WordsAdaBoost0.14730.28130.0476
clusters + AppCatsElasticNet0.15020.25370.0492
clusters + RuLIWCAdaBoost0.15270.1822-0.007
clusters + BehaviorDecisionTree0.150.2343-0.0026
clusters + WordsElasticNet0.14490.26280.0975
clusters + AppCats + RuLIWCElasticNet0.14930.28070.0786
clusters + AppCats + BehaviorElasticNet0.14690.30130.0739
clusters + AppCats + WordsRidge0.14440.3380.0894
clusters + RuLIWC + BehaviorDecisionTree0.15050.23990.0032
clusters + Behavior + WordsRidge0.14620.23890.0653
AppCats + RuLIWC + BehaviorElasticNet0.1450.33630.0835
AppCats + RuLIWC + WordsRidge0.1460.32220.0817
RuLIWC + Behavior + WordsElasticNet0.14790.25310.0531
AppCats + Behavior + WordsElasticNet0.14520.31520.0975
clusters + RuLIWC + WordsAdaBoost0.14360.32020.081
clusters + AppCats + RuLIWC + BehaviorElasticNet0.14560.33940.0938
clusters + AppCats + RuLIWC + WordsElasticNet0.14720.30880.0716
clusters + AppCats + Behavior + WordsLasso0.14570.33390.0701
clusters + RuLIWC + Behavior + WordsElasticNet0.14780.29610.072
AppCats + RuLIWC + Behavior + WordsElasticNet0.14380.3670.1193
clusters + AppCats + RuLIWC + Behavior + WordsElasticNet0.1480.29520.0544
Table 21

WHO-5 classification results

ClassificationThresholdN (Classes)FeaturesBest modelF1-macroF1-weightedF1-lowF1-highTruePositiveRate (low)FalsePositiveRate (low)
binary0.51221/151WordsAdaBoost0.560.5810.6690.4520.6970.57
RuLIWCDecisionTree0.5710.5820.6310.5120.6110.457
AppCatsAdaBoost0.580.6020.6940.4660.7380.57
BehaviorDecisionTree0.5430.5590.630.4560.6380.55
ClustersRandomForest0.5390.5710.7140.3630.8320.715
binary majority baseline0.3780.4560.373011
trinary0.35/0.59111/158/103WordsAdaBoost0.440.4470.4070.430.3780.195
RuLIWCAdaBoost0.3810.3990.4130.2380.4050.241
AppCatsAdaBoost0.4220.4430.4020.2940.3960.241
BehaviorAdaBoost0.4250.4380.4270.3290.4140.23
ClustersDecisionTree0.3580.3640.3380.3390.3510.295
clusters + RuLIWC + WordsAdaBoost0.4830.4930.5020.4330.450.161
trinary majority baseline0.1990.25300
Table 22

Features significant in SWLS regression

FeatureMean importanceCount in 10-CV
спать_NOUN41,086.41440498985
интим_NOUN−44,937.46130190085
орг_NOUN23,978.96144118285
дропнуть_VERB−64,677.15864677155
тратиться_VERB−24,593.57146410345
отл_UNKN34,184.21125047215
пояснение_NOUN−22,499.97575338525
стебать_VERB−28,898.9513939065
вифя_NOUN−48,114.14702412855
спойлерить_VERB−48,530.12110868865
ооохнуть_VERB−44,864.42338317085
милый_COMP56,128.2621556055
пиздёжа_NOUN−22,727.18494764085
Negative_month−29.26520841711935
AppUsage9-12Ratio10.33657600754275
SOCIAL+COMMUNICATION+DATING_0/SOCIAL+COMMUNICATION+DATING11.92001416205175
AppUsage0-3Ratio−8.027821850583735
обжечь_ VERB−40,019.21368972265
PHOTOGRAPHY_6/68.005657609986015
объёмный_ADJ−22,927.14361152994
разрывать_VERB−30,217.46754298194
AppUsage6-9Ratio6.149383647344534
Negative_year−42.38451207870154
Negative_all−31.96833410765744
(face-blowing-a-kiss_emoji)_UNKN30.66321554963344
упад_NOUN−18,580.45701562654
чонгук_ NOUN−17,463.43136347374
дельфин_NOUN21,536.53455832924
пиздуть_VERB−14,962.83462967414
продумать_VERB17,494.23196235444
PERSONALIZATION_3/37.008148364143814
38524,539.38674988114
хл_UNKN−16147.55390405614
TOOLS_2/26.329648895816224
блч_UNKN−14,422.9174898244
мразь_NOUN−18,116.84614736644
ENTERTAINMENT_0/05.314805318097034
Percept_RuLIWC−40.09838694065013
камбэк_ NOUN−11,161.91648713153
помеха_ NOUN16,006.60977362053
неудобный_ADJ14,581.27123820973
байка_NOUN−13,460.07916479493
но_CONJ−33.88203014270593
бляяяяять_VERB16,679.77929734823
OTHER_6/66.701788050095343
OTHER_6/OTHER−5.927797080257813
OTHER_5/5−6.649415016534133
OTHER_5/OTHER7.95877773218383
пожениться_VERB7984.737339583573
джуна_NOUN−15,756.49132301613
хорошо_ ADV27.48326183470053
расстроить_PRTF10,055.30694441933
предать_VERB9610.447895340473
критерий_NOUN13,168.2338140623
офф_UNKN−16,763.63052316213
грубый_ ADJ−9967.349166195783
съебывать_VERB−14,161.9965711573
фандом_ NOUN−7058.658559128613
бляяяяяять_GRND−8683.893108200643
PHOTOGRAPHY_4/4−11.42703181699983
кореец_ NOUN−8536.071986790333
бантан_ NOUN11,240.43725550593
разреветься_VERB9104.896448063333
GAME_0/GAME3.636458098449463
EDUCATION+PRODUCTIVITY_1/1−3.067717726380873
PERSONALIZATION_5/PERSONALIZATION4.180993207310293
HEALTH+MEDICAL_7/HEALTH+MEDICAL3.382556121739813
EDUCATION+PRODUCTIVITY_6/EDUCATION+PRODUCTIVITY3.61034461325333
GAME_5/GAME−3.600757956541763
SOCIAL+COMMUNICATION+DATING_1/SOCIAL+COMMUNICATION+DATING6.34303237854163
HEALTH+MEDICAL_1/HEALTH+MEDICAL3.657488020842153
GAME_4/GAME6.14925632344053
PERSONALIZATION_6/6−7.743922893615353
HEALTH+MEDICAL_4/4−10.39486080392093
PERSONALIZATION_5/5−4.536269159896023
PERSONALIZATION_6/PERSONALIZATION5.566922437218613
EDUCATION+PRODUCTIVITY_3/EDUCATION+PRODUCTIVITY−3.231273158407863
SOCIAL+COMMUNICATION+DATING_6/SOCIAL+COMMUNICATION+DATING4.041586037646383
GAME_1/GAME−7.229001420059063
PERSONALIZATION_3/PERSONALIZATION−5.344309969876653
ENTERTAINMENT_1/ENTERTAINMENT4.141135419469023
ENTERTAINMENT_2/24.316422687385053
PERSONALIZATION_0/PERSONALIZATION−4.41491127246853
привет_ NOUN23.49028840737962
EDUCATION+PRODUCTIVITY_4/EDUCATION+PRODUCTIVITY0.9921248869088412
Positive_month25.76342047152382
EDUCATION+PRODUCTIVITY_5/EDUCATION+PRODUCTIVITY1.058628843950412
TOOLS_5/TOOLS−2.004904475017962
EDUCATION+PRODUCTIVITY_7/EDUCATION+PRODUCTIVITY1.116033065800732
SOCIAL+COMMUNICATION+DATING_7/73.09088054325262
TOOLS_2/TOOLS−6.498042536613742
TOOLS_4/TOOLS−3.313886728455362
ENTERTAINMENT_2/ENTERTAINMENT−2.547365649808112
SOCIAL+COMMUNICATION+DATING_7/SOCIAL+COMMUNICATION+DATING−6.834081641151332
EDUCATION+PRODUCTIVITY_2/EDUCATION+PRODUCTIVITY2.438916334793692
выпилиться_VERB−3338.050508677292
EDUCATION+PRODUCTIVITY_2/22.597316995279222
еби_UNKN19,201.52507641292
выглянуть_VERB−7762.753457454762
гуглить_VERB−1079.550718534412
растягивать_VERB−5127.616020395872
жестокий_ADJ−6724.21957340532
GAME_2/2−2.504966673400792
заржать_VERB−9032.152014132622
мэн_NOUN18,667.86924108252
ENTERTAINMENT_4/4−0.69314622760392
долбиться_VERB−14,770.16180417562
петух_NOUN−7131.855414143962
подробный_ADJ6083.970426424842
оооохнуть_VERB−12,538.5819282292
загуглила_NOUN−8903.857474725492
ущербный_ADJ−10,188.66780267042
GAME_6/GAME−0.5573263735115032
EDUCATION+PRODUCTIVITY_0/EDUCATION+PRODUCTIVITY1.320144145452482
See_RuLIWC−44.90660929590572
TOOLS_1/TOOLS0.2469665911719792
SOCIAL+COMMUNICATION+DATING_0/00.1458306460624442
HEALTH+MEDICAL_2/HEALTH+MEDICAL−0.6604501034545732
PHOTOGRAPHY_3/PHOTOGRAPHY1.220281925278582
PHOTOGRAPHY_2/PHOTOGRAPHY−3.176746251056262
PHOTOGRAPHY_7/PHOTOGRAPHY−1.474326901854112
PHOTOGRAPHY_1/PHOTOGRAPHY3.161101006422272
PHOTOGRAPHY_0/PHOTOGRAPHY−2.353077716913112
OTHER_1/1−1.697174162628082
OTHER_2/2−1.360824378338072
OTHER_3/OTHER4.376049178068432
SOCIAL+COMMUNICATION+DATING_4/4−3.520252299117422
gender_merged0.8432334085022762
PHOTOGRAPHY_4/PHOTOGRAPHY−1.331923796686332
HEALTH+MEDICAL_7/79.424725105099192
HEALTH+MEDICAL_6/HEALTH+MEDICAL2.381433340311952
HEALTH+MEDICAL_4/HEALTH+MEDICAL0.07198951076199452
SOCIAL+COMMUNICATION+DATING_6/63.032022974944422
HEALTH+MEDICAL_1/1−12.98476659997782
GAME_0/0−0.8678513762446032
HEALTH+MEDICAL_0/HEALTH+MEDICAL−1.417304978049682
PERSONALIZATION_4/PERSONALIZATION−3.121054452841812
PERSONALIZATION_2/PERSONALIZATION−1.241272596277682
PERSONALIZATION_7/PERSONALIZATION1.515858027431452
TOOLS_4/4−9.925966401609342
PERSONALIZATION_0/0−3.196600123398452
ENTERTAINMENT_7/ENTERTAINMENT−0.840847617629852
HEALTH+MEDICAL_5/HEALTH+MEDICAL−0.1298103614339151
шава_NOUN−8941.245559081691
AppUsage15-18Ratio−2.076422559914091
AppUsage21-24Ratio−0.6768631488679481
маркус_ NOUN61,863.84482913711
ENTERTAINMENT_6/6−13.52756101891051
научный_ADJ4729.482927167991
ноооо_NOUN5259.916419642481
намджуна_NOUN−510.553271606311
AppUsage12-15Ratio−10.370763214921
HEALTH+MEDICAL_3/326.61706438440241
ENTERTAINMENT_7/717.49768614793161
GAME_1/1−0.02051297005901161
Alters_-9−0.1290415316496351
GAME_2/GAME−2.712009411732721
EDUCATION+PRODUCTIVITY_4/40.1777726688117671
TOOLS_0/0−2.142858704927421
Alters_-70.1531250188835831
TOOLS_6/TOOLS2.699149963141371
OTHER_1/OTHER−2.354496759248341
ENTERTAINMENT_3/31.13284948159931
PHOTOGRAPHY_1/101
ENTERTAINMENT_5/ENTERTAINMENT0.1491658001598871
ENTERTAINMENT_6/ENTERTAINMENT−0.413914518124451
PERSONALIZATION_1/PERSONALIZATION−0.3307074666603511
HEALTH+MEDICAL_2/20.1215024016773611
шов_NOUN−4721.21761873021
бланк_NOUN−4764.159887999681
GAME_4/4−3.650926285972151
EDUCATION+PRODUCTIVITY_3/3−6.807502903567381
ENTERTAINMENT_4/ENTERTAINMENT−3.300818806041511
TOOLS_7/TOOLS−4.042454643384581
TOOLS_5/5−4.143022861167421
PERSONALIZATION_4/410.75676096479531
TOOLS_3/TOOLS−4.122302964278371
TOOLS_0/TOOLS−5.533850373277181
HEALTH+MEDICAL_3/HEALTH+MEDICAL2.052749709717471
altersdiff−0.8445603005381251
HEALTH+MEDICAL_5/515.5521823420511
EDUCATION+PRODUCTIVITY_0/0−1.517890748103551
GAME_6/69.227919786201341
OTHER_7/OTHER1.972709787861121
SOCIAL+COMMUNICATION+DATING_1/1−0.1035680927147211
потерянный_ADJ−11,345.69594339211
саба_NOUN2077.714588082751
SOCIAL+COMMUNICATION+DATING_5/5−1.051766540371011
припереться_VERB−5406.877533644211
OTHER_4/4−5.54310609696131
OTHER_4/OTHER3.930865301657251
OTHER_0/03.276025739727591
HEALTH+MEDICAL_6/615.21614126093181
писаться_VERB−5296.543311149171
OTHER_0/OTHER−2.089672363366321
поплакать_VERB−318.6171796119881
рэп_NOUN−4852.081329186771
ложь_NOUN6888.939058383511
PHOTOGRAPHY_5/PHOTOGRAPHY0.8379222727444071
growth-2to-1weighted0.06823814006079521
Table 23

Features significant in WHO-5 regression

FeatureMean importanceCount in 10-CV
GAME_1/GAME−5.302885593746477
ENTERTAINMENT_1/ENTERTAINMENT4.487943656141627
HEALTH+MEDICAL_1/HEALTH+MEDICAL2.62164213317196
AppUsage9-12Ratio7.26344663990166
PERSONALIZATION_0/0−3.936504462036696
EDUCATION+PRODUCTIVITY_3/EDUCATION+PRODUCTIVITY−2.755472907255536
TOOLS_6/6−3.385621066442815
SOCIAL+COMMUNICATION+DATING_1/SOCIAL+COMMUNICATION+DATING7.085543061824475
GAME_3/GAME2.119836238809785
OTHER_1/OTHER−1.65725965564675
Bio_RuLIWC−20.81182067548225
(face-blowing-a-kiss_emoji)_UNKN35.12925242255355
EDUCATION+PRODUCTIVITY_7/EDUCATION+PRODUCTIVITY−1.529326604738655
Negative_month−32.98595918874245
Negative_year−28.74411918618235
Negative_all−22.82131900362615
но_CONJ−16.03581998014795
ENTERTAINMENT_3/ENTERTAINMENT1.893276644110535
PHOTOGRAPHY_0/PHOTOGRAPHY−1.869073486089515
AppUsage6-9Ratio3.861222483684244
See_RuLIWC−17.7710851043794
Percept_RuLIWC−16.00752351259784
PHOTOGRAPHY_4/4−11.83012052790964
SOCIAL+COMMUNICATION+DATING_7/SOCIAL+COMMUNICATION+DATING5.363962844277984
OTHER_6/OTHER−2.728250232198454
PERSONALIZATION_2/PERSONALIZATION−2.652882082582664
хорошо_ ADV11.88991283970864
HEALTH+MEDICAL_1/1−12.76335652127124
PERSONALIZATION_0/PERSONALIZATION−1.963540721130844
EDUCATION+PRODUCTIVITY_5/5−8.719224942412474
gender_merged1.725377008559494
PHOTOGRAPHY_4/PHOTOGRAPHY−1.7373319566314
EDUCATION+PRODUCTIVITY_4/EDUCATION+PRODUCTIVITY1.255224984983954
ENTERTAINMENT_0/ENTERTAINMENT−1.360740735717044
SOCIAL+COMMUNICATION+DATING_6/62.224058942497214
ENTERTAINMENT_6/ENTERTAINMENT0.9663498394315973
PHOTOGRAPHY_1/1−4.024068445544793
OTHER_4/4−2.76778685835233
OTHER_5/OTHER3.853435035427293
PHOTOGRAPHY_1/PHOTOGRAPHY3.713282774765593
PERSONALIZATION_1/PERSONALIZATION−3.352521253371033
AppUsage15-18Ratio−3.268823868840013
SOCIAL+COMMUNICATION+DATING_4/4−3.319978438014453
HEALTH+MEDICAL_4/HEALTH+MEDICAL−0.94281476811813
PERSONALIZATION_3/3−3.845205789862243
HEALTH+MEDICAL_2/23.929872941699313
PHOTOGRAPHY_3/PHOTOGRAPHY0.409230790974473
PHOTOGRAPHY_6/PHOTOGRAPHY−1.299968307808783
OTHER_1/1−0.7845467227685813
altersdiff−0.925380306470023
OTHER_0/OTHER2.255579369178623
PHOTOGRAPHY_6/69.015383929754992
SOCIAL+COMMUNICATION+DATING_4/SOCIAL+COMMUNICATION+DATING5.875030255535962
обжечь_ VERB−48,599.5264270152
офф_UNKN−28,194.94131704422
PHOTOGRAPHY_0/03.921983454851372
SOCIAL+COMMUNICATION+DATING_5/SOCIAL+COMMUNICATION+DATING−2.505860828683582
жестокий_ADJ−21,716.61977773052
потерянный_ADJ−20,316.82576644842
тратиться_VERB−17,634.62277247492
PHOTOGRAPHY_7/PHOTOGRAPHY−1.206783605730342
OTHER_0/0−1.445704520706562
OTHER_3/OTHER1.379843224534032
пригонять_VERB21,225.4585672082
дропнуть_VERB−53,030.10507098222
OTHER_6/64.065041820097852
предать_VERB36,199.92391282622
AppUsage12-15Ratio−4.386367861771022
червь_NOUN55,826.73532101362
ущербный_ADJ−26,488.84579504152
ооохнуть_VERB−20,570.09836866452
магнитный_ADJ19,900.85865577322
оооохнуть_VERB−34,380.09745211552
блч_UNKN−19,552.73456414622
приобнять_VERB49,086.85360494882
SOCIAL+COMMUNICATION+DATING_0/00.003008242222691632
вифя_NOUN−22,048.00346452232
TOOLS_2/TOOLS−1.550914037183462
growth-2to-1weighted−0.5061048655508312
вообще_ ADV−5.142388915293622
привет_ NOUN12.82267442066662
он_NPRO−18.08200592469432
GAME_0/GAME1.559310766885442
GAME_0/0−1.055879594419842
GAME_3/3−2.6542464878252
PHOTOGRAPHY_5/517.27837982168442
HEALTH+MEDICAL_7/72.710038711456432
EDUCATION+PRODUCTIVITY_1/1−1.522829308977092
EDUCATION+PRODUCTIVITY_2/EDUCATION+PRODUCTIVITY−1.075002012756382
HEALTH+MEDICAL_0/03.69721431046612
Social_RuLIWC21.87296345586092
TOOLS_3/TOOLS0.8076835004459822
TOOLS_6/TOOLS1.504911109745762
ENTERTAINMENT_4/ENTERTAINMENT−0.8412922362381712
ENTERTAINMENT_7/ENTERTAINMENT−0.537818287771842
PERSONALIZATION_1/12.050130574845572
PERSONALIZATION_2/22.412753562188532
PERSONALIZATION_4/PERSONALIZATION−0.2776151454868212
TOOLS_4/TOOLS−0.3326215657148572
PERSONALIZATION_6/PERSONALIZATION1.983311848120832
ханна_NOUN−20,365.78412892521
отбирать_VERB−16,422.6640607771
шлюшка_ NOUN7462.26684799061
интим_NOUN−8415.758716140071
отл_UNKN18,240.27907504331
бабочка_NOUN22,242.84282023781
кпоп_NOUN−22,706.9023322521
объёмный_ADJ−30,296.7983732891
упад_NOUN−17,378.11968787351
анимешник_NOUN−8288.747611123791
хотя_CONJ−2.327505007049371
критерий_NOUN37,329.19946479021
слишком_ADV−2.678476016254441
AppUsage18-21Ratio3.029946346529151
EDUCATION+PRODUCTIVITY_7/73.49770406793281
выглянуть_VERB−19,143.98984540131
хдд_UNKN−2.417641834083811
PERSONALIZATION_3/PERSONALIZATION−1.450623347346871
загуглила_NOUN38,940.88982667391
HEALTH+MEDICAL_2/HEALTH+MEDICAL−1.707665062511231
SOCIAL+COMMUNICATION+DATING_1/10.2140438495479841
SOCIAL+COMMUNICATION+DATING_7/70.1181343280894271
GAME_6/GAME0.3349575238807281
GAME_7/72.064428832531461
EDUCATION+PRODUCTIVITY_3/31.170729281066471
TOOLS_0/00.6908908667361
TOOLS_5/51.770629911156631
TOOLS_7/TOOLS1.26926188209541
ENTERTAINMENT_3/34.918032527134771
PERSONALIZATION_5/PERSONALIZATION−0.0152509884625151
HEALTH+MEDICAL_5/HEALTH+MEDICAL−0.9494067803625111
Alters_-70.02666286149883931
SOCIAL+COMMUNICATION+DATING_2/SOCIAL+COMMUNICATION+DATING−1.166903896450551
SOCIAL+COMMUNICATION+DATING_6/SOCIAL+COMMUNICATION+DATING−3.054223851741221
PHOTOGRAPHY_2/20.3351876578795731
PHOTOGRAPHY_5/PHOTOGRAPHY1.936803253374351
OTHER_4/OTHER2.12445218583981
OTHER_5/5−2.296476072601181
OTHER_7/72.517388616299931
AppUsage0-3Ratio−2.109694171371181
GAME_1/1−1.065747161602731
PERSONALIZATION_7/701
джуна_NOUN−45,463.06255695591
  38 in total

Review 1.  Psychological testing and psychological assessment. A review of evidence and issues.

Authors:  G J Meyer; S E Finn; L D Eyde; G G Kay; K L Moreland; R R Dies; E J Eisman; T W Kubiszyn; G M Reed
Journal:  Am Psychol       Date:  2001-02

Review 2.  The WHO-5 Well-Being Index: a systematic review of the literature.

Authors:  Christian Winther Topp; Søren Dinesen Østergaard; Susan Søndergaard; Per Bech
Journal:  Psychother Psychosom       Date:  2015-03-28       Impact factor: 17.659

3.  Disrupting the power balance between doctors and patients in the digital era.

Authors:  Sarah M Goodday; John R Geddes; Stephen H Friend
Journal:  Lancet Digit Health       Date:  2021-01-27

4.  The Satisfaction with Life Scale: Philosophical Foundation and Practical Limitations.

Authors:  Amalie Oxholm Kusier; Anna Paldam Folker
Journal:  Health Care Anal       Date:  2021-01-02

5.  A Cross-Cultural Study in Germany, Russia, and China: Are Resilient and Social Supported Students Protected Against Depression, Anxiety, and Stress?

Authors:  Julia Brailovskaia; Pia Schönfeld; Xiao Chi Zhang; Angela Bieda; Yakov Kochetkov; Jürgen Margraf
Journal:  Psychol Rep       Date:  2017-08-24

6.  A brief measure for assessing generalized anxiety disorder: the GAD-7.

Authors:  Robert L Spitzer; Kurt Kroenke; Janet B W Williams; Bernd Löwe
Journal:  Arch Intern Med       Date:  2006-05-22

7.  Mental Health During COVID-19 Lockdown in the United Kingdom.

Authors:  Christoph Pieh; Sanja Budimir; Jaime Delgadillo; Michael Barkham; Johnny R J Fontaine; Thomas Probst
Journal:  Psychosom Med       Date:  2021-05-01       Impact factor: 4.312

8.  Resilience, COVID-19-related stress, anxiety and depression during the pandemic in a large population enriched for healthcare providers.

Authors:  Ran Barzilay; Tyler M Moore; David M Greenberg; Grace E DiDomenico; Lily A Brown; Lauren K White; Ruben C Gur; Raquel E Gur
Journal:  Transl Psychiatry       Date:  2020-08-20       Impact factor: 6.222

9.  Strange Days: Adult Physical Activity and Mental Health in the First Two Months of the COVID-19 Pandemic.

Authors:  Madelaine Gierc; Negin A Riazi; Matthew James Fagan; Katie M Di Sebastiano; Mahabhir Kandola; Carly S Priebe; Katie A Weatherson; Kelly B Wunderlich; Guy Faulkner
Journal:  Front Public Health       Date:  2021-04-15

10.  Drastic Reductions in Mental Well-Being Observed Globally During the COVID-19 Pandemic: Results From the ASAP Survey.

Authors:  Jan Wilke; Karsten Hollander; Lisa Mohr; Pascal Edouard; Chiara Fossati; Marcela González-Gross; Celso Sánchez Ramírez; Fernando Laiño; Benedict Tan; Julian David Pillay; Fabio Pigozzi; David Jimenez-Pavon; Matteo C Sattler; Johannes Jaunig; Mandy Zhang; Mireille van Poppel; Christoph Heidt; Steffen Willwacher; Lutz Vogt; Evert Verhagen; Luiz Hespanhol; Adam S Tenforde
Journal:  Front Med (Lausanne)       Date:  2021-03-26
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.