Literature DB >> 35489559

First-onset major depression during the COVID-19 pandemic: A predictive machine learning model.

Daniela Caldirola1, Silvia Daccò2, Francesco Cuniberti3, Massimiliano Grassi2, Alessandra Alciati4, Tatiana Torti5, Giampaolo Perna3.   

Abstract

BACKGROUND: This study longitudinally evaluated first-onset major depression rates during the pandemic in Italian adults without any current clinician-diagnosed psychiatric disorder and created a predictive machine learning model (MLM) to evaluate subsequent independent samples.
METHODS: An online, self-reported survey was released during two pandemic periods (May to June and September to October 2020). Provisional diagnoses of major depressive disorder (PMDD) were determined using a diagnostic algorithm based on the DSM criteria of the Patient Health Questionnaire-9 to maximize specificity. Gradient-boosted decision trees and the SHapley Additive exPlanations technique created the MLM and estimated each variable's predictive contribution.
RESULTS: There were 3532 participants in the study. The final sample included 633 participants in the first wave (FW) survey and 290 in the second (SW). First-onset PMDD was found in 7.4% of FW participants and 7.2% of the SW. The final MLM, trained on the FW, displayed a sensitivity of 76.5% and a specificity of 77.8% when tested on the SW. The main factors identified in the MLM were low resilience, being an undergraduate student, being stressed by pandemic-related conditions, and low satisfaction with usual sleep before the pandemic and support from relatives. Current smoking and taking medication for medical conditions also contributed, albeit to a lesser extent. LIMITATIONS: Small sample size; self-report assessment; data covering 2020 only.
CONCLUSIONS: Rates of first-onset PMDD among Italians during the first phases of the pandemic were considerable. Our MLM displayed a good predictive performance, suggesting potential goals for depression-preventive interventions during public health crises.
Copyright © 2022 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  COVID-19; Depression; First-onset; General population; Machine learning; Predictive model

Mesh:

Year:  2022        PMID: 35489559      PMCID: PMC9044654          DOI: 10.1016/j.jad.2022.04.145

Source DB:  PubMed          Journal:  J Affect Disord        ISSN: 0165-0327            Impact factor:   6.533


Introduction

Population studies of mental health during the COVID-19 pandemic have estimated rates of depression in the general population ranging from approximately 8% to 48%. These rates differ greatly across countries, likely due to differences in psychometric tools, phase of the pandemic, and cultural or social context. However, they were higher than pre-pandemic rates (Adu et al., 2021; Cénat et al., 2021; Ettman et al., 2020; Hao et al., 2020; Lakhan et al., 2020; Liu et al., 2020a, Liu et al., 2020b; Luo et al., 2020; Salari et al., 2020; Solomou and Constantinidou, 2020; Wu et al., 2021). A recent estimate of the global prevalence and burden of major depressive disorder (MDD) in 204 countries and territories found an increase of 27.6% in 2020 due to the COVID-19 pandemic (Santomauro et al., 2021). Most surveys use self-report depressive symptom scales that estimate only probable cases of major depression, so it is difficult to distinguish a true major depressive episode from a transient physiological response to the unexpected global crisis. However, it is nevertheless conceivable that, during the pandemic, some vulnerable individuals may have developed depression that is worthy of clinical attention. It is well-known that exposure to environmental and psychosocial stressors or traumatic events is associated with various mental health consequences, including first-onset depression or worsening of pre-existing depression (Ettman et al., 2020; Gilman et al., 2013; Goldmann and Galea, 2014). As MDD is a leading cause of disease burden and disability worldwide (GBD 2019 Mental Disorders, 2022), it is medically necessary to identify the predictors that have placed a portion of the general population at higher risk of developing depression due to the COVID-19 pandemic. Modifiable risk factors may be a prime target for public depression prevention programs. Early interventions into mental health care may be suitable for certain at-risk groups (Meng et al., 2017) during both the ongoing pandemic and future public health emergencies. Several possible predictors of self-reported depression in the general population during the pandemic have been recognized, including sociodemographic factors, such as being female; employment status, such as job loss; pandemic-related stressful experiences; low social support; and personal characteristics such as having a lower level of coping mechanisms with stressors (Adu et al., 2021; Arpino and Pasqualini, 2021; Bruno et al., 2020; Liu et al., 2020b; Prout et al., 2020; Rossi et al., 2020; Solomou and Constantinidou, 2020; Vindegaard and Benros, 2020). However, most results came from traditional statistical analyses, which were insufficient for identifying the most relevant predictors of depression in large sets of interrelated variables. A few studies predicted pandemic-related mental health via machine learning (ML) techniques (Eder et al., 2021; Flesia et al., 2020; Ge et al., 2020; Prout et al., 2020), which are especially suitable to identify predictive models in extensive and complex data sets (Orrù et al., 2020; Perna et al., 2018; Wardenaar et al., 2021). Only one of this small group of studies included assessment of depressive symptoms (Prout et al., 2020). This study analyzed data in two waves of an online survey that we disseminated among the general population of Italy in two periods of the first year of the pandemic. This study longitudinally evaluated the rates of first-onset MDD in Italian adults without any current clinician-diagnosed psychiatric disorder (CPsyD) and created a predictive ML model of first-onset MDD using in subsequent independent samples of Italians. We were interested in including the general population not directly exposed to highly specific COVID-19-related risk factors for mental health, such as having contracted COVID-19 (Awan et al., 2021; De Berardis, 2020) or being health care workers (Awan et al., 2022; De Berardis et al., 2021) during the pandemic. For this reason, the survey was dedicated to people who were not health care workers, while people who contracted COVID-19 were excluded from this study. We employed a screening questionnaire based on the DSM diagnostic criteria to maximize the specificity of the identification of a major depressive episode and minimize the risk of classifying a physiological depressive response to an unexpected global crisis as pathological. However, as the depression screening was self-report, we considered the diagnosis of MDD to be provisional (PMDD). To the best of our knowledge, no other study with these purposes has been published.

Methods

Procedures

The detailed procedures used have been described elsewhere (Caldirola et al., 2022). Briefly, we disseminated an online self-report survey among the general Italian population in two pandemic periods, from May 18 to June 20, 2020 (first wave survey) and from September 15 to October 20, 2020 (second wave survey). These two waves were part of an ongoing longitudinal study approved by the Ethics Committee of Humanitas Research Hospital. The study was monitoring the mental health of the Italian general population mental health up to 2 years from the beginning of the pandemic through successive online surveys that we distributed approximately every three months. The survey was conducted through the SurveyMonkey platform, an online survey provider (http://www.surveymonkey.com), and was advertised and shared via social media (Facebook, Instagram, LinkedIn, and WhatsApp). People who were ≥18 years old and were not health care workers were invited to fill in the survey voluntarily and anonymously. Before the survey began, participants were required to provide written informed consent. At the beginning of each survey, the participants were asked to enter a few letters and numbers in response to identical standardized hints across the surveys (e.g., “please enter the first two letters of your mother's name”) to create a unique anonymous identifier. We used this identifier to track respondents longitudinally during the study. Moreover, at the beginning of the second wave survey, the participants were asked whether they had previously participated in the first wave. Each participant's collected data were saved and managed under the European regulations for privacy and protected health information. All relevant information is available in the SurveyMonkey Privacy Notice (www.surveymonkey.com/mp/legal/privacy/).

Participants

The study included 3532 participants who provided informed consent. In this study, we included in the analyses only those who met the following criteria: having completed the entire survey; having declared never having had a clinician-diagnosed mental disorder, or only have had a disorder in the past (in the latter case, we excluded those who declared having had major depression); having declared to never having contracted COVID-19; and, for participants in the second wave, not having participated in the first. Our final sample included 633 participants in the first wave and 290 participants in the second wave. Fig. 1 presents the participant selection.
Fig. 1

Flow diagram of the participant selection process for the aim of the study.

Flow diagram of the participant selection process for the aim of the study. A previous study with different aims and inclusion/exclusion criteria included a different subsample of the entire group of 3532 participants. Specifically, we included only participants with no psychiatric history; we estimated the rates of new-onset psychiatric disorders throughout the pandemic; and we developed an ML model predictive of at least one new-onset psychiatric disorder in subsequent independent samples (Caldirola et al., 2022).

Measures

The survey included two sections. The first consisted of a series of ad hoc questions to collect participants' sociodemographic data and certain personal information, such as lifestyle, personal relationships, medical and psychiatric history, occupation, and usual disposition toward multiple aspects of daily life. The other section included several validated self-report screening questionnaires (an Italian-language version). Below we describe the two of these that we used to gather data for the aims of the present study, namely, the Depression Module of the Patient Health Questionnaire (PHQ-9) (Spitzer et al., 1999) and the six-question Brief Resilience Scale (BRS) (Pirro et al., 2020; Smith et al., 2008). The complete list of the ad hoc questions and self-report questionnaires is available on request. The PHQ-9 is a screening tool to detect a current major depressive episode. This tool consists of nine questions that reference the last two weeks; the responses are given on a scale that ranges from “not at all” (scored as 0) to “nearly every day” (scored as 3). The DSM-IV criteria-based diagnostic algorithm identifies a current major depressive episode with a sensitivity and specificity (95% confidence interval [CI]) of 0.73 (0.59–0.87) and 0.98 (0.9–1.00), respectively (Spitzer et al., 1999); recent estimates from multiple studies place pooled sensitivity and specificity (95% CI) at 0.61 (0.54–0.68) and 0.95 (0.93–0.96), respectively (He et al., 2020). Due to the self-report nature of the assessment, the PHQ-9-based diagnosis of MDD has to be considered as provisional (PMDD) and should be confirmed via direct clinical assessment (He et al., 2020; Spitzer et al., 1999). The six-question BRS considers resilience to be “the ability to bounce back or recover from stress;” the responses are given on a scale that ranges from “strongly disagree” (scored as 1) to “strongly agree” (scored as 5).

Statistical analyses

We compared the categorical and ordinal variables between the first and the second wave with the chi-square (χ2) or the Fisher's exact tests and the Mann-Whitney W test, respectively. In the χ2 test, we calculated the standardized adjusted residuals and correspondent p-values for each cell of the contingency tables to determine which cell differences contributed to the significance of the χ2 test results; and we applied Bonferroni's correction to the p-values of the adjusted standardized residuals. The statistical significance level was set at the conventional 0.05. We used the R programming language version 3.6.3 (R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2020) in the statistical analyses.

Machine learning methodology

ML methodology is used to develop algorithms through training examples that are capable of producing the best possible prediction when used in new cases with an undefined outcome. In this study, we included as potential predictive variables all personal information related to factors preceding the pandemic or representing its direct consequences, such as pandemic-related individual experiences, namely the 46 variables reported in Table 1, Table 2 (see Results) and the Supplementary material (Tables S1 and S2).
Table 1

Demographic characteristics and pandemic-related changes among the study participants.

CharacteristicsaFirst wave (N = 633)
Second wave (N = 290)
Prts with first-onset PMDD(N = 47; 7.4%)
Prts without first-onset PMDD(N = 586; 92.6%)
Prts with first-onset PMDD(N = 21; 7.2%)
Prts without first-onset PMDD(N = 269; 92.8%)
N%N%N%N%
Sex, female3778.742372.21571.421178.4
Age, years (mean ± SD)35.3414.2146.3615.531.5714.6138.7114.69
Education, years (mean ± SD)14.433.0215.393.514.142.5315.243.39
Marital status
 Unmarried2859.621236.21571.414252.8
 Married/common-law/civil union1327.731253.2523.810639.4
 Separated/divorced/widowed612.86210.614.8217.8
Living alone (yes)8176911.829.5207.4
Number of children
 No child357528348.31571.416762.1
 1 child510.614624.929.53635.4
 >1 child714.915726.84196622.7
Perceived changes in difficulty in looking after children during the pandemic
 Decreased compared with pre-pandemic12.1274.614.8103.7
 Remained similar to pre-pandemic48.51763014.85420.1
 Increased compared with pre-pandemic714.910017.14193814.1
Perceived changes in quality of relationship with children during the pandemic
 Improved compared with pre-pandemic12.16010.20093.3
 Remained similar to pre-pandemic2110.322730.7523.89033.5
 Worsened compared with pre-pandemic12.1162.714.831.1
Employment status
 Unemployed24.3294.929.5103.7
 Retired36.48514.514.8186.7
 Employed16342404129.58933.1
 Self-employed48.513422.929.56223
 Homemaker36.4264.414.862.2
 Student (all undergraduate, attending University)1940.47212.31361.98431.2
Cause of unemployment
 Due to the pandemic0050.914.820.7
 Preceding the pandemic24.3244.114.883
Other pandemic-related changes in employment status
 Job changed compared with previous employment0050.90051.9
 Previous unemployment that turned into employment0020.30010.4
Pandemic-related changes in the workplace
 Having continued to work only at one's own workplace48.76010.9158230.9
 Having worked only remotely1216.920837.7210238.7
 Having worked in part at one's own workplace and in part remotely36.57213004215.8
Pandemic-related changes in job position (yes)12.2152.700134.9
Pandemic-related changes in work hours
 Increased compared with pre-pandemic1021.711721.2002910.9
 Remained similar to pre-pandemic24.313624.63159837
 Decreased compared with pre-pandemic715.28715.800207.5
Pandemic-related changes in work shifts
 Increased compared with pre-pandemic24.3244.30041.5
 Remained similar to pre-pandemic36.56311.4154717.7
 Decreased compared with pre-pandemic36.5274.90062.3
Perceived changes in work performance during the pandemic
 Increased compared to pre-pandemic24.38815.900228.3
 Remained similar to pre-pandemic916.917531.72109837
 Decreased compared to pre-pandemic817.47713.9152710.2
Perceived changes in work exertion during the pandemic
 Increased compared to pre-pandemic1226.116329.52106624.9
 Remained similar to pre-pandemic48.712823.2006926
 Decreased compared to pre-pandemic36.5498.915124.5
Adequate procedures for preventing the COVID-19 infection put in place in the workplace (judgment of Prts)
 Not at all0040.70020.8
 A little613274.915.383.3
 Significantly00458.1003715.4
 A lot006311.415.34117.1
 Very much36.5478.500218.8
Judgment of one's own economic status during the pandemic
 Improved slightly or significantly00559.400186.7
 Remained stable316633356.81257.116461
 Worsened slightly1123.414725.1838.16825.3
 Worsened significantly or very much510.6518.714.8197.1

All the “Characteristics” are expressed as number (N) and %, except years of age and education that are expressed as mean and standard deviation (SD); in the column “Characteristics”: the variables included in the model as potential predictors are bolded and the possible levels of each variable are italicized; m: mean; N: number; PMDD: provisional diagnosis of major depressive disorder; Prts: participants; SD: standard deviation.

Table 2

Individual and clinical characteristics of the study participants.

CharacteristicsFirst wave (N = 633)
Second wave (N = 290)
Prts with first-onset PMDD(N = 47; 7.4%)
Prts without first-onset PMDD(N = 586; 92.6%)
Prts with first-onset PMDD(N = 21; 7.2%)
Prts without first-onset PMDD(N = 269; 92.8%)
N%N%N%N%
Clinician-diagnosed current medical diseases* (yes)2348.925743.9733.38632
Number of clinician-diagnosed current medical diseases
 1 disease245132655.60000
 2 diseases1429.819032.4733.36925.7
 3 diseases714.9518.700145.2
 4 diseases12.1193.20062.2
 5 diseases00000000
 6 diseases12.1000000
Current medications for medical diseases (yes)1736.222137.7314.37327.1
Smoking habit during the pandemic
 Having continued not smoking3778.748482.61368.419777.3
 Having continued or started smoking919.19115.5526.35120
 Having quit smoking12.1111.915.372.7
Alchool use during the pandemic
 Having continued not drinking alchool1838.322738.7210.59336.5
 Having continued or started drinking alchool2553.234859.41789.516062.7
 Having quit drinking alchool48.5111.90020.8
Recreational drug use during the pandemic**
 Having continued not using recreational drugs4391.557297.61684.224194.5
 Having continued or started using recreational drugs36.471.2315.8114.3
 Having quit using recreational drugs12.171.20031.2
Practicing physical activity during the pandemic
 Having continued not practicing physical activity1531.910117.2421.15521.6
 Having continued or started practicing physical activity163428047.8947.415159.2
 Having decreased or quit practicing physical activity163420535631.64919.2
Satisfaction with the usual sleep before the pandemic
 Very satisfied36.46210.6315.83814.9
 Satisfied24.320034.1315.87429
 Neutral1021.310017.1315.84316.9
 Not very satisfied1429.818631.7526.38432.9
 Very dissatisfied1838.3386.5526.3166.3
Having experienced fiduciary isolation/quarantine due to COVID-19 related risk conditions*** (yes)62.8325.6628.62810.4
Having experienced a loved one's hospitalization due to COVID-19 (yes)1940.420935.7942.97829
Having experienced a loved one's death due to COVID-19 (yes)12.1518.700176.3
Being scared of transmitting COVID 19 to others
 Not at all81714124.1211.82913.4
 A little48.522137.7317.69744.9
 Significantly1123.413322.715.94420.4
 A lot1021.35910.1847.12913.4
 Very much1429.8325.5317.6177.9
Being stressed by the pandemic-related restrictions on activities and personal movement
 Not at all12.111118.9628.67026
 A little1021.323439.9523.814754.6
 Significantly1429.814925.4523.82910.8
 A lot1123.46410.9314.3155.6
 Very much1123.4284.829.583
History of trauma before the COVID-19 pandemic (yes)1531.911519.64193914.5
History of past clinician-diagnosed psychiatric disorders (yes)714.912321314.34516.7

In the column “Characteristics”: the variables included in the model as potential predictors are bolded and the possible levels of each variable are italicized. * Cardiovascular diseases, diabetes, metabolic disorders, respiratory diseases, migraine/headache, oncological disorders/cancer, neurological disordes, others; ** considered illegal in Italy; *** e.g., contact with people who were diagnosed as having COVID 19; N: number; PMDD: provisional diagnosis of major depressive disorder; Prts: participants.

Demographic characteristics and pandemic-related changes among the study participants. All the “Characteristics” are expressed as number (N) and %, except years of age and education that are expressed as mean and standard deviation (SD); in the column “Characteristics”: the variables included in the model as potential predictors are bolded and the possible levels of each variable are italicized; m: mean; N: number; PMDD: provisional diagnosis of major depressive disorder; Prts: participants; SD: standard deviation. Individual and clinical characteristics of the study participants. In the column “Characteristics”: the variables included in the model as potential predictors are bolded and the possible levels of each variable are italicized. * Cardiovascular diseases, diabetes, metabolic disorders, respiratory diseases, migraine/headache, oncological disorders/cancer, neurological disordes, others; ** considered illegal in Italy; *** e.g., contact with people who were diagnosed as having COVID 19; N: number; PMDD: provisional diagnosis of major depressive disorder; Prts: participants. We chose gradient-boosted decision trees among the several available ML techniques (GBDT). In this approach, several decision trees are consecutively trained to reduce misclassification errors in the previous decision trees. The final prediction is based on a weighted sum of the predictions performed by all decision tree models, which can be in the hundreds (Friedman, 2001). This ML technique requires several hyper-parameters to be defined during the development of the algorithm. Each configuration of the hyper-parameters can lead to a different predictive performance of the algorithm through differential tuning of the training process. To identify the optimal hyper-parameter configuration, we attempted 40 random hyper-parameter configurations and used a Bayesian optimization approach to supply 60 further configurations progressively. Because the aim is to identify a hyper-parameter configuration that produces an algorithm with the best possible performance when applied to new cases not used during training, a stratified 10-fold cross-validation was applied. The training sample is divided into folds of cases not used in training. Instead, the training is performed iteratively on the remaining cases. After training, the algorithm is then applied to the previously omitted cases. The hyper-parameter configuration that demonstrated the best average cross-validated area under the receiving operating curve (AUROC) was considered the best configuration and was chosen to be used for the final training algorithm. Possible AUROC values range from 0.5 when the algorithm makes effectively random predictions to 1 when it is correct in every prediction. Applying an a priori selection of the predictive variables to be included in an algorithm can be expected to improve its performance due to the inclusion of only relevant input variables and excluding irrelevant and redundant ones. To achieve this, we used the Minimum Redundancy, Maximum Relevance (MRMR) technique, which ranks all available predictive variables in order of importance by simultaneously considering the association with the output variable (maximum relevance) and the association with other predictive variables (minimum redundancy). The hyperparameter optimization and cross-validation procedure were performed 46 times. Each time was considered as predictors a subset of 1 to 46 variables (all variables), as indicated by the mRMR procedure, and defined the best subset of the initial 46 predictive variables based on average cross-validated AUROC. A more detailed description of the ML methodology is reported in the Supplementary material.

Training and testing protocol

We used observations from the first wave survey for training and cross-validation of the final algorithm. The observations from the second wave survey were used as an independent test set to investigate the predictive performance of the algorithm after its development was completed. The algorithm initially outputs a continuous prediction (range: 0–1; values closer to 1 show a higher predicted probability of having first-onset PMDD). A classification threshold is applied to obtain the final dichotomous prediction. Different threshold values result in different levels of sensitivity and specificity. This study chose a threshold value that minimized the difference between sensitivity and specificity of the cross-validated predictions in the training dataset. This value was then applied to obtain the final predictions in the test dataset.

Evaluating the importance of variables

As with most ML techniques, the inherent complexity of the GBDT model does not allow us to create a direct interpretation of how the algorithm estimates the output beginning from the included features. Techniques to make ML algorithms more interpretable have been developed to overcome this limitation. In this study, we used the SHapley Additive exPlanations (SHAP) technique (Lundberg and Lee, 2017). A SHAP value is assigned to each variable for each prediction created by the algorithm. The larger the absolute SHAP value for a certain variable, the larger its contribution to determining that prediction in a specific case. In the current study, a positive SHAP value contributes to an increased risk of first-onset PMDD, whereas a negative SHAP value indicates a contribution toward reduced risk. The absolute average of the SHAP values observed for all cases in a dataset can be used to identify each variable's overall importance for the algorithm. Plotting the value of a variable against the associated SHAP values can be used to visualize the relationship between the given variable and the risk of first-onset PMDD modeled by the algorithm. The SHAP approach was applied separately to the observations collected in the first wave survey (training dataset) and the second wave survey (test dataset).

Results

Descriptive statistics of the variables used as potential predictors are presented in Table 1, Table 2 and the Supplementary material (Tables S1 and S2). The geographical distribution of participants (Table S8) and the distribution of past clinician-diagnosed psychiatric disorders (Tables S3 and S9) are presented in the Supplementary material. Statistical comparisons between the two waves of all the collected variables are presented in the Supplementary material (Tables S4–S9). Significant differences between participants in the first and second wave were found in mean age, marital and employment status, number of participants living alone, number of children, perceived changes in quality of relationship with children, pandemic-related changes in the workplace, participants' judgment concerning adequate procedures for preventing the COVID-19 infection put in place in the workplace, burden of clinician-diagnosed current medical diseases and related medications, recreational drug use and physical activity during the pandemic, having experienced fiduciary isolation/quarantine, being scared of transmitting COVID 19 to others, being stressed by the pandemic-related restrictions on activities and personal movement, history of trauma before the pandemic, and perception of being supported by friends/colleagues or by religious convictions when facing difficulties. The direction of the significant differences is described in Tables S4, S5, and S7. No other significant differences were found. The criteria for first-onset PMDD were met by 47 participants (7.4%) in the first wave and 21 (7.2%) in the second, and no significant difference in distribution was seen between the waves.

Performance of ML algorithm

Among the 46 variable subsets (from size 1 to all variables) indicated by the mRMR procedure, the subset whose hyper-parameter optimization and cross-validation procedure in the training dataset resulted in the best cross-validated AUROC included 10 variables and showed an AUROC of 0.9082. All variables remained in the final model (Fig. 2 ). A threshold value of 0.084 minimized the difference between sensitivity and specificity in the cross-validated predictions. Applying this threshold to the cross-validated predictions, we observed a sensitivity of 80.0%, a specificity of 80.6%, a positive predictive value of 26.4%, and a negative predictive value of 98.1%. This hyperparameter configuration was subsequently used to train the final model, taking the entire training set without cross-validation.
Fig. 2

Variables included in the final ML predictive model and average of the absolute SHAP values for each variable, ordered by their relevance to the model (train dataset, first wave).

The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of first-onset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with “BRS-item 6”; higher levels of “Being scared of transmitting COVID-19”; higher disagreement with “BRS-item 3”; lower levels of “satisfaction with the usual sleep before the pandemic”; higher levels of “Being stressed by pandemic-related restrictions on activities and personal movement ”; being an undergraduate student (“Employment status”); higher disagreement with “perception of being supported..”; having continued or started smoking (“Smoking habit during the pandemic”); yes (“current medications for medical diseases”); and yes (“Having experienced a loved one's hospitalization”).

ML: machine learning; SHAP: SHapley Additive exPlanations technique.

Variables included in the final ML predictive model and average of the absolute SHAP values for each variable, ordered by their relevance to the model (train dataset, first wave). The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of first-onset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with “BRS-item 6”; higher levels of “Being scared of transmitting COVID-19”; higher disagreement with “BRS-item 3”; lower levels of “satisfaction with the usual sleep before the pandemic”; higher levels of “Being stressed by pandemic-related restrictions on activities and personal movement ”; being an undergraduate student (“Employment status”); higher disagreement with “perception of being supported..”; having continued or started smoking (“Smoking habit during the pandemic”); yes (“current medications for medical diseases”); and yes (“Having experienced a loved one's hospitalization”). ML: machine learning; SHAP: SHapley Additive exPlanations technique. We tested the final model using data from the second wave (test dataset). The AUROC was 0.856 (95% bootstrap CI 0.77–0.93%). With the categorical predictions generated with the threshold identified above, our results indicated an average sensitivity of 76.5% (95% bootstrap CI 52.9–94.1%), an average specificity of 77.8% (95% bootstrap CI 72.4–82.85%), an average positive predictive value of 19.7% (95% bootstrap CI 14.3–25.8%), and an average negative predictive value of 97.9% (95% bootstrap CI 96.0–99.5%).

Relative importance of variables

The relative importance of the 10 variables included in the final model was analyzed using the SHAP technique with data from the first wave (the training dataset) and the second wave (the test dataset). Fig. 3 presents the average absolute SHAP values for each variable obtained in the second wave, ordered in terms of their relevance to the model. Specifically, it visualizes the average contribution of each variable to the risk estimation for first-onset PMDD in the sample. The largest contributions were provided by BRS-item 6, employment status, two variables concerning levels of perceived stress due to pandemic-related conditions, and BRS-item 3, followed by levels of satisfaction with usual sleep before the pandemic and dispositional perception of being supported by relatives or household members when facing difficulties. More limited contributions were provided by smoking and current use of medications for medical diseases, while having experienced a loved one's hospitalization provided the smallest average contribution, playing only a marginal role in the model. The variables provided a similar average contribution to the model in both waves, except for a partial increase in the contribution of employment status in the second wave, determined by a higher prevalence of being an undergraduate student in the second than in the first wave.
Fig. 3

Variables included in the final ML predictive model and average of the absolute SHAP values for each variable, ordered by their relevance to the model (test dataset, second wave).

The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of first-onset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with “BRS-item 6”; higher levels of “Being scared of transmitting COVID-19”; being an undergraduate student (“Employment status”); higher disagreement with “BRS-item 3”; higher levels of “Being stressed by pandemic-related restrictions on activities and personal movement ”; lower levels of “satisfaction with the usual sleep before the pandemic”; higher disagreement with “perception of being supported..”; having continued or started smoking (“Smoking habit during the pandemic”); yes (“current medications for medical diseases”); and yes (“Having experienced a loved one's hospitalization”)

ML: machine learning; SHAP: SHapley Additive exPlanations technique.

Variables included in the final ML predictive model and average of the absolute SHAP values for each variable, ordered by their relevance to the model (test dataset, second wave). The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of first-onset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with “BRS-item 6”; higher levels of “Being scared of transmitting COVID-19”; being an undergraduate student (“Employment status”); higher disagreement with “BRS-item 3”; higher levels of “Being stressed by pandemic-related restrictions on activities and personal movement ”; lower levels of “satisfaction with the usual sleep before the pandemic”; higher disagreement with “perception of being supported..”; having continued or started smoking (“Smoking habit during the pandemic”); yes (“current medications for medical diseases”); and yes (“Having experienced a loved one's hospitalization”) ML: machine learning; SHAP: SHapley Additive exPlanations technique. Fig. 2 presents the SHAP values obtained in the first wave. The levels of each variable plotted against the associated SHAP values in the second wave are presented in Fig. 4 to visually represent the relationship between each variable and the risk of having first-onset PMDD as modeled by the algorithm. Plots for the first wave are available on request.
Fig. 4

Levels of the variables plotted against the associated SHAP values in the second wave.

SHAP: SHapley Additive exPlanations technique.

Levels of the variables plotted against the associated SHAP values in the second wave. SHAP: SHapley Additive exPlanations technique.

Discussion

The study data were obtained from the first (May 18 to June 20, 2020) and second (September 15 to October 20, 2020) waves of an online survey conducted among Italy's general population during the pandemic. We had two novel aims in our study. First, we estimated first-onset PMDD over the first eight months of the pandemic in Italians who had no clinician-diagnosed CPsyDs. Second, we produced an ML model to predict the emergence of first-onset PMDD in an independent sample of participants in the second wave after it was trained on data from the first wave. The main advantage of using the ML approach was its capability to identify the most relevant potential predictors for first-onset PMDD among a very large array of interrelated individual and psychosocial variables.

First-onset PMDD

We identified 7.4% of participants in the first wave and a further 7.2% in the second as meeting the criteria for first-onset PMDD. These rates were obtained by applying the DSM-IV criteria-based diagnostic algorithm to self-reported PHQ-9 scores to maximize the specificity of depression screening compared with the use of a score cutoff threshold of ≥10 (He et al., 2020; Spitzer et al., 1999). Our findings suggest an increase in MDD among the general population of Italy during the pandemic, considering the pre-pandemic estimate of an approximately 3–5% 12-month MDD prevalence in Italy (Girolamo et al., 2006; National Centre for Disease Prevention and Health Promotion (CNaPPS), 2019; Osservatorio Nazionale sulla Salute nelle Regioni Italiane, 2019). Likewise, PHQ-based pre-pandemic screenings of depression in the general Italian population provided prevalence rates of 2.5% (Osservatorio Nazionale sulla Salute nelle Regioni Italiane, 2019) and 6% (National Centre for Disease Prevention and Health Promotion (CNaPPS), 2019), which were lower than rates we identified during the pandemic. However, those studies did not use the PHQ-9 instrument. Finally, the lack of epidemiological data for Italy concerning the pre-pandemic incidence of MDD prevented us from providing direct pre- versus during- pandemic incidence comparisons. Considering the shortcomings of the self-report MDD screening relative to the pre-pandemic MDD rates obtained via clinical interviews (Girolamo et al., 2006), our findings pointed to a possible contribution of the pandemic to raising the MDD rates among the Italians. In most other Italian studies conducted during the pandemic, the rates at which the general population scored above the clinical threshold for self-reported depression were higher than the rates we found, ranging from 20% to 40%, approximately (Bruno et al., 2020; Mazza et al., 2020; Mencacci and Salvi, 2021; Roma et al., 2020). This discrepancy may have been due to the use of different psychometric tools or the application of the PHQ-9 cutoff score threshold, which produces higher sensitivity but lower specificity in detecting a major depressive episode than the PHQ-9 diagnostic algorithm that we used (He et al., 2020; Spitzer et al., 1999). Finally, we were specifically interested in the first onset of primary MDD, we excluded participants with CPsyDs and past major depression, but the comparison studies were more inclusive. Therefore, the higher depression rates identified in previous Italian studies may have included depressive conditions before the pandemic, were secondary to other current psychiatric disorders, or were recurrent depression in previously remitted people. Similar factors, in addition to social and cultural differences, may explain the higher rates of self-reported depression that were found in most other studies worldwide (Adu et al., 2021; Cénat et al., 2021; Ettman et al., 2020; Hao et al., 2020; Lakhan et al., 2020; Liu et al., 2020a, Liu et al., 2020b; Luo et al., 2020; Salari et al., 2020; Solomou and Constantinidou, 2020; Wu et al., 2021). Bearing all these factors in mind, it seems that the rates of first-onset PMDD among Italians during the COVID-19 pandemic deserve consideration.

ML predictive model

Our final best ML predictive model incorporated ten variables and displayed a sensitivity of 76.5% and a specificity of 77.8% when tested during the second wave, suggesting a good prediction performance for first-onset PMDD in independent samples of Italians during the pandemic. These results suggest that the same variables that had contributed to the emergence of first-onset PMDD at the beginning of the pandemic may have continued to affect Italians even in the following months. Participants in the two waves significantly differed in some of the 46 variables used as potential predictors, possibly due to changes in pandemic-related conditions and situations over time. However, the good performance of the algorithm in the second wave makes it highly unlikely that changes in the distribution of variables between the first and the second wave may have significantly affected the predictive capability of the algorithm. The largest contributions to the prediction among the model's variables came from individual features that preceded the pandemic, namely low “ability to bounce back or recover from” setbacks or stressful events, being an undergraduate student, and being unsatisfied with usual sleep before the pandemic. Two other factors directly related to the pandemic were highly perceived stress regarding the possibility of spreading the infection to others and in response to measures restricting personal activities and movement. A lower but significant contribution was also provided by the pre-pandemic dispositional perception of being poorly supported by relatives or household members in facing difficulties. Finally, being an active smoker, including continuing a pre-existing smoking habit or starting smoking during the pandemic, and taking medications for medical disease treatment contributed to the prediction, although to a more limited extent than the other predictors. “Having experienced a loved one's hospitalization” displayed the smallest average contribution, playing only a marginal role in the model. Our results are consistent with other findings around the world that find low resilience is a risk factor for poorer mental health outcomes in the general population, including depressive symptoms, after wide-scale stressors, such as natural disasters (Blackmon et al., 2017; Osofsky et al., 2011; Shenesey and Langhinrichsen-Rohling, 2015), and the COVID-19 pandemic (Landi et al., 2020; Lenzo et al., 2020; Liu et al., 2020b; Prout et al., 2020; Song et al., 2021). Our previous work has identified low resilience as a predictive factor in developing different new-onset psychiatric disorders during the COVID-19 pandemic among Italians with no psychiatric history (Caldirola et al., 2022). Although different resilience assessment questionnaires have been used in different studies, overall results support the idea that low resilience may be a non-specific vulnerability factor in different psychiatric disorders, including depression, following exposure to severe stressors, such as the ongoing pandemic. However, resilience is currently conceptualized as a complex and dynamic quality, including multiple resilience factors that range from neurobiological and psychological features of the individual to the social context and relationship networks (Ayed et al., 2019; Kageyama et al., 2021; Kalisch et al., 2019; Perna et al., 2020; Roeckner et al., 2021). In line with this, we also found that preceding low levels of support by relatives or household members in difficult situations was as an additional significant predictor of first-onset PMDD. Because we used a single resilience questionnaire that only explores a specific aspect of this complex construct, future studies with broader resilience assessments may identify other resilience factors relevant to mental health during large-scale, long-lasting stressful events. Being an undergraduate student was the only occupational status relevant to the predictive model. This finding supports and broadens previous findings in different countries of higher pandemic-related depressive symptoms among students than in other employment groups (González-Sanguino et al., 2020; Lei et al., 2020; Olagoke et al., 2020). It is also consistent with the multiple reports of substantial rates of psychiatric symptoms and disorders among university students that have been published during the COVID-19 pandemic (Caldirola et al., 2022; Dogan-Sander et al., 2021; Li et al., 2021; McLafferty et al., 2021). Several pandemic-related changes affected students' mental well-being, including difficulty adjusting to online classes, self-regulated learning, and self-motivation, as well as daily physical isolation, concerns about decreased practical learning experience, perceived increases in university workload, and worry regarding capacity to successfully meet academic criteria (Conceição et al., 2021; Dogan-Sander et al., 2021; Guse et al., 2021; Matos Fialho et al., 2021). These student-specific stressors might have influenced the occurrence of first-onset PMDD in this particular population subgroup, considering that the younger population is usually more vulnerable to depression than the older people (American Psychiatric Association, 2013). Therefore, strategies to improve students' mental well-being during the ongoing pandemic or future crises should be implemented at universities, including psychological and educational support and resilience programs (Akeman et al., 2020). Low satisfaction with usual sleep before the pandemic contributed to the prediction of first-onset PMDD. This finding is consistent with longitudinal studies that identified sleep complaints or disorders in non-depressed people as a significant risk factor for later development of depression (Baglioni et al., 2011; Byrne et al., 2019; Fang et al., 2019; Li et al., 2016; Nutt et al., 2008). Hence, a personal sleep-related vulnerability to depression might have contributed to the emergence of first-onset PMDD in a portion of the population under pandemic-associated stressful conditions. Public awareness campaigns of the importance of satisfactory sleep to mental health, including education on sleep hygiene and easy-to-implement strategies to develop favourable sleep habits and manage insomnia, might help prevent depression in vulnerable people during a global crisis. Two other important predictive variables for first-onset PMDD were the higher levels of perceived stress regarding common pandemic-related issues, namely the possibility of transmitting COVID-19 to others and the restriction of personal autonomy. We previously showed the same variables are significant predictors of different new-onset psychiatric disorders among Italians during the pandemic (Caldirola et al., 2022). Thus, the subjective emotional responses to unexpected stressors may potentially be non-specific risk factors for developing mental disorders, including MDD. Even though decreasing personal hyperactivity to stressful issues would require psychological treatment, more vulnerable people might benefit from supportive public campaigns to manage related difficulties during public health crises. The contribution of active smoking to our predictive model is in line with findings for smoking as a potential risk factor for depression, which has multiple possible mechanisms, including oxidative stress, chronic inflammation, neural damage, and neurotransmission impairment (Hahad et al., 2021). Therefore, strengthening information campaigns regarding smoking's detrimental effects on mental health and encouraging people to decrease or not to start smoking, especially under highly stressful conditions, could play a part in counteracting the development depression during public emergencies. Finally, our finding that taking medication for medical diseases is a predictor for first-onset PMDD is consistent with the well-known bidirectional association between physical illnesses and depression (Roohi et al., 2021; Thom et al., 2019). Although further confirmation is needed, self-reported medication use for physical diseases may be a more reliable proxy for the presence of true medical conditions relative to self-reported medical diagnoses. Our finding suggests that careful mental health monitoring may be called for among people with medical diseases during large-scale stressful events. Sex did not contribute to the prediction of first-onset PMDD. It seems unexpected due to the usual higher prevalence of MDD among women relative to men (American Psychiatric Association, 2013). Although a large part of participants in our study were women, the ML technique we used was suitable to take into account this sex distribution imbalance without an expected significant impact on the identification of sex as a relevant predictor. Our result may be explained by the use of the MRMR technique to maximize the relevance of each variable to the outcome of interest and minimize its redundancy among a large array of interrelated variables. Therefore, being female may be individually associated with a higher risk of depression onset, but it may have been excluded from the final predictive ML model because it was associated with and redundant relative to other variables highly relevant to the model. In line with this, gender is not relevant to the prediction of PHQ-9 score cutoff threshold of ≥10 during the pandemic also in the only other study that used an ML approach (Prout et al., 2020). The same reason may partly explain our finding that having had a clinician-diagnosed psychiatric disorder only in the past is not a predictor of first-onset PMDD. This finding may also suggest that a personal vulnerability to other psychiatric disorders does not necessarily confer an increased risk of first-onset PMDD in the general population, at least under the conditions analyzed in this study.

Strengths and limitations of this study

The strengths of this study include the use of an ML approach, which is particularly suitable for developing a predictive model among large and complex data sets, and the application of the SHAP technique, which allowed us to identify the importance of each variable for the prediction. It should be noted that these characteristics of the ML approach make it remarkably promising for future studies in the psychiatric field, considering that psychiatric disorders are highly complex conditions, involving an interplay of multiple individual, environmental, and genetic features and risk factors. Finally, the longitudinal design of this study enabled us to recognize a set of variables that continue to exert their influence on first-onset PMDD for two periods of the pandemic. Likewise, some limitations are present. The sample size was limited due to the restrictive inclusion/exclusion criteria. Probably due to the involvement of official institutional sites in the recruitment, most participants were from north-western Italy. Hence, we were unable to include geographical distribution as a potential predictor in the model. Considering that sociodemographic and economic differences across the country exist, and north-western Italy has been particularly affected by the COVID-19, especially at the beginning of the pandemic, that limitation can make our results not applicable to the general population of Italy. The rates of CPsyDs and past major depression, which were exclusion criteria in this study, were particularly high, suggesting that participants in our study may be not representative of the entire general Italian population. All participants in the survey were ≥18 years, so no data were collected on younger people. The self-report nature of the entire survey cannot exclude inaccuracy or subjective bias among participants, even concerning the main inclusion/exclusion criteria, such as psychiatric history. Indeed, we cannot exclude that people who reported not to have CPsyDs, or not to have had past major depression, could have had undiagnosed psychiatric conditions, representing a relevant limitation of the study. Likewise, although we used the most conservative method of MDD self-assessment available, we could not exclude the consideration that at least part of the PHQ-9-based first-onset PMDDs was not related to true clinical MDD diagnoses. Because our survey explored a large array of variables with multiple questionnaires, we simplified the evaluation of each variable, so in depth-details on several personal features, experiences, and behaviors were lacking. Moreover, due to the length of the survey and the high probability that information on some specific topics was unreliable, we did not include questions concerning other known vulnerability factors, such as childhood trauma or family history for depression, which have a predictive value for first-onset MDD in a pre-pandemic algorithm developed in the US general population (Wang et al., 2014). Finally, our predictive model is based on data that only covers 2020. However, the pandemic is still ongoing. The global scenario has partly changed, including the administration of vaccines and a substantial decrease of restrictions on activities and personal movement in the general Italian population. Further prospective studies should use data from the later pandemic phases in 2021 to assess the course of depression, test the model's validity, and explore whether other predictive factors may have played a part in first-onset PMDD.

Conclusions

This study identified considerable rates of first-onset PMDD during the first eight months of the pandemic among Italian adults without CPsyDs and developed an ML model with a good predictive capability in two independent samples. The model's predictive variables could be used to develop goals for preventive interventions during the ongoing pandemic or future public health crises, to decrease depression risk among the general population.

Disclosure

Daniela Caldirola, Silvia Daccò, Francesco Cuniberti, Massimiliano Grassi, Alessandra Alciati, and Giampaolo Perna are scientific consultants for Medibio LTD. Francesco Cuniberti has served as consultant for Menarini Industrie Farmaceutiche Riunite. Giampaolo Perna has served as consultant for Menarini Industrie Farmaceutiche Riunite, Lundbeck and Pfizer.

CRediT authorship contribution statement

DC: Conceptualization; Methodology; Investigation; Supervision; Writing-original draft-review and editing. SD: Conceptualization; Methodology; Investigation; Writing-original draft-review and editing. FC: Investigation; Formal analysis; MG: Methodology; Software; Formal analysis; AA: Investigation; Writing-review and editing; TT: Investigation; Writing-review and editing; GP: Supervision; Writing-review and editing. All authors have approved the final article.

Declaration of competing interest

The authors report no conflicts of interest in this work. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
  70 in total

1.  Deepwater horizon oil spill: mental health effects on residents in heavily affected areas.

Authors:  Howard J Osofsky; Joy D Osofsky; Tonya C Hansel
Journal:  Disaster Med Public Health Prep       Date:  2011-12       Impact factor: 1.385

2.  How concerned should we be about neurotropism of SARS-Cov-2? A brief clinical consideration of the possible psychiatric implications.

Authors:  Domenico De Berardis
Journal:  CNS Spectr       Date:  2020-12-10       Impact factor: 3.790

3.  Predicting Psychological State Among Chinese Undergraduate Students in the COVID-19 Epidemic: A Longitudinal Study Using a Machine Learning.

Authors:  Fenfen Ge; Di Zhang; Lianhai Wu; Hongwei Mu
Journal:  Neuropsychiatr Dis Treat       Date:  2020-09-17       Impact factor: 2.570

4.  The Italian COVID-19 Psychological Research Consortium (IT C19PRC): General Overview and Replication of the UK Study.

Authors:  Giovanni Bruno; Anna Panzeri; Umberto Granziol; Fabio Alivernini; Andrea Chirico; Federica Galli; Fabio Lucidi; Andrea Spoto; Giulio Vidotto; Marco Bertamini
Journal:  J Clin Med       Date:  2020-12-25       Impact factor: 4.241

5.  More Depressive Symptoms, Alcohol and Drug Consumption: Increase in Mental Health Symptoms Among University Students After One Year of the COVID-19 Pandemic.

Authors:  Ezgi Dogan-Sander; Elisabeth Kohls; Sabrina Baldofski; Christine Rummel-Kluge
Journal:  Front Psychiatry       Date:  2021-12-16       Impact factor: 4.157

6.  Depression, anxiety and suicidal behaviour among college students: Comparisons pre-COVID-19 and during the pandemic.

Authors:  Margaret McLafferty; Natasha Brown; Rachel McHugh; Caoimhe Ward; Ailis Stevenson; Louise McBride; John Brady; Anthony J Bjourson; Siobhan M O'Neill; Colum P Walsh; Elaine K Murray
Journal:  Psychiatry Res Commun       Date:  2021-11-06

7.  A Nationwide Survey of Psychological Distress among Italian People during the COVID-19 Pandemic: Immediate Psychological Responses and Associated Factors.

Authors:  Cristina Mazza; Eleonora Ricci; Silvia Biondi; Marco Colasanti; Stefano Ferracuti; Christian Napoli; Paolo Roma
Journal:  Int J Environ Res Public Health       Date:  2020-05-02       Impact factor: 3.390

8.  Prevalence of Depression Symptoms in US Adults Before and During the COVID-19 Pandemic.

Authors:  Catherine K Ettman; Salma M Abdalla; Gregory H Cohen; Laura Sampson; Patrick M Vivier; Sandro Galea
Journal:  JAMA Netw Open       Date:  2020-09-01

9.  COVID-19 pandemic and mental health consequences: Systematic review of the current evidence.

Authors:  Nina Vindegaard; Michael Eriksen Benros
Journal:  Brain Behav Immun       Date:  2020-05-30       Impact factor: 7.217

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.