| Literature DB >> 31699963 |
Sung Yeon Sarah Han1, Jason D Cooper1,2, Sureyya Ozcan1,3, Nitin Rustogi1, Brenda W J H Penninx4, Sabine Bahn5.
Abstract
Individuals with subthreshold depression have an increased risk of developing major depressive disorder (MDD). The aim of this study was to develop a prediction model to predict the probability of MDD onset in subthreshold individuals, based on their proteomic, sociodemographic and clinical data. To this end, we analysed 198 features (146 peptides representing 77 serum proteins (measured using MRM-MS), 22 sociodemographic factors and 30 clinical features) in 86 first-episode MDD patients (training set patient group), 37 subthreshold individuals who developed MDD within two or four years (extrapolation test set patient group), and 86 subthreshold individuals who did not develop MDD within four years (shared reference group). To ensure the development of a robust and reproducible model, we applied feature extraction and model averaging across a set of 100 models obtained from repeated application of group LASSO regression with ten-fold cross-validation on the training set. This resulted in a 12-feature prediction model consisting of six serum proteins (AACT, APOE, APOH, FETUA, HBA and PHLD), three sociodemographic factors (body mass index, childhood trauma and education level) and three depressive symptoms (sadness, fatigue and leaden paralysis). Importantly, the model demonstrated a fair performance in predicting future MDD diagnosis of subthreshold individuals in the extrapolation test set (AUC = 0.75), which involved going beyond the scope of the model. These findings suggest that it may be possible to detect disease indications in subthreshold individuals up to four years prior to diagnosis, which has important clinical implications regarding the identification and treatment of high-risk individuals.Entities:
Mesh:
Year: 2019 PMID: 31699963 PMCID: PMC6838310 DOI: 10.1038/s41398-019-0623-2
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Sociodemographic and health characteristics of individuals in the training set patient group (first-episode MDD patients), the extrapolation test set patient group (subthreshold symptomatic individuals who developed MDD within two or four years) and the shared reference group (subthreshold symptomatic individuals who did not develop MDD within four years)
| Shared reference group | Patient group | ||
|---|---|---|---|
| Training set | Extrapolation test set | ||
| 86 | 86 | 37 | |
| Sex % (male/female) | 35/65 | 48/52 | 32/68 |
| Age (years) | 37.8 (14.1) | 41.8 (12.2) | 38.5 (14) |
| Body mass index (kg/m2) | 23.8 (4.4) | 26.7 (6) | 25.6 (5) |
| Education, % (basic/intermediate/high) | 8/42/50 | 6/67/27 | 3/59/38 |
| Physical activity, % (low/moderate/high) | 23/48/29 | 30/44/26 | 27/41/32 |
| Smoking, % (yes/no) | 31/69 | 38/62 | 32/68 |
| Alcohol abuse, % (yes/no) | 21/79 | 40/60 | 19/81 |
| Weekly alcohol consumption (number of drinks per week) | 8 (11) | 6.8 (11) | 6.3 (7.4) |
| Recreational drug use (past month), % (yes/no) | 7/93 | 8/92 | 8/92 |
| Partner, % (yes/no) | 69/31 | 57/43 | 70/30 |
| Children, % (yes/no) | 44/56 | 48/52 | 51/49 |
| Employment, % (employed/unemployed/retired/occupationally disabled) | 77/17/2/3 | 60/17/2/20 | 65/30/0/5 |
| Absent from work due to health problems (past 6 months), % (yes/no/not applicable) | 43/35/22 | 41/21/38 | 35/32/32 |
| Childhood life event index score | 0.3 (0.5) | 0.2 (0.5) | 0.2 (0.5) |
| Childhood trauma index score | 0.5 (0.9) | 1 (1.3) | 0.8 (1.1) |
| Number of negative life events (past year) | 1 (1) | 1 (1.1) | 0.9 (1) |
| Family history, % (yes/no) | 73/27 | 87/13 | 84/16 |
| Heart disease, % (yes/no) | 1/99 | 5/95 | 5/95 |
| Diabetes, % (yes/no) | 3/97 | 6/94 | 11/89 |
| Other chronic disease, % (yes/no) | 26/74 | 40/60 | 22/78 |
| Anti-inflammatory drug, % (yes/no) | 2/98 | 8/92 | 8/92 |
| Heart medication, % (yes/no) | 10/90 | 23/77 | 8/92 |
| IDS30 total score | 14.9 (7.4) | 37.4 (11.5) | 20.2 (8.8) |
IDS inventory of depressive symptomatology, MDD major depressive disorder
Numerical features are shown as the mean (standard deviation)
Fig. 1Feature selection across 100 models obtained from repeated application of group LASSO regression with tenfold cross-validation on the training set.
a Analysis 1: model selection including IDS30 total score (198 features). b Analysis 2: model selection excluding IDS30 total score (197 features). (i) The number of features selected in each model. (ii) Selection fractions of each feature
Features included in the two prediction models
| Feature | Model | Selection fraction | Average coefficient |
|---|---|---|---|
| AACT_ADLSGITGAR | 2 | 1.00 | 0.122 |
| APOE_ALMDETMK | 2 | 0.99 | −0.195 |
| APOH_EHSSLAFWK | 2 | 1.00 | 0.08 |
| FETUA_HTLNQIDEVK | 2 | 0.97 | 0.082 |
| HBA_MFLSFPTTK | 2 | 1.00 | 0.231 |
| PHLD_NQVVIAAGR | 2 | 1.00 | 0.286 |
| BMI | 2 | 1.00 | 0.291 |
| Childhood trauma | 2 | 1.00 | 0.115 |
| Education; intermediate | 2 | 0.93 | 0.065 |
| Education; high | 2 | 0.93 | −0.055 |
| Sadness; mild | 2 | 1.00 | −0.681 |
| Sadness; moderate | 2 | 1.00 | 0.819 |
| Sadness; severe | 2 | 1.00 | 0.369 |
| Fatigue; mild | 2 | 1.00 | −0.124 |
| Fatigue; moderate | 2 | 1.00 | 0.339 |
| Fatigue; severe | 2 | 1.00 | 0.085 |
| Leaden paralysis; mild | 2 | 1.00 | −0.145 |
| Leaden paralysis; moderate | 2 | 1.00 | 0.219 |
| Leaden paralysis; severe | 2 | 1.00 | 0.272 |
| IDS30 total score | 1 | 1.00 | 0.346 |
IDS inventory of depressive symptomatology, BMI body mass index, AACT alpha-1-antichymotrypsin, APOE apolipoprotein E, APOH apolipoprotein H, FETUA fetuin-A, HBA haemoglobin subunit alpha, PHLD glycoprotein phospholipase D
Model 1 (one feature) was based on the dominant unique model in Analysis 1 (model selection including IDS30 total score), and Model 2 (12 features) was developed by implementing feature extraction and model averaging in Analysis 2 (model selection excluding IDS30 total score) in the absence of a dominant unique model. The selection fraction and the average coefficient of the features are shown. Proteomic features are represented in a Protein_Peptide format. Categorical features (education, sadness, fatigue and leaden paralysis) are represented as sets of dummy variables
Fig. 2ROC curves showing model performance in predicting the probability of MDD outcome.
Model 1 consisted of IDS30 total score and Model 2 consisted of six proteins, three sociodemographic factors and three symptoms. The prediction models were applied to predict the probability of MDD outcome in: a the training set (86 first-episode MDD patients vs 86 subthreshold individuals who did not develop MDD within four years), and b the extrapolation test set (37 subthreshold individuals who developed MDD within two or four years vs 86 subthreshold individuals who did not develop MDD within four years). AUC area under the curve, IDS inventory of depressive symptomatology, MDD major depressive disorder, ROC receiver operating characteristic
Fig. 3Disease indications of MDD represented by 12 features comprising Model 2.
The distribution of data for individuals in the training set patient group (first-episode MDD patients), the extrapolation test set patient group (subthreshold symptomatic individuals who developed MDD within two or four years), and the shared reference group (subthreshold symptomatic individuals who did not develop MDD within four years) is shown. Protein abundances are represented by the log2-transformed peptide abundance ratios. The severity of depressive symptoms is represented on scale of 0 (none) to 3 (severe). Numeric features are illustrated using boxplots, and categorical features are illustrated using bar charts