| Literature DB >> 32427830 |
Yuta Takahashi1,2,3, Masao Ueki4,5, Makoto Yamada5, Gen Tamiya6,4,5, Ikuko N Motoike4,7, Daisuke Saigusa6,4, Miyuki Sakurai4, Fuji Nagami4, Soichi Ogishima6,4, Seizo Koshiba6,4, Kengo Kinoshita4,7,8, Masayuki Yamamoto6,4, Hiroaki Tomita9,10,11.
Abstract
To solve major limitations in algorithms for the metabolite-based prediction of psychiatric phenotypes, a novel prediction model for depressive symptoms based on nonlinear feature selection machine learning, the Hilbert-Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) algorithm, was developed and applied to a metabolomic dataset with the largest sample size to date. In total, 897 population-based subjects were recruited from the communities affected by the Great East Japan Earthquake; 306 metabolite features (37 metabolites identified by nuclear magnetic resonance measurements and 269 characterized metabolites based on the intensities from mass spectrometry) were utilized to build prediction models for depressive symptoms as evaluated by the Center for Epidemiologic Studies-Depression Scale (CES-D). The nested fivefold cross-validation was used for developing and evaluating the prediction models. The HSIC Lasso-based prediction model showed better predictive power than the other prediction models, including Lasso, support vector machine, partial least squares, random forest, and neural network. L-leucine, 3-hydroxyisobutyrate, and gamma-linolenyl carnitine frequently contributed to the prediction. We have demonstrated that the HSIC Lasso-based prediction model integrating nonlinear feature selection showed improved predictive power for depressive symptoms based on metabolome data as well as on risk metabolites based on nonlinear statistics in the Japanese population. Further studies should use HSIC Lasso-based prediction models with different ethnicities to investigate the generality of each risk metabolite for predicting depressive symptoms.Entities:
Mesh:
Year: 2020 PMID: 32427830 PMCID: PMC7237664 DOI: 10.1038/s41398-020-0831-9
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Model characteristics.
| Our model (HSIC Lasso + SVM/KR) | HSIC Lasso | Lasso | SVM/KR | Random forest | Partial least squares | Sparse partial least squares | Multiple linear/logistic regression | |
|---|---|---|---|---|---|---|---|---|
| Prediction | Not performed | |||||||
| Feature selection | Not performed | Not performed | Not performed | Not performed | ||||
| Nonlinear association between variables | Not detectable | Not detectable | Not detectable | Not detectable |
Notes: Preferable properties for handling metabolomic data are in bold.
HSIC Lasso Hilbert–Schmidt Independence Criterion Lasso, SVM support vector machine, KR kernel regression.
Demographic information.
| High CES-D | Low CES-D | ||
|---|---|---|---|
| CES-D range | ≥16 | ≤15 | |
| CES-D, mean (SD) | 22.2 (6.4) | 9.8 (3.8) | 1.26 × 10−106 |
| Subjects | 298 | 599 | |
| Percentage of females | 64.0% | 54.5% | 7.87 × 10−3 |
| Age, mean (SD) | 56.8 (11.7) | 58.2 (11.6) | 0.105 |
| BMI, mean (SD) | 23.50 (4.13) | 23.49 (3.31) | 0.972 |
| Married | 226 (75.83%) | 510 (85.14%) | 7.28 × 10−3 |
| Widowed | 26 (8.72%) | 33 (5.50%) | |
| Divorced | 18 (6.04%) | 19 (3.17%) | |
| Single | 28 (9.39%) | 37 (6.17%) | |
| Total collapse | 75 (25.2%) | 75 (12.5%) | 8.80 × 10−5 |
| Large-scale damage | 36 (12.1%) | 74 (12.3%) | |
| Half-scale damage | 38 (12.8%) | 82 (13.6%) | |
| Small-scale damage | 99 (33.2%) | 239 (39.8%) | |
| No damage | 50 (16.8%) | 129 (21.5%) | |
| Medication | |||
| Antidepressants | 9 (3.0%) | 0 (0.0%) | 4.54 × 10−5 |
| Hypnotics | 57 (19.1%) | 18 (3.0%) | 2.43 × 10−15 |
| Anxiolytics | 94 (31.5%) | 21 (3.5%) | 1.22 × 10−30 |
| LSNS-6 score, mean (SD) | 14.0 (5.8) | 16.23 (5.4) | 2.30 × 10−8 |
| Social capital score, mean (SD) | 5.7 (2.9) | 4.44 (2.4) | 5.27 × 10−12 |
| Gap time between the GEJE and measurement of CES-D (months), mean (SD) | 27.3 (1.0) | 27.4 (1.0) | 0.111 |
| 1. Intrusive images or nightmares | 102 (34.2%) | 49 (8.1%) | 1.91 × 10−21 |
| 2. Emotionally upset when reminded of the GEJE | 102 (34.2%) | 50 (8.3%) | 5.63 × 10−21 |
| 3. Physiological reactions when reminded of the GEJE | 45 (15.1%) | 16 (2.6%) | 2.84 × 10−11 |
| 4. Avoidance of reminders associated with the GEJE | 86 (28.8%) | 67 (11.1%) | 1.32 × 10−10 |
| 5. Interference with everyday life | 36 (12.0%) | 8 (1.3%) | 1.41 × 10−11 |
CES-D Center for Epidemiologic Studies—Depression Scale, SD standard deviation, BMI body mass index, PTSD posttraumatic stress disorder, GEJE the 2011 Great East Japan Earthquake and Tsunami.
aP values were calculated using Student’s t tests for CES-D, age, BMI, LSNS-6 score, social capital score, and the gap time between the 2011 Great East Japan Earthquake and the CES-D measurement. P values were calculated using Fisher’s exact tests for the percentage of females, marital status, house damage from the 2011 Great East Japan Earthquake and Tsunami, medication, and self-reported PTSD symptoms.
bSelf-reported PTSD symptoms show the number of subjects who answered “Yes” to the following questions in the questionnaire. “Below is a list of problems that people sometimes have after experiencing a traumatic event. Have you experienced the following problems two times or more within one week? 1. Unwanted upsetting memories about the GEJE or bad dreams or nightmares related to the GEJE. 2. Feeling very emotionally upset when reminded of the GEJE. 3. Having physical reactions when reminded of the GEJE (for example, sweating or heart racing). 4. Trying to avoid thoughts or feelings related to the GEJE or trying to avoid activities, situations, or places that remind you of the GEJE or that feel more dangerous since the GEJE. 5. The difficulties have been interfering with your everyday life.” These questions were based on the report by Itoh et al.[46], which validated a new short version of the Posttraumatic Diagnostic Scale[47] among Japanese people.
Fig. 1Predictive power for quantitative CES-D scores.
Boxplots show the predictive powers in the fivefold cross-validations of each prediction model utilizing CES-D scores as response variables and metabolites and other covariates as predictive variables. Abbreviations: CES-D Center for Epidemiologic Studies-Depression Scale, HSIC Hilbert–Schmidt independence criterion, Lasso least absolute shrinkage and selection operator, KR kernel regression, SPLS sparse partial least squares, KR P < 0.05 kernel regression with P < 0.05 variables, Lasso + KR kernel regression with variables selected by Lasso, MLR P < 0.05 multiple linear regression with P < 0.05 variables, PLS partial least squares, MLR all multiple linear regression with all variables, KR covariates kernel regression with only covariates, MLR covariates multiple linear regression with only covariates, PCC predictive correlation coefficient.
Fig. 2Predictive power for binary CES-D traits.
Boxplots show the predictive power in the fivefold cross-validations of each prediction model utilizing binary CES-D traits as response variables and metabolites and other covariates as predictive variables. Abbreviations: CES-D Center for Epidemiologic Studies-Depression Scale, HSIC Hilbert–Schmidt independence criterion, Lasso least absolute shrinkage and selection operator, SVM support vector machine, SPLS sparse partial least squares, MLR P < 0.05 multiple logistic regression with P < 0.05 variables, SVM P < 0.05 support vector machine with P < 0.05 variables, Lasso + SVM support vector machine with variables selected by Lasso, PLS partial least squares, MLR all multiple linear regression with all variables, SVM covariates support vector machine with only covariates, MLR covariates multiple logistic regression with only covariates, AUC area under the curve.
Frequently selected metabolites in feature selection models by fivefold cross-validation and their P values and regression coefficients in multiple regression analyses.
| CES-D score | Binary CES-D traits | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| HSIC Lasso + KR | Lasso | Multiple linear regression <0.05 | RCa | HSIC Lasso + SVM | Lasso | Multiple logistic regression <0.05 | RCa | |||
| Number of selected metabolites for prediction (mean ± standard deviation) | 9.6 ± 2.8 | 13.4 ± 6.4 | 16.0 ± 4.0 | NA | NA | 9.4 ± 6.2 | 23.4 ± 6.7 | 26.4 ± 8.0 | NA | NA |
| 3-Hydroxyisobutyrate | 6.67 × 10−4 | −0.832 | 1.78 × 10−4 | −0.320 | ||||||
| NMR | ||||||||||
| Gamma-linolenyl carnitine | 1.44 × 10−3 | −0.758 | 5.20 × 10−3 | −0.243 | ||||||
| MS C18 | ||||||||||
|
| 3/5 | 3.52 × 10−3 | −0.753 | 1.06 × 10−3 | −0.286 | |||||
| MS C18 | ||||||||||
| Uric acid | 0/5 | 3.03 × 10−4 | −1.024 | 0/5 | 9.61 × 10−3 | −0.239 | ||||
| MS C18 | ||||||||||
|
| 1/5 | 1/5 | 9.19 × 10−3 | −0.679 | 0/5 | 0/5 | 1.82 × 10−2 | −0.206 | ||
| MS C18 | ||||||||||
Abbreviations: CES-D Center for Epidemiologic Studies-Depression Scale, HSIC Lasso Hilbert–Schmidt independence criterion lasso, KR kernel regression, RC regression coefficients, SVM support vector machine, NMR nuclear magnetic resonance spectroscopy, MS C18 mass spectrometry in C18 mode.
The frequencies are shown as n/5, which means that the metabolites were utilized for prediction n times out of five replicated feature selections.
aP value and regression coefficients are adjusted by sex, age, body mass index, marital status, the degree of damage from the Great East Japan Earthquake, antidepressant use, Lubben Social Network Scale 6, and social capital scale in multiple linear/logistic regression.
bIn the jMorp metabolomic database, l-gamma-glutamyl-l-leucine and l-gamma-glutamyl-l-isoleucine were not differentiated for one of the features selected by P values from multiple regression, and standard reagents, i.e., H-Glu(Leu-OH)-OH (BACHEM, Budendorf, Switzerland) and l-gamma-glutamyl-l-isoleucine (Santa Cruz Biotechnology, Heidelberg, Germany), were utilized to determine l-gamma-glutamyl-l-leucine as the detected metabolite for the feature in the current study.
Dependencies among CES-D score, metabolites, and covariates based on HSIC statistics.
| Response variable | Metabolites selected in both nonlinear and linear models | Metabolites selected only in linear models | Metabolite selected only by | |||
|---|---|---|---|---|---|---|
| CES-D score | 3-Hydroxyisobutyrate | Gamma-linolenyl carnitine | Uric acid | |||
| NMR | MS C18 | MS C18 | MS C18 | MS C18 | ||
| Sex | 0.9 | 5.6 | 0.7 | 15.7 | 29.5 | 15.8 |
| Age | 1.0 | 1.3 | 1.5 | 1.5 | 1.7 | 3.0 |
| BMI | 1.0 | 2.3 | 0.3 | 3.6 | 4.7 | 2.8 |
| Marital status | 0.6 | 0.1 | 0.5 | 0.2 | 0.1 | 0.3 |
| Damage from the Great East Japan Earthquake | 0.9 | 0.5 | 0.1 | 0.6 | 0.5 | 0.5 |
| Antidepressants | 0.4 | 0.3 | 0.1 | 0.2 | 0.0 | 0.1 |
| LSNS-6 | 1.6 | 0.2 | 0.2 | 0.3 | 0.5 | 0.1 |
| Social capital | 1.4 | 0.6 | 0.4 | 0.3 | 0.2 | 0.2 |
| sum | 7.2 | 11.1 | 4.1 | 22.7 | 37.6 | 23.2 |
| 3-Hydroxyisobutyrate | ||||||
| NMR | 1.2 | NA | 0.6 | 6.9 | 2.7 | 5.7 |
| Gamma-linolenyl carnitine | ||||||
| MS C18 | 0.9 | 0.6 | NA | 1.0 | 0.7 | 2.3 |
|
| ||||||
| MS C18 | 0.9 | 6.9 | 1.0 | NA | 6.1 | 24.4 |
| Uric acid | ||||||
| MS C18 | 0.3 | 2.7 | 0.7 | 6.1 | NA | 8.1 |
|
| ||||||
| MS C18 | 0.7 | 5.7 | 2.3 | 24.4 | 8.1 | NA |
Abbreviations: HSIC Hilbert–Schmidt independence criterion, CES-D Center for Epidemiologic Studies-Depression Scale, BMI body mass index, LSNS-6 Lubben Social Network Scale 6, NMR nuclear magnetic resonance spectroscopy, MS C18 mass spectrometry in C18 mode.