| Literature DB >> 36059048 |
Kwang-Sig Lee1, Byung-Joo Ham2.
Abstract
To review the recent progress of machine learning for the early diagnosis of depression (major depressive disorder). The source of data was 32 original studies in the Web of Science. The search terms were "depression" (title) and "random forest" (abstract). The eligibility criteria were the dependent variable of depression, the interventions of machine learning (the decision tree, the naïve Bayesian, the random forest, the support vector machine and/or the artificial neural network), the outcomes of accuracy and/or the area under the receiver operating characteristic curve (AUC) for the early diagnosis of depression, the publication year of 2000 or later, the publication language of English and the publication journal of SCIE/SSCI. Different machine learning methods would be appropriate for different types of data for the early diagnosis of depression, e.g., logistic regression, the random forest, the support vector machine and/or the artificial neural network in the case of numeric data, the random forest in the case of genomic data. Their performance measures reported varied within 60.1-100.0 for accuracy and 64.0-96.0 for the AUC. Machine learning provides an effective, non-invasive decision support system for early diagnosis of depression.Entities:
Keywords: Depression; Early diagnosis; Machine learning
Year: 2022 PMID: 36059048 PMCID: PMC9441463 DOI: 10.30773/pi.2022.0075
Source DB: PubMed Journal: Psychiatry Investig ISSN: 1738-3684 Impact factor: 3.202
Figure 1.Flow diagram.
Summary of review: methods, sample size, data type and performance measures
| ID | Methods | Sample size | Data type | Performance measures |
|---|---|---|---|---|
| 26 | Semisupervised RF | 115 | Numeric | RMSE 4.50 |
| 27 | RF | 1,549 | Numeric | Accuracy Validation 94.2 Test 93.3 |
| 28 | LR DT NB RF SVM ANN | 28,755 | Numeric | AUC RF 88.4 SVM 86.4 |
| 29 | AR EN-RF | 283 | Numeric | R2 EN-RF 0.25 AR 0.17 |
| 30 | LR RF | 22,131 | Numeric | Only Coefficient P-Values Reported |
| 31 | RF | 97 | Genomic | AUC 81.0 |
| 32 | Expert vs. RF (Feature Selection); SVM vs. RF (Classification) | 508 | Numeric | RF-RF AUC 78.0 Sensitivity 69.0 |
| 33 | RF | 126 | Genomic | Accuracy 87.3 |
| 34 | RF | 3,669 | Numeric | Accuracy 91.6 |
| 35 | Lasso RF SVM | 120 | Radiomic | Accuracy 95.0 90.0 100.0 |
| 36 | 10 Models | 620 | Numeric | RF Accuracy 89.0/91.0 Internal/External |
| 37 | 1 RF vs. 2 RFs | 135 | SNS | Early Risk Detection Error: 2 RFs 10.0% Lower vs. 1 RF |
| 38 | RF | 201 | Radiomic | Accuracy 82.4 |
| 39 | RF | 412 | Numeric | Accuracy 76.8/81.1 Imbalanced/Balanced Data |
| 40 | LR DT NB RF SVM ANN | 43 | EEG | Accuracy 90.24–97.56 |
| 41 | LR RF | 439 | Numeric | Only Coefficient P-Values Reported |
| 42 | LR RF ANN | 637 | Numeric | AUC 87.0–91.0 |
| 43 | LR RF | 150 | Genomic | Only Coefficient P-Values Reported |
| 44 | LR RF | 4,270 | Numeric | AUC 76.0-79.0 |
| 45 | RF | 111 | Numeric | Sensitivity 69.7, Specificity 76.8 |
| 46 | RF | 5,895 | Numeric | Sensitivity 86.7, Specificity 91.9 |
| 47 | EN (Feature Selection); RF (Classification) | 41 | Radiomic | Accuracy 85.4 |
| 48 | RF | 42 | Radiomic | Accuracy 71.0–78.0 |
| 49 | RF | 656 | Numeric | Sensitivity 95.0, Specificity 87.0 |
| 50 | LR RF SVM | 238 | Numeric | LR AUC 93.8 |
| 51 | Lasso RF SVM | 62 | Radiomic | Lasso SVM Accuracy 90.0 |
| 52 | RF SVM | 126 | Numeric | Accuracy RF 60.1 SVM 59.1 |
| 53 | LR RF SVM GB | 170 | Numeric | GB Accuracy 76.9 |
| 54 | LR RF GB | 573,634 | Numeric | Accuracy LR 71.7 RF 72.0 GB 72.2; AUC LR 71.7 RF 72.0 GB 72.2 |
| 55 | LR DT RF GB | 47 | Numeric | LR Accuracy 91.0 AUC 96.0 Sensitivity 92.9 Specificity 94.0 |
| 56 | RF | 84,317 | Numeric | AUC 78.9 Sensitivity 68.8–83.9 Specificity 76.0–92.2 |
| 57 | RF | 90 | Numeric | AUC 64.0 |
Different machine learning methods would be appropriate (i.e., would show the best performance measures) for different types of data for the early diagnosis of depression: 1) logistic regression, the random forest, the support vector machine and/or the artificial neural network in the case of numeric data; 2) the random forest in the case of genomic data; 3) the random forest and/or the support vector machine in the case of radiomic data; and 4) the random forest in the case of social-network-service data. Their performance measures reported varied within 60.1–100.0 for accuracy, 68.8–95.0 for sensitivity, 76.0–94.0 for specificity, and 64.0–96.0 for the AUC. ANN, artificial neural network; AR, augoregressive; AUC, area under the receiver operating characteristic curve; DT, decision tree; EEG, electroencephalogram; EN, elastic net; GB, gradient boosting; LR, logistic regression; NB, naïve bayes; RF, random forest; RMSE, root mean squared error; SNS, social network service; SNP, single nucleotide polymorphism; SVM, support vector machine
Summary of review: important predictors and whether variable importance (VI) is reported
| ID | Important predictors | VI-yes | Participants/class/predictors |
|---|---|---|---|
| 26 | Cognitive-behavioral features | Participants: 35 labeled 80 unlabeled | |
| 27 | Patient health questionnaire-9 items | Participants: university students | |
| 28 | Demographic, health-behavioral factors | Participants: pregnancy risk assessment monitoring system enrollee | |
| 29 | Comorbid psychopathology, symptom-related disability, treatment credibility, access to therapists, time spent using certain internetintervention (deprexis) modules | 1 | |
| 30 | Pain-fatigue (symptom intensity scale), comorbidity | 1 | Participants: rheumatoid arthritis patients |
| 31 | 30 Microbial markers (gut microbiota) | 1 | Predictors: 16s-ribosomal rna gene sequences |
| 32 | Psychological elasticity, depression during the third trimester, income level | 1 | Participants: women with delivery |
| 33 | Blood-derived methylome and transcriptome features | ||
| 34 | Upper body movements-postures | 1 | Participants: university students |
| 35 | 19 Features of brain connectivity | Participants: parkinson’s disease patients | |
| 36 | Demographic, health-behavioral factors | Participants: 510/110 elders for internal/external validation | |
| 37 | SNS-derived behavioral patterns | ||
| 38 | Brain connectivity within posterior cingulate cortex, within insula, between posterior cingulate cortex and insula/hippocampus-amygdala, between insula and precuneus, between superior parietal lobule and medial prefrontal cortex | 1 | Participants: 156 advanced parkinson’s disease patients and 45 normal controls (predictors: 42 brain connectivity networks) |
| 39 | Fewer contacts, fewer calls, more messages | ||
| 40 | Higuchi’s fractal dimension, sample entropy | ||
| 41 | Fluoxetine more important than cognitive-behavioural therapy, two combined more important than one | ||
| 42 | Patient-reported immune-mediated inflammatory disease measures | ||
| 43 | SNPs rs12248560, rs878567, rs17710780 | 1 | Participants: 150 depression patients on 6-month regular therapy from the psycolaus cohort (predictors: 44 snps in existing literature) |
| 44 | Psychosometric properties in general health questionnaire | ||
| 45 | Six cognitive-behavioral tasks | Class: anxiety, depression or mixed vs. Healthy | |
| 46 | Motor activity recorded in a wearable device | ||
| 47 | Prefrontal cortical activation during working memory task anticipation | Class: unipolar vs. Bipolar depression | |
| 48 | Cingulate isthmus asymmetry, pallidal asymmetry, ratio of the paracentral to precentral cortical thickness, ratio of lateral occipital to pericalcarine cortical thickness | 1 | Class: depression relapse after electroconvulsive therapy |
| 49 | 4–6 Computerized-adaptive-diagnostic-test measures | ||
| 50 | Sex, age, medical insurance, marital status, education level, household income, pathological stage, psychosocial measures (social skills rating system, pittsburgh sleep quality index, european organization for research and treatment of cancer quality of life questionnaire [QLQ-C30]) | Participants: non-hodgkin’s lymphoma patients with chemotherapy | |
| 51 | Left precuneus, left precentral gyrus, left inferior frontal cortex (pars triangularis), left cerebellum | ||
| 52 | 120 Behavioral patterns based on smartphone censors including app adherence | ||
| 53 | Whole body kinematic cues | ||
| 54 | Age, race | Participants: women with delivery | |
| 55 | Physical activity and light exposure measured by a wearable device, sleep efficiency measured in a survey | ||
| 56 | Demographic, health-behavioral factors | ||
| 57 | Self-assessed cardiac-related fear, sex, number of words to answer the first homework assignment | 1 | Class: adherence to internet-delivered psychotherapy for myocardial infarction patients’ anxiety and depression |
The following predictors would be important variables for the early diagnosis of depression: comorbid psychopathology, symptom-related disability, treatment credibility, access to therapists, time spent using certain internet-intervention modules; pain-fatigue (symptom intensity scale), comorbidity; 30 microbial markers (gut microbiota); psychological elasticity, income level; upper body movements-postures; brain connectivity within posterior cingulate cortex, within insula, between posterior cingulate cortex and insula/hippocampus-amygdala, between insula and precuneus, between superior parietal lobule and medial prefrontal cortex; single-nucleotide polymorphisms (rs12248560, rs878567, rs17710780); cingulate isthmus asymmetry, pallidal asymmetry, ratio of the paracentral to precentral cortical thickness, ratio of lateral occipital to pericalcarine cortical thickness; self-assessed cardiac-related fear, sex, number of words to answer the first homework assignment for internet-delivered psychotherapy. ANN, artificial neural network; AR, augoregressive; AUC, area under the receiver operating characteristic curve; DT, decision tree; EEG, electroencephalogram; EN, elastic net; GB, gradient boosting; LR, logistic regression; NB, naïve bayes; RF, random forest; RMSE, root mean squared error; SNS, social network service; SNP, single nucleotide polymorphism; SVM, support vector machine