| Literature DB >> 36082304 |
Ruhai Dou1, Weijia Gao2, Qingmin Meng3, Xiaotong Zhang1, Weifang Cao1, Liangfeng Kuang1, Jinpeng Niu1, Yongxin Guo1, Dong Cui4, Qing Jiao1, Jianfeng Qiu1, Linyan Su5, Guangming Lu6.
Abstract
The diagnosis based on clinical assessment of pediatric bipolar disorder (PBD) may sometimes lead to misdiagnosis in clinical practice. For the past several years, machine learning (ML) methods were introduced for the classification of bipolar disorder (BD), which were helpful in the diagnosis of BD. In this study, brain cortical thickness and subcortical volume of 33 PBD-I patients and 19 age-sex matched healthy controls (HCs) were extracted from the magnetic resonance imaging (MRI) data and set as features for classification. The dimensionality reduced feature subset, which was filtered by Lasso or f_classif, was sent to the six classifiers (logistic regression (LR), support vector machine (SVM), random forest classifier, naïve Bayes, k-nearest neighbor, and AdaBoost algorithm), and the classifiers were trained and tested. Among all the classifiers, the top two classifiers with the highest accuracy were LR (84.19%) and SVM (82.80%). Feature selection was performed in the six algorithms to obtain the most important variables including the right middle temporal gyrus and bilateral pallidum, which is consistent with structural and functional anomalous changes in these brain regions in PBD patients. These findings take the computer-aided diagnosis of BD a step forward.Entities:
Keywords: adaptive boosting classifier; k-nearest neighbor; logistic regression; machine learning; naïve Bayes; pediatric bipolar disorder; random forest; support vector machine
Year: 2022 PMID: 36082304 PMCID: PMC9445985 DOI: 10.3389/fncom.2022.915477
Source DB: PubMed Journal: Front Comput Neurosci ISSN: 1662-5188 Impact factor: 3.387
FIGURE 1Workflow of machine learning in our work. In machine learning, the six classifiers are as follows: logistic regression (LR), support vector machine (SVM), random forest classifier (RF), naïve Bayes (NB), k-nearest neighbor (kNN), and AdaBoost algorithm (AdaBoost).
Demographic and clinical characteristics PBD1 and HC groups.
| Characteristics | PBD-I ( | Healthy controls ( | ||
| Gender (M/F) | 18/15 | 9/10 | 1.51 | 0.21 |
| Age (years) | 15.12 ± 1.84 | 14.15 ± 1.57 | 1.90 | 0.06 |
| Education (years) | 8.30 ± 1.91 | 7.47 ± 2.22 | 1.42 | 0.16 |
| Onset age (years) | 13.69 ± 1.82 | – | ||
| Illness duration (months) | 18.69 ± 13.43 | – | ||
| Number of episodes | 3.57 ± 2.31 | – | ||
| SCWT1 | 52 ± 15.12 | 66 ± 12.25 | 0.001 | |
| SCWT2 | 67 ± 18.66 | 88 ± 9.07 | <0.001 | |
| SCWT3 | 31 ± 8.04 | 41 ± 9.42 | <0.001 |
Data were shown in mean ± standard deviation.
#Pearson chi-square test.
∧Two-sample T-test.
FIGURE 2Features selection. (A) Eight features selected by Lasso; (B) eight features selected by f_classif.
Two-fold cross-validation repeated four-time accuracy, sensibility, specificity, and AUC calculated for each classifier with different features.
| Algorithm | Features | Accuracies (%) | Sensibility (%) | Specificity (%) | AUC |
| LR | A | 82.24 | 89.26 | 70.11 | 0.79 |
| B | 84.19 | 93.91 | 67.32 | 0.80 | |
| C | 85.41 | 90.85 | 76.89 | 0.83 | |
| SVM | A | 80.08 | 83.49 | 73.57 | 0.78 |
| B | 82.80 | 91.20 | 68.13 | 0.79 | |
| C | 84.57 | 87.89 | 77.29 | 0.82 | |
| RF | A | 79.53 | 90.90 | 57.94 | 0.74 |
| B | 84.59 | 91.13 | 70.27 | 0.81 | |
| C | 81.86 | 89.17 | 68.61 | 0.78 | |
| NB | A | 78.72 | 90.40 | 58.20 | 0.74 |
| B | 83.56 | 91.20 | 71.01 | 0.80 | |
| C | 83.64 | 90.29 | 71.76 | 0.81 | |
| kNN | A | 77.74 | 91.66 | 55.04 | 0.73 |
| B | 78.70 | 89.84 | 59.64 | 0.74 | |
| C | 81.86 | 88.92 | 68.47 | 0.78 | |
| AdaBoost | A | 77.66 | 85.79 | 64.10 | 0.74 |
| B | 78.24 | 84.35 | 67.50 | 0.75 | |
| C | 77.33 | 84.96 | 64.81 | 0.74 |
Classification indices obtained with eight features, six features, combined six features, and Stroop color-word test scores were represented by the percentage values in the table.
*A: Classification using eight features, including cortical thickness of MTG.R, LOG.L, PosCG.R, bilateral TTG, and gray matter volume of AMG.R and bilateral pallidum.
B: Classification using six features, including cortical thickness of MTG.R, bilateral TTG, and gray matter volume of AMG.R and bilateral pallidum.
C: Classification using features combining the structural MRI indices of the six brain regions and SCWT scores.
FIGURE 3The importance value of features of eight features selected by Lasso (A–D) and f_classif (E,F). The three most important features (MTG.R, Pallidum.R, and Pallidum.L) affecting the accuracy of classification have been marked in red.
FIGURE 4The importance of feature ranking for six features. (A–D) Six features selected by Lasso; (E,F) six features selected by f_classif. The three features MTG.R, Pallidum.R, and Pallidum.L marked in red have important effects on the accuracy of the classification.