| Literature DB >> 30602092 |
Eun Kyung Choe1, Hwanseok Rhee2, Seungjae Lee2, Eunsoon Shin2, Seung-Won Oh3, Jong-Eun Lee2, Seung Ho Choi4.
Abstract
The prevalence of metabolic syndrome (MS) in the nonobese population is not low. However, the identification and risk mitigation of MS are not easy in this population. We aimed to develop an MS prediction model using genetic and clinical factors of nonobese Koreans through machine learning methods. A prediction model for MS was designed for a nonobese population using clinical and genetic polymorphism information with five machine learning algorithms, including naïve Bayes classification (NB). The analysis was performed in two stages (training and test sets). Model A was designed with only clinical information (age, sex, body mass index, smoking status, alcohol consumption status, and exercise status), and for model B, genetic information (for 10 polymorphisms) was added to model A. Of the 7,502 nonobese participants, 647 (8.6%) had MS. In the test set analysis, for the maximum sensitivity criterion, NB showed the highest sensitivity: 0.38 for model A and 0.42 for model B. The specificity of NB was 0.79 for model A and 0.80 for model B. In a comparison of the performances of models A and B by NB, model B (area under the receiver operating characteristic curve [AUC] = 0.69, clinical and genetic information input) showed better performance than model A (AUC = 0.65, clinical information only input). We designed a prediction model for MS in a nonobese population using clinical and genetic information. With this model, we might convince nonobese MS individuals to undergo health checks and adopt behaviors associated with a preventive lifestyle.Entities:
Keywords: genetic polymorphism; machine learning; metabolic syndrome
Year: 2018 PMID: 30602092 PMCID: PMC6440667 DOI: 10.5808/GI.2018.16.4.e31
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1Study flow pipeline. SNP, single nucleotide polymorphism; MLP, multilayer perceptron; NB, naïve Bayes classification; RF, random forest classification; CT, decision tree classification; SVM, support vector machine classification; BCR, balanced classification rate.
Demographic features of the training and test set populations
| Training set (n = 5,251) | Test set (n = 2,251) | p-value | |
|---|---|---|---|
| Age (yr) | 49.3 ± 10.5 | 52.1 ± 9.9 | <0.001 |
| Male sex | 2,078 (39.6) | 1,676 (74.5) | <0.001 |
| Body mass index | 20.9 ± 1.7 | 24.0 ± 0.5 | <0.001 |
| Waist circumference (cm) | 76.7 ± 6.1 | 85.6 ± 4.1 | <0.001 |
| Triglycerides (mg/dL) | 85.3 ± 49.4 | 118.7 ± 70.8 | <0.001 |
| HDL cholesterol (mg/dL) | 57.5 ± 12.5 | 51.1 ± 10.6 | <0.001 |
| Fasting glucose level (mg/dL) | 94.5 ± 14.3 | 100.4 ± 16.0 | <0.001 |
| Systolic blood pressure (mm Hg) | 111.6 ± 12.7 | 117.6 ± 12.5 | <0.001 |
| Diastolic blood pressure (mm Hg) | 72.8 ± 9.7 | 78.0 ± 9.4 | <0.001 |
| Smoking status (current smoker) | 1,581 (30.1) | 1,202 (53.4) | <0.001 |
| Alcohol consumption ≥ 140 g/wk | 708 (13.5) | 613 (27.2) | <0.001 |
| Exercise (physically active) | 263 (5.0) | 144 (6.4) | 0.017 |
| Metabolic syndrome present | 223 (4.2) | 424 (18.8) | <0.001 |
Values are presented as mean ± standard deviation or number (%).
HDL, high-density lipopolysaccharide.
Information on the SNPs used in the algorithm
| Chromosome number | Position | Overlapped gene | Representative trait of association | |
|---|---|---|---|---|
| rs3764261 | chr16 | 56959412 | Metabolic syndrome [ | |
| rs247617 | chr16 | 56956804 | Metabolic syndrome [ | |
| rs2266788 | chr11 | 116789970 | Metabolic syndrome [ | |
| rs964184 | chr11 | 116778201 | Metabolic syndrome [ | |
| rs10830963 | chr11 | 92975544 | Obesity-related traits [ | |
| rs1260326 | chr2 | 27508073 | Metabolic traits [ | |
| rs10830962 | chr11 | 92965261 | Metabolic syndrome [ | |
| rs1883025 | chr9 | 104902020 | Metabolic syndrome [ | |
| rs1919128 | chr2 | 27578892 | Metabolic syndrome [ | |
| rs11757661 | chr6 | 88473861 | Adipose tissue [ |
SNP, single nucleotide polymorphism; HDL, high-density lipopolysaccharide.
Performance comparison among respective machine learning algorithms for predicting metabolic syndrome presence
| Training set | Test set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||
| AC | SP | SN | F1 | BCR | AC | SP | SN | F1 | BCR | |
| MLP | ||||||||||
| Model A | 95.75 | 1 | 1 | 0 | 0 | 81.16 | 1 | 1 | 0 | 0 |
| Model B | 97.81 | 0.99 | 0.515 | 0.672 | 0.717 | 78.90 | 0.93 | 0.12 | 0.167 | 0.333 |
|
| ||||||||||
| NB | ||||||||||
| Model A | 94.24 | 0.98 | 0.12 | 0.147 | 0.337 | 71.43 | 0.79 | 0.38 | 0.334 | 0.289 |
| Model B | 94.44 | 0.98 | 0.13 | 0.167 | 0.351 | 73.24 | 0.80 | 0.42 | 0.360 | 0.327 |
|
| ||||||||||
| RF | ||||||||||
| Model A | 98.78 | 1 | 0.71 | 0.832 | 0.844 | 78.80 | 0.95 | 0.08 | 0.122 | 0.272 |
| Model B | 99.71 | 1 | 0.93 | 0.966 | 0.967 | 82.14 | 0.99 | 0.01 | 0.014 | 0.083 |
|
| ||||||||||
| CT | ||||||||||
| Model A | 95.75 | 1 | 0 | 0 | 0 | 81.16 | 1 | 0 | 0 | 0 |
| Model B | 95.66 | 1 | 0 | 0 | 0 | 82.20 | 1 | 0 | 0 | 0 |
|
| ||||||||||
| SVM | ||||||||||
| Model A | 95.75 | 1 | 0 | 0 | 0 | 81.16 | 1 | 0 | 0 | 0 |
| Model B | 95.66 | 1 | 0 | 0 | 0 | 82.20 | 1 | 0 | 0 | 0 |
AC, accuracy; SP, specificity; SN, sensitivity; F1, F1 score; BCR, balanced classification rate; MLP, multilayer perceptron; NB, Naïve Bayes classification; RF, random forest classification; CT, decision tree classification; SVM, support vector machine classification; SNP, single nucleotide polymorphism.
Attributes for each model.
Model A: age, sex, body mass index, smoking, alcohol consumption, exercise
Model B: Model A + rs3764261, rs247617, rs2266788, rs964184, rs10830963, rs1260326, rs10830962, rs1883025, rs1919128, rs11757661 SNPs.