| Literature DB >> 31349672 |
Hyejin Park1, Kisok Kim2.
Abstract
Lead, mercury, and cadmium are common environmental pollutants in industrialized countries, but their combined impact on hypercholesterolemia (HC) is poorly understood. The aim of this study was to compare the performance of various machine learning (ML) models to predict the prevalence of HC associated with exposure to lead, mercury, and cadmium. A total of 10,089 participants of the Korea National Health and Nutrition Examination Surveys 2008-2013 were selected and their demographic characteristics, blood concentration of metals, and total cholesterol levels were collected for analysis. For prediction, five ML models, including logistic regression (LR), k-nearest neighbors, decision trees, random forests, and support vector machines (SVM) were constructed and their predictive performances were compared. Of the five ML models, the SVM model was the most accurate and the LR model had the highest area under receiver operating characteristic (ROC) curve of 0.718 (95% CI: 0.688-0.748). This study shows the potential of various ML methods to predict HC associated with exposure to metals using population-based survey data.Entities:
Keywords: cholesterol; heavy metals; machine learning; predictive model
Mesh:
Substances:
Year: 2019 PMID: 31349672 PMCID: PMC6696126 DOI: 10.3390/ijerph16152666
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Demographic characteristics by categories of hypercholesterolemia (HC).
| Characteristic | Total ( | Non-HC ( | HC ( |
|
|---|---|---|---|---|
| Sex, male (%) | 4681 (46.4) | 4162 (47.3) | 519 (40.1) | <0.001 |
| Mean age (SD) | 44.9 (15.6) | 43.5 (15.6) | 54.4 (12.4) | <0.001 |
| Income, US$/month (SD) | 2964 (2127) | 2997 (2112) | 2720 (2194) | <0.001 |
| BMI, kg/m2 (SD) | 23.6 (3.4) | 23.4 (3.4) | 24.9 (3.3) | <0.001 |
| Waist circumference, cm (SD) | 80.5 (10.0) | 79.9 (9.9) | 84.9 (9.2) | <0.001 |
| Regular exercise (%) | 2344(23.2) | 2092 (23.8) | 252 (19.5) | 0.001 |
| Energy intake, kcal/day (SD) | 2015.2 (881.8) | 2030.3 (887.4) | 1912.9 (835.5) | <0.001 |
| Lead, μg/dL (SD) | 2.32 (1.20) | 2.30 (1.22) | 2.45 (1.02) | <0.001 |
| Mercury, μg/L (SD) | 4.55 (3.90) | 4.49 (3.55) | 4.94 (5.71) | 0.006 |
| Cadmium, μg/L (SD) | 1.11 (0.68) | 1.08 (0.68) | 1.29 (0.67) | <0.001 |
ap—determined by t-test or Mantel–Haenszel chi-square test between HC and no-HC groups. HC—hypercholesterolemia; BMI—body mass index.
Adjusted odds ratios and 95% confidence intervals of hypercholesterolemia by blood metal level.
| Metal a | Tertile Blood Metal Level | |||
|---|---|---|---|---|
| Low ( | Middle ( | High ( | ||
| Lead | 1.00 (reference) | 1.06 (0.92–1.21) | 1.29 (1.10–1.51) | 0.006 |
| Mercury | 1.00 (reference) | 1.22 (1.06–1.41) | 1.15 (0.99–1.33) | 0.020 |
| Cadmium | 1.00 (reference) | 1.36 (1.19–1.56) | 2.31 (1.97–2.71) | <0.001 |
a Adjusted for the other metals in the table; b p—determined by linear contrast in logistic regression model.
Figure 1Box plot of the accuracy scores across each algorithm for normalized data. LR—logistic regression; KNN—k-nearest neighbor; DT—decision trees; RF—random forests; SVM—support vector machines.
Accuracy values for all prediction models.
| Dataset | LR | KNN | DT | RF | SVM |
|---|---|---|---|---|---|
| All | 0.870 | 0.851 | 0.787 | 0.865 | 0.872 |
| Train | 0.872 | 0.899 | 1.000 | 0.981 | 0.872 |
| Test | 0.872 | 0.832 | 0.788 | 0.864 | 0.872 |
LR—logistic regression; KNN—k-nearest neighbor; DT—decision trees; RF—random forests; SVM—support vector machines.
Figure 2Receiver operating characteristic (ROC) curve of LR, KNN, DT, RF, and SVM models. LR—logistic regression; KNN—k-nearest neighbor; DT—decision trees; RF—random forests; SVM—support vector machines.