| Literature DB >> 33126737 |
Davide Barbieri1, Nitesh Chawla2, Luciana Zaccagni1,3, Tonći Grgurinović4, Jelena Šarac5, Miran Čoklo5, Saša Missoni6,7.
Abstract
Cardiovascular diseases are the main cause of death worldwide. The aim of the present study is to verify the performances of a data mining methodology in the evaluation of cardiovascular risk in athletes, and whether the results may be used to support clinical decision making. Anthropometric (height and weight), demographic (age and sex) and biomedical (blood pressure and pulse rate) data of 26,002 athletes were collected in 2012 during routine sport medical examinations, which included electrocardiography at rest. Subjects were involved in competitive sport practice, for which medical clearance was needed. Outcomes were negative for the largest majority, as expected in an active population. Resampling was applied to balance positive/negative class ratio. A decision tree and logistic regression were used to classify individuals as either at risk or not. The receiver operating characteristic curve was used to assess classification performances. Data mining and resampling improved cardiovascular risk assessment in terms of increased area under the curve. The proposed methodology can be effectively applied to biomedical data in order to optimize clinical decision making, and-at the same time-minimize the amount of unnecessary examinations.Entities:
Keywords: decision tree; logistic regression; machine learning; medical diagnostic
Mesh:
Year: 2020 PMID: 33126737 PMCID: PMC7662820 DOI: 10.3390/ijerph17217923
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Data mining process. DT: decision tree; LR: logistic regression; AUC: area under curve.
Descriptive statistics by sex and age classes.
| Variables | 6–10 Years | 11–14 Years | 15–18 Years | ≥19 Years | Total |
|---|---|---|---|---|---|
| Females | |||||
| Weight (kg) | 34.2 ± 8.8 | 51.7 ± 10.8 | 61.8 ± 8.8 | 65.1 ± 10.5 | 50.9 ± 15.1 |
| Height (cm) | 138.3 ± 10.0 | 160.6 ± 8.7 | 168.5 ± 7.0 | 168.9 ± 7.52 | 157.3 ± 14.9 |
| BMI (kg/m2) | 17.7 ± 2.8 | 19.9 ± 3.1 | 21.8 ± 2.6 | 22.8 ± 3.2 | 20.1 ± 3.5 |
| Pulse rate ( | 83.1 ± 13.1 | 77.1 ± 13.0 | 68.3 ± 11.1 | 65.4 ± 11.7 | 75.2 ± 14.1 |
| Systolic pressure (mm Hg) | 97.3 ± 9.7 | 105.8 ± 9.9 | 108.7 ± 10.4 | 113.2 ± 11.8 | 105.2 ± 11.6 |
| Diastolic pressure (mm Hg) | 62.8 ± 7.47 | 66.4 ± 7.8 | 68.6 ± 7.8 | 73.3 ± 8.4 | 66.9 ± 8.6 |
| ECG Ps ( | 140 (10.2) | 160 (8.5) | 63 (6.5) | 65 (8.6) | 428 (8.6) |
| Males | |||||
| Weight (kg) | 33.6 ± 8.6 | 52.4 ± 13.4 | 71.4 ± 11.7 | 83.9 ± 12.7 | 61.3 ± 22.6 |
| Height (cm) | 136.9 ± 9.2 | 161.4 ± 11.6 | 178.6 ± 7.6 | 180.0 ± 7.3 | 164.8 ± 19.2 |
| BMI (kg/m2) | 17.7 ± 2.8 | 19.9 ± 3.3 | 22.3 ± 3.0 | 25.9 ± 3.4 | 21.6 ± 4.4 |
| Pulse rate ( | 79.0 ± 12.6 | 73.1 ± 12.6 | 66.8 ± 12.4 | 63.8 ± 11.9 | 70.4 ± 13.7 |
| Systolic pressure (mm Hg) | 97.1 ± 9.1 | 106.8 ± 11.1 | 118.2 ± 10.9 | 126.4 ± 11.8 | 112.7 ± 15.6 |
| Diastolic pressure (mm Hg) | 61.5 ± 7.8 | 65.2 ± 7.8 | 69.6 ± 8.2 | 77.9 ± 9.4 | 69.0 ± 10.5 |
| ECG Ps ( | 379 (7.9) | 405 (7.0) | 397 (9.3) | 699 (11.3) | 1879 (8.9) |
BMI: body mass index; ECG Ps: electrocardiography positives.
Algorithm classification performances.
| Algorithm | TPR | TNR | J | AUC |
|---|---|---|---|---|
| DT (1st run) | 0.29 | 0.97 | 0.26 | 0.68 |
| LR (1st run) | 0 | 1 | 0.00 | 0.56 |
| DT (2nd run) | 0.68 | 0.82 | 0.50 | 0.76 |
| LR (2nd run) | 0.65 | 0.82 | 0.47 | 0.78 |
TPR: True positive rate; TNR: True negative rate; J: Youden index; AUC: Area under the ROC curve; DT: Decision tree; LR: Logistic regression.