| Literature DB >> 34541337 |
Sylvia Aponte-Hao1, Sabrina T Wong2,3, Manpreet Thandi2,3, Paul Ronksley1, Kerry McBrien1, Joon Lee1, Mathew Grandy4, Dee Mangin5, Alan Katz6,7, Alexander Singer8, Donna Manca9, Tyler Williamson1.
Abstract
INTRODUCTION: Frailty is a medical syndrome, commonly affecting people aged 65 years and over and is characterized by a greater risk of adverse outcomes following illness or injury. Electronic medical records contain a large amount of longitudinal data that can be used for primary care research. Machine learning can fully utilize this wide breadth of data for the detection of diseases and syndromes. The creation of a frailty case definition using machine learning may facilitate early intervention, inform advanced screening tests, and allow for surveillance.Entities:
Keywords: Canada; case definition; electronic health records; electronic medical records; frailty; machine learning; primary care; supervised machine learning
Mesh:
Year: 2021 PMID: 34541337 PMCID: PMC8431345 DOI: 10.23889/ijpds.v6i1.1650
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
|
|
|
|
|---|---|---|
| Patient age | 1 | Numeric |
| Patient sex | 1 | Binary |
| Patient Diagnoses Received in Last 2 Years (ICD-9 Codes) | 13 | Numeric |
| CPCSSN’s Detection of 6 Chronic Conditions | 6 | Binary |
| Medications Prescribed in Last 2 Years | 39 | Numeric |
| Patient Biometrics | 7 | Numeric |
| Province | 1 | Categorical |
| Missing Medication Indicator | 1 | Binary |
| Missing Height, Weight and BMI Indicators | 3 | Binary |
| Missing Chronic Conditions Indicator | 1 | Binary |
| Missing Patient Diagnoses Indicator | 1 | Binary |
| Missing Blood Pressure Indicator | 1 | Binary |
|
|
|
Figure 1: Comparison of ROC curves for final models trained on original dataset|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Elastic Net Logistic Regression | 81.58% | 85.42%* | 46.05% | 36.56% | 95.44% | 62.20% | 88.00% |
| SVM | 80.75% | 85.23% | 49.16%* | 41.94% | 94.12% | 59.39% | 88.77% |
| KNN | 66.48% | 83.40% | 21.84% | 13.62% | 97.72%* | 55.07% | 84.65% |
| Naïve Bayes | 74.72% | 70.23% | 43.52% | 67.38%* | 70.81% | 32.14% | 91.37%* |
| CaRT | 77.56% | 82.18% | 44.70% | 42.29% | 90.37% | 47.39% | 88.42% |
| Random Forest | 81.03% | 85.11% | 47.64% | 39.79% | 94.41% | 59.36% | 88.43% |
| XGBoost | 83.18%* | 84.87% | 47.68% | 40.50% | 93.97% | 57.95% | 88.50% |
| Feedforward NN | 78.20% | 84.87% | 35.32% | 24.37% | 97.28% | 64.76%* | 86.25% |
*Highest value achieved for each metric.
|
|
|
|
|
|---|---|---|---|
| Elastic Net Logistic Regression | 77.78% | 72.72% | 0.4730 |
| SVM | 74.55% | 73.38% | 0.1889 |
| KNN | 64.16% | 61.69% | 0.1000 |
| Naïve Bayes | 70.97% | 68.60% | 0.2777 |
| CaRT | 69.89% | 72.79% | 0.1228 |
| Random Forest | 75.27% | 71.99% | 0.3104 |
| XGBoost | 78.14%* | 74.41%* | 0.1851 |
| Feedforward NN | 73.84% | 68.82% | 0.2712 |
*Highest value achieved for each metric.
Figure 2: ROC Curves of models trained on balanced data|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Elastic Net Logistic Regression | 77.21% | 72.79% | 45.74% | 67.38% | 73.90% | 34.62% | 91.70% |
| SVM | 77.26% | 73.89% | 46.10% | 65.59%* | 75.59% | 35.53% | 91.46%* |
| KNN | 65.37% | 77.67% | 31.72% | 30.47% | 87.35% | 33.07% | 85.96% |
| Naïve Bayes | 71.70% | 70.47% | 52.36%* | 62.72% | 72.06% | 31.53% | 90.41% |
| CaRT | 71.27% | 76.69% | 46.20% | 58.78% | 80.37% | 38.05% | 90.48% |
| Random Forest | 80.90%* | 84.20%* | 44.30% | 36.92% | 93.80%* | 55.38%* | 87.89% |
| XGBoost | 80.53% | 83.83% | 44.44% | 37.99% | 93.24% | 53.54% | 87.99% |
| Feedforward NN | 77.76% | 83.28% | 38.01% | 30.11% | 94.19% | 51.53% | 86.79% |
|
|
|
|
|
|---|---|---|---|
| Elastic Net Logistic Regression | 74.19% | 67.79% | 0.4019 |
| SVM | 70.97% | 71.25% | 0.4429 |
| KNN | 65.95% | 61.18% | 0.0833 |
| Naïve Bayes | 62.01% | 73.75% | 0.6216 |
| CaRT | 58.78% | 80.37% | 0.5121 |
| Random Forest | 72.76% | 76.10% | 0.3525 |
| XGBoost | 77.42%* | 71.84%* | 0.2185 |
| Feedforward NN | 66.67% | 77.72% | 0.6636 |
|
|
|
|
| |
|---|---|---|---|---|
| Age (Median, [Q1-Q3]) | 74 [69–80] | 81 [74–88] | 72 [68–78] | |
| Sex (% Male) | 2,425 (44.4%) | 348 (34.6%) | 2,077(46.6%) | |
| No Known Chronic Conditions | 732 (13.4%) | 52 (5.2%) | 680 (15.2%) | |
| COPD* | 534 (11.3%) | 382 (10.1%) | 152 (15.9%) | |
| Dementia* | 449 (9.5%) | 238 (24.9%) | 211 (5.6%) | |
| Depression*1,155 (24.4%) | 316 (33.1%) | 839 (22.2%) | ||
| Diabetes Mellitus* | 1,866 (39.4%) | 374 (39.2%) | 1,492 (39.5%) | 0.909 |
| Epilepsy* | 94 (2.0%) | 24 (2.5%) | 70 (1.9%) | 0.237 |
| Hypertension* | 3,614 (76.35) | 760 (79.7%) | 2,854(75.5%) | 0.008 |
| Osteoarthritis* | 2,187 (46.2%) | 439 (46.2%) | 1,748 (46.2%) | 0.929 |
| Mean BMI (Median [Q1–Q3]) | 28.5 [25.31–32.49] | 28.34 [24.52–33.17] | 28.50 [25.40–32.40] | 0.501† |
| Missing BMI | 1,735 (45.3%) | 436 (60.2%) | 1,299 (41.9%) | |
| Mean Height (centimetres) (Median [Q1–Q3]) | 165.00 [157.47–173.15] | 160.00 [152.81–168.50] | 165.80 [158.15–174.00] | |
| Missing Height (centimetres) | 1761 (46.0%) | 443 (61.2%) | 1318 (42.5%) | |
| Mean Weight (kg) (Median [Q1–Q3]) | 79.60 [67.39–92.60] | 75.19 [64.21–90.00] | 80.32 [68.40–93.00] | |
| Missing Weight (kg) | 1,317 (34.4%) | 302 (42.7%) | 1,015 (32.7%) | |
| Missing Systolic Blood Pressure Measurement | 611 (16.0%) | 111 (15.3%) | 500 (16.1%) | 0.645 |
| Mean Systolic Blood Pressure (Median [Q1–Q3]) | 132.62 [124.50–141.28] | 133.00 [125.33–141.67] | 133.61 [123.95–142.00] | 0.546† |
| Number of Clinic Visits In Most Recent Calendar Year (Median [Q1–Q3]) | 5 [3–9] | 7 [4–11] | 5 [3–9] | |
| Missing Clinic Visits | 296 (17.1%) | 35 (4.8%) | 261 (8.4%) | 0.002 |
| Number of Unique Medications Prescribed In Last 2 Years (Median [Q1–Q3]) | 6 [3–10] | 5 [3–9] | 7 [4–11] | |
| Missing Medications | 249 (6.5%) | 27 (3.7%) | 222 (7.2%) | 0.001 |
*Proportions of those who has at least one known chronic condition.
†Tested using the Krusal-Wallis test.