| Literature DB >> 35488291 |
Jisoo Lee1, Sulyun Lee2, W Nick Street1, Linnea A Polgreen3.
Abstract
BACKGROUND: While multiple randomized controlled trials (RCTs) are available, their results may not be generalizable to older, unhealthier or less-adherent patients. Observational data can be used to predict outcomes and evaluate treatments; however, exactly which strategy should be used to analyze the outcomes of treatment using observational data is currently unclear. This study aimed to determine the most accurate machine learning technique to predict 1-year-after-initial-acute-myocardial-infarction (AMI) survival of elderly patients and to identify the association of angiotensin-converting- enzyme inhibitors and angiotensin-receptor blockers (ACEi/ARBs) with survival.Entities:
Keywords: Acute myocardial infarction (AMI heart attack); Hyper-parameter optimization; Lasso logistic regression (LLR); Machine learning; Nested cross-validation (CV); Random forest (RF); Sampling methods
Mesh:
Substances:
Year: 2022 PMID: 35488291 PMCID: PMC9052482 DOI: 10.1186/s12911-022-01854-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Schematic of model development for survival prediction. We optimized feature and hyper-parameter selection in the inner CV loop, while we evaluated the model performance with the optimal feature subsets and hyper-parameters in the outer CV loop. Both inner and outer layers consist of ten repeated folds (training/testing repetitions)
Lasso logistic regression with one category subset
| One category subset | Sampling | Lambda | AUC (SD) |
|---|---|---|---|
| Demographics | Under | 0.000655 | 0.6798 (0.0077) |
| Cardiac events | Both | 0.001200 | 0.6359 (0.0086) |
| Complications | Under | 0.002252 | 0.5822 (0.0067) |
| Procedures | Under | 0.000900 | 0.7241 (0.0043) |
| Medications | Under | 0.001153 | 0.6369 (0.0055) |
| Insurance | Both | 0.000453 | 0.6196 (0.0064) |
| Utilization | Under | 0.001875 | 0.6243 (0.0065) |
The AUC and the SD columns show the average of outer AUC and its standard deviation respectively. The most commonly selected sampling method and the average of lambda were reported under the Sampling and the Lamda columns as well
Random forest with one category subset
| One category subset | Sampling | mtry | ntree | AUC (SD) |
|---|---|---|---|---|
| Demographics | Under | 3 | 250 | 0.6695 (0.0059) |
| Cardiac Events | Under | 3 | 2250 | 0.6327 (0.0093) |
| Complications | Under | 3 | 1500 | 0.5804 (0.0069) |
| Procedures | Under | 3 | 1250 | 0.7183 (0.0061) |
| Medications | Under | 3 | 1500 | 0.6317 (0.0055) |
| Insurance | Under | 3 | 1250 | 0.6127 (0.0068) |
| Utilization | Under | 3 | 2500 | 0.6224 (0.0055) |
The AUC and the SD columns show the average of outer AUC and its standard deviation respectively. The most frequently selected sampling method and parameters (mtry and ntree) are reported accordingly
Lasso logistic regression with two category subset
| Two category subset | Sampling | Lambda | AUC (SD) |
|---|---|---|---|
| Comorbidities + demographics | Both | 0.000508 | 0.7785 (0.0049) |
| Comorbidities + cardiac events | Both | 0.000665 | 0.7568 (0.0054) |
| Comorbidities + complications | Both | 0.000581 | 0.7537 (0.0054) |
| Comorbidities + medications | Both | 0.000543 | 0.7553 (0.0053) |
| Comorbidities + insurance | Both | 0.000385 | 0.7563 (0.0056) |
| Comorbidities + utilization | Both | 0.000567 | 0.7573 (0.0053) |
Random forest with two category subset
| Two category subset | Sampling | mtry | ntree | AUC (SD) |
|---|---|---|---|---|
| Comorbidities + demographics | Under | 3 | 750 | 0.7705 (0.0044) |
| Comorbidities + cardiac events | Under | 3 | 750 | 0.7503 (0.0049) |
| Comorbidities + complications | Under | 3 | 1250 | 0.7467 (0.0046) |
| Comorbidities + medications | Under | 3 | 2000 | 0.7502 (0.0045) |
| Comorbidities + insurance | Under | 3 | 1750 | 0.7501 (0.0044) |
| Comorbidities + utilization | Under | 3 | 1000 | 0.7512 (0.0049) |
Lasso logistic regression with three category subset
| Three category subset | Sampling | Lambda | AUC (SD) |
|---|---|---|---|
| Comorbidities + procedures + cardiac events | Both | 0.000565 | 0.7851 (0.0039) |
| Comorbidities + procedures + complications | Both | 0.000481 | 0.7850 (0.0040) |
| Comorbidities + procedures + medications | Both | 0.000467 | 0.7860 (0.0040) |
| Comorbidities + procedures + insurance | Both | 0.000362 | 0.7860 (0.0039) |
| Comorbidities + procedures + utilization | Both | 0.000480 | 0.7862 (0.0040) |
Random forest with three category subset
| Three category subset | Sampling | mtry | ntree | AUC (SD) |
|---|---|---|---|---|
| Comorbidities + procedures + cardiac events | Under | 3 | 500 | 0.7810 (0.0038) |
| Comorbidities + procedurse + complications | Under | 3 | 2000 | 0.7804 (0.0036) |
| Comorbidities + procedures + medications | Under | 3 | 2000 | 0.7829 (0.0039) |
| Comorbidities + procedures + insurance | Under | 3 | 1500 | 0.7827 (0.0038) |
| Comorbidities + procedures + utilization | Under | 3 | 2750 | 0.7825 (0.0040) |
Lasso logistic regression with four category subset
| Four category subset | Sampling | Lambda | AUC (SD) |
|---|---|---|---|
| Comorbidities + procedures + demographics + cardiac events | Under | 0.000527 | 0.7946 (0.0036) |
| Comorbidities + procedures + demographics + complications | Under | 0.000466 | 0.7944 (0.0036) |
| Comorbidities + procedures + demographics + medications | Under | 0.000919 | 0.7948 (0.0035) |
| Comorbidities + procedures + demographics + insurance | Both | 0.000409 | 0.7949 (0.0034) |
Random forest with four category subset
| Four category subset | Sampling | mtry | ntree | AUC (SD) |
|---|---|---|---|---|
| Comorbidities + procedures + demographics + cardiac events | Under | 6 | 1750 | 0.7902 (0.0040) |
| Comorbidities + procedures + demographics + complications | Under | 3 | 2000 | 0.7898 (0.0038) |
| Comorbidities + procedures + demographics + medications | Under | 6 | 2750 | 0.7901 (0.0031) |
| Comorbidities + procedures + demographics + insurance | Under | 6 | 3000 | 0.7907 (0.0036) |
Performance evaluation with final category subset (comorbidities + procedure + demographics + utilization)
| Model | Sampling | AUC (SD) | Accuracy (SD) | Sensitivity (SD) | Specificity (SD) |
|---|---|---|---|---|---|
| LLR | Both | 0.7104 (0.0039) | 0.7033 (0.0054) | ||
| RF | Under | 0.7911 (0.0037) | 0.5322 (0.0090) |
Coefficients of features selected by final model
| Category | Features | Time periods | Coef. |
|---|---|---|---|
| Intercept | Intercept | – | |
| ACEi/ARBs use | ACEi/ARBs (untreated) | Post-index | 0.1910 |
| Comorbidities | Charlson comorbidity index (CCI) | Index | 0.1102 |
| Charlson comorbidity index (CCI) | Pre-index | 0.0932 | |
| Elixhauser comorbidity index (ECI) | Pre-index | 0.0420 | |
| Elixhauser comorbidity index (ECI) | Index | 0.0165 | |
| Number of comorbidities | Pre-index | 0.0027 | |
| Number of comorbidities | Index | 0.0005 | |
| Charlson comorbidity index (CCI) | Change | 0.0000 | |
| Elixhauser comorbidity index (ECI) | Change | 0.0000 | |
| Number of comorbidities | Change | 0.0000 | |
| Serious myopathy | Pre-index | 0.5637 | |
| General cancer | Index | 0.4801 | |
| Heart failure | Index | 0.3594 | |
| Metastatic cancer | Pre-index | 0.2553 | |
| Metastatic cancer | Index | 0.2453 | |
| Heart failure | Pre-index | 0.2102 | |
| Atrial fibrillation | Index | 0.1367 | |
| Serious myopathy | Index | 0.1205 | |
| COPD | Pre-index | 0.1183 | |
| Hypotension | Pre-index | 0.1108 | |
| Depression | Pre-index | 0.1020 | |
| Chronic kidney disease | Index | 0.0882 | |
| COPD | Index | 0.0842 | |
| Hyperkalemia | Pre-index | 0.0786 | |
| Atrial fibrillation | Pre-index | 0.0671 | |
| Hepatic events | Index | 0.0572 | |
| Hyperkalemia | Index | 0.0488 | |
| Depression | Index | 0.0464 | |
| Renal failure | Pre-index | 0.0264 | |
| Non-AMI ischemic heart disease | Pre-index | 0.0233 | |
| Hepatic events | Pre-index | 0.0115 | |
| Renal failure | Index | 0.0083 | |
| Chronic kidney disease | Pre-index | 0.0072 | |
| Hypotension | Index | ||
| Non-AMI ischemic heart disease | Index | ||
| Bradycardia | Pre-index | ||
| Hypertension (uncomplicated) | Pre-index | ||
| General cancer | Pre-index | ||
| Hypertension (complicated) | Index | ||
| Diabetes | Pre-index | ||
| Hypertension (complicated) | Pre-index | ||
| Non-serious myopathy | Index | ||
| Bradycardia | Index | ||
| Hypertension (uncomplicated) | Index | ||
| Diabetes | Index | ||
| Non-serious myopathy | Pre-index | ||
| Asthma | Index | ||
| Hyperlipidemia | Index | ||
| Asthma | Pre-index | ||
| Angioedema | Index | ||
| Hyperlipidemia | Pre-index | ||
| Angioedema | Pre-index | ||
| Procedures | Echocardiography | Index | 0.3932 |
| Percutaneous coronary intervention | Index | 0.3223 | |
| Stent | Index | 0.2612 | |
| Stent | Pre-index | 0.1310 | |
| Pacemaker implantation | Pre-index | ||
| CABG | Pre-index | ||
| Pacemaker implantation | Index | ||
| Stress test | Index | ||
| Cardiac catheterization | Index | ||
| CABG | Index | ||
| Demographics | Age: 85+ | – | 0.9877 |
| Age: 81–85 | – | 0.5803 | |
| Age: 76–80 | – | 0.3469 | |
| Age: 71–75 | – | 0.1136 | |
| Metro area: unknown | – | 0.7015 | |
| Metro area: non-metro | – | 0.0183 | |
| Dual eligibility | Steady [2] | 0.3618 | |
| Dual eligibility | Steady [1] | 0.0004 | |
| Dual eligibility | Steady [3] | 0.0000 | |
| Dual eligibility | Index | 0.2281 | |
| Dual eligibility | Pre-index | 0.1213 | |
| Low income subsidy | – | 0.1231 | |
| Low income area | – | 0.0445 | |
| Low high school diploma area | – | 0.0279 | |
| High poverty area | – | ||
| High immigrant area | – | ||
| No English speaker area | – | ||
| Race: black | – | 0.0386 | |
| Race: white | – | ||
| Race: unknown | – | ||
| Race: Asian | – | ||
| Race: hispanic | – | ||
| Race: others | – | ||
| Average life expectancy: 4th quartile | – | ||
| Average life expectancy: 2nd quartile | – | ||
| Average life expectancy: 3rd quartile | – | ||
| Gender: male | – | ||
| Utilization | ER use | Index | 0.1240 |
| Acute inpatient stay days | Index | 0.0215 | |
| Post-acute care use | Index | 0.0001 | |
| Transferred to another facility | Index |
Lasso with Four-Category Subset—Comorbidities, Procedures, Demographics, and Utilization—, and Both Sampling (LLR coefficients of the variables from the final model with four categories)