| Literature DB >> 34874889 |
Rachel S Kim1, Steven Simon2, Brett Powers1, Amneet Sandhu3, Jose Sanchez3, Ryan T Borne3, Alexis Tumolo3, Matthew Zipse3, J Jason West3, Ryan Aleong3, Wendy Tzou3, Michael A Rosenberg1,3.
Abstract
BACKGROUND: The identification of an appropriate rhythm management strategy for patients diagnosed with atrial fibrillation (AF) remains a major challenge for providers. Although clinical trials have identified subgroups of patients in whom a rate- or rhythm-control strategy might be indicated to improve outcomes, the wide range of presentations and risk factors among patients presenting with AF makes such approaches challenging. The strength of electronic health records is the ability to build in logic to guide management decisions, such that the system can automatically identify patients in whom a rhythm-control strategy is more likely and can promote efficient referrals to specialists. However, like any clinical decision support tool, there is a balance between interpretability and accurate prediction.Entities:
Keywords: ablation; antiarrhythmia agents; artificial intelligence; atrial fibrillation; biostatistics; data science; machine learning; rhythm-control
Year: 2021 PMID: 34874889 PMCID: PMC8691402 DOI: 10.2196/29225
Source DB: PubMed Journal: JMIR Med Inform
Population demographics.
| Demographics | Training set (n=31,517) | Testing set (n=10,505) | |||
|
| Rhythm control (n=2370) | Rate control (n=29,147) | Rhythm control (n=785) | Rate control (n=9720) | |
| Age (years), mean (SD) | 66.4 (12.0) | 72.1 (12.9) | 67.1 (11.6) | 72.3 (12.7) | |
| Sex (female), n (%) | 779 (32.9) | 12,588 (43.2) | 265 (33.8) | 4115 (42.3) | |
| HTNa, n (%)b | 1036 (43.7) | 14,577 (50) | 372 (47.4) | 4870 (50.1) | |
| Obesity, n (%)c | 366 (15.4) | 3877 (13.3) | 156 (19.9) | 1243 (12.8) | |
| Diabetes, n (%)d | 343 (14.5) | 5305 (18.2) | 115 (14.7) | 1768 (18.2) | |
| CADe, n (%)f | 475 (20) | 7433 (24.5) | 164 (20.9) | 2497 (25.7) | |
| Heart failure, n (%)g | 488 (20.6) | 5625 (19.3) | 142 (18.1) | 1874 (19.3) | |
| Mitral valve disease, n (%)h | 394 (16.6) | 4841 (16.6) | 124 (15.8) | 1687 (17.4) | |
aHTN: hypertension diagnosis.
bInternational Classification of Disease-9 401.X; International Classification of Disease-10 I10.X.
cObesity diagnosis (International Classification of Disease-9 278.X; International Classification of Disease-10 E66.X).
dDiabetes mellitus (International Classification of Disease-9 250.X; International Classification of Disease-10 E11.X).
eCAD: coronary artery disease.
fInternational Classification of Disease-9 414.X; International Classification of Disease-10 I25.X.
gHeart failure (International Classification of Disease-9 428.X; International Classification of Disease-10 I50.X).
hMitral valve disease (International Classification of Disease-9 424.X or 394.X; International Classification of Disease-10 I34.X).
Figure 1(A) Predictive margins for rhythm-control strategy. Based on logistic regression with age and age-squared and age-sex interactions. Error bars represent the 95% CIs applied to each age-sex combination. (B) Predictive margins for the type of rhythm-control strategy: ablation, antiarrhythmic drug, and external cardioversion. Based on multinomial logistic regression for the first rhythm-control treatment applied, with age and age-squared and age-sex interactions. Error bars represent the 95% CI applied to each age-sex combination. (C) Predictive margins for the effect of hypertension diagnosis on the rhythm-control strategy. Based on multinomial logistic regression for the first rhythm-control treatment applied, with age and age-squared and age-sex interactions. Error bars represent the 95% CI applied to each age-sex combination.
Best supervised learning models.
| Modela | Resampling | F1 score | AUCb | Accuracy | Recall | Precision |
| Random forestc | SMOTEd | 0.186 | 0.591 | 0.689 | 0.476 | 0.116 |
| Extreme gradient boostinge | Random oversampling | 0.179 | 0.591 | 0.614 | 0.563 | 0.106 |
| K-nearest neighborsf | Random undersampling | 0.181 | 0.605 | 0.541 | 0.682 | 0.105 |
| Naïve Bayesg | SMOTE | 0.184 | 0.602 | 0.596 | 0.609 | 0.108 |
| Logistic regression | SMOTE | 0.185 | 0.608 | 0.570 | 0.654 | 0.108 |
All models except neural network applied to known predictors only.
bAUC: area under the receiver operator characteristic curve.
cRandom forest hyperparameters: estimators=200, maximum features=8, maximum leaf nodes=300.
dSMOTE: synthetic minority oversampling technique.
eExtreme gradient boosting hyperparameters: booster=gbtree, η=0.9, γ=0, α=1, λ=0.
fK-nearest neighbors: N=500.
gNaïve Bayes: α=0.
Feature importance.
| Predictor | Random forest impurity reductiona (%) | Logistic chi-square ( | |
| Age (years) | 81.74 | 462.11 (4) | <.001 |
| CADb | 3.25 | 21.28 (1) | <.001 |
| Sex | 3.01 | 60.61 (3) | <.001 |
| Mitral valve disease | 2.82 | 8.04 (1) | .01 |
| Diabetes mellitus | 2.78 | 18.46 (1) | <.001 |
| Heart failure | 2.43 | 17.59 (1) | <.001 |
| Hypertension | 2.36 | 4.03 (1) | .04 |
| Obesity | 1.62 | 2.61 (1) | .11 |
aFor random forest (synthetic minority oversampling technique resampling).
bCAD: coronary artery disease.
Figure 2Decision tree for rhythm-control strategy. Based on known predictors to classify rate- versus rhythm-control strategy using the training data. Maximum depth=2, minimum samples to split nodes=50.
Combined big data (BD) and known predictor models.
| Model | F1 score | AUCa | Accuracy | Recall | Precision |
| Random forests combined | 0.258 | 0.643 | 0.807 | 0.451 | 0.181 |
| Neural network combined | 0.250 | 0.617 | 0.843 | 0.350 | 0.194 |
| Neural network (BD predictors) | 0.260 | 0.629 | 0.835 | 0.387 | 0.195 |
aAUC: area under the receiver operator characteristic curve.
Figure 3(A) Receiver operator characteristic curves for prediction models. Shown are top five models, including random forest combined and neural network combined (use big data and known inputs), random forest and logistic regression (use only known inputs), and neural network (only big data inputs). (B) Calibration curves (top) and histograms (bottom) for prediction models. Shown are top five models, including random forest combined and neural network combined (use big data and known inputs), random forest and logistic regression (use only known inputs), and neural network (only big data inputs). ROC: Receiver operator characteristic.
Figure 4Decision curves for prediction models based on proportion of appropriate and inappropriate referrals that would result from applying the model at different levels of sensitivity (thresholds): (A) random forest combined, (B) neural network combined, (C) random forest, (D) logistic regression, and (E) neural network.