| Literature DB >> 35566761 |
Jean C Nuñez-Garcia1, Antonio Sánchez-Puente1,2, Jesús Sampedro-Gómez1,2, Victor Vicente-Palacios1,3, Manuel Jiménez-Navarro4,5, Armando Oterino-Manzanas1, Javier Jiménez-Candil1,2,6, P Ignacio Dorado-Diaz1,2, Pedro L Sánchez1,2,6.
Abstract
BACKGROUND: The integrated approach to electrical cardioversion (EC) in atrial fibrillation (AF) is complex; candidates can resolve spontaneously while waiting for EC, and post-cardioversion recurrence is high. Thus, it is especially interesting to avoid the programming of EC in patients who would restore sinus rhythm (SR) spontaneously or present early recurrence. We have analyzed the whole elective EC of the AF process using machine-learning (ML) in order to enable a more realistic and detailed simulation of the patient flow for decision making purposes.Entities:
Keywords: atrial fibrillation; electrical cardioversion; machine-learning; pharmacologic cardioversion; rhythm control
Year: 2022 PMID: 35566761 PMCID: PMC9101912 DOI: 10.3390/jcm11092636
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Figure 1Overview of the phases followed to build and evaluate the machine-learning models.
Figure 2Patients scheduled for planned electrical cardioversion flow diagram where the different outcomes at each pathway are highlighted. The different machine-learning models were then built for each of these 5 different circumstances: (i) spontaneous sinus rhythm restoration (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for non-antiarrhythmics-treated patients); (ii) pharmacologic cardioversion (conversion to sinus rhythm in the pre-scheduled electrical cardioversion period for antiarrhythmics-treated patients); (iii) direct-current cardioversion (conversion to sinus rhythm after direct-current shock application); (iv) atrial fibrillation recurrence (atrial fibrillation recurrence at 6-month follow-up for those patients who underwent sinus rhythm restoration spontaneously, by pharmacologic or direct-current cardioversion); and (v) rhythm control (maintenance in sinus rhythm at 6-month follow-up).
Hyperparameters tested during the tuning step. This table contains the different combinations of feature selection strategies and hyperparameters tested for each of the classification algorithms during training.
| Algorithm | Feature Selection | Hyperparameters |
|---|---|---|
| Boosted Trees | No selection | Number of trees: 25, 100, or 1000 |
| Random Forest | No selection | Number of trees: 100 or 1000 |
| Extremely Randomized Trees | No selection | Number of trees: 100 or 1000 |
| Logistic Regression | No selection | Regularization term: L1 or L2 |
Baseline characteristics of the study cohort. List of continuous and categorical data input of the patients used for ML model development. Continuous variables are expressed as mean ± standard deviation and categorical as n (%). Reference ranges for LVEF were considered normal greater than 50%, mild dysfunction from 49 to 40%, moderate dysfunction from 39 to 30%, and severe dysfunction less than 30%. Paroxysmal AF was defined as AF with episodes recurring with variable frequency; persistent AF was defined as continuous AF that is sustained >7 days; long-standing persistent AF was defined as continuous AF >12 months in duration. Reference ranges for LA volume index were considered normal <35 mL/m2, mildly dilated from 35 to 41 mL/m2, moderately dilated from 42 to 48 mL/m2, and severely dilated >48 mL/m2 [41].
| Development Dataset | Validation Dataset | |||
|---|---|---|---|---|
| Missing Values | Mean | Missing Values | Mean | |
|
| ||||
| Age, years | 2 | 62.1 ± 11.9 | 4 | 63.0 ± 12.2 |
| Gender, male | - | 240 (75.9%) | - | 83 (73.5%) |
| Weight, kg | 30 | 84.0 ± 17.1 | 19 | 85.2 ± 20.1 |
| Height, cm | 39 | 170.0 ± 8.9 | 28 | 170.6 ± 9.9 |
| Body mass index, kg/m2 | 41 | 28.9 ± 5.0 | 28 | 29.0 ± 6.2 |
|
| ||||
| Hypertension | - | 172 (54.4%) | - | 62 (54.9%) |
| Dyslipidemia | - | 128 (40.5%) | - | 47 (41.6%) |
| Active smoking | - | 46 (14.6%) | - | 12 (10.6%) |
| Smoking history | - | 133 (42.1%) | - | 45 (39.8%) |
| Diabetes mellitus | - | 59 (18.7%) | - | 23 (20.4%) |
|
| ||||
| Heart failure | - | 100 (31.6%) | - | 24 (21.2%) |
| Coronary artery disease | - | 52 (16.5%) | - | 11 (9.7%) |
| Previous direct-current shock application attempt | - | 22 (7.0%) | - | 15 (13.3%) |
| Previous transient ischemic attack or stroke | - | 19 (6.0%) | - | 6 (5.3%) |
| History of oral anticoagulation treatment | - | 158 (50.0%) | - | 33 (29.2%) |
| Peripheral vascular disease | - | 16 (5.1%) | - | 9 (8.0%) |
| Rheumatic heart disease | - | 7 (2.2%) | - | 1 (0.9%) |
|
| ||||
| Chronic obstructive pulmonary disease | - | 66 (20.9%) | - | 15 (13.3%) |
| Prior cancer | - | 28 (8.9%) | - | 10 (8.8%) |
| Prior bleeding | - | 11 (3.5%) | - | 3 (2.7%) |
| Venous thromboembolism | - | 10 (3.2%) | - | 1 (0.9%) |
| Impaired physical mobility | - | 8 (2.5%) | - | 6 (5.3%) |
|
| ||||
| NYHA functional class >I | - | 106 (33.5%) | - | 39 (34.5%) |
| NYHA functional class >II | - | 35 (11.1%) | - | 10 (8.8%) |
| NYHA functional class >III | - | 9 (2.8%) | - | 0.0 ± 0.0 |
| CHAD2DS2-VASc score | - | 2.2 ± 1.7 | - | 2.1 ± 1.6 |
| HATCH score | - | 1.6 ± 1.5 | - | 1.4 ± 1.2 |
| HASBLED score | - | 2.3 ± 1.1 | - | 2.1 ± 0.9 |
| Anemia | - | 35 (11.1%) | - | 14 (12.4%) |
| Creatinine, mg/dL | - | 1.0 ± 0.4 | 1 | 1.0 ± 0.3 |
| Glomerular filtration rate, mL/min/1.73 m2 | - | 75.6 ± 17.1 | 1 | 77.0 ± 17.0 |
|
| ||||
| Paroxysmal | - | 61 (19.3%) | - | 24 (21.2%) |
| Persistent | - | 250 (79.1%) | - | 89 (78.8%) |
| Long-standing persistent | - | 5 (1.6%) | - | 0 (0%) |
|
| ||||
| LV mass index, g/m2 | 42 | 102.3 ± 32.4 | 53 | 97.7 ± 28.4 |
| LVEF < 50% | - | 73 (23.1%) | - | 26 (23.0%) |
| LVEF < 40% | - | 42 (13.3%) | - | 13 (11.5%) |
| LVEF < 30% | - | 21 (6.6%) | - | 6 (5.3%) |
| Tricuspid regurgitant jet velocity, cm/sec | 149 | 257.7 ± 47.1 | 67 | 247.0 ± 50.9 |
| At least moderate probability pulmonary hypertension | - | 43 (13.6%) | - | 9 (8.0%) |
| High probability pulmonary hypertension | - | 10 (3.2%) | - | 2 (1.8%) |
| LA volume index, mL/m2 | 37 | 43.6 ± 17.4 | 44 | 44.9 ± 16.7 |
| LA volume index ≥ 35 mL/m2 | - | 182 (57.6%) | - | 51 (45.1%) |
| LA volume index ≥ 42 mL/m2 | - | 138 (43.7%) | - | 34 (30.1%) |
| LA volume index > 48 mL/m2 | - | 98 (31.0%) | - | 27 (23.9%) |
| Significant valvular heart disease | - | 68 (21.5%) | - | 21 (18.6%) |
| Mitral stenosis | - | 2 (0.6%) | - | 1 (0.9%) |
| Mitral regurgitation | - | 39 (12.3%) | - | 15 (13.3%) |
| Aortic stenosis | - | 2 (0.6%) | - | 3 (2.7%) |
| Aortic regurgitation | - | 9 (2.8%) | - | 4 (3.5%) |
| Tricuspid regurgitation | - | 23 (7.3%) | - | 5 (4.4%) |
| Mechanical prosthetic valve | - | 10 (3.2%) | - | 1 (0.9%) |
| Biological prosthetic valve | - | 9 (2.8%) | - | 5 (4.4%) |
|
| ||||
| Time under anticoagulation, days | - | 30.9 ± 23.2 | - | 27.4 ± 17.2 |
| K-vitamin antagonist | - | 102 (32.3%) | 1 | 14 (12.5%) |
| Direct oral anticoagulants | - | 214 (67.7%) | 1 | 98 (87.5%) |
| Dabigatran | - | 23 (7.3%) | - | 11 (9.7%) |
| Rivaroxaban | - | 93 (29.4%) | - | 17 (15.0%) |
| Apixaban | - | 79 (25.0%) | - | 49 (43.4%) |
| Edoxaban | - | 19 (6.0%) | - | 20 (17.7%) |
| Low-weight-molecular heparin | - | 0 (0%) | - | 2 (1.8%) |
|
| ||||
| Antiarrhythmics before scheduled EC | - | 132 (41.8%) | - | 45 (39.8%) |
| Amiodarone before scheduled EC | - | 98 (31.0%) | - | 34 (30.1%) |
| Flecainide before scheduled EC | - | 30 (9.5%) | - | 11 (9.7%) |
| Dronedarone before scheduled EC | - | 4 (1.3%) | - | 0 (0%) |
| Antiarrhythmics after scheduled EC | - | 198 (62.7%) | - | 65 (57.5%) |
| Amiodarone after scheduled EC | - | 147 (46.5%) | - | 38 (33.6%) |
| Flecainide after scheduled EC | - | 45 (14.2%) | - | 27 (23.9%) |
| Dronedarone after scheduled EC | - | 6 (1.9%) | - | 0 (0%) |
|
| ||||
| Nonsteroidal anti-inflammatory drug | - | 5 (1.6%) | - | 0 (0%) |
| Aspirin | - | 39 (12.3%) | - | 7 (6.2%) |
| Dual antiplatelet therapy | - | 4 (1.3%) | - | 1 (0.9%) |
| Beta-blocker | - | 243 (76.9%) | - | 88 (77.9%) |
| ACE inhibitors/angiotensin II receptor blocker | - | 155 (49.1%) | - | 40 (35.4%) |
| Sacubitril-Valsartan | - | 2 (0.6%) | - | 1 (0.9%) |
| Calcium antagonist | - | 50 (15.8%) | - | 11 (9.7%) |
| Aldosterone receptor antagonist | 1 | 31 (9.8%) | - | 10 (8.8%) |
| Digoxin | - | 20 (6.3%) | - | 1 (0.9%) |
|
| ||||
| Number of shocks | 76 | 1.4 ± 0.7 | 30 | 1.3 ± 0.6 |
| Applied maximal energy, J | 105 | 176.2 ± 102.6 | 42 | 165.6 ± 36.5 |
ACE = angiotensin converting enzyme; EC = electric cardioversion; LA = left atrial; LVEF = left ventricle ejection fraction.
Performance of all prediction models at each clinical pathway in the cross-validation of the training data, measured in terms of the area under the ROC curve (AUC ROC) and area under the precision-recall curve (AUC PR). Both the CHA2DS2-VASc and HATCH risk scores were used as baseline models for the performance evaluation of each machine-learning developed model.
| Pathway | Predictions | Model | AUC-ROC | AUC-ROC Change | AUC-PR | AUC-PR Change | ||
|---|---|---|---|---|---|---|---|---|
| Spontaneous SR restoration | 1840 | CHA2DS2-VASc | 0.62 (0.50–0.73) | Baseline model | −7% | 0.33 (0.23–0.44) | Baseline model | −2% |
| HATCH | 0.69 (0.58–0.80) | +7% | Baseline model | 0.35 (0.25–0.45) | +2% | Baseline model | ||
| Regularized logistic regression | 0.81 (0.71–0.92) | +19% | +12% | 0.68 (0.53–0.82) | +35% | +33% | ||
| Random forest | 0.82 (0.72–0.92) | +20% | +13% | 0.67 (0.53–0.81) | +34% | +32% | ||
| Extremely randomized trees | 0.81 (0.71–0.92) | +19% | +12% | 0.68 (0.54–0.83) | +35% | +33% | ||
| Boosted trees | 0.80 (0.70–0.91) | +18% | +11% | 0.68 (0.53–0.82) | +35% | +33% | ||
| Pharmacologic cardioversion | 1320 | CHA2DS2–VASc | 0.53 (0.39–0.67) | Baseline model | −2% | 0.29 (0.20–0.37) | Baseline model | +2% |
| HATCH | 0.55 (0.43–0.67) | +2% | Baseline model | 0.27 (0.21–0.33) | −2% | Baseline model | ||
| Regularized logistic regression | 0.74 (0.60–0.87) | +21% | +19% | 0.64 (0.47–0.80) | +35% | +37% | ||
| Random forest | 0.67 (0.49–0.85) | +14% | +12% | 0.60 (0.42–0.77) | +31% | +33% | ||
| Extremely randomized trees | 0.68 (0.51–0.84) | +15% | +13% | 0.58 (0.41–0.75) | +29% | +31% | ||
| Boosted trees | 0.68 (0.53–0.84) | +15% | +13% | 0.61 (0.45–0.78) | +32% | +34% | ||
| Direct-current cardioversion | 2550 | CHA2DS2-VASc | 0.52 (0.42–0.62) | Baseline model | –6% | 0.85 (0.81–0.89) | Baseline model | −1% |
| HATCH | 0.58 (0.47–0.68) | +6% | Baseline model | 0.86 (0.82–0.90) | +1% | Baseline model | ||
| Regularized logistic regression | 0.51 (0.40–0.62) | −1% | −7% | 0.85 (0.80–0.89) | 0% | −1% | ||
| Random forest | 0.48 (0.38–0.59) | −4% | −10% | 0.85 (0.80–0.89) | 0% | −1% | ||
| Extremely randomized trees | 0.47 (0.35–0.58) | −5% | −11% | 0.84 (0.79–0.88) | −1% | −2% | ||
| Boosted trees | 0.46 (0.38–0.55) | −6% | −12% | 0.84 (0.80–0.87) | −1% | −2% | ||
| 6-month AF recurrence | 2730 | CHA2DS2-VASc | 0.54 (0.47–0.61) | Baseline model | −4% | 0.40 (0.35–0.46) | Baseline model | +2% |
| HATCH | 0.58 (0.50–0.65) | +4% | Baseline model | 0.38 (0.33–0.43) | −2% | Baseline model | ||
| Regularized logistic regression | 0.63 (0.55–0.71) | +9% | +5% | 0.55 (0.47–0.63) | +15% | +17% | ||
| Random forest | 0.67 (0.59–0.75) | +13% | +9% | 0.61 (0.52–0.70) | +21% | +23% | ||
| Extremely randomized trees | 0.68 (0.61–0.75) | +14% | +10% | 0.61 (0.52–0.70) | +21% | +23% | ||
| Boosted trees | 0.63 (0.55–0.71) | +9% | +5% | 0.57 (0.48–0.65) | +17% | +19% | ||
| 6-month rhythm control | 3160 | CHA2DS2-VASc | 0.55 (0.48–0.62) | Baseline model | −4% | 0.58 (0.52–0.63) | Baseline model | −2% |
| HATCH | 0.59 (0.52–0.69) | +4% | Baseline model | 0.60 (0.54–0.66) | +2% | Baseline model | ||
| Regularized logistic regression | 0.63 (0.57–0.70) | +8% | +4% | 0.69 (0.63–0.74) | +11% | +9% | ||
| Random forest | 0.68 (0.62–0.74) | +13% | +9% | 0.71 (0.65–0.77) | +13% | +11% | ||
| Extremely randomized trees | 0.69 (0.62–0.75) | +14% | +10% | 0.72 (0.65–0.78) | +14% | +12% | ||
| Boosted trees | 0.57 (0.51–0.64) | +2% | −2% | 0.63 (0.58–0.68) | +5% | +3% | ||
Number of predictions = Number of samples × 10 repetition.
Performance of all prediction models at each clinical pathway in the evaluation with testing data. Both the CHA2DS2-VASc and HATCH risk scores were used as baseline models for the performance evaluation of each machine-learning developed model.
| Pathway | Predictions | Model | AUC-ROC | AUC-ROC Change | AUC-PR | AUC-PR Change | ||
|---|---|---|---|---|---|---|---|---|
| Spontaneous SR restoration | 68 | CHA2DS2-VASc | 0.57 (0.59–0.65) | Baseline model | −9% | 0.31 (0.24–0.39) | Baseline model | −7% |
| HATCH | 0.66 (0.59–0.73) | +9% | Baseline model | 0.38 (0.30–0.47) | +7% | Baseline model | ||
| Regularized logistic regression | 0.80 (0.75–0.86) | +23% | +14% | 0.52 (0.44–0.60) | +21% | +14% | ||
| Random forest | 0.72 (0.66–0.79) | +15% | +6% | 0.48 (0.39–0.56) | +17% | +10% | ||
| Extremely randomized trees | 0.79 (0.73–0.84) | +22% | +13% | 0.57 (0.49–0.64) | +26% | +19% | ||
| Boosted trees | 0.77 (0.71–0.83) | +20% | +11% | 0.56 (0.48–0.64) | +25% | +18% | ||
| Pharmacologic cardioversion | 45 | CHA2DS2-VASc | 0.45 (0.34–0.56) | Baseline model | −10% | 0.18 (0.09–0.27) | Baseline model | −5% |
| HATCH | 0.55 (0.45–0.66) | +10% | Baseline model | 0.23 (0.13–0.33) | +5% | Baseline model | ||
| Regularized logistic regression | 0.62 (0.52–0.72) | +17% | +7% | 0.43 (0.32–0.54) | +25% | +20% | ||
| Random forest | 0.66 (0.57–0.76) | +21% | +11% | 0.40 (0.29–0.51) | +22% | +17% | ||
| Extremely randomized trees | 0.71 (0.63–0.80) | +26% | +16% | 0.42 (0.31–0.53) | +24% | +19% | ||
| Boosted trees | 0.57 (0.46–0.67) | +12% | +2% | 0.30 (0.19–0.40) | +12% | +7% | ||
| Direct-current cardioversion | 87 | CHA2DS2-VASc | 0.57 (0.48–0.66) | Baseline model | +2% | 0.88 (0.81–0.94) | Baseline model | +1% |
| HATCH | 0.55 (0.46–0.65) | –2% | Baseline model | 0.87 (0.81–0.94) | –1% | Baseline model | ||
| Regularized logistic regression | 0.53 (0.44–0.62) | –4% | –2% | 0.87 (0.81–0.94) | –1% | 0% | ||
| Random forest | 0.41 (0.32–0.49) | –16% | –14% | 0.85 (0.77–0.92) | –3% | –2% | ||
| Extremely randomized trees | 0.48 (0.39–0.57) | –9% | –7% | 0.88 (0.81–0.94) | 0% | +1% | ||
| Boosted trees | 0.58 (0.48–0.67) | +1% | +3% | 0.91 (0.86–0.97) | +3% | +4% | ||
| 6-month AF recurrence | 101 | CHA2DS2-VASc | 0.52 (0.46–0.58) | Baseline model | +1% | 0.41 (0.35–0.47) | Baseline model | +1% |
| HATCH | 0.51 (0.45–0.56) | –1% | Baseline model | 0.40 (0.34–0.46) | –1% | Baseline model | ||
| Regularized logistic regression | 0.64 (0.59–0.70) | +12% | +13% | 0.49 (0.43–0.55) | +8% | +9% | ||
| Random forest | 0.61 (0.55–0.67) | +9% | +10% | 0.50 (0.44–0.56) | +9% | +10% | ||
| Extremely randomized trees | 0.62 (0.56–0.68) | +10% | +11% | 0.53 (0.47–0.59) | +12% | +13% | ||
| Boosted trees | 0.57 (0.51–0.63) | +5% | +6% | 0.48 (0.42–0.54) | +7% | +8% | ||
| 6-month rhythm control | 113 | CHA2DS2-VASc | 0.50 (0.45–0.56) | Baseline model | –1% | 0.54 (0.48–0.59) | Baseline model | 0% |
| HATCH | 0.51 (0.46–0.56) | +1% | Baseline model | 0.54 (0.49–0.60) | 0% | Baseline model | ||
| Regularized logistic regression | 0.66 (0.61–0.71) | +16% | +15% | 0.68 (0.63–0.73) | +14% | +14% | ||
| Random forest | 0.60 (0.54–0.65) | +10% | +9% | 0.62 (0.56–0.67) | +8% | +8% | ||
| Extremely randomized trees | 0.60 (0.55–0.65) | +10% | +9% | 0.61 (0.56–0.67) | +7% | +7% | ||
| Boosted trees | 0.58 (0.53–0.63) | +8% | +7% | 0.63 (0.58–0.68) | +9% | +9% | ||
Number of predictions = Number of samples × 10 repetition.
Figure 3Illustration of envisioned clinical utilization of the machine-learning predictions along the elective electrical cardioversion (EC) process. For the predictions, it was used an independent dataset (from that used for the generation of the machine-learning models) of 113 patients. Patients in sinus rhythm (SR) are represented in yellow and patients in atrial fibrillation (AF) in red. Patients predicted by the machine-learning model to undergo or be in SR are included in a blue background. Panel (A) represents predictions (blue background) to undergo spontaneous restoration of SR or pharmacological cardioversion (CV) and ground truth findings for each patient (yellow or red). Panel (B) represents predictions (blue background) of efficacy of direct-current shock application and ground truth findings for each dataset patient (yellow or red). Panel (C) represents predictions (blue background) of AF recurrence at 6 months after SR restoration and ground truth findings for each patient (yellow or red). Panel (D) represents predictions (blue background) of SR control at 6 months and ground truth findings for each patient (yellow or red).
Classification analysis. The classification performance of the CHA2DS2-VASc and HATCH risk scores and the best performance machine-learning model were calculated for each electric cardioversion pathway. The net increase performance (number of patients and percentage) and net reclassification index were provided when utilizing the developed machine-learning model. The most competitive existing risk score, either CHA2DS2-VASc or HATCH, was used as the baseline model for the performance evaluation of the machine-learning developed model at each pathway.
| Pathway/Model | TP | FP | TN | FN | R | S | P | NPV | Net Reclassification Index |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| Extremely randomized trees | 12 | 16 | 35 | 5 | 70.6% | 68.6% | 42.9% | 87.5% | +5.9% |
| CHA2DS2-VASc ≤ 1 | 9 | 20 | 31 | 8 | 52.9% | 60.8% | 31% | 79.5% | −19.6% |
| HATCH ≤ 0 | 9 | 10 | 41 | 8 | 52.9% | 80.4% | 47.4% | 83.7% | Baseline model |
|
| |||||||||
| Extremely randomized trees | 4 | 3 | 33 | 5 | 44.4% | 91.7% | 57.1% | 86.8% | +38.8% |
| CHA2DS2-VASc ≤ 2 | 6 | 25 | 11 | 3 | 66.7% | 30.6% | 19.4% | 78.6% | Baseline model |
| HATCH ≤ 2 | 8 | 36 | 0 | 1 | 88.9% | 0% | 18.2% | 0% | −8.4% |
|
| |||||||||
| Extremely randomized trees | 73 | 10 | 2 | 2 | 97.3% | 16.7% | 88% | 50% | −0.6% |
| CHA2DS2-VASc ≤ 0 | 10 | 3 | 9 | 65 | 13.3% | 75% | 76.9% | 12.1% | −26.3% |
| HATCH ≤ 1 | 61 | 8 | 4 | 14 | 81.3% | 33.3% | 88.4% | 22.2% | Baseline model |
|
| |||||||||
| Extremely randomized trees | 16 | 14 | 47 | 24 | 40% | 77% | 53.3% | 66.2% | +14.8% |
| CHA2DS2-VASc >2 | 14 | 20 | 41 | 26 | 35% | 67.2% | 41.2% | 61.2% | Baseline model |
| HATCH >1 | 15 | 23 | 38 | 25 | 37.5% | 62.3% | 39.5% | 60.3% | −2.4% |
|
| |||||||||
| Extremely randomized trees | 45 | 24 | 28 | 16 | 73.8% | 53.8% | 65.2% | 63.6% | +22.1% |
| CHA2DS2-VASc ≤ 2 | 41 | 32 | 20 | 20 | 67.2% | 38.5% | 56.2% | 50% | Baseline model |
| HATCH ≤ 1 | 38 | 31 | 21 | 23 | 62.3% | 40.4% | 55.1% | 47.7% | −2.8% |
Feature importance. Variable ranking by their contribution to the predictions of the extremely randomized tree model at each pathway. The score represents the relative importance of that variable for the machine-learning model. The weight of the features is scaled from 0 to 1; thus, variables close to 1 show a higher impact on the predictive model.
| Pathway | Variable | Score |
|---|---|---|
|
| Paroxysmal atrial fibrillation | 1 |
| History of oral anticoagulation treatment | 0.316 | |
| LA volume index ≥ 42 mL/m2 | 0.257 | |
| ACE inhibitors/Angiotensin II receptor blockers | 0.150 | |
| LVEF < 50% | 0.065 | |
|
| Paroxysmal atrial fibrillation | 1 |
| Heart failure | 0.111 | |
| Dyslipidemia | 0.085 | |
| Glomerular filtration rate | 0.066 | |
| Peripheral vascular disease | 0.064 | |
|
| Chronic obstructive pulmonary disease | 1 |
| Long-standing persistent AF | 0.693 | |
| Heart Failure | 0.411 | |
| Beta blockers | 0.297 | |
| LA volume index ≥ 35 mL/m2 | 0.277 | |
|
| Spontaneous SR restoration | 1 |
| History of oral anticoagulation treatment | 0.857 | |
| Hypertension | 0.849 | |
| ACE inhibitors/angiotensin II receptor blockers | 0.827 | |
| NYHA functional class >II | 0.818 | |
|
| LA volume index ≥ 35 mL/m2 | 1 |
| Paroxysmal atrial fibrillation | 0.577 | |
| History of oral anticoagulation treatment | 0.468 | |
| LA volume index ≥ 48 mL/m2 | 0.446 | |
| Smoking history | 0.423 |
LA = left atrial; LVEF = left ventricular ejection fraction; SR = sinus rhythm.
Figure 4The learning curve of the best model for each of the pathways, including hyperparameter tuning. The results are shown as the area under the ROC curve with its confidence interval as measured in the external validation set. Notice that the results of the models are not displayed until a certain percentage of the training set is used. This is because the hyperparameter tuning step performs a cross validation that requires a minimum of events to produce results.
TRIPOD Checklist: Prediction Model Development and Validation.
| Section/Topic | Item | Checklist Item | Page | |
|---|---|---|---|---|
|
| ||||
| 1 | D;V | Identify the study as developing and/or validating a multivariable prediction model, the target population, and the outcome to be predicted. | 1 | |
| Abstract | 2 | D;V | Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions. | 2 |
|
| ||||
| Background and objectives | 3a | D;V | Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models. | 3 |
| 3b | D;V | Specify the objectives, including whether the study describes the development or validation of the model or both. | 3 | |
|
| ||||
| Source of data | 4a | D;V | Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable. | 2 |
| 4b | D;V | Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up. | 3 | |
| Participants | 5a | D;V | Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centers. | 3 |
| 5b | D;V | Describe eligibility criteria for participants. | 3 | |
| 5c | D;V | Give details of treatments received, if relevant. | 4 | |
| Outcome | 6a | D;V | Clearly define the outcome that is predicted by the prediction model, including how and when assessed. | 2, 3 |
| 6b | D;V | Report any actions to blind assessment of the outcome to be predicted. | 3 | |
| Predictors | 7a | D;V | Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured. |
|
| 7b | D;V | Report any actions to blind assessment of predictors for the outcome and other predictors. | 4 | |
| Sample size | 8 | D;V | Explain how the study size was arrived at. | 4 |
| Missing data | 9 | D;V | Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method. | 4 |
| Statistical analysis methods | 10a | D | Describe how predictors were handled in the analyses. | 4 |
| 10b | D | Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation. | 3, 4 | |
| 10c | V | For validation, describe how the predictions were calculated. | 4 | |
| 10d | D;V | Specify all measures used to assess model performance and, if relevant, to compare multiple models. | 4 | |
| 10e | V | Describe any model updating (e.g., recalibration) arising from the validation, if done. | NA | |
| Risk groups | 11 | D;V | Provide details on how risk groups were created, if done. | 4 |
| Development vs. validation | 12 | V | For validation, identify any differences from the development data in setting, eligibility criteria, outcome, and predictors. | 4 |
|
| ||||
| Participants | 13a | D;V | Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. |
|
| 13b | D;V | Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and the outcome. |
| |
| 13c | V | For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors, and outcome). | ||
| Model development | 14a | D | Specify the number of participants and outcome events in each analysis. |
|
| 14b | D | If done, report the unadjusted association between each candidate predictor and outcome. | NA | |
| Model specification | 15a | D | Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients and model intercept or baseline survival at a given time point). | 6 |
| 15b | D | Explain how to use the prediction model. | 6 | |
| Model performance | 16 | D;V | Report performance measures (with CIs) for the prediction model. | |
| Model-updating | 17 | V | If done, report the results from any model updating (i.e., model specification, model performance). | NA |
|
| ||||
| Limitations | 18 | D;V | Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data). | 14, 15 |
| Interpretation | 19a | V | For validation, discuss the results with reference to the performance of the development data and any other validation data. | 13, 14 |
| 19b | D;V | Give an overall interpretation of the results, considering objectives, limitations, results from similar studies, and other relevant evidence. | 13, 14 | |
| Implications | 20 | D;V | Discuss the potential clinical use of the model and implications for future research. | 13,14 |
|
| ||||
| Supplementary information | 21 | D;V | Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets. | 7, 14 |
| Funding | 22 | D;V | Give the source of funding and the role of the funders for the present study. | 15 |