Literature DB >> 32227238

Predicting recurrent atrial fibrillation after catheter ablation: a systematic review of prognostic models.

Janine Dretzke¹, Naomi Chuchu¹, Ridhi Agarwal¹, Clare Herd¹, Winnie Chua², Larissa Fabritz^2,3, Susan Bayliss¹, Dipak Kotecha^2,3, Jonathan J Deeks¹, Paulus Kirchhof^2,3,4, Yemisi Takwoingi¹.

Abstract

AIMS: We assessed the performance of modelsf (risk scores) for predicting recurrence of atrial fibrillation (AF) in patients who have undergone catheter ablation. METHODS AND
RESULTS: Systematic searches of bibliographic databases were conducted (November 2018). Studies were eligible for inclusion if they reported the development, validation, or impact assessment of a model for predicting AF recurrence after ablation. Model performance (discrimination and calibration) measures were extracted. The Prediction Study Risk of Bias Assessment Tool (PROBAST) was used to assess risk of bias. Meta-analysis was not feasible due to clinical and methodological differences between studies, but c-statistics were presented in forest plots. Thirty-three studies developing or validating 13 models were included; eight studies compared two or more models. Common model variables were left atrial parameters, type of AF, and age. Model discriminatory ability was highly variable and no model had consistently poor or good performance. Most studies did not assess model calibration. The main risk of bias concern was the lack of internal validation which may have resulted in overly optimistic and/or biased model performance estimates. No model impact studies were identified.
CONCLUSION: Our systematic review suggests that clinical risk prediction of AF after ablation has potential, but there remains a need for robust evaluation of risk factors and development of risk scores.

Entities: Chemical Disease Gene Species

Keywords: Atrial fibrillation; Catheter ablation; Model performance; Prognostic model; Recurrence; Systematic review

Mesh：

Year: 2020 PMID： 32227238 PMCID： PMC7203634 DOI： 10.1093/europace/euaa041

Source DB: PubMed Journal: Europace ISSN： 1099-5129 Impact factor: 5.214

Several prognostic models have been developed to predict individual risk of recurrence of atrial fibrillation (AF) after catheter ablation. To the best of our knowledge this is the first comprehensive systematic review of such models to (i) include detailed risk of bias assessment of model development and validation studies and (ii) provide a descriptive summary of measures of model performance in forest plots. Model discriminatory ability based on the c-statistic was highly variable; no model had consistently poor or good discriminatory ability. Model calibration (i.e. how well predicted risk agrees with observed risk) was rarely reported. Thus overall assessment of model performance remains incomplete. Risks of bias were substantial and included a lack of internal validation in model development studies, flawed variable selection and weighting, low event rates and poor reporting of missing data. Robust evaluation of risk factors and development of clinically useful risk scores is still needed.

Introduction

Atrial fibrillation (AF) is the most common arrhythmia diagnosed in clinical practice, and worldwide incidence and prevalence is increasing. Atrial fibrillation is predicted to affect between 1.3 and 1.8 million patients in the UK and 18 million people in Europe by 2060., Drivers for this increase include an ageing population, better survival from conditions such as ischaemic heart disease and increasing multimorbidity., Atrial fibrillation is associated with increased morbidity and mortality, particularly cardiovascular related., Currently available treatments can reduce this, particularly via anticoagulation for stroke prevention, but many patients remain symptomatic even on optimal rate control therapy. Furthermore, these patients remain at high risk of cardiovascular complications, often manifesting as heart failure or sudden death., To mitigate this epidemic of AF-related disease, efforts are underway to improve primary and secondary prevention., Unfortunately, recurrent AF is common: approximately 70% of patients experience recurrence after a cardioversion., This proportion can be somewhat reduced with the use of antiarrhythmic drugs., Atrial fibrillation ablation, mainly via pulmonary vein isolation, is an effective and safe intervention to restore and maintain sinus rhythm., Recurrence of AF after catheter ablation is estimated to be between 20% and 45%., Catheter ablation seems to achieve a better quality of life than antiarrhythmic drug therapy., Furthermore, recent data suggest that AF ablation could have a positive effect on left ventricular function in patients with heart failure. These benefits are better sustained in patients who remain free of AF and need to be balanced against the discomfort and complication risk of AF ablation. Hence, there is a growing clinical need to identify patients at risk of developing recurrent AF after AF ablation. Numerous risk factors are associated with the development of AF, including age, hypertension, diabetes mellitus, and heart failure., Less validated risk factors include subclinical hyperthyroidism, obesity, and sleep apnoea syndrome. Risk factors associated with recurrence are less well-established but likely include type of AF (chronic or paroxysmal) and echocardiographic parameters., Prognostic models, which combine several predictors to generate an individualized risk estimate have been developed for AF prediction in different populations. We identified two systematic reviews on prognostic models for predicting recurrent AF after ablation,; these reviews had limited search strategies and did not include formal risk of bias appraisal. We therefore performed a comprehensive systematic review on predicting recurrent AF in patients who underwent AF ablation.

Methods

The systematic review protocol was registered with PROSPERO (CRD42018111649). Full details of methods have been published.

Study eligibility criteria

Study design

Published or unpublished studies reporting (i) prediction model development with internal validation, (ii) prediction model development with external validation, (iii) external model validation with or without model updating, or (iv) model impact assessment were eligible for inclusion. Studies that developed a new model with no subsequent validation were recorded but not assessed. A prognostic model was defined as a combination of two or more predictors within a statistical model used to predict an individual’s risk of the outcome. An impact study quantifies the impact of the model on clinical decision-making and patient outcome.

Population

Patients undergoing single or repeat ablation using any method were eligible for inclusion. There were no restrictions on previous treatments.

Outcomes

The clinical outcome of interest was recurrent AF at any time post-ablation. We excluded models that were developed for predicting a different outcome (e.g. the CHADS2 score for stroke prediction). Model performance measures of interest were calibration measures (e.g. calibration slope, calibration-in-the-large), which indicate how well the predicted risk compares to the observed risk, and discrimination measures (e.g. c-statistic), which indicate how well the model differentiates between those with and without the outcome. Measures that quantify the added discriminative value of one model over another, such as the net reclassification index (NRI) and/or integrated discrimination index (IDI), were also extracted.

Search strategy

Bibliographic databases (MEDLINE, MEDLINE In-Process, Embase, and Cochrane CENTRAL) were searched from inception to November 2018 using combinations of text and index terms relating to AF and models (Supplementary material online, File S1). The ‘model’ component of the search strategy was informed by a validated search filter. There were no date or language restrictions. Reference lists of relevant articles were checked and subject experts consulted. ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform were searched for ongoing studies and the Conference Proceedings Citation Index for conference abstracts.

Study selection

A sample of records was screened by two reviewers to pilot the screening criteria. In a change from the protocol, the remainder of the title and abstract screening was undertaken by one reviewer only (J.D., N.C., or C.H.) to process the large volume of records retrieved (n = 16 023). Records, where eligibility for inclusion was unclear, were discussed by a panel of reviewers (J.D., N.C., Y.T., and C.H.), and disagreements on study eligibility were resolved through discussion. Full texts (n = 150) were reviewed where a decision could not be made based on title and abstract.

Data extraction

Data extraction was undertaken by one reviewer (J..D.) using a pre-defined and piloted data extraction form (Excel 2016). Data items to extract were based on the CHARMS checklist, and included: Participants (e.g. proportion with paroxysmal/persistent AF, ablation procedure). Study design (e.g. prospective or retrospective cohort, sample size, length of follow-up). Outcome measures (e.g. definition and frequency of outcome assessment). Model development (e.g. method for selection of predictors, validation method). Model performance (e.g. c-statistic, ratio of observed and expected events (E/O)).

Assessment of risk of bias

Risk of bias was assessed using the Prediction Study Risk Of Bias Assessment Tool (PROBAST). This assesses criteria within five domains: participant selection; predictors; outcomes; sample size; and patient flow and analysis (Supplementary material online, File S2). Risk of bias assessment was performed by one reviewer (J.D.) and checked by a further two (Y.T. and R.A.).

Synthesis

All studies were narratively described, with key findings tabulated and results presented with confidence intervals (CIs) when reported. Several studies reported the c-statistic. However, quantitative pooling was not possible due to differences in populations (e.g. different approaches to ablation, single vs. repeat ablation), variable electrocardiogram (ECG) monitoring intensity for recurrent AF, length of follow-up, possible overlap between patient cohorts, and a lack of uncertainty measures such as CIs. The c-statistics, grouped by type of model or by study, were instead presented in forest plots; this included subgroup analyses. A c-statistic of ≥0.7 was considered good and ≥0.8 very good discriminative ability; values <0.7 were considered weak, and <0.5 as very weak. These cut-offs are arbitrary and intended as a rough guide only. Lack of meta-analysis precluded formal exploration of publication bias using funnel plots. The body of evidence identified was considered in the context of the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) domains (risk of bias, imprecision, inconsistency, indirectness, and publication bias). As there is no specific guidance on how to apply GRADE to systematic reviews of prognostic models, we did not produce a GRADE summary of findings table or generate a quality score. PRISMA guidelines were followed for the reporting of the systematic review.

Results

Search results

Thirty-three studies of 13 models were included (Figure ). Six studies included two separate cohorts. Studies that developed a model, which was not validated (either in the same or another study) were documented but not analysed (Supplementary material online, File S3). One study (Kosiuk et al.) developed and externally validated a score (DR-FLASH) primarily to predict low-voltage areas rather than AF; this study was not included but findings have been presented in Supplementary material online, File S4. PRISMA flow diagram. AF, atrial fibrillation.

Study characteristics

Twelve studies, described the development (or modification) of a model, and 28 studies,,,, (including 31 patient cohorts) undertook external model validation. Seven,,, of these studies undertook both model development and external validation. Twenty-five studies reported a single relevant model, and eight studies,,,,,,, evaluated two or more. No model impact studies were identified. Most studies were retrospective analyses of consecutive patients; detailed study characteristics are provided in Supplementary material online, File S5.

Variables included in models

Twenty-five variables were included across 13 models (Table ). Models included between three and six variables. The most common variables were left atrial parameters (nine models), type of AF (eight models), age (seven models), sex (four models), and estimated glomerular filtration rate (eGFR, four models). Model variables √, variable included in model; AADs, antiarrhythmic drugs; AF, atrial fibrillation; APC, atrial premature contraction; BMI, body mass index; BNP, brain natriuretic peptide; CAD, coronary artery disease; COPD, chronic obstructive pulmonary disease; eGFR, estimated glomerular filtration rate; LAD, left atrial diameter; LVEF, left ventricular ejection fraction; MetS, metabolic syndrome; NLA, normalized left atrial area; OSA, obstructive sleep apnoea; PAF, paroxysmal atrial fibrillation; TIA, transient ischaemic attack. Severe comorbidity defined as severe mitral regurgitation, moderate mitral stenosis, mitral valvotomy, mitral valve replacement, hypertrophic cardiomyopathy, or structural congenital heart disease.

Risk of bias

Population, predictors, and outcomes

There was poor reporting of whether AF recurrence was determined without knowledge of predictor information (97% of study cohorts; Figure ). Only one study specifically noted that treating physicians were not blinded to one of the variables [brain natriuretic peptide (BNP) status] which may have influenced frequency or intensity of screening. Studies did not always report how AF recurrence was assessed (28% of study cohorts), whether a standard outcome definition was used (18%) or whether predictors were assessed without knowledge of outcome information (26%). An assumption was made that single-centre studies would have a consistent approach to defining and assessing predictors, although this may not always be the case (e.g. for left atrial parameters). Studies used a combination of ECG and Holter monitoring for assessing recurrence, with around 60% of studies reporting that additional investigations were scheduled if patients reported symptoms. There was variation both within and between studies in intensity of monitoring which can influence outcome detection (e.g. monitoring between two and four times in the first year). Only one study reported the proportion of patients who received Holter monitoring. Three studies,, had a proportion of patients with implantable recorders and one a proportion of patients with pacemaker data. Follow-up time was variable (6 months to >5 years, Supplementary material online, File S5). Risk of bias summary. Chart shows percentage of study cohorts meeting/not meeting criteria: AS, all studies (39 study cohorts); EV, external validation studies (31 study cohorts); MD, model development studies (12 study cohorts); N, no; NI, no or insufficient information; PN, probably no; PY, probably yes; Y, yes. There are more evaluations than studies, as some studies included more than one cohort and/or analysis; the criterion ‘participants with missing data handled appropriately’ is only applicable where there was missing data.

Analysis—model development studies

Model development was subject to substantial risk of bias and/or poor reporting (Figure ; Supplementary material online, File S2). Three studies (25%),, had an adequate (>10) number of events per candidate variable, and three studies,, used appropriate methods for selecting predictors (i.e. based on multivariable modelling). One study stated that a variable cut-off was chosen on the basis of prior research; the remaining studies appeared to dichotomize at least one variable based on study data. Two studies, (17%) appeared to appropriately assign predictor weights based on regression coefficients; the remaining studies (83%) gave no information or used an incorrect method (such as simply assigning one point per variable). Time-to-event analysis (Cox model) was appropriately used in six (50%) studies.,,,,, Eleven (92%) studies did not perform internal validation and thus failed to account for model overfitting and optimism in model performance; one study used a split sample approach which is not thought to be an adequate method. Five studies, (42%) modified existing scores, e.g. by adding another variable, or changing a variable cut-off, but did not consider these as new models or perform internal validation. No analyses were performed of the added value of a modified model compared with the previous one.

Analysis—model evaluation studies

Twenty-eight studies (31 cohorts) externally validated a previously developed model. Fifteen cohorts (50%) had sample sizes with event rates of 100 or over, whilst 16 had smaller sample sizes and event rates (<100). Most cohorts (90%) appeared to evaluate models using the same variable cut-offs as specified in the model development study. An exception were studies relating to ALARMEc, where variable cut-offs were changed.

Analysis—all studies

For most analyses (70%), there was insufficient information on data completeness. Most studies were based on retrospective analyses and eligibility criteria sometimes related to availability of model variable data and/or a minimum follow-up time but this was not always made explicit. Around 60% of analyses presented a measure of model discrimination (c-statistic). Only two studies, additionally considered model calibration. Neither discrimination nor calibration measures were reported in 30% of analyses.

Model performance

ALARMEc

Five studies were identified. Berkowitsch et al. developed a risk score [variables: type of AF, metabolic syndrome, eGFR, and normalized left atrial area (NLA)], and applied this to patients undergoing first ablation (Supplementary material online, File S5). Subsequent studies added a further variable (cardiomyopathy) and externally validated the score,,, in first and/or repeat ablation populations. There was inconsistency in terms of variable cut-off for NLA. Recurrence rates after a first procedure varied between 27% and 47% (Supplementary material online, File S6). Four studies found that recurrence increased with increasing risk scores. Two studies reported a c-statistic of 0.66 (95% CI 0.58–0.73) and 0.49 (95% CI 0.42–0.56), respectively (Figure ). There was little difference in c-statistic for paroxysmal or persistent AF sub-groups. Model c-statistics (by model). ALL, all patients; BH, B-HATCH score; C1, cohort 1; C2, cohort 2; CI, confidence interval; DEV, model development; EV, external model validation; H, HATCH score; PAF, paroxysmal AF sub-group; PER, persistent AF sub-group; RA, repeat ablation sub-group.

APPLE

Ten studies,,,, evaluated the APPLE score [variables: age, type of AF, eGFR, left atrial diameter (LAD), and left ventricular ejection fraction] in first and/or repeat ablation populations (Supplementary material online, File S5). A model development study for this risk score was not identified. One study was specifically interested in very late prediction of recurrence (>12 months). One study (Jud et al.) developed a new risk score by adding a variable (previous ablation) to the APPLE score; this new score (SUCCESS) was not internally validated and there was no attempt to quantify the added value of this score compared with APPLE. Recurrence rates ranged from 16% to 64%. Eight studies reported c-statistics ranging from 0.46 to 0.74 (Figure ) indicating very poor to good discriminative ability. The poorest discriminative ability was in a subgroup of patients with persistent AF. There was little difference in c-statistic between a repeat ablation subgroup and the total population, a paroxysmal AF subgroup and total population or between the APPLE score and the modified APPLE (SUCCESS) score. One study (Jud et al.) reported a calibration measure and found no statistically significant difference between observed and expected events based on the Hosmer–Lemeshow test. This test has limited statistical power and is difficult to interpret as there is no indication of direction or magnitude of miscalibration. Other measures reported were proportions of recurrence for different scores and odds ratios (Supplementary material online, File S6).

ATLAS

One study (Mesquita et al.) developed and validated this score in patients undergoing first ablation [variables: age, sex, type of AF, current smoking, and indexed left atrial volume]. The recurrence rate was 27%. The c-statistic was 0.75 in both the development and validation cohorts. The calibration-in-the-large-statistic was 0.077 (P = 0.272) and the calibration slope 0.93 indicating that observed events were only slightly higher than predicted.

BASE-AF2

This score was developed by Canpolat et al. and validated in a further three studies.,, Included variables were type of AF, LAD, body mass index, current smoking, AF history, and early recurrence. Two studies, had mixed populations in terms of single and repeat ablation. Recurrence rates varied between 15% and 27%. Studies reported c-statistics ranging from 0.61 to 0.94 (Figure ). Sub-group analysis in Bavishi et al. indicated slightly poorer discriminative ability in a persistent AF population [c-statistic 0.61 (95% CI 0.52–0.69)] compared with a paroxysmal AF population [c-statistic 0.69 (95% CI 0.59–0.78)]. A sensitivity of 80% and specificity of 91.6% (CI NR) were reported in the development study (threshold BASE-AF2 ≥ 3).

CAAP-AF

The score was developed and externally validated by Winkle et al., and validated in a further three studies.,, Included variables were age, sex, type of AF, LAD, coronary artery disease, and number of antiarrhythmic drugs failed. Most patients were undergoing first ablation. Recurrence rates varied between 8% and 59%. The studies reported a c-statistic between 0.59 and 0.71 suggesting weak to good discriminative ability (Figure ). Sensitivity and specificity were reported in two studies: 64% and 68% (Sanhoury et al.; threshold ≥ 5, CI NR) and 57.9% (95% CI 49.0–66.4) and 57% (95% CI 46.3–67.7) (Potpara et al.; threshold ≥ 6), respectively.

HATCH

This score was developed for the prediction of progression from paroxysmal to persistent AF in patients who had not undergone ablation (de Vos et al.). It has subsequently been applied in 12 studies to predict recurrence of AF in post-ablation cohorts. Included variables are age, heart failure, hypertension, chronic obstructive pulmonary disease, and stroke/transient ischaemic attack. Patients were undergoing first ablation in most studies; three studies,, had a proportion with repeat ablation. In two studies,, ablation was performed for atrial flutter rather than AF; however, the model was applied to predict post-ablation AF. One study used the score to predict very late recurrence (>12 months post-ablation). Shaikh et al. applied a modified version of HATCH (with obstructive sleep apnoea added as variable), and Shaikh et al. evaluated both the HATCH score and a modified version (HATCH + BNP as added variable); neither study performed internal validation of the modified score. Recurrence rates varied between 16% and 48%. Eight studies reported a c-statistic between 0.49 and 0.74 (Figure ) indicating very poor to good discriminative ability. The remaining studies reported proportion of recurrence according to score and/or mean scores in those with and without recurrence (Supplementary material online, File S6). There was no clear trend towards increasing recurrence with higher scores. At a threshold ≥2, the sensitivities and specificities were 25.0% and 92.4% (Miao et al.) and 51.8% and 84.7% (Chen et al.), respectively.

MB-LATER

This score was developed by Mujovic et al. for the prediction of very late recurrence (>12 months) and validated in a very small cohort (n = 39). Another five studies, applied the score to post-ablation cohorts, with one study predicting very late recurrence. Included variables are sex, type of AF, LAD, early recurrence, and bundle branch block. Three studies included a proportion of repeat ablations.,, Recurrence rates were between 15% and 64%. Five studies reported c-statistics (Figure ) varying between 0.58 and 0.83 indicating very weak to very good discriminative ability. Little difference in c-statistic was reported between paroxysmal [0.58 (95% CI 0.49–0.68)] and persistent AF [0.58 (95% 0.49–0.67)] populations. Two studies reported sensitivity and specificity of 42.9% (95% CI 34.3–51.7) and 74.2% (95% CI 64.1–82.7) (Potpara et al., threshold ≥ 2) and 75% and 72.6% (Mujovic et al., threshold ≥ 2). No calibration measures were reported.

Other models

Two additional studies were identified that developed and externally validated a model in separate cohorts, the FER2CI score [variables: sex, coupling interval of atrial premature contraction, and early recurrence] and a ‘risk score’ [variables: duration of persistent AF, eGFR, and presence of severe comorbidity]. Both studies were reported as a conference abstract only. Egami et al. aimed to predict very late recurrence. Jarman et al. included only patients with persistent AF. Recurrence rates were 21% in the development cohort in Egami et al. and not reported for the other cohorts. Both studies found an association between higher risk scores and recurrence but did not report model performance.

Studies comparing models

Eight studies,,,,,,, compared two or more risk scores in the same population. There was no consistency across studies in terms of which models were compared, and no model consistently showed better discrimination based on the c-statistic (Figure ). Model c-statistics (by study). ALL, all patients; C1, cohort 1; C2, cohort 2; CI, confidence interval; DEV, model development; EV, external model validation; PAF, paroxysmal AF sub-group; PER, persistent AF sub-group. Four studies,,, reported risk reclassification measures such as NRI or IDI, albeit without CIs, and/or undertook decision curve analysis (Supplementary material online, File S6). Findings suggested that (i) adding BNP as a variable (to HATCH) may improve the model, (ii) MB-LATER may be able to better predict recurrence compared with APPLE, ALARMEc, BASE-AF2, and HATCH, (iii) MB-LATER, BASE-AF2, APPLE, and CAAP-AF showed similar clinical usefulness but are more useful than HATCH, and (iv) MB-LATER showed greater clinical usefulness compared with CAAP-AF.

Discussion

Main findings

This systematic review found 33 studies developing and/or validating 13 models to predict AF recurrence after ablation. Model discriminatory ability based on the c-statistic was reported for around 60% of analyses and was highly variable—from very poor to very good. No model had consistently poor or good discriminatory ability across studies. Eight studies compared two or more models in the same population, again with no model showing consistently better discrimination compared with others. Model calibration was only reported by two studies, and assessment of overall model performance therefore remains incomplete. While our systematic review suggests that clinical risk prediction of recurrent AF after ablation has potential, there is a need for robust evaluation of risk factors and development of risk scores. The most common model variables were left atrial parameters, type of AF and age, and to a lesser extent sex and eGFR. All model variables can be measured before ablation and therefore models could be used pre-procedurally to predict the likelihood of recurrence. The exception are those models (MB-LATER, BASE-AF2, and FER2CI) including early recurrence (within 3 months after ablation) as a variable; these scores can hence only be used to predict late recurrence. Given the inconsistent and sometimes poor performance of the models to date, it is possible that incorporating other variables may improve model performance. There may be a role for biomarkers in assessing AF risk, including serum biomarkers such as BNP, or fibroblast growth factor 23, imaging of atrial function, ECG-based parameters, and genetic factors. Some as yet unvalidated models (Supplementary material online, File S3) include additional variables. A large ongoing study from South Korea (NCT02138695) plans to develop a simulation model to predict recurrence based on clinical, electrophysiological, anatomical, imaging, and serological characteristics. Clearly, these efforts would benefit from robust evaluation of clinical candidate predictors for recurrent AF after ablation.

Issues identified

A major risk of bias is that none of the development studies performed internal validation, which may result in overly optimistic and/or biased model performance estimates. This is reflected in Figure , which shows that c-statistics reported for development studies are often higher than those of validation studies. Overestimation of model performance is more likely to occur when the number of events per candidate predictor is low, model variables are dichotomized based on study data, variables are selected by univariate analyses and weights are incorrectly assigned to predictors. These were all commonly encountered issues. Whilst external validation studies mostly applied the models as originally developed and thus met this quality criterion, this does not mitigate the fact that models were often poorly developed in the first place. Furthermore, around half of studies undertaking external validation did not have a sufficiently large event rate to minimize bias in effect estimates. Risk of bias assessment was hampered by poor reporting, especially on completeness and handling of missing data, as well as predictor assessment. Poor reporting was not limited to conference abstracts but also seen across full-text studies; this is a recognized issue in prognostic research, despite the existence of reporting guidelines. For comparisons of models, we note that interpretation of both the NRI and the IDI are considered problematic in terms of magnitude and clinical applicability and thus any inferences regarding superior model performance should be regarded as uncertain. In addition to risk of bias, we also considered the GRADE criteria of imprecision, inconsistency, indirectness, and publication bias. There were concerns regarding indirectness as some models were not applied in the population they were developed in, or for the purpose they were developed for. So for example, HATCH was developed to predict progression to persistent AF but is commonly used to predict AF recurrence after ablation. MB-LATER was developed to predict very late recurrence (>12 months post-ablation) but has been applied in studies to predict recurrence after 3 months. In terms of precision, CIs around c-statistics were often wide, and many encompassed values that spanned weak to good model performance; seven (33%) studies reporting a c-statistic did not report a CI. Heterogeneity could not be quantified since we did not perform a meta-analysis, but inconsistency in discriminatory ability is evident within groups of studies for individual models. Variability may stem from differences in populations, ablation procedure, length of follow-up, and intensity of outcome ascertainment. Publication bias was not assessed as no meta-analysis was performed; it is however known to be an issue in prognosis research.

Strengths of review and future directions

This systematic review used sensitive search strategies and identified more studies than reported in previous reviews. To the best of our knowledge, it is also the first systematic review in this area to conduct detailed risk of bias assessment using PROBAST. Whilst heterogeneity precluded meta-analysis, results have been presented where possible in forest plots. Screening of all references was performed by only one reviewer due to the large number of references retrieved; the potential for missed studies was mitigated by reference checking of relevant reviews and primary studies, searching in conference abstract databases and screening of a sub-set of references by more than one reviewer. Impact studies quantify the effect of using a model on decision-making and patient outcome. No studies were identified that looked at the impact of using risk categories based on model scores to influence clinical practice. Given the performance of the models to date, an impact study would likely be premature. Equally, a focus on developing ever more models may not be helpful unless these are more rigorously developed or validated. Future research could focus on revalidating existing models using more methodologically sound approaches particularly with regard to internal validation, variable selection and weighting, assessment of model calibration, and reporting of methods used. Future model development and validation studies may also want to consider pre-specifying sub-groups, e.g. patients with persistent and paroxysmal AF, or first or repeat ablation. Prospective measurement of model variables and outcomes would ensure that patients are not selected based on availability of variable or outcome data, whilst continuous assessment of outcome using implanted devices would be more effective for detecting the outcome. It is recognized that AF is caused by different mechanisms which are currently not targeted by treatment strategies., Research is ongoing to identify clinical markers related to potential causal mechanisms and to integrate these into prediction models; this may ultimately allow development of more tailored approaches to prevention and therapy. Future research on model development and validation will likely need to consider differences in underlying causal mechanisms to ensure that models are an appropriate fit to different patient groups.

Conclusions

Whilst our systematic review suggests that clinical risk prediction of recurrent AF after ablation has potential, there is a need for robust evaluation of risk factors and further development of risk scores to achieve clinical utility. Click here for additional data file.

Table 1

Model variables

Variables	Risk score Berkowitsch 2012	ALARMEc	APPLE	SUCCESS	ATLAS	BASE-AF₂	CAAP-AF	HATCH	HATCH + OSA	B-HATCH	MB-LATER	FER2CI	Risk score Jarman 2012
Age			√	√	√		√	√	√	√
Sex					√		√				√	√
Type of AF	√	√	√	√	√	√	√				√
Duration of persistent AF													√
Previous ablations				√
MetS	√	√
eGFR	√	√	√	√
Left atrial parameters	√	√	√	√	√	√	√				√		√
Min coupling interval of APC												√
Cardiomyopathy		√
Heart failure								√	√	√
LVEF			√	√
BMI						√
Current smoking					√	√
AF history						√
Early recurrence						√					√	√
CAD							√
Antiarrhythmics failed							√
Hypertension								√	√	√
OSA									√
COPD								√	√	√
Stroke or TIA								√	√	√
BNP										√
Bundle branch block											√
Presence of severe comorbidity^a													√
Point range and cut-offs;1 point for each variable unless otherwise stated	Point range: 0–4; √ non-PAF; √ ≤68 mL/min eGFR; √ NLA >11.5	Point range: 0–5; √ non-PAF; √ ≤68 mL/min eGFR; √ NLA >11.5 or >10.25 depending on study	Point range: 0–5; √ >65 years; √ PAF; √ <60 mL/min eGFR; √ LAD ≥43 mm; √ <50% LVEF	Point range as APPLE score + additional point for each previous ablation	Possible points: 15+; 1 for age (>60); 4 for female gender; 7 for current smoker; 2 for non-PAF; 1 for each 10 mL/m² LAV indexed for body surface area	Point range: 0–6; √ non-PAF; √ LAD >40 mm; √ >28 kg/m² BMI; √ >6 years AF	Point range: 0–13; 1 for <50, 2 for 50 to <60, 3 for 60 to <70, and 4 for ≥70; √ female; 2 for persistent or long-standing AF; 0 for LAD <4; 1 for 4 to <4.5; 2 for 4.5 to <5; 3 for 5 to <5.5; and 4 for ≥5.5 cm; 0 for none, 1 for 1 or 2, 2 for >2 (AADs failed)	Point range: 0–7; √ >75 years; 2 for stroke and heart failure	No details on point range; √ >75 years; 2 for stroke and heart failure	Point range: 0–10; √ >75 years; 2 for stroke and heart failure; 3 points for BNP ≥100 pg/dL	Point range: 0–6; 1 for PAF and 2 for long-standing AF; √ male; √ LAD ≥47 mm	Point range: 0–4; 1 point for female; 2 for early recurrence of AF; 1 for coupling interval <49%	Point range: 0–7; √ duration of continuous AF >1 year (1 point); √ LAD 40–45 mm (1 point), 46–50 mm (2 points), >50 (3 points); √ any severe comorbidity^a 3 points

√, variable included in model; AADs, antiarrhythmic drugs; AF, atrial fibrillation; APC, atrial premature contraction; BMI, body mass index; BNP, brain natriuretic peptide; CAD, coronary artery disease; COPD, chronic obstructive pulmonary disease; eGFR, estimated glomerular filtration rate; LAD, left atrial diameter; LVEF, left ventricular ejection fraction; MetS, metabolic syndrome; NLA, normalized left atrial area; OSA, obstructive sleep apnoea; PAF, paroxysmal atrial fibrillation; TIA, transient ischaemic attack.

Severe comorbidity defined as severe mitral regurgitation, moderate mitral stenosis, mitral valvotomy, mitral valve replacement, hypertrophic cardiomyopathy, or structural congenital heart disease.

67 in total

1. The APPLE score: a novel and simple score for the prediction of rhythm outcomes after catheter ablation of atrial fibrillation.

Authors: Jelena Kornej; Gerhard Hindricks; M Benjamin Shoemaker; Daniela Husser; Arash Arya; Philipp Sommer; Sascha Rolf; Pablo Saavedra; Arvindh Kanagasundram; S Patrick Whalen; Jay Montgomery; Christopher R Ellis; Dawood Darbar; Andreas Bollmann
Journal: Clin Res Cardiol Date: 2015-04-17 Impact factor: 5.460

Review 2. Clinical scores used for the prediction of negative events in patients undergoing catheter ablation for atrial fibrillation.

Authors: Falco Kosich; Katja Schumacher; Tatjana Potpara; Gregory Y Lip; Gerhard Hindricks; Jelena Kornej
Journal: Clin Cardiol Date: 2019-01-14 Impact factor: 2.882

3. A guide to systematic review and meta-analysis of prediction model performance.

Authors: Thomas P A Debray; Johanna A A G Damen; Kym I E Snell; Joie Ensor; Lotty Hooft; Johannes B Reitsma; Richard D Riley; Karel G M Moons
Journal: BMJ Date: 2017-01-05

Review 4. Comprehensive risk reduction in patients with atrial fibrillation: emerging diagnostic and therapeutic options--a report from the 3rd Atrial Fibrillation Competence NETwork/European Heart Rhythm Association consensus conference.

Authors: Paulus Kirchhof; Gregory Y H Lip; Isabelle C Van Gelder; Jeroen Bax; Elaine Hylek; Stefan Kaab; Ulrich Schotten; Karl Wegscheider; Giuseppe Boriani; Axel Brandes; Michael Ezekowitz; Hans Diener; Laurent Haegeli; Hein Heidbuchel; Deirdre Lane; Luis Mont; Stephan Willems; Paul Dorian; Maria Aunes-Jansson; Carina Blomstrom-Lundqvist; Maria Borentain; Stefanie Breitenstein; Martina Brueckmann; Nilo Cater; Andreas Clemens; Dobromir Dobrev; Sergio Dubner; Nils G Edvardsson; Leif Friberg; Andreas Goette; Michele Gulizia; Robert Hatala; Jenny Horwood; Lukas Szumowski; Lukas Kappenberger; Josef Kautzner; Angelika Leute; Trudie Lobban; Ralf Meyer; Jay Millerhagen; John Morgan; Felix Muenzel; Michael Nabauer; Christoph Baertels; Michael Oeff; Dieter Paar; Juergen Polifka; Ursula Ravens; Ludger Rosin; W Stegink; Gerhard Steinbeck; Panos Vardas; Alphons Vincent; Maureen Walter; Günter Breithardt; A John Camm
Journal: Europace Date: 2011-07-26 Impact factor: 5.214

5. Predictors of arrhythmia recurrence after balloon cryoablation of atrial fibrillation: the value of CAAP-AF risk scoring system.

Authors: Mohamed Sanhoury; Massimo Moltrasio; Fabrizio Tundo; Stefania Riva; Antonio Dello Russo; Michela Casella; Claudio Tondo; Gaetano Fassini
Journal: J Interv Card Electrophysiol Date: 2017-04-18 Impact factor: 1.900

6. Efficacy of catheter ablation of atrial fibrillation beyond HATCH score.

Authors: Ri-Bo Tang; Jian-Zeng Dong; De-Yong Long; Rong-Hui Yu; Man Ning; Chen-Xi Jiang; Cai-Hua Sang; Xiao-Hui Liu; Chang-Sheng Ma
Journal: Chin Med J (Engl) Date: 2012-10 Impact factor: 2.628

7. Outcome parameters for trials in atrial fibrillation: recommendations from a consensus conference organized by the German Atrial Fibrillation Competence NETwork and the European Heart Rhythm Association.

Authors: Paulus Kirchhof; Angelo Auricchio; Jeroen Bax; Harry Crijns; John Camm; Hans-Christoph Diener; Andreas Goette; Gerd Hindricks; Stefan Hohnloser; Lukas Kappenberger; Karl-Heinz Kuck; Gregory Y H Lip; Bertil Olsson; Thomas Meinertz; Silvia Priori; Ursula Ravens; Gerhard Steinbeck; Elisabeth Svernhage; Jan Tijssen; Alphons Vincent; Günter Breithardt
Journal: Europace Date: 2007-09-25 Impact factor: 5.214

8. The HATCH and CHA2DS 2-VASc scores. Prognostic value in pulmonary vein isolation.

Authors: E U Schmidt; R Schneider; J Lauschke; I Wendig; D Bänsch
Journal: Herz Date: 2013-05-18 Impact factor: 1.443

Review 9. Risk Factor Management in Atrial Fibrillation.

Authors: Axel Brandes; Marcelle D Smit; Bao Oanh Nguyen; Michiel Rienstra; Isabelle C Van Gelder
Journal: Arrhythm Electrophysiol Rev Date: 2018-06

Review 10. Heart Failure With Preserved Ejection Fraction and Atrial Fibrillation: Vicious Twins.

Authors: Dipak Kotecha; Carolyn S P Lam; Dirk J Van Veldhuisen; Isabelle C Van Gelder; Adriaan A Voors; Michiel Rienstra
Journal: J Am Coll Cardiol Date: 2016-11-15 Impact factor: 24.094

16 in total

Review 1. Catheter Ablation of Atrial Fibrillation in Heart Failure: from Evidences to Guidelines.

Authors: Arianna Cirasa; Carmelo La Greca; Domenico Pecora
Journal: Curr Heart Fail Rep Date: 2021-04-04

2. Development and Validation of a Novel Prognostic Model Predicting the Atrial Fibrillation Recurrence Risk for Persistent Atrial Fibrillation Patients Treated with Nifekalant During the First Radiofrequency Catheter Ablation.

Authors: Youzheng Dong; Zhenyu Zhai; Bo Zhu; Shucai Xiao; Yang Chen; Anxue Hou; Pengtao Zou; Zirong Xia; Jianhua Yu; Juxiang Li
Journal: Cardiovasc Drugs Ther Date: 2022-06-22 Impact factor: 3.727

3. CHA₂DS₂VASc score as a predictor of ablation success defined by continuous long-term monitoring.

Authors: Graham Lohrmann; Albert Liu; Paul Ziegler; João Monteiro; Nathan Varberg; Rod Passman
Journal: J Interv Card Electrophysiol Date: 2022-08-02 Impact factor: 1.759

4. Plasma carbohydrate antigen-125 for prediction of atrial fibrillation recurrence after radiofrequency catheter ablation.

Authors: Qingya Wang; Chengjing Dang; Haoyu Liu; Jie Hui
Journal: BMC Cardiovasc Disord Date: 2021-08-19 Impact factor: 2.298

5. Machine Learning-Predicted Progression to Permanent Atrial Fibrillation After Catheter Ablation.

Authors: Je-Wook Park; Oh-Seok Kwon; Jaemin Shim; Inseok Hwang; Yun Gi Kim; Hee Tae Yu; Tae-Hoon Kim; Jae-Sun Uhm; Jong-Youn Kim; Jong Il Choi; Boyoung Joung; Moon-Hyoung Lee; Young-Hoon Kim; Hui-Nam Pak
Journal: Front Cardiovasc Med Date: 2022-02-16

6. Machine Learning Prediction Models for Gestational Diabetes Mellitus: Meta-analysis.

Authors: Zheqing Zhang; Luqian Yang; Wentao Han; Yaoyu Wu; Linhui Zhang; Chun Gao; Kui Jiang; Yun Liu; Huiqun Wu
Journal: J Med Internet Res Date: 2022-03-16 Impact factor: 7.076

7. Reduced left atrial cardiomyocyte PITX2 and elevated circulating BMP10 predict atrial fibrillation after ablation.

Authors: Jasmeet S Reyat; Winnie Chua; Victor R Cardoso; Anika Witten; Peter M Kastner; S Nashitha Kabir; Moritz F Sinner; Robin Wesselink; Andrew P Holmes; Davor Pavlovic; Monika Stoll; Stefan Kääb; Georgios V Gkoutos; Joris R de Groot; Paulus Kirchhof; Larissa Fabritz
Journal: JCI Insight Date: 2020-08-20

8. Characterization of Changes in P-Wave VCG Loops Following Pulmonary-Vein Isolation.

Authors: Nuria Ortigosa; Óscar Cano; Frida Sandberg
Journal: Sensors (Basel) Date: 2021-03-09 Impact factor: 3.576

Review 9. Computational models of atrial fibrillation: achievements, challenges, and perspectives for improving clinical care.

Authors: Jordi Heijman; Henry Sutanto; Harry J G M Crijns; Stanley Nattel; Natalia A Trayanova
Journal: Cardiovasc Res Date: 2021-06-16 Impact factor: 10.787

10. Arginine derivatives in atrial fibrillation progression phenotypes.

Authors: Edzard Schwedhelm; Jelena Kornej; Petra Büttner; Martin Bahls; Rainer H Böger; Gerhard Hindricks; Holger Thiele
Journal: J Mol Med (Berl) Date: 2020-06-06 Impact factor: 4.599