Literature DB >> 33769473

Validation of risk scores for ischaemic stroke in atrial fibrillation across the spectrum of kidney function.

Ype de Jong^1,2, Edouard L Fu¹, Merel van Diepen¹, Marco Trevisan³, Karolina Szummer⁴, Friedo W Dekker¹, Juan J Carrero³, Gurbey Ocak^1,5.

Abstract

AIMS: The increasing prevalence of ischaemic stroke (IS) can partly be explained by the likewise growing number of patients with chronic kidney disease (CKD). Risk scores have been developed to identify high-risk patients, allowing for personalized anticoagulation therapy. However, predictive performance in CKD is unclear. The aim of this study is to validate six commonly used risk scores for IS in atrial fibrillation (AF) patients across the spectrum of kidney function. METHODS AND
RESULTS: Overall, 36 004 subjects with newly diagnosed AF from SCREAM (Stockholm CREAtinine Measurements), a healthcare utilization cohort of Stockholm residents, were included. Predictive performance of the AFI, CHADS2, Modified CHADS2, CHA2DS2-VASc, ATRIA, and GARFIELD-AF risk scores was evaluated across three strata of kidney function: normal kidney function [estimated glomerular filtration rate (eGFR) >60 mL/min/1.73 m2], mild CKD (eGFR 30-60 mL/min/1.73 m2), and advanced CKD (eGFR <30 mL/min/1.73 m2). Predictive performance was assessed by discrimination and calibration. During 1.9 years, 3069 (8.5%) patients suffered an IS. Discrimination was dependent on eGFR: the median c-statistic in normal eGFR was 0.75 (range 0.68-0.78), but decreased to 0.68 (0.58-0.73) and 0.68 (0.55-0.74) for mild and advanced CKD, respectively. Calibration was reasonable and largely independent of eGFR. The Modified CHADS2 score showed good performance across kidney function strata, both for discrimination [c-statistic: 0.78 (95% confidence interval 0.77-0.79), 0.73 (0.71-0.74) and 0.74 (0.69-0.79), respectively] and calibration.
CONCLUSION: In the most clinically relevant stages of CKD, predictive performance of the majority of risk scores was poor, increasing the risk of misclassification and thus of over- or undertreatment. The Modified CHADS2 score performed good and consistently across all kidney function strata, and should therefore be preferred for risk estimation in AF patients.

Entities: CellLine Chemical Disease Gene Species

Keywords: Atrial fibrillation; Chronic kidney disease; Ischaemic stroke; Risk score; SCREAM

Mesh：

Year: 2021 PMID： 33769473 PMCID： PMC8046502 DOI： 10.1093/eurheartj/ehab059

Source DB: PubMed Journal: Eur Heart J ISSN： 0195-668X Impact factor: 29.983

See page 1486 for the editorial comment on this article (doi:

Introduction

The prevalence of ischaemic stroke (IS) is increasing and has become a leading cause of morbidity and mortality worldwide. Chronic kidney disease (CKD) is associated with an increased risk of IS via various mechanisms, both specific to CKD (e.g. accelerated atherosclerotic vascular disease) and general risk factors, such as hypertension, diabetes mellitus, dyslipidaemia, and ageing. With an estimated prevalence of 10–15% in the general population, a number that is increasing steadily, CKD may partly explain the high number of strokes., Atrial fibrillation (AF), which is considered the main risk factor for IS both in the general population and in CKD patients, is more commonly reported in this fragile population, an observation that may be related to shared risk factors such as age, diabetes, and hypertension., Risk scores for IS are essential to weigh the risk of IS vs. the risk of treatment-related bleeding and thus deliver patient-tailored therapy. In patients with CKD, this notion is highly relevant since these patients are at increased risk of treatment-related bleeding as well. Typically, most risk scores use clinical parameters (e.g. disease history) in combination with patient-specific characteristics (e.g. age and sex) to compute a risk for IS within a given prediction timeframe. Although widely used risk scores, such as CHADS2 and its updated version CHA2DS2-VASc, are endorsed by current guidelines on IS,, their predictive performance in patients with CKD is largely unknown, as these risk scores have been developed in general AF populations., For incident dialysis patients, however, external validation studies showed poor predictive performance both for risk scores predicting IS and bleeding., Despite these uncertainties, the use of these clinical decision aids is appealing as a seemingly objective tool to standardize the allocation of anticoagulation therapy within CKD care. However, due to the lack of information on the validity of these risk scores in patients beyond the development cohorts of the original studies, their use comes with a risk of misclassification. The aim of the present study is therefore to externally validate multiple commonly used risk scores for IS in a cohort of patients with AF across the spectrum of kidney function.

Methods

This study was reported in line with the TRIPOD guideline.

Study population and baseline definition

We used data from the Stockholm CREAtinine Measurements (SCREAM) project, a healthcare utilization cohort from Stockholm, Sweden. The SCREAM included all Stockholm residents aged ≥18 years who had a measurement of serum creatinine from in- or outpatient care between 2006 and 2011. The SCREAM includes data from about 1.3 million adults, corresponding to 68% of the population of the region for that period. Information on demographics, disease history, vital status, pharmacy-dispensed medication, and healthcare use was obtained by linking to regional and national administrative databases. All subjects with new-onset AF from January 2007 to December 2012 were selected. New-onset AF was defined as the presence of ICD-10 code I48 in any diagnostic position in primary, outpatient specialist or hospital care, with no I48 code between 1997 (when ICD coding started) and 2007. Baseline was defined as the date of first occurrence of AF. Patients were censored at the end of follow-up (31 December 2012), when they moved outside the Stockholm region or died from other causes than IS. Patients with missing data on creatinine were excluded. Since this study utilized only de-identified data, it was not deemed to require informed consent. The study was approved by the regional ethical review boards and the Swedish National Board of Welfare.

Outcome and predictor definitions

Study outcomes were ascertained via linkage with the government-run National Population Registry, which registers all deaths without loss to follow-up, and the National Patient Registry with diagnosis codes for essentially all (>99%) hospitalizations. The study outcome was defined as hospital admission for IS (ICD-10 codes I63x, 169.3, 169.4, and 169.8 in 1st or 2nd diagnostic position) or IS as main cause of death. Estimated glomerular filtration rate [eGFR; mL/min per 1.73 m2, calculated with the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula] was calculated using the most recent measurement prior to AF diagnosis (median 0.28 years). Creatinine was measured in plasma, with either an enzymatic or corrected Jaffe method (alkaline picrate reaction); both methods are traceable to isotope dilution mass spectroscopy standards. Creatinine values <25 or >1500 μmol/L were considered outliers and were discarded. Proteinuria (median 0.87 years prior to AF diagnosis) was measured by either urinary albumin-to-creatinine ratio >30, or a urine dipstick (range: negative, 1, 2, and 3; all positive values were regarded as proteinuria). Information on disease history, including previous stroke, previous bleeding, congestive heart failure, cardiovascular disease, hypertension, and diabetes, was obtained using ICD-10 codes (detailed in Supplementary material online, ). The overall positive predictive value of these diagnoses in the register is about 85–95%. Medication use, including antihypertensive and anti-diabetic drugs, was defined by registered pharmacy dispensations in the 180 days prior to AF diagnosis; for vitamin K antagonist or direct oral anticoagulant usage, dispensations in the 120 days before or up to 60 days after AF diagnosis were evaluated.

Risk scores

Risk scores to be validated were identified from a previous systematic review. Based on availability of predictors, the following risk scores were validated: AFI, CHADS2, Modified CHADS2, CHA2DS2-VASc, ATRIA, and GARFIELD-AF. Scores were validated within the designated timeframe if specified (i.e. the prediction timeframe as specified in the original article), or within the maximum follow-up of the development cohort if no timeframe was specified. We used the same predictor definitions as the original studies where possible. An overview of the included risk scores, the original predictor definitions, and those used in this validation study is presented in the Supplementary material online (‘Risk Scores’).

Statistical analysis

The predictive performance of the included risk scores was assessed by their discrimination and calibration abilities, stratified by CKD stage, using the estimated glomerular filtration rate (eGFR) classification of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria. Normal kidney function was defined as KDIGO G1-2 (eGFR >60 mL/min/1.73 m2), mild CKD as KDIGO G3 (eGFR 30–60 mL/min/1.73 m2), and advanced CKD as KDIGO G4-5 (eGFR <30 mL/min/1.73 m2). Discrimination was assessed by the concordance index (c-index or c-statistic), which reflects how well the risk score distinguishes between patients with and without the outcome of interest. The c-statistic lies between 0.5 and 1.0, which equals pure chance and perfect discrimination, respectively. In general, c-statistic <0.7 is considered poor to moderate, 0.8 is considered good, and >0.9 excellent. For logistic risk scores, an area under the receiver operating curve was calculated. For Cox models, Harrell’s c-statistic was calculated. Calibration describes the agreement between the predicted and actual probabilities of the outcome. It is typically presented in a calibration plot or calibration in the large (population average observed frequency and average predicted probability). In case of ideal calibration, the slope of the calibration curve would be 1 (i.e. a 45 degree line: predicted probability equals observed probability); for calibration in the large, the average observed and predicted probabilities would be equal. When risk scores presented an event rate instead of cumulative incidence, the cumulative incidence was approximated, as done in a previous study (method detailed in Supplement material online [‘Formulae’]). To assess the effect of the prediction timeframe (i.e. the time between baseline and when the outcome can occur, e.g. at 1 year for the CHA2DS2-VASc and GARFIELD-AF scores, and 5 years for the Modified CHADS2 score) on the predictive performance, the included risk scores were sequentially validated for different timeframes at monthly intervals, and c-statistics and calibration in the large were plotted over the entire follow-up duration of SCREAM. This analysis may provide insight into the stability of the performance of risk scores and the dependency on the prediction timeframe. We conducted this analysis since it is not uncommon for clinicians to extrapolate or interpolate the predicted risks over time and we hypothesized that both discrimination and calibration would be highest at the timeframe for which the risk score was developed.

Sensitivity analyses

Three sensitivity analyses were conducted. First, an analysis using a broader, composite outcome definition of IS including transient ischaemic attacks (TIAs; ICD-10 codes detailed in Supplementary material online, ). Second, an analysis stratified on anticoagulation use, which included both vitamin K antagonists and direct oral anticoagulants. Lastly, we validated the included risk scores in subgroups with smaller eGFR ranges than the KDIGO stages to further explore the effect of eGFR on score performance. To compare the calibration in the large of the different risk scores, the mean squared error (MSE) of the average predicted and observed probabilities per eGFR cut-off (n = 11) was calculated. The MSE is the average of the differences between the predicted and observed risks. Lower values indicate a good concordance between these risks, while higher values indicate over- or underprediction, or a combination of both. The methods used to approximate the cumulative incidence and calculate the MSE are further detailed in the Supplementary material online (‘formulae’).

Results

Demographics

Of the 1 372 425 healthcare users in Stockholm included in SCREAM, 39 260 subjects were diagnosed with AF between 2007 and 2011, of which 3256 were excluded because of missing information on eGFR, leaving a total of 36 004 subjects eligible for analysis (Figure ). At a median follow-up of 1.88 years, a total of 3069 (8.5%) IS occurred: 1946 (7.4%) of the 26 249 patients with normal kidney function, 1018 (11.8%) of the 8625 patients with mild CKD, and 105 (9.3%) of the 1130 patients with advanced CKD. The baseline characteristics of the included subjects, together with an overview of the study demographics of the validated risk scores, are presented in Table . Flow chart of patient inclusion in SCREAM (Stockholm CREAtinine Measurements). AF, atrial fibrillation; eGFR, estimated glomerular filtration rate. Baseline characteristics of the validation cohort (SCREAM) and the validated risk scores (AFI, CHADS2, Modified CHADS2, CHA2DS2-VASc, ATRIA, and GARFIELD-AF) CKD categories were defined based on the eGFR classification of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria. Normal kidney function was defined as KDIGO G1-2 (eGFR >60 mL/min/1.73 m2), mild CKD as KDIGO G3 (eGFR 30–60 mL/min/1.73 m2), and advanced CKD as KDIGO G4-5 (eGFR <30 mL/min/1.73 m2). AF, atrial fibrillation; CKD, chronic kidney disease; DOAC, direct oral anticoagulant; eGFR, estimated glomerular filtration rate; SD, standard deviation; VKA, vitamin K antagonist. The AFI risk score was presented in four subgroups, with percentages instead of absolute numbers for baseline characteristics. These were estimated by calculating a weighted mean of these percentages. Median follow-up only given per subgroup. Presented as person-years. Presented as rounded percentages; absolute number estimated. Presented as event rates; no absolute numbers. No information was provided on follow-up duration, but patients were censored after reaching this timepoint. Range of mean follow-up of the included study populations. Defined as CKD stage III-V or eGFR <60 mL/min/1.73 m2.

Discrimination

C-statistics of most risk scores were lower across worsening kidney function categories (Table , Figure ). For the AFI score, c-statistics were 0.68 (95% confidence interval 0.67–0.69) in AF patients with normal kidney function, 0.58 (0.57–0.59) in those with mild CKD and 0.55 (0.51–0.59) in patients with advanced CKD. The c-statistics for the CHADS2 were relatively stable. The Modified CHADS2 score showed the highest and consistent discriminatory abilities in all kidney function groups [0.78 (0.77–0.79), 0.73 (0.71–0.74), and 0.74 (0.69–0.79), respectively]. The CHA2DS2-VASc score showed moderate discrimination in AF patients with normal kidney function, but poor discrimination in mild and advanced CKD. The ATRIA risk score showed good discrimination in AF patients with normal kidney function, but moderate in those with mild and advanced CKD, as did the GARFIELD-AF risk score. Visualization of the predictive performance, stratified by three estimated glomerular filtration rate (eGFR) categories. CKD, chronic kidney disease. (Left) C-statistic (dot) with 95% confidence interval (bar). (Middle) Calibration in the large, showing the average observed (asterisk) and predicted (bar) probabilities of ischaemic stroke. (Right) The ratio of the predicted/observed risks—ratios above one indicate overprediction, ratios below one underprediction. Differences in the observed risks are due to the prediction timeframe of the validated risk scores and calculation methods (i.e. Cox or logistic). Overview of the predictive performance of the six included and externally validated risk scores AFI CHADS2 Modified- CHADS2 CHA2DS2-VASc ATRIA GARFIELD-AF Discrimination (c-statistic) and calibration in the large (observed vs. predicted) stratified by CKD stages. CKD categories were defined based on the eGFR classification of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria. Normal kidney function was defined as KDIGO G1-2 (eGFR >60 mL/min/1.73 m2), mild CKD as KDIGO G3 (eGFR 30–60 mL/min/1.73 m2), and advanced CKD as KDIGO G4-5 (eGFR <30 mL/min/1.73 m2). Risk scores were validated at the timeframe specified in the article, or if no specification was given, at the maximum follow-up. CKD, chronic kidney disease; eGFR, estimated glomerular filtration rate; HS, haemorrhagic stroke; IS, ischaemic stroke; NS, not specified; Obv., observed; Pred., predicted; SE, systemic embolus; TIA, transient ischaemic attack.

Calibration

Most risk scores showed modest calibration, largely independent of kidney function (Figures ; Table ). For the AFI score, the calibration in the large showed overprediction in all kidney function categories. The CHADS2 score underpredicted risks in the three kidney function categories. The Modified CHADS2 score showed good calibration for the normal eGFR and mild CKD group, but slight overprediction in the advanced CKD group, especially so for the higher-risk patients. The CHA2DS2-VASc score underpredicted risks. The ATRIA score underpredicted the risk of IS. Finally, the GARFIELD-AF underpredicted the risk of IS in the normal eGFR category, but overpredicted the risks in mild and advanced CKD. The calibration plots illustrated the inaccuracy of most risk scores for patients with a high risk of IS, regardless of CKD stage, and also underlined the differences in the broadness of the prediction range (i.e. the range of possible predicted risks) (0–0.077 for CHA2DS2-VASC to 0.002–0.521 for GARFIELD-AF). Calibration plots showing observed and predicted probabilities of ischaemic stroke in patients with chronic kidney disease (CKD) and atrial fibrillation. (A) AFI; (B) Modified CHADS2; (C) ATRIA; (D) CHADS2; (E) CHA2DS2-VASc; (F) GARFIELD-AF. eGFR, estimated glomerular filtration rate.

Effect of the prediction timeframe on predictive performance

C-statistics were relatively stable over time, with most risk scores showing only a mild decrease in c-statistic, and subsequently stabilization (Figure , upper panel; stratified for CKD stages see Supplementary material online, ). For calibration in the large, the optimal prediction timeframe was shorter than in the development studies for CHADS2 (optimal timepoint at 6 months, developed for 12 months), CHA2DS2-VASc (1 and 12 months, respectively), ATRIA (optimal at 17 months, validated at 29 months) and only marginally so for GARFIELD-AF (optimal at 9 months, developed for 12 months), and longer for the AFI (optimal at 49 months, validated at 28 months). The Modified CHADS2 score (developed for 60 months) did not reach an optimal timepoint within 72 months (Figure , lower panel; stratified for CKD stages Supplementary material online, ). Effect of prolonging the prediction timeframe on the predictive performance in patients with atrial fibrillation, not stratified for CKD stage. (Upper panel) The effect on discrimination (c-statistic with confidence interval); (lower panel) the effect on calibration in the large. Risk scores were validated 72 times; each time prolonging the prediction timeframe with 1 month until the maximum follow-up of 72 months was reached. Dotted cross-lines indicate the prediction timeframe for which the risk score was developed and the corresponding predictive performance, optimal calibration in the large indicated with T, followed by time in months. Stratification by chronic kidney disease stage is presented in Supplementary material online, . For discrimination, when validated for IS and TIA instead of IS only (sensitivity analysis 1, detailed in Supplementary material online, ), outcomes were comparable to the main analysis. Stratification by anticoagulation use (sensitivity analysis 2, Supplementary material online, ; ) showed similar results, indicating independence from anticoagulation usage, but with broader confidence intervals due to smaller sample sizes. For most risk scores, there was a trend towards poorer discrimination in patients with lower eGFR compared with higher eGFR (sensitivity analysis 3, Supplementary material online, ). The Modified CHADS2 score showed consistently good discriminatory abilities, both in the main analysis and in sensitivity analyses. For calibration, the findings of the main analysis were consistent with the sensitivity analyses 1 and 2 (Supplementary material online, ). The difference between the mean observed and predicted probabilities (calibration in the large) over the eGFR strata (sensitivity analysis 3, ) was stable, as illustrated with the low MSE values, indicating independence of the accuracy of risk scores from eGFR (Supplementary material online, ). Modification of the outcome, using only ICD-10 I63x, showed similar predictive performance, though the number of events decreased to 2572 with corresponding broader confidence intervals for the c-statistics (Supplementary material online, ).

Discussion

In this cohort study of 36 004 patients with AF, we externally validated six commonly used risk scores for IS. Although most risk scores showed moderate to good discrimination in patients with normal kidney function, discrimination was less accurate in moderate and advanced CKD. Calibration was largely independent of kidney function, and most risk scores either over- or underpredicted the risk of IS in one or more CKD categories. The broadness of the prediction range (i.e. the scores’ ability to differentiate between low and high risks given the range of possible predicted risks) differed greatly between risk scores. The effect of the prediction timeframe influenced the predictive performance: discrimination showed an initial decrease for the shorter timeframes, but stabilized thereafter, indicating that, with regard to discrimination, risk scores can be used to predict IS on a longer or shorter prediction timeframe than designed in the original studies. For calibration, the optimal prediction timepoint differed substantially with the timepoint in the original study of most risk scores. Our results support the use of the Modified CHADS2 score in clinical practice as it showed good and consistent discrimination and calibration in all three kidney function categories. Given the increasing prevalence of CKD, and the frequent use of risk scores for IS in the care of patients with CKD, there is surprisingly little information on the predictive performance in this high-risk population. Except for GARFIELD-AF and ATRIA, none of the validated risk scores included patients with CKD in their development cohorts, or CKD-specific predictors in their risk score. Furthermore, external validation—the cornerstone for assessment of predictive performance in ‘real-world’ patients—of these risk scores is essential, but seldom performed. So far only a few studies included patients with CKD in their validation cohorts, with conflicting results: one large study on 14 264 patients with AF and eGFR >30 mL/min validated both the CHA2DS2-VASc and CHADS2 scores showing poor discrimination (c-statistic of 0.578 and 0.575, respectively), but did not present information on calibration. In another study on 307 351 patients with AF, these two risk scores performed considerably better and more in line with our results (c-statistics of 0.71 and 0.72, respectively). However, again no information on calibration was reported. Finally, several studies with substantial smaller sample sizes evaluated the same risk scores, yielding comparable results, but as with the previous studies, none calculated the agreement between observed and predicted risks. Yet, from a clinical point of view, it could be argued that this calibration, which indicates the precision of the predicted absolute risks, is clinically more important than discrimination in the setting of weighing the risks of IS and severe bleeding due to anticoagulation. This is especially relevant for patients with AF and CKD, as both the risks of IS and severe bleeding are increased. For the clinician facing such a patient, using a risk score may seem an objective method to decide on anticoagulation therapy. However, as our study demonstrates, most of the validated risk scores for IS in this clinically relevant population either substantially over- or underpredict this risk. Although the Modified CHADS2 score showed reasonable performance and would currently be the preferred risk estimation tool for patients with AF and CKD, ideally new risk scores should be developed and validated in this high-risk population. Prediction of bleeding risk appears to be equally influenced by kidney function, though data are only available for patients on dialysis. This effect on predictive performance is not without consequence. Underprediction of IS risk, when weighed with bleeding risk, will result in less patients being treated with anticoagulation and consequently, an increased IS incidence, while overprediction will result in overtreatment and increased bleeding incidence. Regardless, most clinical guidelines on IS prevention recommend using the CHA2DS2-VASc score,,—which showed poor predictive performance in patients with and without CKD alike. Finally, the AFI, Modified CHADS2, ATRIA, and GARFIELD-AF risk scores have not been validated in patients with CKD. Predictive performance decreased in the more clinically relevant groups of mild and advanced CKD, especially so for discrimination. Two mechanisms may have contributed to this. First, most risk scores were developed in general AF patients, and most of these studies did not include patients with CKD in their development cohort. When validating these risk scores that were developed in such heterogeneous populations in a more homogeneous population, such as patients with CKD, predictive performance—and especially discrimination—may drop. While the included predictors may predict well in general AF cohorts, other more CKD-specific predictors of IS may better discriminate in this relatively homogeneous population. These include for example eGFR (which was used in ATRIA and GARFIELD-AF), proteinuria (used in ATRIA), primary kidney disease, presence of atherosclerotic vascular disease, or various biomarkers (e.g. myeloperoxidase or fibroblast growth factor-23, amongst others). Second, although we expected a comparable drop in c-statistics for the even more homogeneous patients with advanced CKD, the c-statistics of these groups were roughly equal. This may have been due to chance however: the absolute number of events in the advanced CKD group was smaller, and the level of precision consequently lower. Finally, while we ensured conformity between the predictor definitions of the original studies and our validation cohort, we deliberately validated these risk scores for the same outcome definition of IS. Indeed, most studies were developed to predict the probability of a composite outcome (e.g. CHA2DS2-VASc predicts a composite outcome of IS, TIA, peripheral and pulmonary embolisms). Although predictive performance might improve from validating each risk score for their specific outcome, comparability of these risk scores would then become impossible, especially when the composite outcome includes counterintuitive components, such as IS and haemorrhagic stroke. Another reason for using this outcome definition is the clinical usage: these risk scores are usually used for prediction of IS alone in the clinical setting. To test this effect, we included TIA as a composite outcome in a sensitivity analysis, which was included in most risk scores as part of the outcome. As this did not alter the results, we do not expect the effect of this outcome definition to be substantial.

Strengths and limitations

Our study has several strengths, but also limitations. The main strength is our large and well-defined source population, which allowed for a head-to-head comparison of multiple risk scores in well-characterized participants. Our study also provides information on calibration - the agreement between the predicted and observed risks - information that is essential for weighing IS and bleeding risk. Consistently with previous studies,, patients with more severe CKD stages in our study had a markedly increased risk of stroke. A first limitation of our study is the large proportion of anticoagulation users in our population. Stratification for anticoagulation users and non-users yielded similar results, although discrimination was slightly poorer in anticoagulation users. Second, the cut-offs for the CKD groups might have influenced the predictive performance. Although of clinical relevance and in line with the KDIGO classification, we aimed to further explore the correlation between discrimination and kidney function. When stratified in smaller groups than the three kidney function groups, it was again shown that for most risk scores discrimination decreased with worsening of kidney function, while calibration remained relatively stable. Third, Swedish regulations do not allow the recording of ethnicity in registers, and we assumed our population to be primarily Caucasian. Disparities in IS risk may be explained by ethnicity, for example, blacks have a two-fold increased risk of stroke compared with non-Hispanic white adults, and the predictive performance of two different scores (QRISK2 and Framingham scores) was indeed influenced by ethnicity. In line with recent debates on the adequacy of the correction factor for African American ethnicity in eGFR calculation, extrapolation of our results to other ethnicities should be done with caution. Fourth, because the prediction timeframes differed for the validated risk scores, we were unable to formally compare the predictive performance. Fifth, the use of routinely collected laboratory data may be a source of bias: for example proteinuria, a predictor used for one study (ATRIA), is not routinely measured, and measurements are performed in persons at risk. Finally, in daily clinical practice, it is not uncommon to categorize or dichotomize risk scores (e.g. CHA2DS2-VASc is often categorized in zero points, one point, or greater than one). Dichotomization results in loss of information and our sensitivity analysis showed poor performance when validated in commonly used categories. Furthermore, most risk scores were updated many times after publication. We decided to validate the scores as intended by the authors of the original scores, instead of choosing one of the many updates or categorizations, although this may not represent the clinical use of these risk scores.

Implications and conclusion

Our study demonstrated moderate to poor predictive performance of various risk scores for IS in patients with AF and CKD and emphasizes how difficult this prediction is, underlining the statistical work that needs to be done in the field. For most risk scores, discriminatory abilities decreased in clinically relevant patients with mild and advanced CKD. However, calibration, which is essential for weighing the risk of IS and treatment-related bleeding, was less dependent on kidney function but still most risk scores either over- or underpredicted IS risk, or a combination of both. Prediction of IS risk should be accurate and weighed against the risk of treatment-related bleeding. To this aim, either new scores incorporating CKD-specific predictors should be developed, or alternatively, existing and externally validated scores should be combined to increase predictive performance in this clinically relevant population, for example using ensemble modelling. As most risk scores used different prediction timeframes, this was unfeasible in our study. By conducting a head-to-head comparison of multiple scores , this study provides the clinician with information on which risk score perform well for different prediction timeframes. The Modified CHADS2 score showed the best and most consistent predictive performance in all CKD stages and we suggest it is the preferred risk score to apply in clinical practice. These findings can inform the choice of risk scores in clinical practice, particularly in patients with mild to severe forms of CKD, which have not always been considered when these scores were developed.

Supplementary material

Supplementary material is available at European Heart Journal online. Click here for additional data file.

Table 1

Baseline characteristics of the validation cohort (SCREAM) and the validated risk scores (AFI, CHADS2, Modified CHADS2, CHA2DS2-VASc, ATRIA, and GARFIELD-AF)

Study	Validation cohort	AFI Investigators 1994²⁰	Gage 2001²¹	Rietbrock 2008²²	Lip 2010²³	Singer 2013²⁴	Fox 2017²⁵
Risk score	SCREAM	AFI	CHADS₂	Modified CHADS₂	CHA₂DS₂-VASc	ATRIA^c	GARFIELD-AF
Total participants	36 004	4253^a	1733	305 566	1084	10 927	38 935
Total events, n (%)	3069 (8.52)	106 (2.49)	94 (5.42)	19 925 (6.52)	25 (2.31)	685 (6.27)	473 (1.21)
Median follow-up, years	1.88	1.2–2.3^g	1.0	2.46–2.74^b	1^f	2.4	1^f
Age, years, mean (SD)	74.84 (12.80)	69.4^a	81	—	66	— ^c	71.0
Male sex, n (%)	18 891 (52.5)	2977 (70)^a	728 (42)	157 202 (48.6)	642 (59.2)	—^c	21 628 (55.5)
Race (Caucasian), n (%)	—	3930 (92.4)^a	—	—	—	—	24 157 (62.0)
CKD, n (%)
Normal eGFR	26 249 (72.9)	—	—	—	—	—^c	—
Mild CKD	8625 (24.0)	—	—	—	—	—^c	—
Advanced CKD	1130 (3.1)	—	—	—	—	—^c	—
Other	—	—	—	—	—	—^c	4038 (12.0)^h
Proteinuria, n (%)	2411 (6.7)	—	—	—	—	—^c
Diabetes mellitus, n (%)	7000 (19.4)	595 (14)^a	399 (23)^d	—^e	187 (17.3)	—^c	8558 (22.0)
AF, n (%)	36 004 (100)	4253 (100)^a	1733 (100)	51 807 (17.0)	1084 (100)	10 927 (100)	38 935 (100)
Heart failure, n (%)	9 340 (25.9)	940 (22.1)^a	970 (56)^d	—^e	253 (23.5)	—^c	8752 (22.5)
Hypertension, n (%)	21 966 (61.0)	1927 (45.3)^a	970 (56)^d	—^e	729 (67.3)	—^c	30 435 (78.2)
Previous stroke, n (%)	4 608 (12.8)	264 (6.2)^a	433 (25)^d	—	45 (4.2)	—^c	3030 (7.8)
Peripheral vascular disease, n (%)	3 069 (8.5)	404 (9.5)^a	—	—	62 (5.8)	—^c	2212 (5.7)
Ischaemic heart disease, n (%)	9 814 (27.3)	519 (12.2)^a	—	—^e	412 (38.4)	—^c	—
Previous bleeding, n (%)	4 969 (13.8)	—	—	—	—	—	1024 (2.6)
Antithrombotic drug, n (%)
VKA usage	14 526 (40.3)	2 113 (49.7)	0 (0)	—	0 (0)	0 (0)	16 491 (42.4)
DOAC usage	217 (0.6)	—	—	—	—	—	8804 (22.6)
Antiplatelet	23 321 (64.8)	Uncertain number	529 (31)	—	802 (74.0)	—	14 084 (36.2)

CKD categories were defined based on the eGFR classification of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria. Normal kidney function was defined as KDIGO G1-2 (eGFR >60 mL/min/1.73 m2), mild CKD as KDIGO G3 (eGFR 30–60 mL/min/1.73 m2), and advanced CKD as KDIGO G4-5 (eGFR <30 mL/min/1.73 m2).

AF, atrial fibrillation; CKD, chronic kidney disease; DOAC, direct oral anticoagulant; eGFR, estimated glomerular filtration rate; SD, standard deviation; VKA, vitamin K antagonist.

The AFI risk score was presented in four subgroups, with percentages instead of absolute numbers for baseline characteristics. These were estimated by calculating a weighted mean of these percentages.

Median follow-up only given per subgroup.

Presented as person-years.

Presented as rounded percentages; absolute number estimated.

Presented as event rates; no absolute numbers.

No information was provided on follow-up duration, but patients were censored after reaching this timepoint.

Range of mean follow-up of the included study populations.

Defined as CKD stage III-V or eGFR <60 mL/min/1.73 m2.

Table 2

Overview of the predictive performance of the six included and externally validated risk scores

		Risk score characteristics			Validation
Study	Outcome	Timeframe (validated)	Original c-statistic	Normal eGFR			Mild CKD			Advanced CKD
Study	Outcome	Timeframe (validated)	Original c-statistic	C-statistic	Obs.	Pred.	C-statistic	Obs.	Pred.	C-statistic	Obs.	Pred.
AFI²⁰	IS, TIA, SE	NS (2.3 y)	—	0.68 (0.67–0.69)	0.076	0.114	0.58 (0.57–0.59)	0.130	0.147	0.55 (0.51–0.59)	0.127	0.150
CHADS₂²¹	IS, TIA	NS (1.0 y)	0.82	0.78 (0.77–0.80)	0.047	0.039	0.70 (0.68–0.72)	0.084	0.055	0.71 (0.66–0.76)	0.086	0.063
Modified- CHADS₂²²	IS, HS	5 y (5 y)	0.72	0.78 (0.77–0.79)	0.124	0.150	0.73 (0.71–0.74)	0.204	0.231	0.74 (0.69–0.79)	0.225	0.238
CHA₂DS₂-VASc²³	IS, TIA, SE	1 y (1 y)	0.61	0.70 (0.69–0.71)	0.043	0.022	0.60 (0.58–0.62)	0.074	0.027	0.58 (0.52–0.64)	0.065	0.028
ATRIA²⁴	IS, SE	NS (2.4 y)	0.73	0.78 (0.76–0.79)	0.078	0.055	0.68 (0.66–0.70)	0.133	0.097	0.66 (0.60–0.72)	0.130	0.120
GARFIELD-AF²⁵	IS, TIA, SE	1 y (1 y)	0.69	0.76 (0.75–0.77)	0.047	0.029	0.67 (0.65–0.69)	0.084	0.104	0.70 (0.64–0.76)	0.086	0.108

Discrimination (c-statistic) and calibration in the large (observed vs. predicted) stratified by CKD stages. CKD categories were defined based on the eGFR classification of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria. Normal kidney function was defined as KDIGO G1-2 (eGFR >60 mL/min/1.73 m2), mild CKD as KDIGO G3 (eGFR 30–60 mL/min/1.73 m2), and advanced CKD as KDIGO G4-5 (eGFR <30 mL/min/1.73 m2). Risk scores were validated at the timeframe specified in the article, or if no specification was given, at the maximum follow-up.

CKD, chronic kidney disease; eGFR, estimated glomerular filtration rate; HS, haemorrhagic stroke; IS, ischaemic stroke; NS, not specified; Obv., observed; Pred., predicted; SE, systemic embolus; TIA, transient ischaemic attack.

43 in total

1. Heart disease and stroke statistics--2012 update: a report from the American Heart Association.

Authors: Véronique L Roger; Alan S Go; Donald M Lloyd-Jones; Emelia J Benjamin; Jarett D Berry; William B Borden; Dawn M Bravata; Shifan Dai; Earl S Ford; Caroline S Fox; Heather J Fullerton; Cathleen Gillespie; Susan M Hailpern; John A Heit; Virginia J Howard; Brett M Kissela; Steven J Kittner; Daniel T Lackland; Judith H Lichtman; Lynda D Lisabeth; Diane M Makuc; Gregory M Marcus; Ariane Marelli; David B Matchar; Claudia S Moy; Dariush Mozaffarian; Michael E Mussolino; Graham Nichol; Nina P Paynter; Elsayed Z Soliman; Paul D Sorlie; Nona Sotoodehnia; Tanya N Turan; Salim S Virani; Nathan D Wong; Daniel Woo; Melanie B Turner
Journal: Circulation Date: 2011-12-15 Impact factor: 29.690

2. Clinical Implications of Removing Race From Estimates of Kidney Function.

Authors: James A Diao; Gloria J Wu; Herman A Taylor; John K Tucker; Neil R Powe; Isaac S Kohane; Arjun K Manrai
Journal: JAMA Date: 2021-01-12 Impact factor: 56.272

3. 2012 focused update of the ESC Guidelines for the management of atrial fibrillation: an update of the 2010 ESC Guidelines for the management of atrial fibrillation--developed with the special contribution of the European Heart Rhythm Association.

Authors: A John Camm; Gregory Y H Lip; Raffaele De Caterina; Irene Savelieva; Dan Atar; Stefan H Hohnloser; Gerhard Hindricks; Paulus Kirchhof
Journal: Europace Date: 2012-08-24 Impact factor: 5.214

4. A systematic review and external validation of stroke prediction models demonstrates poor performance in dialysis patients.

Authors: Ype de Jong; Chava L Ramspek; Vera H W van der Endt; Maarten B Rookmaaker; Peter J Blankestijn; Robin W M Vernooij; Marianne C Verhaar; Willem Jan W Bos; Friedo W Dekker; Gurbey Ocak; Merel van Diepen
Journal: J Clin Epidemiol Date: 2020-03-30 Impact factor: 6.437

5. Cardiovascular risk prediction in chronic kidney disease patients.

Authors: Santiago Cedeño Mora; Marian Goicoechea; Esther Torres; Úrsula Verdalles; Ana Pérez de José; Eduardo Verde; Soledad García de Vinuesa; José Luño
Journal: Nefrologia Date: 2017-05-08 Impact factor: 2.033

6. Disparities in stroke incidence contributing to disparities in stroke mortality.

Authors: Virginia J Howard; Dawn O Kleindorfer; Suzanne E Judd; Leslie A McClure; Monika M Safford; J David Rhodes; Mary Cushman; Claudia S Moy; Elsayed Z Soliman; Brett M Kissela; George Howard
Journal: Ann Neurol Date: 2011-03-17 Impact factor: 10.422

7. Net clinical benefit of antithrombotic therapy in patients with atrial fibrillation and chronic kidney disease: a nationwide observational cohort study.

Authors: Anders Nissen Bonde; Gregory Y H Lip; Anne-Lise Kamper; Peter Riis Hansen; Morten Lamberts; Kristine Hommel; Morten Lock Hansen; Gunnar Hilmar Gislason; Christian Torp-Pedersen; Jonas Bjerring Olesen
Journal: J Am Coll Cardiol Date: 2014-12-16 Impact factor: 24.094

Review 8. Systematic DOACs oral anticoagulation in patients with atrial fibrillation and chronic kidney disease: the nephrologist's perspective.

Authors: Maura Ravera; Elisabetta Bussalino; Maria Fusaro; Luca Di Lullo; Filippo Aucella; Ernesto Paoletti
Journal: J Nephrol Date: 2020-03-21 Impact factor: 3.902

Review 9. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.

Authors: Gary S Collins; Johannes B Reitsma; Douglas G Altman; Karel G M Moons
Journal: BMJ Date: 2015-01-07

10. The Stockholm CREAtinine Measurements (SCREAM) project: protocol overview and regional representativeness.

Authors: Björn Runesson; Alessandro Gasparini; Abdul Rashid Qureshi; Olof Norin; Marie Evans; Peter Barany; Björn Wettermark; Carl Gustaf Elinder; Juan Jesús Carrero
Journal: Clin Kidney J Date: 2015-11-14

3 in total

1. Chronic Kidney Disease with Mild and Mild to Moderate Reduction in Renal Function and Long-Term Recurrences of Atrial Fibrillation after Pulmonary Vein Cryoballoon Ablation.

Authors: Giuseppe Boriani; Saverio Iacopino; Giuseppe Arena; Paolo Pieragnoli; Roberto Verlato; Massimiliano Manfrin; Giulio Molon; Giovanni Rovaris; Antonio Curnis; Giovanni Battista Perego; Antonio Dello Russo; Maurizio Landolina; Marco Vitolo; Claudio Tondo
Journal: J Cardiovasc Dev Dis Date: 2022-04-21

2. Person centred care provision and care planning in chronic kidney disease: which outcomes matter? A systematic review and thematic synthesis of qualitative studies : Care planning in CKD: which outcomes matter?

Authors: Ype de Jong; Esmee M van der Willik; Jet Milders; Yvette Meuleman; Rachael L Morton; Friedo W Dekker; Merel van Diepen
Journal: BMC Nephrol Date: 2021-09-13 Impact factor: 2.388

Review 3. Appraising prediction research: a guide and meta-review on bias and applicability assessment using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Authors: Ype de Jong; Chava L Ramspek; Carmine Zoccali; Kitty J Jager; Friedo W Dekker; Merel van Diepen
Journal: Nephrology (Carlton) Date: 2021-07-08 Impact factor: 2.358

3 in total