Literature DB >> 34958097

External Validation of Clinical Prediction Models in Unilateral Primary Aldosteronism.

Davis Sam¹, Gregory A Kline², Benny So³, Gregory L Hundemer⁴, Janice L Pasieka⁵, Adrian Harvey⁵, Alex Chin^6,7, Stefan J Przybojewski³, Cori E Caughlin³, Alexander A Leung^2,8.

Abstract

BACKGROUND: Targeted treatment of primary aldosteronism (PA) is informed by adrenal vein sampling (AVS), which remains limited to specialized centers. Clinical prediction models have been developed to help select patients who would most likely benefit from AVS. Our aim was to assess the performance of these models for PA subtyping.
METHODS: This external validation study evaluated consecutive patients referred for PA who underwent AVS at a tertiary care referral center in Alberta, Canada during 2006-2018. In alignment with the original study designs and intended uses of the clinical prediction models, the primary outcome was the presence of lateralization on AVS. Model discrimination was evaluated using the C-statistic. Model calibration was assessed by comparing the observed vs. predicted probability of lateralization in the external validation cohort.
RESULTS: The validation cohort included 342 PA patients who underwent AVS (mean age, 52.1 years [SD, 11.5]; 201 [58.8%] male; 186 [54.4%] with lateralization). Six published models were assessed. All models demonstrated low-to-moderate discrimination in the validation set (C-statistics; range, 0.60-0.72), representing a marked decrease compared with the derivation sets (range, 0.80-0.87). Comparison of observed and predicted probabilities of unilateral PA revealed significant miscalibration. Calibration-in-the-large for every model was >0 (range, 0.35-1.67), signifying systematic underprediction of lateralizing disease. Calibration slopes were consistently <1 (range, 0.35-0.87), indicating poor performance at the extremes of risk.
CONCLUSIONS: Overall, clinical prediction models did not accurately predict AVS lateralization in this large cohort. These models cannot be reliably used to inform the decision to pursue AVS for most patients.

Entities: Chemical

Keywords: adrenal vein sampling; blood pressure; decision rules; hypertension; prediction models; primary aldosteronism

Mesh：

Substances：
Aldosterone

Year: 2022 PMID： 34958097 PMCID： PMC8976177 DOI： 10.1093/ajh/hpab195

Source DB: PubMed Journal: Am J Hypertens ISSN： 0895-7061 Impact factor: 2.689

Primary aldosteronism (PA) is a common cause of hypertension, affecting at least 10% of all people with high blood pressure.[1] Distinguishing between unilateral vs. bilateral disease helps to guide treatment. Specifically, adrenalectomy is preferred for patients with unilateral disease, whereas mineralocorticoid receptor antagonist therapy is recommended for patients with bilateral disease.[2-5] Subtyping is one of the most challenging diagnostic steps, owing to the unreliability of diagnostic imaging and limited availability of adrenal vein sampling (AVS),[2,6-10] creating a large care gap.[11] Consequently, multiple clinical prediction models have been proposed to simplify the workup by helping to prioritize patients most likely to benefit from AVS.[12-19] These frequently incorporate demographic, laboratory, and/or imaging characteristics to stratify patients based on the probability of having lateralizing disease. However, their performance in clinical practice remains unclear, as previous evaluations of these models have generally been limited by nonstandardized reporting and the absence of key measures of discrimination or calibration.[14,15,18-22] As performance is commonly overestimated at the development stage, external validation of a prediction model is requisite for determining its generalizability before it can be incorporated into clinical practice.[23] Addressing this, we sought to externally validate the performance of multiple published clinical prediction models for subtyping in a large and diverse cohort of patients with PA, using AVS as a reference standard, exactly as the models intended. Specifically, we evaluated discrimination and calibration to determine whether each model provided discriminate and reliable AVS lateralization predictions in an independent dataset.

METHODS

This study was reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement[24] and was approved by the University of Calgary Conjoint Health Research Ethics Board.

Validation study population

Data were prospectively collected. We identified all consecutive patients who underwent AVS in the workup of PA from January 2006 to May 2018 using the regional AVS database in Calgary, AB, Canada. This database has been used in numerous studies evaluating PA workup and contains detailed diagnostic information on all AVS performed in Southern Alberta.[9,25,26] The catchment includes Alberta, the interior of British Columbia, and Saskatchewan, representing an ethnically and sociodemographically diverse population in western Canada. Patients were included if they had a diagnosis of PA (based on an elevated aldosterone–renin ratio [ARR] >550 pmol/l per ng/ml/h or >60 pmol/l per mIU/l and high-probability features of PA [resistant hypertension, adrenal mass, spontaneous hypokalemia, and/or diuretic-induced hypokalemia]) and successful AVS, as defined by cannulation of both adrenal veins.[25,27] Prior to AVS, potassium-sparing diuretics were stopped for at least 4 weeks and other interfering medications were discontinued for 2 weeks; alpha blockers, calcium channel blockers, and/or hydralazine were instead used for blood pressure control.[2]

Outcome

The study outcome was the presence (or absence) of unilateral PA as defined by AVS based on a standardized protocol and interpretation criteria. Lateralization with AVS was selected to facilitate comparisons with other studies and to align with the original intention of these clinical prediction models. Post-treatment outcomes, such as biochemical or clinical response to adrenalectomy,[28] were not considered because these were not relevant for the intended use of the AVS prediction models being studied; rather, post-treatment outcomes would only be useful for assessing whether models designed to predict postoperative success are effective at prioritizing patients for surgical referral (which is entirely different than and outside the scope of the present study). We have previously published that our AVS interpretation criteria and subsequent surgical decisions are associated with excellent outcomes.[26] A selectivity index of ≥2:1 at baseline or ≥3:1 following cosyntropin stimulation (250 µg cosyntropin bolus followed by 6.25 µg infusion over 15 minutes) indicated successful cannulation during AVS.[26] The presence of unilateral aldosterone hypersecretion was based on a lateralization index of >3:1, comparing the dominant to the nondominant adrenal gland, using unstimulated aldosterone–cortisol ratios.[9,26]

Clinical prediction models

We defined a “clinical prediction model” as a model that combined 2 or more independent variables to obtain an estimate of the probability of having unilateral or bilateral subtypes for PA in accordance with our institutional AVS interpretation practice, as above. A sensitivity analysis was also performed where we applied the same AVS selectivity index and lateralization index thresholds used in the original studies for each model to define successful AVS and lateralization, respectively, and then reassessed model performance. We searched MEDLINE from inception to November 2020 using medical subject headings (“hyperaldosteronism” and “clinical decision rules”) and author-supplied keywords (“prediction score,” “prediction model,” “subtype prediction”). Reference lists of included articles were also hand-searched to identify other relevant studies. Studies were considered for inclusion if they reported model derivation using original data. Those that incorporated variables that were not routinely collected at our institution (e.g., ARR following captopril challenge) were excluded, as these models could not be externally validated with our data. We identified 6 models for inclusion.[12-14,16-18] We additionally assessed a prediction rule adopted by several societies suggesting that the combination of age under 35 years, hypokalemia, plasma aldosterone concentration (PAC) >30 ng/dl, and a unilateral adrenal adenoma >1.0 cm is specific for unilateral disease.[2,29]

Predictor variables

Based on the clinical prediction models identified, information on potential predictors of lateralization was retrieved from clinical records. These included age (at the time of AVS), sex, and baseline measures of serum potassium, renal function, PAC, and ARR. The lowest serum potassium on record was used to define hypokalemia. PAC was measured by either solid-phase radioimmunoassay (Siemens Coat-A-Count Aldosterone assay; Siemens Healthcare Diagnostics, Tarrytown, NY) or chemiluminescent immunoassay (Diasorin Liaison XL platform; Diasorin, Mississauga, ON). Both assays performed similarly across the range of measurements.[27] Renin was variably reported as plasma renin activity (PRA), using the GammaCoat PRA I125 assay (Diasorin, Stillwater, MN), or as direct renin concentration (DRC), using the Diasorin Liaison XL platform. To facilitate comparisons, DRC was converted to PRA (DRC of 14.1 mIU/l = PRA of 1 ng/ml/h).[27] The highest ARR prior to AVS (in the absence of confounding medications) was used. Finally, computed tomography and magnetic resonance imaging reports were reviewed to determine the presence of adrenal nodules. Where appropriate, we restricted the analysis to patients without visible adrenal nodularity >1.0 cm to better match the population where the clinical prediction models were originally derived.[16,17]

Statistical analysis

We evaluated model performance in predicting unilateral PA.[23,30] For consistency, reverse coding of the outcome was used for models designed to predict bilateral disease.[16,18] For each model, we restricted our analysis to individuals who had data for all the predictors of interest for complete case analysis. Discrimination was examined by calculating the C-statistic (where a value of 1.0 represents perfect discrimination and 0.5 reflects no discriminative ability). We considered a C-statistic <0.60 to indicate poor discrimination, 0.60–0.75 to represent moderate discrimination, and >0.75 to be acceptable discrimination.[31] Calibration was assessed using locally estimated scatterplot smoothing plots, comparing observed and predicted probabilities of lateralization (where perfect calibration is represented as a 45° straight line with a slope of one and an intercept of zero).[23,32] Whenever possible, the expected probability of having the unilateral subtype was calculated using the published intercept and regression coefficients from the original model,[13] where the predicted probability was equal to 1/(1 + e−linear predictor). Otherwise, the predicted probability was based on published frequencies of the outcome in the derivation dataset for each stratum of risk,[12,17,18] or was estimated from the reported sensitivities and specificities given for at least 2 strata.[16] Calibration assessment was not possible for 1 model because none of the above were available.[14] The calibration slope was calculated by fitting a logistic regression model using the linear predictor from the original model as an independent variable then estimating the corresponding regression coefficient.[33] The calibration intercept (“calibration-in-the-large”) was determined by fixing the calibration slope at one and estimating the corresponding model intercept.[33] Additionally, we recalibrated the widely cited Küpers model by adjusting the model intercept to our local dataset.[13,34] The model intercept reflects the baseline risk of the outcome, which may be different between populations.[35] By updating the intercept, the model may perform better with improved calibration, provided that the original model was developed appropriately.[34,36] Recalibration was possible here because it was the only full model (with regression coefficients and intercept) reported. The regression coefficients were fixed at their original values while a new intercept was estimated as a free parameter.[36] After updating the intercept, calibration was reassessed. P value <0.05 was used to indicate statistical significance. Statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC).

RESULTS

Study participants

In total, 354 patients underwent AVS for the investigation of PA during the study period. After excluding 12 patients where cannulation of both adrenal veins could not be confirmed, the validation cohort included 342 people (reflecting a 96.6% AVS success rate). Overall, the mean age was 52.1 (SD, 11.5) years and 201 (58.8%) participants were male (Table 1). All patients had cross-sectional adrenal imaging; 318 (93%) received computed tomography and 24 (7%) magnetic resonance imaging. A discrete unilateral adrenal nodule measuring >1.0 cm was detected in 45% of patients. Hypokalemia (serum potassium <3.5 mmol/l) was present in 68.8% (218 of 317 known values). The median PAC was 522 (interquartile range, 457) pmol/l and median ARR was 2,930 (interquartile range, 3,632) pmol/l per ng/ml/h (i.e., 5.3 times the upper limit of normal). There were 186 (54.4%) people with lateralization on AVS.

Table 1.

Characteristics of the validation cohort

Characteristics^{^a}	Total AVS group			Bilateral lesions excluded group			Normal imaging group
	All (n = 342)	Stratified by AVS result		All (n = 328)	Stratified by AVS result		All (n = 174)	Stratified by AVS result
		Unilateral (n = 186)	Bilateral (n = 156)		Unilateral (n = 180)	Bilateral (n = 148)		Unilateral (n = 70)	Bilateral (n = 104)
Age, mean (SD), years [n = 342; n = 328; n = 174]	52.1 (11.5)	52.3 (11.3)	52.0 (11.8)	51.8 (11.5)	52.0 (11.3)	51.7 (11.8)	52.1 (11.9)	54.3 (10.3)	50.6 (12.7)
Age <35 years, no. (%)	29 (8.5)	13 (7.0)	16 (10.3)	29 (8.8)	13 (7.2)	16 (10.8)	17 (9.8)	2 (2.9)	15 (14.4)
Male sex, no. (%) [n = 342; n = 328; n = 174]	201 (58.8%)	119 (64.0%)	82 (52.6%)	189 (57.6%)	114 (63.3%)	75 (50.7%)	109 (62.6%)	52 (74.3%)	57 (54.8%)
Unilateral adrenal nodule >0.8 cm, no. (%) [n = 342; n = 328; n = 174]	173 (50.6%)	122 (65.6%)	51 (32.7%)	173 (52.7%)	122 (67.8%)	51 (34.5%)	23 (13.2%)	14 (20.0%)	9 (8.7%)
Unilateral adrenal nodule >1 cm, no. (%) [n = 342; n = 328; n = 174]	154 (45.0%)	110 (59.1%)	44 (28.2%)	154 (47.0%)	110 (61.1%)	44 (29.7%)	0 (0.0%)	0 (0.0%)	0 (0.0%)
Serum potassium, mean (SD), mmol/l [n = 317; n = 304; n = 159]	3.2 (0.6)	3.0 (0.6)	3.4 (0.6)	3.2 (0.6)	3.0 (0.6)	3.4 (0.6)	3.3 (0.5)	3.2 (0.5)	3.3 (0.5)
Serum potassium >3.9, no. (% of known)	33 (10.4%)	12 (6.7%)	21 (15.1%)	33 (10.9%)	12 (7.0%)	21 (15.9%)	19 (11.9%)	7 (10.4%)	12 (13.0%)
Serum potassium 3.5–3.9, no. (% of known)	66 (20.8%)	29 (16.3%)	37 (26.6%)	61 (20.1%)	26 (15.1%)	35 (26.5%)	47 (29.6%)	18 (26.9%)	29 (31.5%)
Serum potassium <3.5, no. (% of known)	218 (68.8%)	137 (77.0%)	81 (58.3%)	210 (69.1%)	134 (77.9%)	76 (57.6%)	93 (58.5%)	42 (62.7%)	51 (55.4%)
PAC, median (IQR), pmol/l [n = 276; n = 264; n = 139]	522 (457)	619 (555)	431 (329)	517 (457)	618 (562)	428 (321)	448 (346)	521 (435)	439 (297)
PAC <583 pmol/l (<210 pg/ml), no. (% of known)	156 (56.5%)	70 (46.1%)	86 (69.4%)	151 (57.2%)	68 (46.3%)	83 (70.9%)	95 (68.3%)	35 (61.4%)	60 (73.2%)
PAC >694 pmol/l (>25 ng/dl), no. (% of known)	91 (33.0%)	64 (42.1%)	27 (21.8%)	87 (33.0%)	62 (42.2%)	25 (21.4%)	31 (22.3%)	18 (31.6%)	13 (15.9%)
PAC >831 pmol/l (>30 ng/dl), no. (% of known)	62 (22.5)	46 (30.3)	16 (12.9)	60 (22.7)	45 (30.6)	15 (12.8)	17 (12.2)	10 (17.5)	7 (8.5)
ARR, median (IQR), pmol/l per ng/ml/h [n = 289; n = 276; n = 148]	2,930 (3,632)	3,473 (4,436)	2,300 (2,864)	2,906 (3,671)	3,457 (4,581)	2,300 (2,829)	2,617 (2,932)	2,822 (3,275)	2,380 (2,781)
ARR <1,526 pmol/l per ng/ml/h (<550 pg/ml per ng/ml/h), no. (% of known)	70 (24.2%)	37 (23.4%)	33 (25.2%)	68 (24.6%)	37 (24.2%)	31 (25.2%)	40 (27.0%)	16 (26.7%)	24 (27.3%)
ARR <1,720 pmol/l per ng/ml/h (<620 pg/ml per ng/ml/h), no. (% of known)	77 (26.6%)	38 (24.1%)	39 (29.8%)	75 (27.2%)	38 (24.8%)	37 (30.1%)	45 (30.4%)	17 (28.3%)	28 (31.8%)
eGFR, mean (SD), ml/min/1.73 m² [n = 264; n = 252; n = 130]	83.7 (20.9)	84.0 (19.8)	83.1 (22.4)	83.7 (20.5)	84.4 (19.5)	82.7 (22.0)	83.2 (21.1)	80.2 (19.9)	85.7 (21.9)
eGFR >100, no. (% of known)	55 (20.8%)	34 (21.7%)	21 (19.6%)	52 (20.6%)	33 (21.9%)	19 (18.8%)	24 (18.5%)	7 (12.1%)	17 (23.6%)
eGFR 80–100, no. (% of known)	97 (36.7%)	60 (38.2%)	37 (34.6%)	93 (36.9%)	57 (37.7%)	36 (35.6%)	51 (39.2%)	23 (39.7%)	28 (38.9%)
eGFR <80, no. (% of known)	112 (42.4%)	63 (40.1%)	49 (45.8%)	107 (42.5%)	61 (40.4%)	46 (45.5%)	55 (42.3%)	28 (48.3%)	27 (37.5%)
AVS lateralization index, median (IQR) [n = 342; n = 328; n = 174]	3.6 (11.3)	11.8 (18.7)	1.5 (0.8)	3.6 (11.4)	11.8 (18.9)	1.5 (0.7)	2.1 (4.2)	8.4 (14.2)	1.5 (0.7)

Abbreviations: ARR, aldosterone–renin ratio; AVS, adrenal vein sampling; eGFR, estimated glomerular filtration rate; IQR, interquartile range; PAC, plasma aldosterone concentration. Number of patients with missing data (no.; % of total AVS group) included: serum potassium (25; 7.3%), PAC (66; 19.3%), ARR (53; 15.5%), and eGFR (78; 22.8%). Number of patients with missing data (no.; % of bilateral lesions excluded group) included: serum potassium (24; 7.3%), PAC (64; 19.5%), ARR (52; 15.9%), and eGFR (76; 23.2%). Number of patients with missing data (no.; % of normal imaging group) included: serum potassium (15; 8.6%), PAC (35; 20.1%), ARR (26; 14.9%), and eGFR (44; 25.3%).

a n shown for total AVS cohort, bilateral lesions excluded cohort, and normal imaging cohort, respectively.

Characteristics of the validation cohort Abbreviations: ARR, aldosterone–renin ratio; AVS, adrenal vein sampling; eGFR, estimated glomerular filtration rate; IQR, interquartile range; PAC, plasma aldosterone concentration. Number of patients with missing data (no.; % of total AVS group) included: serum potassium (25; 7.3%), PAC (66; 19.3%), ARR (53; 15.5%), and eGFR (78; 22.8%). Number of patients with missing data (no.; % of bilateral lesions excluded group) included: serum potassium (24; 7.3%), PAC (64; 19.5%), ARR (52; 15.9%), and eGFR (76; 23.2%). Number of patients with missing data (no.; % of normal imaging group) included: serum potassium (15; 8.6%), PAC (35; 20.1%), ARR (26; 14.9%), and eGFR (44; 25.3%). a n shown for total AVS cohort, bilateral lesions excluded cohort, and normal imaging cohort, respectively. Additional subgroups were defined to facilitate prediction rule comparisons, one that excluded patients with bilateral adrenal lesions (>1.0 cm)[17] and another which only included patients with normal cross-sectional imaging (lesions absent or <1.0 cm),[16] comprising 328 and 174 patients, respectively. The baseline characteristics of these subgroups were broadly similar to those of the overall validation cohort (Table 1). Rates of lateralization were also similar at 54.8% and 40.2%, respectively.

Prediction models

Six clinical prediction models were included, based out of Japan,[16-18] France,[13] Italy,[12] and the United Kingdom[14] (Supplementary Table S1 online). Most of the original studies were small with typically <50 people with lateralizing disease.[12-14,16] Rates of lateralization varied, ranging from 10.5%[16] to 65.3%.[14] The reported demographic characteristics in the derivation cohorts were generally similar to those in the validation cohort. Variables incorporated into the models included sex, serum potassium, renal function, PAC and/or ARR levels, and presence of an adrenal nodule (Supplementary Figure S1 online). In the derivation sets, discrimination was frequently reported to be excellent with C-statistics of 0.80–0.87.[13,16,18] None of the studies provided measures of calibration.

Model performances

When externally validated, the rates of lateralization generally increased with each point assigned for the models that predicted the unilateral subtype[12-14,17] (and the reverse for those designed to predict bilateral disease)[16,18] (Table 2). In many cases, however, differences between strata were small and the range in the reported frequencies of lateralization was narrow. Correspondingly, the models had low-to-modest discriminative ability overall (Figure 1), with C-statistics between 0.60 (95% confidence intervals [CIs], 0.51, 0.69) and 0.72 (95% CI, 0.66, 0.78).

Table 2.

Frequencies of the unilateral subtype according to stratum for each prediction model and the guideline-based prediction rule

Model (no.)	Points	No. in category	No. with unilateral subtype (%)	No. with bilateral subtype (%)
Mulatero [n = 271]^{^a}	Negative	215	104 (48.4)	111 (51.6)
	Positive	56	48 (85.7)	8 (14.3)
Küpers [n = 263]	0	20	11 (55.0)	9 (45.0)
	1	23	10 (43.5)	13 (56.5)
	2	44	17 (38.6)	27 (61.4)
	3	34	14 (41.2)	20 (58.8)
	4	21	10 (47.6)	11 (52.4)
	5	52	36 (69.2)	16 (30.8)
	6	44	35 (79.6)	9 (20.4)
	7	25	24 (96.0)	1 (4.0)
Sze [n = 263]^{^b}	0	9	4 (44.4)	5 (55.6)
	1	21	10 (47.6)	11 (52.4)
	2	50	21 (42.0)	29 (58.0)
	3	38	17 (44.7)	21 (55.3)
	4	19	6 (31.6)	13 (68.4)
	5	54	39 (72.2)	15 (27.8)
	6	47	36 (76.6)	11 (23.4)
	7	25	24 (96.0)	1 (4.0)
Kamemura [n = 141]^{^c}	0	52	26 (50.0)	26 (50.0)
	1	60	26 (43.3)	34 (56.7)
	2	25	7 (28.0)	18 (72.0)
	3	4	0 (0)	4 (100)
Umakoshi [n = 304]^{^d}	Low	66	25 (37.9)	41 (62.1)
	Low-moderate	28	13 (46.4)	15 (53.6)
	High-moderate	93	42 (45.2)	51 (54.8)
	High	117	92 (78.6)	25 (21.4)
Kobayashi [n = 269]^{^c}	0	37	26 (83.9)	5 (16.1)
	1	27	24 (88.9)	3 (11.1)
	2	21	12 (70.6)	5 (29.4)
	3	39	27 (61.4)	17 (38.6)
	4	17	9 (52.9)	8 (47.1)
	5	22	12 (50.0)	12 (50.0)
	6	20	8 (38.1)	13 (61.9)
	7	23	10 (45.4)	12 (54.6)
	8	24	11 (42.3)	15 (57.7)
	9	21	7 (33.3)	14 (66.7)
	10	10	3 (27.3)	8 (72.7)
	11	7	2 (28.6)	5 (71.4)
	12	1	0 (0)	1 (100)
Endocrine Society and European Society of Hypertension [n = 271]^{^e}	Negative	267	148 (55.4)	119 (44.6)
	Positive	4	4 (100)	0 (0)

Data are no. (%).

aBased on the combined presence of serum potassium <3.0 mmol/l and plasma aldosterone concentration >25 ng/dl (>694 pmol/l).

bA modification of the clinical prediction model derived by Küpers et al.

cThe models by Kamemura et al. and Kobayashi et al. were developed to predict bilateral disease (i.e., higher scores correlating with a lower frequency of the unilateral subtype).

dAfter excluding patients with bilateral adrenal lesions seen on computed tomography (CT), patients were categorized into 4 categories in ascending frequencies of having a unilateral subtype (normal CT with normokalemia [serum potassium ≥3.5 mmol/l]; unilateral adenoma with normokalemia; normal CT with hypokalemia; and unilateral adenoma with hypokalemia).

eBased on the combined presence of age <35 years, serum potassium <3.5 mmol/l, plasma aldosterone concentration >30 ng/dl (>831 pmol/l), and a unilateral adrenal adenoma >1.0 cm with a contralateral normal adrenal gland.

Figure 1.

Receiver operating characteristic curves of the clinical prediction models. Sensitivity and specificity were estimated for each category of risk of unilateral primary aldosteronism within each model in the validation cohort, using adrenal vein sampling as the reference. C-Statistics with 95% confidence intervals are shown.

Frequencies of the unilateral subtype according to stratum for each prediction model and the guideline-based prediction rule Data are no. (%). aBased on the combined presence of serum potassium <3.0 mmol/l and plasma aldosterone concentration >25 ng/dl (>694 pmol/l). bA modification of the clinical prediction model derived by Küpers et al. cThe models by Kamemura et al. and Kobayashi et al. were developed to predict bilateral disease (i.e., higher scores correlating with a lower frequency of the unilateral subtype). dAfter excluding patients with bilateral adrenal lesions seen on computed tomography (CT), patients were categorized into 4 categories in ascending frequencies of having a unilateral subtype (normal CT with normokalemia [serum potassium ≥3.5 mmol/l]; unilateral adenoma with normokalemia; normal CT with hypokalemia; and unilateral adenoma with hypokalemia). eBased on the combined presence of age <35 years, serum potassium <3.5 mmol/l, plasma aldosterone concentration >30 ng/dl (>831 pmol/l), and a unilateral adrenal adenoma >1.0 cm with a contralateral normal adrenal gland. Receiver operating characteristic curves of the clinical prediction models. Sensitivity and specificity were estimated for each category of risk of unilateral primary aldosteronism within each model in the validation cohort, using adrenal vein sampling as the reference. C-Statistics with 95% confidence intervals are shown. For the 5 models where calibration assessment was possible, plots were used to examine the observed vs. predicted probabilities of the unilateral subtype (Figure 2). Three models were unable to accommodate a large spectrum of probabilities.[12,16,17] Specifically, in the Kamemura model, patients were universally classified as having a high probability of bilateral disease. In contrast, most patients were considered to have at least a 50% chance of lateralization using the Mulatero model. Calibration intercepts (calibration-in-the-large) were uniformly >0 (from 0.35 [95% CI, 0.07, 0.64] to 1.67 [95% CI, 1.32, 2.01]; i.e., showing systematic bias with underprediction of the unilateral subtype) and the corresponding slopes were all <1 (from 0.35 [95% CI, 0.21, 0.50] to 0.87 [95% CI, 0.18, 1.70]; i.e., indicating that lateralizing disease was generally underestimated and bilateral disease was overestimated at the extremes of risk). After recalibration, the Küpers model demonstrated some improvement, but substantial bias remained with significant variability at the extremes of the predicted probabilities (Supplementary Figure S2 online).

Figure 2.

Calibration plots of the clinical prediction models. The curves compare the observed (from adrenal vein sampling) and predicted (from each model) probabilities of unilateral primary aldosteronism in the validation cohort. Calibration intercept and slope are shown with 95% confidence intervals. The guideline-based prediction rule was assessed separately, as it was designed to confirm unilateral disease in high-probability cases (rather than broadly differentiating between unilateral vs. bilateral subtypes).[2,29] The presence of all 4 predictors from the rule was highly specific for lateralization but lacked sensitivity (specificity, 100.0% [95% CI, 97.0%, 100.0%]; sensitivity, 2.6% [95% CI, 0.7%, 6.6%]).

Sensitivity analysis

When the models were reexamined using the AVS interpretation criteria originally described in the derivation cohorts, there was incremental improvement in performance across all models, although our overall findings remained similar (Supplementary Figures S3 and S4 online). The largest improvements were seen in the Kobayashi model, which had a C-statistic of 0.80 (95% CI, 0.74, 0.85), calibration intercept crossing zero (−0.21 [95% CI, −0.50, 0.08]) and slope crossing one (0.89 [95% CI, 0.67, 1.13]). Calibration measures also improved in the Mulatero and Umakoshi models, but these continued to be limited by a narrow range of predictions and modest discrimination.

DISCUSSION

In this study, we attempted external validation of 6 clinical prediction models for detecting lateralization on AVS in a large and diverse cohort of patients with PA,[12-14,16-18] and consistently found that these had low-to-modest discrimination and calibration. The range of predictions from these models was generally narrow and the frequency of lateralization was commonly underestimated. Given the importance of accurate subtyping for informing treatment decisions (i.e., both to minimize missed opportunities for intervention in patients with unilateral PA and to avoid unnecessary surgery in those with bilateral disease), our findings suggest that clinical assessment alone cannot be reliably used to decide upon whether AVS should be offered. Indeed, a robust prediction tool with good discrimination that remains well-calibrated across diverse settings is needed, but presently lacking. Previous evaluations of these models have been limited in scope.[23,30,32] Generally, these have confirmed modest discrimination, but none have examined calibration. The Küpers model has been the subject of greatest study with external validations (with C-statistic, if available) performed in German,[20] British,[14] Chinese (C-statistic, 0.64),[15] and Japanese (C-statistic, 0.80)[18] cohorts. A recent study evaluated the Küpers, Kamemura, and Kobayashi models in an Italian population[19]; while no C-statistics or measures of calibration were provided, the investigators reported only modest model accuracy at distinguishing between unilateral vs. bilateral subtypes (accuracy range, 72.7%–74.1%). Consistent with the common rule that prognostic models generalize best to populations that are similar to the development population, a limited validation of the Kamemura model in an independent Japanese cohort—one that shared many similar characteristics with the derivation cohort—reported fair discrimination (C-statistic, 0.78).[18] Finally, the guideline-based prediction rule has only been evaluated in small groups, all showing limited sensitivity, but high specificity for unilateral disease.[21,22] In contrast to previous validation studies, ours was the first to assess multiple dimensions of model performance, and demonstrated global miscalibration with systematic underprediction of lateralization. Importantly, good discrimination, even if present, is insufficient to guide diagnostic decisions, as it only captures relative ranks between subjects. Calibration, however, is a crucial property of a prognostic model because it reflects absolute risk for individuals.[23] There were multiple possible reasons why the studied models did not perform well externally, including methodological factors (e.g., overfitting of the original models),[37] clinical heterogeneity (e.g., diversity in participant characteristics, laboratory assays, and interpretation criteria),[30] and incomplete data (e.g., potential omission of important predictors that were either unknown or unmeasured). First, the inclusion of many variables in a model with comparatively few outcomes may lead to overly optimistic measures of performance that only become apparent when the same model is evaluated in an independent dataset. Addressing clinical heterogeneity, we partially accounted for differences in baseline risk using recalibration techniques and also accounted for differences in AVS interpretation criteria,[6,13,36] but found minimal improvement in overall performance. Admittedly, wide variability exists between practices for patients with PA, and it is impossible to find a single “representative” population.[6] Indeed, many PA patients are never identified or referred for subtyping.[1,11] However, these facts simply reinforce the difficulty in generalizing clinical prediction models for PA subtyping. The third factor was likely the most significant limitation of every model. It is not surprising that the major determinants of lateralization are more complex than can be captured with current data. For instance, imaging abnormalities and hypokalemia were the strongest predictors of lateralization, and were included in most models, but these still proved to be unreliable.[2,13,14,16-18,29] Specifically, an adrenal nodule may frequently be discordant with AVS,[9] and the absence of a visible nodule does not rule-out unilateral disease.[38] Previous studies supporting the use of computed tomography alone were limited by small sample sizes and inconsistent verification with AVS.[10,39] Moreover, hypokalemia is frequently detected in patients with both unilateral and bilateral subtypes.[40] As such, the common variables purported to be associated with lateralization are not sufficiently sensitive to guide most clinical decisions. While there may be a limited role for the highly specific guideline-based prediction rule for identifying a selected group of patients who are likely to have unilateral disease, it cannot be used to decide upon AVS for the majority of PA patients.[2,29] This study had multiple strengths (i.e., at our center, AVS has been verified to be a reliable reference standard with a high degree of success and a low rate of false positive lateralization; baseline characteristics, including mean age and the proportion male, were generally similar to those in the derivation studies, thus allowing for fair comparisons; there were a large number of patients who had lateralization providing sufficient power to evaluate each model; and we were able to assess performance using a range of AVS interpretation criteria).[6,26,35] There were some weaknesses. First, there were occasional missing data for the predictors of interest in the validation cohort. However, we performed complete case analysis for each model, only including people where all variables were known. Second, we were unable to evaluate models that used variables which were not routinely collected at our center.[18,19] Even so, we were able to assess the commonly referenced models in this study. In conclusion, the clinical prediction models evaluated here lack generalizability and cannot reliably predict lateralization. The presence of young age, hypokalemia, high aldosterone, and a unilateral adrenal nodule, even in combination, are not sufficiently accurate to distinguish between unilateral and bilateral disease in most people. Correspondingly, the absence of these factors cannot be used to withhold AVS, as this is likely to miss many lateralizing cases. Therefore, whenever available, AVS should still be offered to guide diagnostic subtyping decisions for most patients with PA. Click here for additional data file.

40 in total

1. Prognosis and prognostic research: validating a prognostic model.

Authors: Douglas G Altman; Yvonne Vergouwe; Patrick Royston; Karel G M Moons
Journal: BMJ Date: 2009-05-28

2. Development and validation of subtype prediction scores for the workup of primary aldosteronism.

Authors: Hiroki Kobayashi; Masanori Abe; Masayoshi Soma; Yoshiyu Takeda; Isao Kurihara; Hiroshi Itoh; Hironobu Umakoshi; Mika Tsuiki; Takuyuki Katabami; Takamasa Ichijo; Norio Wada; Takanobu Yoshimoto; Yoshihiro Ogawa; Junji Kawashima; Masakatsu Sone; Nobuya Inagaki; Katsutoshi Takahashi; Minemori Watanabe; Yuichi Matsuda; Hirotaka Shibata; Kohei Kamemura; Toshihiko Yanase; Michio Otsuki; Yuichi Fujii; Koichi Yamamoto; Atsushi Ogo; Kazutaka Nanba; Akiyo Tanabe; Tomoko Suzuki; Mitsuhide Naruse
Journal: J Hypertens Date: 2018-11 Impact factor: 4.844

3. Discordance between imaging and immunohistochemistry in unilateral primary aldosteronism.

Authors: Aya T Nanba; Kazutaka Nanba; James B Byrd; James J Shields; Thomas J Giordano; Barbara S Miller; William E Rainey; Richard J Auchus; Adina F Turcu
Journal: Clin Endocrinol (Oxf) Date: 2017-09-04 Impact factor: 3.478

4. Clinical Outcomes After Unilateral Adrenalectomy for Primary Aldosteronism.

Authors: Wessel M C M Vorselaars; Sjoerd Nell; Emily L Postma; Rasa Zarnegar; F Thurston Drake; Quan-Yang Duh; Stephanie D Talutis; David B McAneny; Catherine McManus; James A Lee; Scott B Grant; Raymon H Grogan; Minerva A Romero Arenas; Nancy D Perrier; Benjamin J Peipert; Michael N Mongelli; Tanya Castelino; Elliot J Mitmaker; David N Parente; Jesse D Pasternak; Anton F Engelsman; Mark Sywak; Gerardo D'Amato; Marco Raffaelli; Valerie Schuermans; Nicole D Bouvy; Hasan H Eker; H Jaap Bonjer; N M Vaarzon Morel; Els J M Nieveen van Dijkum; Otis M Vrielink; Schelto Kruijff; Wilko Spiering; Inne H M Borel Rinkes; Gerlof D Valk; Menno R Vriens
Journal: JAMA Surg Date: 2019-04-17 Impact factor: 14.766

5. Assessing calibration of prognostic risk scores.

Authors: Cynthia S Crowson; Elizabeth J Atkinson; Terry M Therneau
Journal: Stat Methods Med Res Date: 2013-07-30 Impact factor: 3.021

6. Significance of Computed Tomography and Serum Potassium in Predicting Subtype Diagnosis of Primary Aldosteronism.

Authors: Hironobu Umakoshi; Mika Tsuiki; Yoshiyu Takeda; Isao Kurihara; Hiroshi Itoh; Takuyuki Katabami; Takamasa Ichijo; Norio Wada; Takanobu Yoshimoto; Yoshihiro Ogawa; Junji Kawashima; Masakatsu Sone; Nobuya Inagaki; Katsutoshi Takahashi; Minemori Watanabe; Yuichi Matsuda; Hiroki Kobayashi; Hirotaka Shibata; Kohei Kamemura; Michio Otsuki; Yuichi Fujii; Koichi Yamamto; Atsushi Ogo; Toshihiko Yanase; Tomoko Suzuki; Mitsuhide Naruse
Journal: J Clin Endocrinol Metab Date: 2018-03-01 Impact factor: 5.958

7. Outcomes of a Specialized Clinic on Rates of Investigation and Treatment of Primary Aldosteronism.

Authors: Yuan-Yuan Liu; James King; Gregory A Kline; Raj S Padwal; Janice L Pasieka; Guanmin Chen; Benny So; Adrian Harvey; Alex Chin; Alexander A Leung
Journal: JAMA Surg Date: 2021-06-01 Impact factor: 14.766

8. Identifying unilateral disease in Chinese patients with primary aldosteronism by using a modified prediction score.

Authors: Ying Zhang; Wenquan Niu; Fangfang Zheng; Hua Zhang; Wenlong Zhou; Zhoujun Shen; Jianzhong Xu; Xiaofeng Tang; Jin Zhang; Ping-Jin Gao; Ji-Guang Wang; Limin Zhu
Journal: J Hypertens Date: 2017-12 Impact factor: 4.844

9. High-probability features of primary aldosteronism may obviate the need for confirmatory testing without increasing false-positive diagnoses.

Authors: Gregory A Kline; Janice L Pasieka; Adrian Harvey; Benny So; Val C Dias
Journal: J Clin Hypertens (Greenwich) Date: 2014-05-27 Impact factor: 3.738

10. Association of Adrenal Venous Sampling With Outcomes in Primary Aldosteronism for Unilateral Adenomas.

Authors: Jessica W Thiesmeyer; Timothy M Ullmann; Alexia T Stamatiou; Jessica Limberg; Dessislava Stefanova; Toni Beninato; Brendan M Finnerty; Timothée Vignaud; Julie Leclerc; Thomas J Fahey; Laurent Brunaud; Eric Mirallie; Rasa Zarnegar
Journal: JAMA Surg Date: 2021-02-01 Impact factor: 14.766

1 in total

Review 1. Primary aldosteronism - a multidimensional syndrome.

Authors: Adina F Turcu; Jun Yang; Anand Vaidya
Journal: Nat Rev Endocrinol Date: 2022-08-31 Impact factor: 47.564

1 in total