Literature DB >> 35652330

Performance of Confirmatory Tests for Diagnosing Primary Aldosteronism: a Systematic Review and Meta-Analysis.

Alexander A Leung^1,2, Christopher J Symonds¹, Gregory L Hundemer³, Paul E Ronksley², Diane L Lorenzetti², Janice L Pasieka^4,5, Adrian Harvey^4,5, Gregory A Kline¹.

Abstract

BACKGROUND: Confirmatory tests are recommended for diagnosing primary aldosteronism, but the supporting evidence is unclear.
METHODS: We searched Medline, EMBASE, and the Cochrane Central Register of Controlled Trials. Studies evaluating any guideline-recommended confirmatory test (ie, saline infusion test, salt loading test, fludrocortisone suppression test, and captopril challenge test), compared with a reference standard were included. The Quality Assessment of Diagnostic Accuracy Studies-2 tool was used to assess the risk of bias. Meta-analyses were conducted using hierarchical summary receiver operating characteristic models.
RESULTS: Fifty-five studies were included, comprising 26 studies (3654 participants) for the recumbent saline infusion test, 4 studies (633 participants) for the seated saline infusion test, 2 studies (99 participants) for the salt loading test, 7 studies (386 participants) for the fludrocortisone suppression test, and 25 studies (2585 participants) for the captopril challenge test. Risk of bias was high, affecting more than half of studies, and across all domains. Studies with case-control sampling overestimated accuracy by 7-fold (relative diagnostic odds ratio, 7.26 [95% CI, 2.46-21.43]) and partial verification or use of inconsistent reference standards overestimated accuracy by 5-fold (5.12 [95% CI, 1.48-17.77]). There were large variations in how confirmatory tests were conducted, interpreted, and verified. Under most scenarios, confirmatory testing resulted in an excess of missed cases. The certainty of evidence underlying each test (Grading of Recommendations, Assessment, Development, and Evaluations) was very low.
CONCLUSIONS: Recommendations for confirmatory testing in patients with abnormal screening tests and high probability features of primary aldosteronism are based on very low-quality evidence and their routine use should be reconsidered.

Entities: Chemical

Keywords: aldosterone; captopril; fludrocortisone; phenotype; prevalence

Mesh：

Substances：
Captopril
Fludrocortisone

Year: 2022 PMID： 35652330 PMCID： PMC9278709 DOI： 10.1161/HYPERTENSIONAHA.122.19377

Source DB: PubMed Journal: Hypertension ISSN： 0194-911X Impact factor: 9.897

There are large variations in how confirmatory tests are conducted, interpreted, and verified. Verification bias and spectrum bias are very common, leading to overestimation of test accuracy by 5- to 7-fold. Under most clinical scenarios, use of confirmatory tests results in an excess of missed cases, such that many patients with primary aldosteronism would be overlooked for targeted treatment. The use of confirmatory tests in patients with abnormal screening and high probability features of primary aldosteronism is based on very low-quality evidence. Current recommendations for confirmatory testing may need revision. Primary aldosteronism (PA) is the most common cause of remediable hypertension,[1] yet <1% of affected people are diagnosed and treated.[2] PA poses a major public health problem, not only because of its high prevalence, but also due to the excess risk of cardiovascular, metabolic, and kidney disease if untreated.[3-5] In the absence of an extreme phenotype, it is recommended that at least one of 4 confirmatory tests (ie, saline infusion test [SIT], oral salt loading test [SLT], fludrocortisone suppression test [FST], or captopril challenge test [CCT]) be used to confirm PA in individuals with positive screening before proceeding to further investigations or treatment.[6] An elevated aldosterone following any of these tests is purported to be diagnostic of PA, whereas a suppressed aldosterone is believed to rule-out disease. Most studies evaluating confirmatory testing have been limited by case-control selection and inconsistent use of a gold standard for verification (ie, individuals with normal testing did not proceed to surgery or targeted medical therapy), scenarios that bias towards inflated diagnostic yields.[7,8] In light of these limitations, the purpose of this study was to assess the characteristics of confirmatory tests for PA and to interpret these in the context of study design and potential risks of bias.

Methods

The authors declare that all supporting data are available within the article.

Data Sources and Searches

This study was registered with PROSPERO (CRD42021258919). We searched Medline, EMBASE, and the Cochrane Central Register of Controlled Trials (inception to June 1, 2021; see Supplemental Material for detailed search strategy and methods).

Study Selection

Original studies evaluating any guideline-recommended confirmatory test for PA were eligible if they included comparison to a reference standard. Two reviewers independently screened titles and abstracts for eligibility and selected articles for further review if the study reported original data on confirmatory testing for PA. Full-text articles were reviewed in duplicate. Studies were selected for final inclusion if a 2×2 diagnostic accuracy table could be extracted (or reconstructed). Discrepancies were resolved by consensus.

Data Extraction and Quality Assessment

We collected information related to testing conditions, reference standards, and study design (eg, single-gate design where the entire sample was drawn from a single clinical population suspected to have PA versus 2- or multi-gate designs where cases and controls were sampled from 2 or more distinct source populations, such that cases were known or strongly suspected to have PA, but controls had an alternative diagnosis, like essential hypertension, or were healthy participants never at risk of having PA).[8] The Quality Assessment of Diagnostic Accuracy Studies-2 tool was used to evaluate the risk of bias and concerns of applicability.[9] Data extraction and assessment of study quality were performed in duplicate.

Statistical Analysis

We used coupled forest plots and the receiver operating characteristic (ROC) space to visualize variation between studies. Meta-analyses were conducted using hierarchical summary ROC models.[10] We explored for sources of heterogeneity using meta-regression, considering differences in methodological quality and clinical characteristics between studies.[11] To quantify differences, we calculated the relative diagnostic odds ratio, which is a summary measure of the relative accuracy.[7] Because summary statistics are only interpretable when studies share a similar threshold (but thresholds varied considerably in our current review), we estimated the sensitivities at discrete points on the summary ROC curve corresponding to the lower quartile, median, and upper quartile of the reported specificities to facilitate comparisons.[11] We calculated the number of missed cases and over-diagnosed cases per 1000 patients and presented these in a summary of findings table.[12,13] Please see Supplemental Material for details.

Results

Included Studies

There were 55 studies included (Supplemental Material), comprising 26 studies (3654 participants) for the recumbent SIT,[14-42] 4 studies (633 participants) for the seated SIT,[34,35,41,43-45] 2 studies (99 participants) for the SLT,[46,47] 7 studies (386 participants) for the FST,[28,48-53] and 25 studies (2585 participants) for the CCT (see Supplemental Material).[24,26,29,32,33,37,44,54-74] Coupled forest plots. CCT indicates captopril challenge test; FN, false negative; FP, false positive; FST, fludrocortisone suppression test; SIT, intravenous saline infusion test; SLT, oral salt loading test; TN, true negative; and TP, true positive.

Quality Assessment

Risk of bias was high, affecting more than half of studies, and across all domains (Supplemental Material). Half of studies (47.3%) had 2- or multi-gate study designs with unclear sampling or case-control selection of patients (50.9%), such that confirmatory tests were applied to people who were never suspected of having PA, leading to a high risk of spectrum bias. Less than two-thirds of studies were prospective (61.8%) and interpretation of tests were commonly performed post hoc without blinding (72.7%). In the majority of studies (67.3%), a confirmatory test was used as a reference standard,[18,22-25,28,29,31,32,34,36-42,44,47,50,53,55-59,61,63-67,69-74] and in nearly a quarter (23.6%), the same test was used as part of its own reference standard.[23,28,29,31,36-39,41,42,53,71,74] Few studies applied complete verification with the same reference standard to all participants (25.5%) and of those that did, only one study (1.8%) used an independent reference that was not contingent upon a confirmatory test (ie, adrenal vein sampling to identify unilateral aldosterone excess, and where it was assumed that patients without lateralization were negative).[30]

Indirect and Direct Comparisons

Sensitivities and specificities for each test were highly variable (Figure 1). There were visible differences in the ROC curves between tests (P=0.010 for global test; Figure 2). The FST appeared to have the best performance overall and its curve dominated across all specificities. There was considerable overlap between the curves corresponding to the recumbent SIT and CCT, suggesting comparable accuracies across testing thresholds. Few studies directly compared multiple confirmatory tests in the same patients against a common reference standard. When direct comparisons were made (Supplemental Material), the recumbent SIT was frequently more accurate than the CCT in the individual studies, but aggregate differences were not statistically significant (P=0.061).[24,29,32,33,37] The seated SIT appeared to be more accurate than the recumbent SIT in 2 studies[34,41]; other head-to-head comparisons were limited by few studies.[28,44]

Figure 1.

Coupled forest plots. CCT indicates captopril challenge test; FN, false negative; FP, false positive; FST, fludrocortisone suppression test; SIT, intravenous saline infusion test; SLT, oral salt loading test; TN, true negative; and TP, true positive.

Figure 2.

Summary receiver operating characteristics curves. The clear markers correspond to individual studies. The size of each marker reflects study size (with height proportional to the number diseased and the width with the number nondiseased). A summary curve could not be provided for the oral salt loading test because there were only 2 studies. CCT indicates captopril challenge test; FST, fludrocortisone suppression test; SIT, intravenous saline infusion test; and SLT, oral salt loading test. Meta-regression analysis for diagnostic test accuracy variability. Relative diagnostic odds ratios (DOR) with 95% CI according to the main study characteristics. The reference category for all comparisons was the absence of the characteristic. PA indicates primary aldosteronism.

Intravenous SIT

The SIT was examined according to recumbent and seated postures using indirect comparisons. The corresponding summary ROC curves were approximately symmetrical (P=0.061) with comparable accuracy (P=0.058), implying similar performance irrespective of posture across the range of observed thresholds. While the vast majority of studies used a similar protocol (ie, 2 L of 0.9% NaCl infused over 4 hours) with few exceptions,[18,19,21] there were large differences in diagnostic cut-offs (Supplemental Material). Comparisons were difficult as only 4 of 30 studies provided complete verification of all cases with a consistent reference standard,[22,28,30,40] usually with another confirmatory test,[22,28,40] and with the remaining studies applying partial or differential verification.

Oral SLT

Only 2 studies evaluated the SLT and these used highly different protocols.[46,47] The SLT was considered diagnostic of PA when the urinary aldosterone was >5 µg/d (13.9 nmol/d) in one study and >13 µg/d (36.0 nmol/d) in the other.[46,47] Using these cut-offs, the SLT had poor specificity (20% [95% CI, 10%–35%] to 50% [95% CI, 25%–75%]) and moderate to high sensitivity (85% [95% CI, 68%–95%] to 100% [95% CI, 48% to 100%]). The verification standard was inconsistent in the first study[46] and based on the recumbent SIT in the second (using a low plasma aldosterone of >100 pmol/L [3.6 ng/dL] to define disease).[47]

Fludrocortisone Suppression Test

There were significant differences in the dosing of fludrocortisone, ranging from 0.4 [28,49,50,53] to 1.2 mg/d[48,51] for 3 to 4 days and with large variations in diagnostic cut-offs (ie, plasma aldosterone from 3.0 ng/dL [83 pmol/L] to 12.6 ng/dL [350 pmol/L]). Only one study described dexamethasone co-administration.[53] In most studies, inconsistent reference standards were used[48-51,53]; in some, the FST was both the index test and part of its own reference standard.[28,53] Complete verification with an independent reference was only provided in one of the 7 studies,[52] which also reported the lowest sensitivity (68% [95% CI, 51%–82%]) and specificity (90% [95% CI, 76%–97%]).

Captopril Challenge Test

There were very large variations in captopril dosage, timing of blood collection, and diagnostic thresholds. In one study, all participants had PA, such that the false positive and true negative rates could not be determined.[73] Complete verification of disease status with the same reference standard was only performed in 8 of the 25 studies, and in every case, another confirmatory test was the reference standard.[57,58,63,65,66,69,70,72,73] The remaining 17 studies were limited by partial verification and the use of inconsistent standards, usually based on selective testing with another confirmatory test, but in some studies, the reference standard was based on the results of the CCT itself.[29,37,67,74]

Meta-regression Analysis

In addition to differences in testing protocols and thresholds, other major drivers of heterogeneity were related to study quality (Figure 3). Studies with case-control sampling overestimated test accuracy by 7-fold compared with those enrolling consecutive or randomly selected patients in whom there was diagnostic uncertainty (relative diagnostic odds ratio, 7.26 [95% CI, 2.46–21.43]). Similarly, the use of 2- or multi-gate designs (eg, inclusion of patients never suspected to have PA) was associated with a 4-fold overestimation of the diagnostic odds ratio (3.92 [95% CI, 1.27–12.05]), while partial verification or use of inconsistent reference standards resulted in a 5-fold overestimation (5.12 [95% CI, 1.48–17.77]). Post hoc interpretation of test results with knowledge of disease status potentially overestimated accuracy by 3-fold (3.32 [95% CI, 0.94–11.79]). Apart from disease prevalence, none of the clinical characteristics examined were associated with changes in test accuracy. For each test, the direction and magnitude of effects were consistent with the overall estimates, though meta-regression was sometimes underpowered owing to the small number of studies in some cases (Supplemental Material).

Figure 3.

Meta-regression analysis for diagnostic test accuracy variability. Relative diagnostic odds ratios (DOR) with 95% CI according to the main study characteristics. The reference category for all comparisons was the absence of the characteristic. PA indicates primary aldosteronism.

Publication Bias

Deeks’ funnel plot appeared asymmetrical for the recumbent SIT, FST, and CCT, suggesting publication bias, but statistical tests were nonsignificant (Supplemental Material).

Clinical Implications

Each test was applied to a hypothetical cohort of 1000 patients to help contextualize the performance under different conditions (Table). With a fixed specificity of 95%, the number of missed cases (ie, patients with PA but normal results and therefore overlooked for opportunities for targeted treatment) exceeded the number over-diagnosed (ie, people without PA but who would potentially undergo unnecessary adrenal vein sampling) when disease prevalence ranged from 30% to 70% following positive ARR screening. This was true for every test, except the FST when disease prevalence was the lowest; in this latter scenario, the numbers of false negatives and false positives were approximately equal. Otherwise, under most scenarios, missed cases were often several fold higher than those over-diagnosed, except when test specificity was low. Finally, owing to serious concerns related to the risks of bias, indirectness, inconsistency, and imprecision across the body of evidence, the certainty of evidence for each confirmatory test was graded very low. Summary of Findings

Discussion

We found large variations in how confirmatory tests for PA were conducted, interpreted, and verified, along with global concerns about study quality that limit their application in clinical practice. Spectrum bias (ie, generated by selection of cases and controls) and verification bias (ie, using different verification standards for positive and negative results) posed the greatest threats to study validity. There were almost no studies that completely verified disease status with a valid reference standard that was not itself a confirmatory test. It was impossible to distinguish a single best test or to produce meaningful summary sensitivities or specificities with any certainty. Therefore, the general reliance on historical studies to inform confirmatory testing is highly problematic. Previous reviews did not consider the impact of study quality on test performance, but rather focused on clinical characteristics, and therefore, were unable to identify major sources of statistical heterogeneity.[75,76] Moreover, these only included a subset of available studies and did not use recommended meta-analytic techniques to account for correlation between performance measures.[11] In contrast, we found that diagnostic accuracy was highly dependent on study design. Indeed, study quality is well-recognized to impact estimates of association and failure to incorporate quality assessments in the analysis can dramatically distort the results of any review.[7,77,78] Our results closely align with the work by Lijmer et al[7] who also showed that case-control selection, use of different reference standards, and absence of blinding were the most important factors leading to exaggerated test accuracy. To frame the magnitude of overestimation in practical terms, in our review, an influential study that used different reference standards (ie, composite standards with different ways of ascertaining the presence of disease) would on average overstate the diagnostic odds ratio by 5-fold. Supposing a test had a fixed specificity of 90%, this would be equivalent to reporting a sensitivity of 90% when in reality the sensitivity should be 64%. In light of this, perhaps it is time to reconsider the long tradition of confirmatory testing given the paucity of empirical evidence supporting its use. Contrary to classic dogma, many patients with PA can have suppressed aldosterone concentrations well below what is commonly believed to be possible with this condition (eg, under 5.0 ng/dL [140 pmol/L]), independent of medication effect, hypokalemia, circadian timing, or postural variation.[30,79-82] Furthermore, testing can be dangerous (eg, customary drug washout can provoke symptomatic hypertension; volume expansion can lead to fluid overload or severe hypokalemia).[83-85] As we have shown, the added value of confirmatory testing is minimal when there is a high pretest probability of PA, as positive tests only prove what was likely already known. Conversely, normal results modestly rule-out disease but with a high rate of false negative misclassification. Instead, relying on the basic ARR in combination with clinical characteristics (eg, multidrug hypertension, hypokalemia, and presence of adrenal nodule) appears to be sufficiently accurate for diagnosing PA in most instances,[86,87] even without a subsequent confirmatory test. This approach has been safely adopted by a number of centers with high rates of treatment success without inordinate risks of performing unnecessary procedures.[88,89] There were many strengths of our study (ie, inclusion of more than double the number of studies compared with previous reviews; robust analyses; identification of the major reasons for between-study differences; and standardized grading of the evidence), but there were some limitations. First, there was no universal definition of PA, so each study may have measured a slightly different construct. Part of the challenge lies in changing definitions of disease with increased recognition that PA is a continuous spectrum, such that any dichotomous classification is arbitrary.[1] Second, it was impossible to include some studies because the 2×2 table could not be reconstructed (eg, AQUARR),[90] but their inclusion would not likely have affected the overall findings, nor resolved the observed heterogeneity. Conversely, the inclusion of a large number of diverse studies was also the reason we observed large amounts of heterogeneity due to significant differences in testing protocols, interpretation criteria, and populations between studies, thus limiting the ability to pool and compare results. Third, PA was commonly defined by surrogate reference standards, and therefore, misclassification was not only possible, but expected.[12] Patients with PA commonly have abnormal responses to one test, but not another.[91] Admittedly, there is no perfect gold standard for diagnosing PA, and only one study had complete verification with an independent reference standard that did not include a confirmatory test itself.[30] Addressing this, there is an ongoing trial assessing the SIT in consecutive patients suspected to have PA with complete verification using targeted treatment response as a reference standard (URL: https://www.clinicaltrials.gov; Unique identifier: NCT04422756).

Perspectives

Current recommendations for confirmatory testing in patients who have high probability features of PA are based on very low-quality evidence. The implication of our findings is that improvements in care may be realized by forgoing routine confirmatory tests if these add little to the diagnostic work-up, but present an unnecessary barrier in an already lengthy and complex diagnostic-care pathway. A potential future pathway may be to rely on the ARR together with clinical/biochemical characteristics to diagnose most cases of PA with early introduction of empirical mineralocorticoid receptor antagonist treatment in the majority, while reserving adrenal vein sampling for those who are most likely to benefit from potential surgery. Given that only a small fraction of patients with PA are ever diagnosed and treated,[2] a paradigm shift is needed to meaningfully close care gaps and to improve clinical outcomes.

Article Information

Sources of Funding

This work was supported by the Canadian Institutes of Health Research (PJT-159533). A.A. Leung is supported by the Heart and Stroke Foundation of Canada’s National New Investigator Award. G.L. Hundemer is supported by the Kidney Research Scientist Core Education and National Training (KRESCENT) Program New Investigator Award (2019KP-NIA626990).

Disclosures

None.

Table.

Summary of Findings

88 in total

1. Confirmatory testing in normokalaemic primary aldosteronism: the value of the saline infusion test and urinary aldosterone metabolites.

Authors: Caroline Schirpenbach; Lysann Seiler; Christiane Maser-Gluth; Frank Rüdiger; Christian Nickel; Felix Beuschlein; Martin Reincke
Journal: Eur J Endocrinol Date: 2006-06 Impact factor: 6.664

2. GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables.

Authors: Holger J Schünemann; Reem A Mustafa; Jan Brozek; Karen R Steingart; Mariska Leeflang; Mohammad Hassan Murad; Patrick Bossuyt; Paul Glasziou; Roman Jaeschke; Stefan Lange; Joerg Meerpohl; Miranda Langendam; Monica Hultcrantz; Gunn E Vist; Elie A Akl; Mark Helfand; Nancy Santesso; Lotty Hooft; Rob Scholten; Måns Rosen; Anne Rutjes; Mark Crowther; Paola Muti; Heike Raatz; Mohammed T Ansari; John Williams; Regina Kunz; Jeff Harris; Ingrid Arévalo Rodriguez; Mikashmi Kohli; Gordon H Guyatt
Journal: J Clin Epidemiol Date: 2020-02-10 Impact factor: 6.437

3. The value of the post-captopril aldosterone/renin ratio for the diagnosis of primary aldosteronism and the influential factors: A meta-analysis.

Authors: Qiao Xiang; Wen Wang; Tao Chen; Kai Yu; Qianrui Li; Tingting Zhang; Haoming Tian; Yan Ren
Journal: J Renin Angiotensin Aldosterone Syst Date: 2020 Oct-Dec Impact factor: 1.636

4. Surprisingly low aldosterone levels in peripheral veins following intravenous sedation during adrenal vein sampling: implications for the concept of nonsuppressibility in primary aldosteronism.

Authors: Gregory A Kline; Pol Darras; Alexander A Leung; Benny So; Alex Chin; Daniel T Holmes
Journal: J Hypertens Date: 2019-03 Impact factor: 4.844

5. Comparison of the captopril and the saline infusion test for excluding aldosterone-producing adenoma.

Authors: Gian Paolo Rossi; Anna Belfiore; Giampaolo Bernini; Giovambattista Desideri; Bruno Fabris; Claudio Ferri; Gilberta Giacchetti; Claudio Letizia; Mauro Maccario; Francesca Mallamaci; Massimo Mannelli; Gaetana Palumbo; Damiano Rizzoni; Ermanno Rossi; Enrico Agabiti-Rosei; Achille C Pessina; Franco Mantero
Journal: Hypertension Date: 2007-06-25 Impact factor: 10.190

6. Outpatient screening tests for primary aldosteronism.

Authors: P J Dunn; E A Espiner
Journal: Aust N Z J Med Date: 1976-04

7. Single dose captopril as a diagnostic test for primary aldosteronism.

Authors: D F Lyons; D C Kem; R D Brown; C S Hanson; M L Carollo
Journal: J Clin Endocrinol Metab Date: 1983-11 Impact factor: 5.958

8. High-probability features of primary aldosteronism may obviate the need for confirmatory testing without increasing false-positive diagnoses.

Authors: Gregory A Kline; Janice L Pasieka; Adrian Harvey; Benny So; Val C Dias
Journal: J Clin Hypertens (Greenwich) Date: 2014-05-27 Impact factor: 3.738

9. Prospective evaluation of aldosterone LC-MS/MS-specific cutoffs for the saline infusion test.

Authors: Charlotte Michaela Fries; Yoon Ju Bae; Nada Rayes; Benjamin Sandner; Berend Isermann; Michael Stumvoll; Valentina Fagotto; Martin Reincke; Martin Bidlingmaier; Vogel Mandy; Jürgen Kratzsch; Wiebke Kristin Fenske
Journal: Eur J Endocrinol Date: 2020-08 Impact factor: 6.664

10. Confirmatory testing of primary aldosteronism with saline infusion test and LC-MS/MS.

Authors: Carmina Teresa Fuss; Katharina Brohm; Max Kurlbaum; Anke Hannemann; Sabine Kendl; Martin Fassnacht; Timo Deutschbein; Stefanie Hahner; Matthias Kroiss
Journal: Eur J Endocrinol Date: 2021-01 Impact factor: 6.664

1 in total

Review 1. Primary aldosteronism - a multidimensional syndrome.

Authors: Adina F Turcu; Jun Yang; Anand Vaidya
Journal: Nat Rev Endocrinol Date: 2022-08-31 Impact factor: 47.564

1 in total