Literature DB >> 35439947

Blinded sample size re-estimation in a comparative diagnostic accuracy study.

Maria Stark¹, Mailin Hesse², Werner Brannath³, Antonia Zapf⁴.

Abstract

BACKGROUND: The sample size calculation in a confirmatory diagnostic accuracy study is performed for co-primary endpoints because sensitivity and specificity are considered simultaneously. The initial sample size calculation in an unpaired and paired diagnostic study is based on assumptions about, among others, the prevalence of the disease and, in the paired design, the proportion of discordant test results between the experimental and the comparator test. The choice of the power for the individual endpoints impacts the sample size and overall power. Uncertain assumptions about the nuisance parameters can additionally affect the sample size.
METHODS: We develop an optimal sample size calculation considering co-primary endpoints to avoid an overpowered study in the unpaired and paired design. To adjust assumptions about the nuisance parameters during the study period, we introduce a blinded adaptive design for sample size re-estimation for the unpaired and the paired study design. A simulation study compares the adaptive design to the fixed design. For the paired design, the new approach is compared to an existing approach using an example study.
RESULTS: Due to blinding, the adaptive design does not inflate type I error rates. The adaptive design reaches the target power and re-estimates nuisance parameters without any relevant bias. Compared to the existing approach, the proposed methods lead to a smaller sample size.
CONCLUSIONS: We recommend the application of the optimal sample size calculation and a blinded adaptive design in a confirmatory diagnostic accuracy study. They compensate inefficiencies of the sample size calculation and support to reach the study aim.

Entities: Chemical

Keywords: Adaptive design; Co-primary endpoints; Paired design; Sensitivity; Specificity; Unpaired design

Mesh：

Year: 2022 PMID： 35439947 PMCID： PMC9019976 DOI： 10.1186/s12874-022-01564-2

Source DB: PubMed Journal: BMC Med Res Methodol ISSN： 1471-2288 Impact factor: 4.612

Background

In a diagnostic accuracy trial the experimental test is compared to the reference standard, which defines the true disease status. Either the evaluation is limited to the comparison with the reference standard (single-test design) or another test is considered in addition (comparative design) [1]. The present article puts the focus on comparative study designs in which the experimental test is compared to an already evaluated comparator test. In the unpaired design, either the experimental test or the comparator test is assigned randomly to study participants in addition to the reference standard [2]. In contrast, in the paired design, participants undergo all three diagnostic procedures [3]. Due to the within-subject comparison of the diagnostic tests in the paired design, the variability of the study results will be diminished [4]. For this reason, the paired design is preferred to the unpaired design if technically feasible and ethically justifiable [4]. Hence, the focus of this article is especially on the paired design. Figure 1 gives an overview about the different designs.

Fig. 1

Study designs of a confirmatory diagnostic accuracy trial

Study designs of a confirmatory diagnostic accuracy trial Independent of the chosen study design, sensitivity and specificity are used as co-primary endpoints in a confirmatory diagnostic accuracy trial [4, 5]. Both endpoints are combined via a joint hypothesis which is evaluated by the Intersection-Union Test [6, 7]. In this context, Stark et al. [8] developed an approach to calculate the sample size considering the prevalence. The advantage of this optimal sample size calculation is to avoid an overpowered study as it is often the case with the conventional approach. We will extend this approach to the unpaired and paired comparative study design. Hereby, the study might either aim to show superiority, non-inferiority or a combination of both regarding the co-primary endpoints. To adjust the sample size during the course of the study, an adaptive design can be applied. Zapf et al. [9] reveal that adaptive designs including group-sequential designs are hardly developed and rarely applied in diagnostic studies. Stark et al. [8] introduce a blinded adaptive design for sample size re-estimation in the single-test design. Focusing on comparative study designs, Mazumdar et al. [10] propose a group-sequential design, but restricted to the area under the receiver operating characteristic curve as endpoint. McCray et al. [11] developed a blinded sample size re-estimation procedure in the paired study design regarding sensitivity and specificity. Their approach is based on the re-estimation of the proportion of concordant test results and the prevalence. To further develop the approaches of McCray et al. [11] and Stark et al. [8], we transfer the blinded adaptive design in the single-test design using the optimal sample size calculation to both comparative study designs. Hence, novel aspects in the present work are first, the development of the optimal sample size calculation in the unpaired as well as paired design aiming to show superiority, non-inferiority or a combination of both regarding the co-primary endpoints and second, the implementation of a blinded-sample size re-estimation procedure in the unpaired and paired design based on the optimal sample size calculation. The present article is structured the following way: at first, we introduce the optimal sample size calculation in the unpaired and paired study design aiming to show superiority, non-inferiority or a combination of both. Second, we describe the procedure of the blinded sample size re-estimation in the unpaired and paired study design. Third, we compare the blinded adaptive design in a paired trial to the approach of McCray et al. [11] using an exemplary trial. Then, we present the results of a simulation study investigating the blinded adaptive design compared to a fixed design in an unpaired and paired study. Finally, we discuss the results and offer a conclusion.

Methods

Sample size calculation in a comparative diagnostic study

In this section, we introduce the optimal sample size calculation for a comparative diagnostic study, which is already developed by Stark et al. [8] for the single-test design. In a comparative diagnostic study, sensitivity and specificity of the experimental test can be tested for superiority, non-inferiority or the combination of superiority and non-inferiority against the comparator test. For the motivation and application of the optimal sample size calculation, we focus on the paired design testing for superiority regarding both endpoints because the paired design is the more relevant design in comparative studies [4]. However, the advantages of the optimal sample size calculation are also valid in the unpaired design. Furthermore, we provide formulas for the optimal approach in the unpaired and paired design. In confirmatory diagnostic studies, sensitivity and specificity are combined as co-primary endpoints via the Intersection-Union test [8]. The null hypothesis of the Intersection-Union-Test is the union of the individual null hypothesis regarding sensitivity and the individual null hypothesis regarding specificity [6]. The overall power of this Intersection-Union test is calculated by the product of the power of each individual hypothesis. To show superiority of the experimental test regarding sensitivity and specificity against the comparator test, the global null hypothesis for equality is given by: SeE and SpE denote the sensitivity and specificity of the experimental test. SeC and SpC represent the sensitivity and specificity of the comparator test. is only rejected if both and are rejected simultaneously. Superiority of the experimental test regarding sensitivity and specificity against the comparator test can be concluded from point estimates and p-values or confidence intervals. Sensitivity and specificity represent the success probabilities of a binomial distribution which follow an asymptotic normality in the case of a large sample [12]. For the analysis based on confidence intervals, we propose to use approximate 100 · (1 − α)% confidence intervals for the difference of two proportions.

Conventional sample size calculation

To motivate the advantage of the optimal sample size calculation, we show the problems related to the procedure of the conventional sample size calculation in a confirmatory diagnostic study in the context of the paired design. The conventional sample size calculation consists of three steps: calculate the needed number of diseased and non-diseased individuals, refer these numbers to the prevalence to receive numbers needed to show sensitivity and specificity and, choose the maximum to determine the final sample size [13-15]. We now perform these three steps for a paired diagnostic study mentioned in McCray et al. [11]. The example study compares the experimental combination of Positron Emission Tomography (PET) and computed tomography (CT) against CT alone to diagnose pancreatic cancer. The goal is to show superiority of the experimental test against the comparator test. The biopsy defines the true disease status. Table 1 shows the assumptions for sample size calculation used in this example. The disease prevalence π represents the proportion of diseased individuals on all individuals. Parameters ψD and ψND denote the proportion of discordant test results in the diseased and non-diseased population, hence those proportions in which both diagnostic tests lead to different test results. The conventional approach plans the sample size for each endpoint with a power of 90% which theoretically leads in the product to an overall target power of approximately 80%. The significance level α is set to 5% per endpoint. The 1 − α/2 and 1 − β quantile of the standard normal distribution is denoted by z1 − and z1 − . The individual steps are as follows:

Table 1

Assumptions of the paired diagnostic accuracy trial for the comparison of the experimental Positron Emission Tomography (PET) combined with the computed tomography (CT) against the comparator test PET

General input parameters:Significance level per endpoint: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\alpha} =\mathbf{0.05}\ \left(\mathsf{two}-\mathsf{sided}\right),$$\end{document}α=0.05two-sided, Overall Power: Power_overall = 1 − β_overall = 0.8Power per endpoint: Power_Se = Power_Sp = 1 − β_Se = 1 − β_Sp = 0.9
Prevalence: π = 0.47	Comparator test (CT)	Experimental test (PET/CT)	Proportion of discordant test results
Diseased population	Se_C = 0.81	Se_E = 0.90	ψ _D = 0.09
Non-diseased population	Sp_C = 0.66	Sp_E = 0.80	ψ _ND = 0.14

Sample size of diseased individuals based on the formula of Miettinen et al. [16]: Sample size of non-diseased individuals: Total sample size including at least n Se diseased individuals: Total sample size including at least n Sp non-diseased individuals: Assumptions of the paired diagnostic accuracy trial for the comparison of the experimental Positron Emission Tomography (PET) combined with the computed tomography (CT) against the comparator test PET The study recruits more individuals than would be necessary to show the specificity because the sensitivity determines the final sample size in this scenario. This can result in an overpowered study. If the prevalence was smaller, the difference between N Se and N Sp would be even larger. Vice versa, if the prevalence was larger, N Sp would determine the final sample size. These discrepancies between the sample sizes of both endpoints can result in an overpowered study. To face this problem, we propose the optimal sample size calculation explained in the next section.

Optimal sample size calculation

At first, we present the general idea of the optimal sample size calculation. Then, we expand the optimal sample size calculation in the single-test design developed by Stark et al. [8] to an unpaired and paired study. Furthermore, we provide formulas testing for superiority regarding both endpoints in the unpaired and paired design. In additional materials, we show hypotheses and sample size formulas testing for non-inferiority or combinations of superiority and non-inferiority [see Additional file 1]. Furthermore, we offer R-Code for the optimal sample size calculation considering superiority in both endpoints in additional materials [see Additional file 2]. The general idea behind the optimal sample size calculation consists of the individual splitting of the overall power (Poweroverall) to both endpoints, so that N Se and N Sp are equal. In this case, we won’t need to select a maximum from both sample sizes. Consequently, the final sample size is the smallest representative sample which allows to reach the desired overall power. We calculate the final sample size with the following equation in which the symbol “ ” denotes that terms on both sides must be equal: Under the condition: In the following subsections, we plug the condition into the sample size calculation; noting that the resulting equations cannot be solved analytically respect to βSe.

Unpaired design

In the unpaired design, the optimal sample size calculation uses the formula for the comparison of two independent proportions following Zhou et al. [1]: where V0(SeC − SeE) and V(SeC − SeE) represent the variance of the difference between SeC and SeE under the null and alternative hypothesis, respectively. In the unpaired design, the variance V(SeC − SeE) is defined as [1]: The variance V(SpC − SpE) is calculated in analogy. Although the sample size formula in Eq. (7) fits to the Wald confidence interval for the difference of two independent proportions, we propose to analyse the unpaired design with the two-sided 1- α Score confidence interval for the difference of two independent proportions [17]. The coverage probability of the Score confidence interval is closer to the nominal level compared to the Wald confidence interval [18-20].

Paired design

In the paired design, the optimal sample size is based on the formula of Miettinen et al. [16]: with ψ D as the proportion of discordant test results in the diseased sample, which varies between [16, 21]: The interval of the proportion of discordant test results in the non-diseased sample ψ ND is calculated in analogy by considering SpC and SpE. For two different proportions of discordant test results in the diseased () and non-diseased () population, the total sample size N(ψ D, ψ ND) in Eq. (9) is monotone increasing: In analogy to the unpaired design, we propose to analyse the paired design with the two-sided 1- α Tango’s asymptotic score confidence interval for the difference of two matched proportions [22, 23]. We recommend this based on the reason given above. Furthermore, the Wald confidence is not range preserving [24].

Application of the optimal sample size calculation in the paired design

We apply the optimal sample size approach to the example study introduced in Table 1 and compare the results to those of the conventional approach. For this purpose, we simulate, based on 10,000 simulation runs, the empirical power of both approaches for a varying prevalence π and calculate the sample size. Figure 2 shows the results. In most cases, the conventional approach is highly overpowered due to the choice of the maximum sample size of both endpoints in the third step. If the prevalence is in the range between 0.5 and 0.75, the empirical power will be closer to the target power of 80%. The empirical power will be the closest to the target power, if the prevalence equals 0.6 as the discrepancy between N Se and N Sp is the smallest.

Fig. 2

Empirical power and sample size of the conventional and optimal sample size calculation. Simulations are based on the example study given in Table 1 with a varying prevalence π. The figure considering the sample size contains an enlarged image section so that the differences between both approaches are highlighted The optimal approach splits the overall power to both endpoints depending on the prevalence, so that the product of the empirical power of both endpoints comes close to the target power of 80%. Considering the sample size, the optimal approach will lead to a smaller sample size than the conventional approach if the prevalence is unbalanced. Figure 2 contains an enlarged image section of the sample size so that the differences between both approaches are highlighted.

Blinded sample size re-estimation

The procedure of a blinded sample size adjustment based on the re-estimation of nuisance parameters basically follows five phases named by Stark et al. [8]. In Fig. 3, these five steps are explained in context of the unpaired and paired study design. The nuisance parameters re-estimated during the study are the prevalence and additionally proportions of discordant test results in the paired design. The main difference between the adaptive designs in the unpaired and paired study design consists of the sample size for the interim analysis. In the unpaired design, the prevalence is estimated based on 50% of the initially calculated sample size. In the paired design, both, the initial sample size and the sample size for the interim analysis equal the minimal sample size [11]. The minimal sample size is received with the minimal possible proportion of discordant test results in the diseased () and non-diseased population (). Assumptions about the sensitivity and the specificity of the comparator and experimental test determine the minimal possible proportion of discordant test results. Following Eq. (10), the minimal proportion of discordant test results are calculated with:

Fig. 3

Procedure of the blinded adaptive design in an unpaired and paired diagnostic trial

Procedure of the blinded adaptive design in an unpaired and paired diagnostic trial Furthermore, the calculation of the minimal sample size requires assumptions about the prevalence. During interim analysis, the prevalence is estimated by the maximum likelihood estimator of a binomial proportion [25]: The number of diseased individuals involved in the interim analysis is represented by n D, and the sample size used for interim analysis is denoted by n. In analogy, the proportion of discordant test results is estimated by the maximum likelihood estimator of a multinomial distribution [26]: Table 2 shows the parameters needed to re-estimate the proportions of discordant test results.

Table 2

Results in a paired diagnostic study

Diseased n _D
		Comparator Test
		True Positive (TP _C)	False Negative (FN_C)
Experimental Test	True Positive (TP_E)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{D}}_{11}}$$\end{document}nD11	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{D}}_{10}}$$\end{document}nD10
Experimental Test	False Negative (FN_E)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{D}}_{01}}$$\end{document}nD01	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{D}}_{00}}$$\end{document}nD00
Non-diseased n_ND
		Comparator Test
		False Positive (FP_C)	True Negative (TN_C)
Experimental Test	False Positive (FP_E)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{ND}}_{11}}$$\end{document}nND11	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{ND}}_{10}}$$\end{document}nND10
Experimental Test	True Negative (TN_E)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{ND}}_{01}}$$\end{document}nND01	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{{\mathrm{ND}}_{00}}$$\end{document}nND00

Results in a paired diagnostic study True Positive (TP ) False Negative (FN) Experimental Test True Positive (TP) False Negative (FN) False Positive (FP) True Negative (TN) Experimental Test False Positive (FP) True Negative (TN) The estimation of nuisance parameters represents a blinded adaptive design because the sensitivity and the specificity of the experimental test are not revealed. Hence, the type I error rate will not be inflated by definition.

Results

Application of the blinded sample size re-estimation in the example study

This section serves for illustration of the blinded sample size re-estimation in the paired study design. For this purpose, we compare the approach of McCray et al. [11] to the adaptive design procedure described in this article by taking up the example of a paired diagnostic accuracy study already introduced in Table 1. The main progress of our new approach compared to McCray et al. [11] is to implement the optimal sample size calculation. We reveal the advantage of the optimal sample size calculation in this context again. Table 3 compares the theoretical aspects and the results of both adaptive design procedures. They differ in the definition of endpoints, hypothesis and in the way the sample size calculation is performed. McCray et al. [11] work with the quotient of sensitivities and the quotient of specificities of both diagnostic tests as endpoints. They use sample size formulas which rely on the true-positive-positive rate (TPPR) and true-negative-negative-rate (TNNR) [27]. TPPR denotes the proportion of test results in which both, the comparator test and the experimental test correctly diagnose a diseased individual. Vice versa, TNNR represents the proportion of test results in which both tests correctly return a negative test result. For initial sample size calculation, TPPRmax and TNNRmax are used, which represent the maximal possible TPPR and TNNR, respectively.

Table 3

Comparison of the blinded adaptive design procedure with McCray et al. [11]

		McCray et al. (2017)	Our approach
General information	Endpoint	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{S}{\mathrm{e}}_{\mathrm{E}}}{\mathrm{S}{\mathrm{e}}_{\mathrm{C}}}$$\end{document}SeESeC and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\mathrm{S}{\mathrm{p}}_{\mathrm{E}}}{\mathrm{S}{\mathrm{p}}_{\mathrm{C}}}$$\end{document}SpESpC	Se_E − Se_C and Sp_E − Sp_C
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{H}}_{{\mathbf{0}}_{\mathbf{global}}}$$\end{document}H0global	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}}_{0_{\mathrm{S}\mathrm{e}}}:\frac{\mathrm{S}{\mathrm{e}}_{\mathrm{E}}}{{\mathrm{S}\mathrm{e}}_{\mathrm{C}}}=1\cup$$\end{document}H0Se:SeESeC=1∪ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}}_{0_{\mathrm{S}\mathrm{p}}}:\frac{\mathrm{S}{\mathrm{p}}_{\mathrm{E}}}{{\mathrm{S}\mathrm{p}}_{\mathrm{C}}}=1$$\end{document}H0Sp:SpESpC=1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}}_{0_{\mathrm{Se}}}:\mathrm{S}{\mathrm{e}}_{\mathrm{E}}-{\mathrm{Se}}_{\mathrm{C}}=0\cup$$\end{document}H0Se:SeE-SeC=0∪ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}}_{0_{\mathrm{Sp}}}:\mathrm{S}{\mathrm{p}}_{\mathrm{E}}-{\mathrm{Sp}}_{\mathrm{C}}=0$$\end{document}H0Sp:SpE-SpC=0
	Sample size calculation	Conventional approach α per endpoint: 0.05 (two-sided) Power per endpoint: 0.8	Optimal approach α per endpoint: 0.05 (two-sided) Overall power: 0.8
	Parameter of dependency between both tests	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{TPPR}=\frac{{n_{\mathrm{D}}}_{11}}{n_{\mathrm{D}}}$$\end{document}TPPR=nD11nD \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{TNNR}=\frac{{n_{\mathrm{ND}}}_{00}}{n_{\mathrm{ND}}}$$\end{document}TNNR=nND00nND	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{\mathrm{D}}=\frac{{n_{\mathrm{D}}}_{10}+{n_{\mathrm{D}}}_{01}}{n_{\mathrm{D}}}$$\end{document}ψD=nD10+nD01nD \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{\mathrm{ND}}=\frac{{n_{\mathrm{ND}}}_{10}+{n_{\mathrm{ND}}}_{01}}{n_{\mathrm{ND}}}$$\end{document}ψND=nND10+nND01nND
Initial sample size calculation	Size of internal pilot study	TPPR_max and TNNR_max correspond to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{D}}_{\mathrm{min}}}$$\end{document}ψDmin and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{ND}}_{\mathrm{min}}}$$\end{document}ψNDmin
	Parameter of dependency between both tests for initial sample size calculation	TPPR_max = Se_C = 0.81 TNNR_max = Sp_C = 0.66	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{D}}_{\mathrm{min}}}=\left\|\mathrm{S}{\mathrm{e}}_{\mathrm{C}}-\mathrm{S}{\mathrm{e}}_{\mathrm{E}}\right\|=0.09$$\end{document}ψDmin=SeC-SeE=0.09 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{ND}}_{\mathrm{min}}}=\left\|\mathrm{S}{\mathrm{p}}_{\mathrm{C}}-\mathrm{S}{\mathrm{p}}_{\mathrm{E}}\right\|=0.14$$\end{document}ψNDmin=SpC-SpE=0.14
	Initial sample size, size of internal pilot study	186	133
Sample size re-estimation	Estimation of nuisance parameters	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi}=0.44$$\end{document}π^=0.44 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{TPPR}=0.80\mid$$\end{document}TPPR^=0.80∣ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{TNNR}=0.66$$\end{document}TNNR^=0.66	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi}=0.44$$\end{document}π^=0.44 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\psi}}_{\mathrm{D}}=0.11\mid$$\end{document}ψ^D=0.11∣ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\psi}}_{\mathrm{ND}}=0.14$$\end{document}ψ^ND=0.14
Sample size re-estimation	Re-estimated sample size	242	200

Comparison of the blinded adaptive design procedure with McCray et al. [11] Conventional approach α per endpoint: 0.05 (two-sided) Power per endpoint: 0.8 Optimal approach α per endpoint: 0.05 (two-sided) Overall power: 0.8 Initial sample size calculation TPPRmax = SeC = 0.81 TNNRmax = SpC = 0.66 Sample size re-estimation McCray et al. [11] perform the sample size calculation based on the conventional three steps by planning the sample size calculation with a power of 80% per endpoint. This leads to a theoretical overall power of 64%. In contrast to McCray et al. [11], our approach uses the optimal sample size calculation. It is based on sample size formulas considering the difference of sensitivities and the proportion of discordant test results in the diseased population or the difference of specificities of both tests and the proportion of discordant test results in the non-diseased population, respectively [1]. In contrast to McCray et al. [11], we choose the differences as endpoint measurement because the guideline on clinical evaluation of diagnostic agents suggests this [4]. Furthermore, we perform the optimal sample size calculation to reach an overall power of 80%. Table 3 shows the initial sample size, the sample size for interim analysis and the re-estimated sample size of both adaptive design procedures. Due to the optimal approach, sample sizes resulting from our adaptive design are lower than those of McCray et al. [11]. The optimal sample size calculation avoids that one of both co-primary endpoints is overpowered which leads to smaller sample sizes. The difference between both approaches regarding sample sizes will be even more extensive if the prevalence is unbalanced. A figure in additional materials, which depicts the simulated empirical overall power based on 10,000 simulations runs and the calculated sample size, illustrates this difference between both approaches for the initial sample size calculation based on and by varying π [see Additional file 3]. This figure reveals that the approach of McCray et al [11]. is highly overpowered although they plan with a power of 80% per endpoint. This theoretically leads to a theoretical overall power of 64%. In this example, the dependence between both diagnostic tests is almost maximal because ψ D and ψ ND are almost minimal. In this case, the underlying assumptions of sample size formulas and confidence intervals are not valid [11]. Hence, the approach of McCray et al. [11] is highly overpowered. In contrast, the optimal sample size calculation enables to reach an overall power of 80% independent of the prevalence.

Simulation study

We perform a simulation study to evaluate type I error rates, statistical power, sample sizes and bias of the adaptive design based on re-estimated nuisance parameters in the unpaired and paired study design. We compare results of the adaptive design to those of the fixed design which gets by without re-estimation of the sample size. Table 4 shows the simulated scenarios testing for superiority in both endpoints. Based on the example of a paired diagnostic accuracy study used by McCray et al. [11], we choose one initial scenario. Starting from the initial scenario, we vary one parameter in each further scenario. That results in 15 scenarios in the unpaired design and 19 scenarios in the paired design, each simulated with 10,000 simulation runs. In analogy to these scenarios, we perform simulations testing for non-inferiority in both endpoints, or the combinations of superiority and non-inferiority, respectively. In this section, we focus on the results of those scenarios testing for superiority in both endpoints because the other results are comparable to them. For completeness, we make the remaining simulated scenarios and their results available in the online supplement materials [see Additional files 4 and 5].

Table 4

Simulated scenarios in the unpaired and paired study design testing for superiority in both endpoints. The proportion of discordant test results is only relevant in the paired design

	10,000 simulation runs per scenario
Nominal significance level α per endpoint	0.05 (two-sided)
Nominal overall target power	0.8
	Initial scenario	Variation of initial scenario
Sensitivity comparator test Se_C	0.8	0.6, 0.7
Specificity comparator test Sp_C	0.7	0.6, 0.8
True prevalence π _true	0.2	0.4, 0.6, 0.8
Assumed prevalence π _ass.	π_true + 0.1	π_true - 0.1 π_true + 0.2 π_true + 0.3
True discordant results diseased population \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{D}}_{\mathrm{true}}}$$\end{document}ψDtrue	0.11 (0.15, if: Se_E − Se_C = 0.15)	0.18, 0.26
Assumed discordant results diseased population \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{D}}_{\mathrm{ass.}}}$$\end{document}ψDass.	0.18
True discordant results non-diseased population \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{ND}}_{\mathrm{true}}}$$\end{document}ψNDtrue	0.14 (0.15, if: Sp_E − Sp_C = 0.15)	0.24, 0.38
Assumed discordant results non-diseased population \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\psi}_{{\mathrm{ND}}_{\mathrm{ass.}}}$$\end{document}ψNDass.	0.24
Sensitivity experimental test Se_E	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{=}\mathrm{S}{\mathrm{e}}_{\mathrm{C}}$$\end{document}=^SeC
Specificity experimental test Sp_E	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{=}\mathrm{S}{\mathrm{p}}_{\mathrm{C}}$$\end{document}=^SpC
Sensitivity experimental test Se_E	Se_C + 0.1	Se_C + 0.05 Se_C + 0.15
Specificity experimental test Sp_E	Sp_C + 0.1	Sp_C + 0.05 Sp_C + 0.15

Simulated scenarios in the unpaired and paired study design testing for superiority in both endpoints. The proportion of discordant test results is only relevant in the paired design πtrue - 0.1 πtrue + 0.2 πtrue + 0.3 True discordant results diseased population 0.11 (0.15, if: SeE − SeC = 0.15) Assumed discordant results diseased population True discordant results non-diseased population 0.14 (0.15, if: SpE − SpC = 0.15) Assumed discordant results non-diseased population SeC + 0.05 SeC + 0.15 SpC + 0.05 SpC + 0.15 Table 5 shows distributions involved in the data generation mechanism. We use the statistical software R version 4.0.5 to perform the simulations with the default random number generator Mersenne-Twister, but with the own initialization methods of R [28, 29].

Table 5

	Unpaired design	Paired design
Diseased individuals (n_D) according to reference standard	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{D_E}\;\sim\;Bin(k = N, p = \pi_\mathrm{true})$$\end{document}nDE∼Bin(k=N,p=πtrue) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{D_C}\,\sim\,Bin(k = N, p = \pi_\mathrm{true})$$\end{document}nDC∼Bin(k=N,p=πtrue)	n_D~ Bin(k = N, p = π_true)
True Positive Results (TP)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TP_{E}\,\sim Bin(k = n_{D_E}, p = Se_{E})$$\end{document}TPE∼Bin(k=nDE,p=SeE) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TP_{C}\,\sim Bin(k = n_{D_C}, p = Se_{C})$$\end{document}TPC∼Bin(k=nDC,p=SeC)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(TP_{E}, TP_{C})\,\sim MVBin(k_\mathrm{E} = n_{D_E}, k_\mathrm{C} = n_{D_C},$$\end{document}(TPE,TPC)∼MVBin(kE=nDE,kC=nDC, p_E = Se_E, p_C = Se_C, ρ = TPPR)
True Negative Results (TN)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{E}\,\sim Bin(k = N - n_{D_E}, p = Sp_{E})$$\end{document}TNE∼Bin(k=N-nDE,p=SpE) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{C}\,\sim Bin(k = N - n_{D_C}, p = Sp_{C})$$\end{document}TNC∼Bin(k=N-nDC,p=SpC)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(TN_{E}, TN_{C})\,\sim MVBin(k_\mathrm{E} = N - n_{D_E}, k_\mathrm{C} = N - n_{D_C},$$\end{document}(TNE,TNC)∼MVBin(kE=N-nDE,kC=N-nDC, p_E = Sp_E, p_C = Sp_C, ρ = TNNR)

Unpaired design

Paired design

Diseased individuals (n_D) according to

reference standard

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n}_{D_E}\;\sim\;Bin(k = N, p = \pi_\mathrm{true})$$\end{document}nDE∼Bin(k=N,p=πtrue)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{D_C}\,\sim\,Bin(k = N, p = \pi_\mathrm{true})$$\end{document}nDC∼Bin(k=N,p=πtrue)

n_D~ Bin(k = N, p = π_true)

True Positive

Results (TP)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TP_{E}\,\sim Bin(k = n_{D_E}, p = Se_{E})$$\end{document}TPE∼Bin(k=nDE,p=SeE)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TP_{C}\,\sim Bin(k = n_{D_C}, p = Se_{C})$$\end{document}TPC∼Bin(k=nDC,p=SeC)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(TP_{E}, TP_{C})\,\sim MVBin(k_\mathrm{E} = n_{D_E}, k_\mathrm{C} = n_{D_C},$$\end{document}(TPE,TPC)∼MVBin(kE=nDE,kC=nDC,

p_E = Se_E, p_C = Se_C, ρ = TPPR)

True Negative

Results (TN)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{E}\,\sim Bin(k = N - n_{D_E}, p = Sp_{E})$$\end{document}TNE∼Bin(k=N-nDE,p=SpE)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{C}\,\sim Bin(k = N - n_{D_C}, p = Sp_{C})$$\end{document}TNC∼Bin(k=N-nDC,p=SpC)

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(TN_{E}, TN_{C})\,\sim MVBin(k_\mathrm{E} = N - n_{D_E}, k_\mathrm{C} = N - n_{D_C},$$\end{document}(TNE,TNC)∼MVBin(kE=N-nDE,kC=N-nDC,

p_E = Sp_E, p_C = Sp_C, ρ = TNNR)

Description of the data generation mechanism of the unpaired and paired design in the simulation study (Bin: binomial distribution, MVBin: multivariate binomial distribution, k: number of trials, p: success probability, ρ: dependence between both tests, N: total sample size, n: diseased individuals in experimental group, n: diseased individuals in comparator group) Diseased individuals (nD) according to reference standard True Positive Results (TP) pE = Se, pC = Se, ρ = TPPR) True Negative Results (TN) pE = Sp, pC = Sp, ρ = TNNR) Figure 4 shows type I error rates with according Monte Carlo errors due to simulations (1.96 x SE = 0.00098), power and true sample sizes (Ntrue) with root-mean-squared-error of the re-estimated sample size (RMSE) under H1 and additionally the mean of the re-estimated samples sizes per scenario (Nmean) of those scenarios containing the minimal, medium and maximal in the paired study design. The depicted results offer some characteristics which can be generalized to other scenarios in the paired and unpaired design. Referring to Fig. A, one important aspect is that scenarios preserve type I error rates. In analogy to the overall power of the Intersection-Union Test explained in section 2, global type I error rates result as the product of the individual type I error rates of each endpoint (0.05 two-sided each). Due to the analysis with the score confidence interval in this scenario with small prevalence, results are conservative [24].

Fig. 4

Global type I error, overall power and sample sizes of the fixed and adaptive paired design. Simulations are based on the initial scenario and a variation of the true proportion of discordant test results in the diseased population (). In Fig. A, black dotted lines mark the interval of Monte Carlo error due to simulations. In Fig. B, the target power equals 0.8 Considering Fig. B and C, the overall power of the fixed design decreases with increasing . The larger is, the smaller the dependence between both tests is. The smaller the dependence between both tests is, the larger Ntrue becomes. The discrepancy between Ntrue and Nmean in the fixed design increases, if increases. If is medium, the assumption about this parameter in the fixed design equals the true parameter. But the assumption about the prevalence is larger than the prevalence is in truth. Therefore, Nmean is smaller than Ntrue and the overall power is smaller than the target power of 80%. The adaptive design compensates wrong assumptions about nuisance parameters. The discrepancy between Ntrue and Nmean of the adaptive design is small. Hence, the overall power comes close to the target power. The adaptive design re-estimates , and πtrue without any relevant bias. In those scenarios based on the initial prevalence of 20%, relative bias of is little higher than relative bias of . Due to this prevalence, there is only a small number of diseased patients in the sample which can be consulted for the re-estimation of . Supplement materials show simulations results of the bias. Figure 5 compares the overall power depending on the true prevalence πtrue in the unpaired and paired design. If πtrue is low, the power in both fixed designs is the lowest. The power becomes larger with increasing prevalence. In the depicted scenarios, the assumed prevalence is larger than the true prevalence. A low true prevalence represents a small number of diseased individuals. In this case, the number of diseased individuals is the determining aspect for sample size calculation to show the sensitivity. In the fixed unpaired design, a higher number of diseased individuals is wrongly assumed which results in a too small sample size and power. Vice versa, a high true prevalence leads to a too large sample size and power. The number of non-diseased individuals now determines the sample size to show the specificity. Due to the wrongly assumed prevalence, a too small number of non-diseased individuals is expected. The sample size is calculated too large. The fixed paired design is highly overpowered, independent of πtrue. Both proportions of discordant test results are assumed higher than in truth. The sample size is calculated too large.

Fig. 5

Overall power of the fixed and adaptive design in an unpaired and paired diagnostic study. Simulations are based on the initial scenario and a variation of the true prevalence (πtrue). The target power equals 0.8 In contrast to the fixed designs, both adaptive designs reveal a power closer to the target power of 80%. If πtrue equals 80%, the overall power of the adaptive paired design stands out. In this scenario, the proportion of non-diseased individuals is initially assumed smaller than in truth. Hence, the sample size used for the re-estimation of nuisance parameters is already larger than the true sample size. The overall power is higher compared to scenarios with a lower πtrue.

Discussion

In this article, we present an approach for blinded sample size re-estimation in a comparative diagnostic accuracy study. This allows the sample size to be revised for incorrect assumptions during the course of the study, so that the study is neither over- nor underpowered. We use an example and simulation study to show that the approach does not inflate type I error rates, reach the target power and re-estimate nuisance parameters without any relevant bias. One strength of our simulation study is that it is based on a realistic initial scenario. Therefore, the simulation study covers the results of realistic as well as of extreme parameter combinations. But of course the simulation study does not depict all possible parameter combinations. One general weakness of our proposed approach is that the sample size calculation and the confidence intervals used for evaluation are not based on the same formulas. McCray et al. [11] use a sample size calculation and an evaluation method which belong together. Due to different endpoints in the approach of McCray et al. [11] and our approach, we don’t compare both approaches within an extensive simulation study. However, we compare both approaches within the example study. We show that our approach requires a smaller sample size and comes closer to the target power than the approach of McCray et al. [11], if the dependence between both diagnostic tests is maximal. In contrast to our work, McCray et al. [11] do not extend their approach to show non-inferiority or a combination of superiority and non-inferiority in both diagnostic tests. We recommend to apply blinded adaptive designs in comparative diagnostic accuracy studies, especially if the nuisance parameters are extremely small or large. The reason for this is that a blinded adaptive design can correct extremely small or large sample sizes based on wrong assumptions. Our work creates some space for further research. One important unanswered question asks about the consequences of the re-estimation of the prevalence on the blinding if predictive values are chosen as co-primary endpoints. Both, the positive and negative predictive value depend on the prevalence. Hence, the analysis is not blinded in the strong sense. Furthermore, it is of interest to develop unblinded adaptive designs in comparative diagnostic accuracy studies to allow for early stopping due to futility or efficacy [9].

Conclusions

A confirmatory diagnostic accuracy study can either be performed as a single-test or a comparative study design. Comparative study designs are distinguished between an unpaired and paired study design. Stark et al. [8] introduce the optimal sample size calculation and the blinded adaptive design to re-estimate the sample size in the single-test design. This approach avoids an overpowered diagnostic accuracy study by calculating the sample size for two co-primary endpoints sensitivity and specificity in dependence of the prevalence of the disease. In this article, we transfer the optimal sample size calculation to both comparative study designs. Furthermore, we propose blinded adaptive designs for an unpaired and paired diagnostic accuracy study. In the unpaired design, the adaptive design re-estimates the prevalence whereas, in the paired design, it additionally re-estimates the proportions of discordant test results. Subsequent to the re-estimation of these nuisance parameters, the sample size is re-calculated. Due to the blinded character of the adaptive designs, type I error rates are not inflated. Both approaches reach the target power and re-estimate nuisance parameters without any relevant bias. We recommend to apply the optimal sample size calculation and a blinded adaptive design in a confirmatory diagnostic accuracy trial. Both approaches support to calculate the necessary sample size to achieve the targeted power without much additional effort. Additional file 1. Formulas for the optimal sample size calculation. Additional file 2. R-Code for the optimal sample size calculation testing for superiority in both endpoints in the unpaired and paired design. Additional file 3 Figure containing the comparison of the optimal sample size calculation with the approach of McCray et al. [11]. Additional file 4. Simulation results of the blinded sample size re-estimation in the unpaired design. Additional file 5. Simulation results of the blinded sample size re-estimation in the paired design.

17 in total

1. Simple improved confidence intervals for comparing matched proportions.

Authors: Alan Agresti; Yongyi Min
Journal: Stat Med Date: 2005-03-15 Impact factor: 2.373

Review 2. Design, data monitoring, and analysis of clinical trials with co-primary endpoints: A review.

Authors: Toshimitsu Hamasaki; Scott R Evans; Koko Asakura
Journal: J Biopharm Stat Date: 2017-10-30 Impact factor: 1.051

3. Recommended tests and confidence intervals for paired binomial proportions.

Authors: Morten W Fagerland; Stian Lydersen; Petter Laake
Journal: Stat Med Date: 2014-03-20 Impact factor: 2.373

4. The matched pairs design in the case of all-or-none responses.

Authors: O S Miettinen
Journal: Biometrics Date: 1968-06 Impact factor: 2.571

5. Sample size for testing differences in proportions for the paired-sample design.

Authors: R J Connor
Journal: Biometrics Date: 1987-03 Impact factor: 2.571

6. Adaptive trial designs in diagnostic accuracy research.

Authors: Antonia Zapf; Maria Stark; Oke Gerke; Christoph Ehret; Norbert Benda; Patrick Bossuyt; Jon Deeks; Johannes Reitsma; Todd Alonzo; Tim Friede
Journal: Stat Med Date: 2019-11-27 Impact factor: 2.373

7. Sample size calculation and re-estimation based on the prevalence in a single-arm confirmatory diagnostic accuracy study.

Authors: Maria Stark; Antonia Zapf
Journal: Stat Methods Med Res Date: 2020-04-16 Impact factor: 3.021

Blinded sample size re-estimation in a comparative diagnostic accuracy study.

Background

Methods

Sample size calculation in a comparative diagnostic study

Conventional sample size calculation

Optimal sample size calculation

Unpaired design

Paired design

Application of the optimal sample size calculation in the paired design

Blinded sample size re-estimation

Results

Application of the blinded sample size re-estimation in the example study

Simulation study

Discussion

Conclusions

1. Simple improved confidence intervals for comparing matched proportions.

Review 2. Design, data monitoring, and analysis of clinical trials with co-primary endpoints: A review.

3. Recommended tests and confidence intervals for paired binomial proportions.

4. The matched pairs design in the case of all-or-none responses.

5. Sample size for testing differences in proportions for the paired-sample design.

6. Adaptive trial designs in diagnostic accuracy research.

7. Sample size calculation and re-estimation based on the prevalence in a single-arm confirmatory diagnostic accuracy study.

8. Comparative analysis of two rates.

9. Group sequential design for comparative diagnostic accuracy studies.

10. Sample size re-estimation in paired comparative diagnostic accuracy studies with a binary response.