Literature DB >> 31651877

A test for treatment effects in randomized controlled trials, harnessing the power of ultrahigh dimensional big data.

Wen-Chung Lee1, Jui-Hsiang Lin.   

Abstract

BACKGROUND: The randomized controlled trial (RCT) is the gold-standard research design in biomedicine. However, practical concerns often limit the sample size, n, the number of patients in a RCT. We aim to show that the power of a RCT can be increased by increasing p, the number of baseline covariates (sex, age, socio-demographic, genomic, and clinical profiles et al, of the patients) collected in the RCT (referred to as the 'dimension').
METHODS: The conventional test for treatment effects is based on testing the 'crude null' that the outcomes of the subjects are of no difference between the two arms of a RCT. We propose a 'high-dimensional test' which is based on testing the 'sharp null' that the experimental intervention has no treatment effect whatsoever, for patients of any covariate profile.
RESULTS: Using computer simulations, we show that the high-dimensional test can become very powerful in detecting treatment effects for very large p, but not so for small or moderate p. Using a real dataset, we demonstrate that the P value of the high-dimensional test decreases as the number of baseline covariates increases, though it is still not significant.
CONCLUSION: In this big-data era, pushing p of a RCT to the millions, billions, or even trillions may someday become feasible. And the high-dimensional test proposed in this study can become very powerful in detecting treatment effects.

Entities:  

Mesh:

Year:  2019        PMID: 31651877      PMCID: PMC6824789          DOI: 10.1097/MD.0000000000017630

Source DB:  PubMed          Journal:  Medicine (Baltimore)        ISSN: 0025-7974            Impact factor:   1.817


This paper presents a test for treatment effects in randomized controlled trials, which harnesses the power of ultrahigh dimensional big data. The proposed high-dimensional test increases the power of a RCT by increasing p, the number of baseline covariates (sex, age, socio-demographic, genomic, and clinical profiles et al, of the patients), rather than the usual n, the number of patients. The proposed high-dimensional test can become very powerful in detecting treatment effect for large p, but not so for small or moderate p.

Introduction

The randomized controlled trial (RCT) is the gold-standard research design in biomedicine and provides the most rigorous way of determining whether a cause-effect relation exists between treatment and outcome.[ Randomization (random allocation of patients to intervention groups) and double blinding (neither the patients or investigators being aware of the treatment assignments until the study is completed) are the hallmarks of RCTs. A carefully conducted RCT should be free from selection or confounding bias that otherwise plagues most observational studies.[ RCTs are, however, more costly and time-consuming than other studies. A realistic RCT is therefore often limited in sample size (n, the number of patients participating in the study) to no more than a few thousands patients. The power of a study is however an increasing function of n; an investigator content with a small n will likely get a non-significant result despite all the efforts he/she put into conducting the trial.[ We are therefore posed with a dilemma—to recruit or not to recruit more patients. We suggest new avenue for future RCTs. In this paper, we develop a “high-dimensional test” for treatment effects. We will show that the power of the test is an increasing function of p, the number of baseline covariates (sex, age, socio-demographic, genomic, and clinical profiles et al, of the patients) collected in a RCT (p is also referred to as the “dimension”, and hence the name of the test). We will show that the high-dimensional test can become very powerful in detecting treatment effects if p can be made very large. We will also use a real dataset to demonstrate the methodology.

Methods

High-dimensional test

In a typical RCT comparing an experimental intervention and a suitable control for a continuous or binary end point, let A denote the treatment assignment indicator (A = 1 for experimental intervention; A = 0 for control), Y, the outcome, and z, a vector of baseline covariates (with a dimension of p). We use the generic notation, f(·), to denote the (joint) probability density or mass function of a random variable (vector), where appropriate. The conventional test for treatment effects is based on testing the following ‘crude null’, That is, the outcomes of the subjects are of no difference between the 2 arms of a RCT. By contrast, the proposed high-dimensional test is based on testing the following “sharp null”, That is, the experimental intervention has no treatment affect whatsoever, for patients of any covariate profile. In practice, we can dichotomize Y into Y∗, such as ‘favorable’ (Y∗ = 1) and ‘unfavorable’ (Y∗ = 0) outcomes, based on some suitable criteria. (Y∗ may already be a binary variable, such as ‘survival’ (Y = 1) and ‘death’ (Y = 0). This case is then simply Y∗ = Y) Supplementary Note shows that alternatively, we can test the sharp null in a RCT, based on the following two equalities: and This alternative sharp-null formulation implies no difference in the baseline covariates between the two arms of a RCT, separately for those with favorable outcomes (3) and those with unfavorable outcomes (4). Assume that a RCT recruits a total of n (i = 1,…,n) subjects. The data collected consists of the treatment assignment indicator, A, the outcome (and the dichotomized outcome), Y (and ), and a total of p (j = 1,…,p) baseline covariates, Z, for i = 1,…,n. To test the crude null (1), one can use the usual two-sample test, where n1(n0) is the number of subjects receiving the experimental (control) intervention (n1 + n0 = n), and is an estimate of the variance of the outcome under the crude null. in (5) is distributed asymptotically as a chi-squared distribution with one degree of freedom under the crude null. The same can be done for the dichotomized outcome, . To test the sharp null using (3) and (4), we can construct a test statistic for the jth baseline covariate, where n11(n01) and n10(n00) are the numbers of subjects receiving the experimental (control) intervention and ultimately leading to, respectively, favorable and unfavorable outcomes (), and and are the estimates of the variances of the jth baseline covariate under the sharp null among subjects with, respectively, favorable and unfavorable outcomes. The first term to the right of the equality sign in (6) is a test statistic based on (3), and the second term, that based on (4). These 2 terms involve different sets of subjects and are independent of one another. Under the sharp null, in (6) is therefore distributed asymptotically as a chi-squared distribution with 2 degrees of freedom. Next, we sum up the statistics of all p baseline covariates as our high-dimensional test, The ordinary chi-square approximation may not apply for in (7) because the baseline covariates themselves may not be independent of one another. We, therefore, propose performing Monte-Carlo permutations for the sampling distribution of under the sharp null. To be precise, we fix z and shuffle (A, Y∗) among the study subjects (or vice versa). The permutation-based high-dimensional test is a distribution-free test, suitable for use with normal or non-normal data and in large or small RCTs.

Simulation study

We considered a small RCT with n = 50 and a large one with n = 250. Each patient is randomized either to the treatment or the control arm, with equal probability. The outcomes of the trials (survival or death) are recorded for each patient. The trials also collected p baseline covariates for each patient. We assume a potential-outcome model[ for a particular disease: the experimental treatment is beneficial to 15% of patients (they will live upon being given the treatment and will die otherwise), is harmful to 5% of patients (they will instead die upon being given the treatment but will live otherwise), and is of absolutely no effect on the rest (30% and 50% of patients are destined to live or die, respectively, regardless of the treatment given). We also consider a stochastic version of the model, in which those who will live or die as per the above deterministic model will succumb to the same fates, not absolutely but with a probability of 0.9. To check the validity of the high-dimensional test, we construct a sharp null of a deterministic potential-outcome model where no one is responsive to the treatment (assuming 40% patients are destined to live, and the other 60% will die, regardless of the treatment). We assume that the baseline covariates are normally distributed with a constant variance of one, but with slightly different means for subjects of different potential-outcome types. To be precise, the type-specific means are randomly sampled from a N(0,Δ2) normal distribution. In the simulation, we consider 3 scenarios for the association between the measured baseline covariates and the assumed potential-outcome types: (i) weak-to-moderate association (Δ2 = 0.03), (ii) weak association (Δ2 = 0.01), and (iii) ultra-weak association (Δ2 = 0.005). The baseline covariates are assumed to be independent of one another conditional on the potential-outcome types. We also considered the cases of weakly and strongly correlated covariates, where the correlation coefficients between the ith and the jth baseline covariates are assumed to be respectively. We simulate a hypothetical omniscient test to serve as an upper bound for what a real-world high-dimensional test can achieve. To be precise, an omniscient trial analyst having the knowledge regarding the potential-outcome types of all patients and puts this piece of information into the analysis; he/she creates four indicator variables, each indicating whether a subject belongs to a specific potential-outcome type, and then calculates a high-dimensional test treating these indicator variables as four “baseline covariates”. The “operating characteristic” (OC) of a test is its statistical power averaged over a uniformly distributed α-level between 0 and 1. The OC is a value between 0.5 (no power at all) and 1 (highest power possible). It can be converted to a power at a specified α-level, if the test statistic is normally distributed: is the cumulative distribution function, and Z, the x’th quantile of the standard normal distribution. In the simulation study, the OC is estimated as the proportion of the simulations that result in a test statistic larger than the same statistic under a random permutation of the data (as described before). If a test statistic happens to be equal to its permuted counterpart, a 0.5 count is tallied. A total of 1000 simulations were performed for each sharp-alternative scenario. To facilitate comparison, we estimated the OCs of the above omniscient test and the traditional test (testing the crude null) using the same simulation-permutation scheme as we used for the high-dimensional test. For each sharp-null scenario, we performed a total of 10,000 simulations to estimate the OC and the type I error rate at α = 0.05 (with 99 permutations to derive the null sampling distribution in each round of the simulation).

Real data analysis

We re-analyzed Gene Expression Omnibus dataset (GSE118657) to illustrate the methodology.[ The dataset is a Phase II randomized controlled trial assessing the effect of lactoferrin on critically ill patients undergoing mechanical ventilation (a total of 61 patients, 32 patients in the lactoferrin group, and the remaining, the placebo group). Gene expressions with a total of 49,495 genes were measured at the first day of admission for each patient. The proposed high-dimensional test was used to test the effect of lactoferrin treatment using all gene expressions as the baseline covariates (p = 49,495). We also examined the effects of using reduced numbers of genes (p = 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, respectively) randomly sampled from the total 49,495 genes (100 random samples were taken and the results were averaged for each scenario). A total of 9999 permutations were performed to derive the null sampling distribution for the high-dimensional test.

Ethical review

This paper is a methodological study (computer simulation study) and does not involve the enrollment of patients. The real data used in this paper is from public domain. Ethical approval is not necessary.

Results

Figure 1 presents the results when the outcomes follow the assumed deterministic potential-outcome model. For a small RCT (n = 50), the traditional test (testing the crude null) has a very low OC of 0.57, whereas the omniscient test can have a very high OC of 0.93. When the sample size increases to n = 250, the performance of the traditional test improves, though not by very much (OC = 0.78), whereas the omniscient test now functions impeccably (OC = 1.00).
Figure 1

Operating characteristics of the traditional test (dotted horizontal lines), the high-dimensional test (left solid curves: weak-to-moderate covariates; middle solid curves: weak covariates; right solid curves: ultra-weak covariates), and the omniscient test (dash horizontal lines), under a deterministic potential-outcome model.

Operating characteristics of the traditional test (dotted horizontal lines), the high-dimensional test (left solid curves: weak-to-moderate covariates; middle solid curves: weak covariates; right solid curves: ultra-weak covariates), and the omniscient test (dash horizontal lines), under a deterministic potential-outcome model. In a real-world RCT, the potential-outcome types of the patients are, of course, unknown. However, we found that the performance of the hypothetical omniscient test can be replicated using a real-world high-dimensional test (Fig. 1). With a large enough p (more than 10 weak-to-moderate covariates, more than 100 weak covariates, or more than 1000 ultra-weak covariates), the high-dimensional test outperforms the traditional test. For a large RCT (such as when n = 250) and with a fairly large p (such as when p > 104), the high-dimensional test can also become impeccable (OC→1). The high-dimensional test is, as it should be, bounded above by the omniscient test in terms of its OC, no matter how strong the association is between the covariates used and the potential-outcome types, and no matter how many there are (Supplementary Fig. 1). Under the sharp null, the high-dimensional test has an OC close to 0.5 (Table 1) and a type I error rate close to the nominal α level of 0.05 for all scenarios studied (Table 2).
Table 1

Operating characteristics of the high-dimensional test under the sharp null.

Table 2

Type I error rates at α = 0.05 of the high-dimensional test under the sharp null.

Operating characteristics of the high-dimensional test under the sharp null. Type I error rates at α = 0.05 of the high-dimensional test under the sharp null. Figure 2 presents the results when the outcomes follow the stochastic potential-outcome model. Again, we see that the traditional test performs very poorly (OC = 0.55 when n = 50; OC = 0.67 when n = 250). With the stochasticity introduced, a perfect knowledge of the potential-outcome types no longer foretells a subject's fate exactly (only with an accuracy rate of 0.9 for the assumed model). Yet, the omniscient test still performs remarkably better than the traditional test in a small trial (OC = 0.84 when n = 50), and can even become impeccable in a large RCT (OC = 1.00 when n = 250).
Figure 2

Operating characteristics of the traditional test (dotted horizontal lines), the high-dimensional test (left solid curves: weak-to-moderate covariates; middle solid curves: weak covariates; right solid curves: ultra-weak covariates), and the omniscient test (dash horizontal lines), under a stochastic potential-outcome model.

Operating characteristics of the traditional test (dotted horizontal lines), the high-dimensional test (left solid curves: weak-to-moderate covariates; middle solid curves: weak covariates; right solid curves: ultra-weak covariates), and the omniscient test (dash horizontal lines), under a stochastic potential-outcome model. Again, the (real-world) high-dimensional test outperforms the traditional test with a large enough p (Fig. 2). It can also become impeccable in a large RCT (n = 250) with p > 106, or with a smaller p if the covariates used are more strongly associated with the potential-outcome types (Supplementary Fig. 2). Table 3 compares the OCs of the high-dimensional test for independent, weakly correlated, and strongly correlated, baseline covariates. With the same number of baseline covariates, the operating characteristic is lower if the baseline covariates are correlated with one another. To make up for the power loss in using correlated covariates, one can include more covariates in the high-dimensional test. For all scenarios studied, OC increases as p increases.
Table 3

Operating characteristics of the high-dimensional test for independent, weakly correlated, and strongly correlated, baseline covariates (n = 250; strength of the covariates: weak to moderate).

Operating characteristics of the high-dimensional test for independent, weakly correlated, and strongly correlated, baseline covariates (n = 250; strength of the covariates: weak to moderate). We also performed additional simulations for more complexly distributed baseline covariates (non-normal covariates, a mixed panel of binary and continuous variables, and a mixed panel of signals and noises, see Supplementary Table), and for a patient population with a different potential-outcome-type distribution from that assumed in this study (including ‘monotonicity’ scenarios where the experimental treatment can do only good and no harm[). The basic conclusions are the same though some scenarios may call for a larger p to achieve the same OC as in this paper. However, the high-dimensional test has no power whatsoever to test the sharp null if none of the baseline covariate collected is a signal, or if the signal-to-noise ratio tends to zero as p tends to infinity. The high-dimensional test is also ineffective if the treatment effect is homogeneous across covariate profiles [e.g., all patients are of the same stochastic potential-outcome type: they all have the same survival probabilities of, say, 0.7(0.4), if given (not given) the treatment]. Figure 3 presents the P values for the lactoferrin treatment. The traditional test (testing the crude null) has a P value of .36. As the number of baseline covariates (genes) increases, the P values of the high-dimensional test decreases. With 20 genes used, the high-dimensional test has a P value of .32, which is smaller than that of the traditional test. With all 49,495 genes used, the high-dimensional test has a P value of .23, though it is still not significant. To achieve significance (if the sharp null is indeed false for this example), one could include more baseline covariates into the high-dimensional test for the total 61 patients in the trial (as the power of the test is an increasing function of the number of covariates), and ideally covariates of diverse types other than the gene expression data currently used (as the power of the test is compromised for highly correlated covariates such as gene expressions).
Figure 3

P values in GEO118657 dataset analysis (high-dimensional test: solid line; traditional test: dotted horizontal line).

P values in GEO118657 dataset analysis (high-dimensional test: solid line; traditional test: dotted horizontal line).

Discussion

The proposed high-dimensional test is based on testing the sharp null. The sharp-null formulation in (2) is self-explanatory: the experimental intervention has no treatment affect whatsoever, for patients of any covariate profile. However, the sharp-null formulation in (3) and (4) seems rather peculiar. A simple two-step conditionality argument (Supplementary Note) may help clarify what this alternative formulation means: (the first step) it is true that there shall be no association unconditionally between treatment assignment and each and every baseline covariate in a dutifully conducted RCT (, where the sign indicates ‘independence’ or ‘no association’), and (the second step) if the sharp null in (2) is also true , then (the result) there shall furthermore be no association between treatment assignment and each and every baseline covariate, conditional on the outcome (the alternative sharp-null formulation, ). Conventional wisdom holds that testing many variables simultaneously incurs a penalty[ and many researchers turn to dimension reduction methods to mitigate the problem.[ The “p” -based methods developed by previous researchers approached this multiple-testing problem differently, whereby the dimensionality is no longer a curse but in fact a blessing. For examples, Hall et al[ and Ahn et al[ studied the geometric properties of high-dimension and low-sample-size data and showed that the group memberships of study subjects can be resolved almost perfectly using their pair-wise distances (in high dimension), and Lo and Lee[ constructed a p-based test to detect weak associations (when p is very large) and Lee[ further developed a p-based adjustment method to correct for unmeasured confounding biases (again, when p is very large). In this paper, we extend the applicability of the “p”-based approach to RCT settings and show that the high-dimensional test can become very powerful in detecting treatment effects for very large p, the number of baseline covariates. The current practice of RCTs follows the “n” -based paradigm; the power of a test is gauged by n, the number of study subjects.[ But this has a limit as the n is bounded above by the world population. By contrast, in this big-data era[ pushing the p of a RCT to the billions, trillions or even more may quickly become possible. The high-dimensional test we proposed in this paper thus provides a means to break the “n”-barrier and let ultrahigh dimensional big data generate new knowledge. But one needs to keep in mind that RCTs often have stringent inclusion and exclusion criteria. Even if infinite number of baseline covariates was collected in a RCT, the results of the high-dimensional test only apply to the (selected) patient population of that particular RCT and are not directly generalizable to patients seen in real-world. For small or moderate p, say, hundreds, thousands or millions, the high-dimensional test by itself may be underpowered and should best be used in conjunction with the traditional test. A possible solution is to combine the “p” -based sharp-null test in (7) and the “n” -based crude-null test in (5): , where are the weights attached, respectively, to the 2 tests. Further work is needed to study how to set the weights and to examine the statistical properties of this combined test. From our simulation study, the power of the high-dimensional test depends on many factors: the number of baseline covariates, the number of study subjects, the strength of the association between the baseline covariates and the potential-outcome types, the nature of the potential outcomes (deterministic or stochastic), the degree of the correlation between the baseline covariates, the distribution of the baseline covariates, the distribution of the potential-outcome types, etc. Further work is also needed to develop power formula for the proposed high-dimensional test.

Conclusions

In this big-data era, pushing p of a RCT to the millions, billions, or even trillions may someday become feasible. And the high-dimensional test proposed in this study can become very powerful in detecting treatment effects.

Author contributions

Conceptualization: Wen-Chung Lee. Data curation: Wen-Chung Lee, Jui-Hsiang Lin. Formal analysis: Wen-Chung Lee, Jui-Hsiang Lin. Funding acquisition: Wen-Chung Lee. Investigation: Wen-Chung Lee. Methodology: Wen-Chung Lee. Project administration: Wen-Chung Lee. Resources: Wen-Chung Lee. Software: Wen-Chung Lee, Jui-Hsiang Lin. Supervision: Wen-Chung Lee. Validation: Wen-Chung Lee. Visualization: Wen-Chung Lee. Writing – original draft: Wen-Chung Lee. Writing – review & editing: Wen-Chung Lee.
  12 in total

Review 1.  Clinical research methodology I: introduction to randomized trials.

Authors:  Lillian S Kao; Jon E Tyson; Martin L Blakely; Kevin P Lally
Journal:  J Am Coll Surg       Date:  2008-02       Impact factor: 6.113

2.  Bounds on causal effects in randomized trials with noncompliance under monotonicity assumptions about covariates.

Authors:  Yasutaka Chiba
Journal:  Stat Med       Date:  2009-11-20       Impact factor: 2.373

Review 3.  Understanding controlled trials. Why are randomised controlled trials important?

Authors:  B Sibbald; M Roland
Journal:  BMJ       Date:  1998-01-17

4.  The inevitable application of big data to health care.

Authors:  Travis B Murdoch; Allan S Detsky
Journal:  JAMA       Date:  2013-04-03       Impact factor: 56.272

5.  How does multiple testing correction work?

Authors:  William S Noble
Journal:  Nat Biotechnol       Date:  2009-12       Impact factor: 54.908

6.  Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system.

Authors:  Harlan M Krumholz
Journal:  Health Aff (Millwood)       Date:  2014-07       Impact factor: 6.301

7.  Prevention of Nosocomial Infections in Critically Ill Patients With Lactoferrin: A Randomized, Double-Blind, Placebo-Controlled Study.

Authors:  John Muscedere; David M Maslove; J Gordon Boyd; Nicole O'Callaghan; Stephanie Sibley; Steven Reynolds; Martin Albert; Richard Hall; Xuran Jiang; Andrew G Day; Gwyneth Jones; Francois Lamontagne
Journal:  Crit Care Med       Date:  2018-09       Impact factor: 7.598

Review 8.  From big data analysis to personalized medicine for all: challenges and opportunities.

Authors:  Akram Alyass; Michelle Turcotte; David Meyre
Journal:  BMC Med Genomics       Date:  2015-06-27       Impact factor: 3.063

9.  Detecting a weak association by testing its multiple perturbations: a data mining approach.

Authors:  Min-Tzu Lo; Wen-Chung Lee
Journal:  Sci Rep       Date:  2014-05-28       Impact factor: 4.379

10.  Detecting and correcting the bias of unmeasured factors using perturbation analysis: a data-mining approach.

Authors:  Wen-Chung Lee
Journal:  BMC Med Res Methodol       Date:  2014-02-05       Impact factor: 4.615

View more
  1 in total

1.  Health outcome prediction using multiple perturbations.

Authors:  Wen-Chung Lee
Journal:  Medicine (Baltimore)       Date:  2020-01       Impact factor: 1.817

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.