| Literature DB >> 27713904 |
Jennifer C Nelson1, Robert Wellman2, Onchee Yu2, Andrea J Cook1, Judith C Maro3, Rita Ouellet-Hellstrom4, Denise Boudreau1, James S Floyd5, Susan R Heckbert5, Simone Pinheiro4, Marsha Reichman4, Azadeh Shoaibi4.
Abstract
INTRODUCTION: The large-scale assembly of electronic health care data combined with the use of sequential monitoring has made proactive postmarket drug- and vaccine-safety surveillance possible. Although sequential designs have been used extensively in randomized trials, less attention has been given to methods for applying them in observational electronic health care database settings. EXISTINGEntities:
Keywords: adverse drug reaction reporting systems; drug-related side effects and adverse reactions; electronic health records; product surveillance, postmarketing; sequential analysis; vaccine/adverse effects
Year: 2016 PMID: 27713904 PMCID: PMC5051582 DOI: 10.13063/2327-9214.1219
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
FDA32 and PCORI33 Recommendations on Sequential Testing in Clinical Trials and Their Relevance to Observational Electronic Health Care Database Safety Surveillance Settings like Sentinel
| Prespecify statistical design and primary analysis and document changes. | All statistical methods should be prespecified prior to obtaining information on treatment outcomes, including the schedule of interim analyses, stopping rules and their properties, primary hypotheses, underlying statistical model, use of one- versus two-sided tests, and designation of primary versus exploratory analyses. It is important to document protocol deviations as changes made to the original plans can weaken and even invalidate the results. | Yes. It is equally important in observational settings to prespecify analytic plans to the extent possible. However, observational surveillance is subject to many more unknowns and may need to flexibly accommodate some changes when plans cannot be implemented as initially expected. Such changes should be documented and explained so that appropriate interpretations may be made. |
| Evaluate statistical properties of the design in advance. | The statistical properties of the design should be evaluated a priori so that they are understood prior to implementation and in the context of the research question (e.g., adequate power for several assumed treatment effects). For complex designs, this might include evaluating properties over a range of assumptions relating to size of treatment effect, missing data, dropout rates, etc. Technical details should be included in an appendix (e.g., statistical models and significance thresholds for the primary analyses along with calculation details or software used, operating characteristics for the design along with methods and assumptions for computing them). | Yes. But it may not be as desirable or practical to conduct an extensive performance evaluation for surveillance applications because of the following: (1) Surveillance may be done for many exposure-outcome pairs at once, making it less feasible to conduct an extensive evaluation for each design, and (2) many unknowns can lead to changes in the actual versus designed implementation, which may downweight the need to understand the planned design’s performance in depth. It also may be helpful to use relatively simple designs that are well understood, can be reused, and can be scaled up. |
| Communicate and vet the design in advance. | The sequential design and analyses should be clearly communicated and vetted with those designing and interpreting the safety surveillance activity to assess acceptability to address the primary aims. | Yes. It is important that those designing and interpreting the safety surveillance activity (e.g., FDA) understand how the design will work in practice so any potential actions taken based on a safety signal are suitable. |
| Account for multiple testing. | The chance of making a Type 1 error will increase due to testing multiple outcomes, treatment comparisons, subgroups, or repeated analyses over time and should be addressed, potentially using frequentist Type 1 error adjustment methods. | Yes. However, the importance of strict accounting for random variation via multiple testing may be less in an observational surveillance setting since systematic variation will be (relatively) larger and sample sizes relatively larger. It is likely worth adjusting for sequential tests across multiple analysis time points, but it may be less necessary to adjust across multiple outcomes (since very few outcome are targeted for surveillance) or subgroups (since this is already designated as exploratory). |
| Interpret exploratory analyses with caution. | Exploratory analyses (e.g., in subgroups) should be interpreted with caution and should generally not be used to make definitive conclusions regarding treatment effects. | Yes. In general, surveillance results are more exploratory than results from trials. However, when prespecified, surveillance may reasonably test specific hypotheses. Results of surveillance analyses that are |
| Ensure proper oversight and reporting. | Proper statistical oversight of trial conduct should be in place, and reporting of the results should be done in a consistent fashion. | Yes. Statistical oversight and reliable reporting are key components for surveillance, given the data and analysis complexities and the desire for transparent presentation. |
Key Features of the Planned Sequential Designs Used in the VSD Collaboration and MS Pilot
| Surveillance start | As soon as uptake begins or delayed until a preset # of events occur | Delayed start until 1 year of uptake (for early conservatism) | Specified in doses (information time) based on power for specific RRs | Specified in new users (information time) based on power for specific HRs | Specified in new users (information time) based on power for specific HRs |
| Surveillance end | Specified in calendar time ∼2–3 years after the first dose | Specified in doses (information time) based on power for specific RRs; varied by event prevalence (N=72,000 doses if common, 150,000 if rare) | Specified in information time based on power to detect specific RRs; varied by adverse event prevalence | Specified in information time and based on power to detect specific HRs; resulted in last analysis ∼6 years after licensure | Specified in information time and based on power to detect specific HRs |
| Frequency of testing | Specified in calendar time as weekly | 12 total tests based on doses (information time); spacing between analyses depended on event prevalence: 3,500 or 10,500 doses | 12 total tests based on information time; spacing depended on event prevalence | 7 total tests, planned to be equally spaced based on information time | 5 total tests, planned based on information time to occur at 35, 47, 62, 80, and 100% of the total person-time |
| Duration of surveillance | Specified in calendar time as 2–3 years | Specified in information time; resulted in ∼2.5 years | Specified in information time; resulted in ∼2 years | Specified in information time; resulted in ∼6 years | Specified in information time |
| Shape of signaling threshold over time | Constant (flat) threshold on the scale of the LRT statistic | Constant (flat) threshold on the scale of the LRT statistic | O’Brien-Fleming threshold on the LRT scale, which is higher at earlier analyses | Constant (flat) threshold on the scale of the Wald statistic | Constant (flat) threshold on the scale of the Wald statistic |
| Test statistic | LRT | LRT | LRT | Wald | Wald |
| Test type | one-sided | one-sided | one-sided | one-sided | two-sided |
| Adjust thresholds? | No | Yes | No | No | No |
| Apply data lag so data are more complete? | 2–3 months | 2–3 months | 2–3 months | Varied by Data Partner (some lag by 6–9 months, others do not lag) | Varied by Data Partner (some lag by 6–9 months, others do not lag) |
| Freeze prior data? | Freeze results from prior analyses and add only new information. | Cumulatively refresh all data since start of surveillance at each new interim analysis | Cumulatively refresh all data since start of surveillance but preserve matches from prior analyses whenever feasible. | Cumulatively refresh data since start of surveillance. |
Maximum Sample Size for Logistic Regression Analysis By Number Of Analyses
| % of total sample who are ACE users | RR | ||||
|---|---|---|---|---|---|
| 25% | 1.5 | 902,285 | 1,084,340 | 1,153,941 | 1,213,358 |
| 2 | |||||
| 3 | 122,903 | 147,701 | 157,182 | 165,275 | |
| 50% | 1.5 | 676,714 | 813,255 | 865,456 | 910,019 |
| 2 | 231,559 | 278,281 | 296,143 | 311,392 | |
| 3 | 92,178 | 110,776 | 117,887 | 123,957 | |
Notes:
Assumptions: Binary outcome: Angioedema in 30 days after exposure; Comparator group: Beta blockers; Estimated rate of outcome among comparator group: 3.08/10,000 person-months; Boundary shape: Flat on standardized Z-statistic scale; Power: 90% to detect a given relative risk or risk difference.
Maximum sample size is defined as the number of new ACE inhibitor users that are required to achieve 90% power to detect a specified minimum RR or RD of interest if no signal is detected during the course of a sequential evaluation.
Maximum Sample Size for Regression Analyses by Boundary Shape
| # of Analyses | % of total sample who are ACE users | RR | |||
|---|---|---|---|---|---|
| 8 | 25% | 1.5 | 1,153,941 | 990,736 | 943,715 |
| 2 | |||||
| 3 | 157,182 | 134,951 | 128,546 | ||
| 50% | 1.5 | 865,456 | 743,052 | 707,786 | |
| 2 | 296,143 | 254,259 | 242,191 | ||
| 3 | 117,887 | 101,214 | 96,410 | ||
| 16 | 25% | 1.5 | 1,213,358 | 1,003,258 | 951,930 |
| 2 | |||||
| 3 | 165,275 | 136,657 | 129,666 | ||
| 50% | 1.5 | 910,019 | 752,444 | 713,948 | |
| 2 | 311,392 | 257,472 | 244,300 | ||
| 3 | 123,957 | 102,493 | 97,249 | ||
Notes:
Assumptions: Binary outcome: Angioedema in 30 days after exposure; Comparator group: Beta blockers; Estimated rate of outcome among comparator group: 3.08/10,000 person-months; Power: 90% to detect a given relative risk or risk difference.
Figure 1.Signaling Thresholds for a Design with Four Analyses
Notes: Assumptions: Binary outcome: Angioedema in 30 days after exposure; Proportion using ACE inhibitors (versus a comparator like beta blockers): 25%; Estimated rate of outcome among comparator group: 3.08/10,000 person-months; Power: 90% to detect a RR=2.
Figure 2.Signaling Thresholds for Designs with 8 (Top) or 16 (Bottom) Analyses
Notes: Assumptions: Binary outcome: Angioedema in 30 days after exposure; Proportion using ACE inhibitors (versus a comparator like beta blockers): 25%; Estimated rate of outcome among comparator group: 3.08/10,000 person-months; Power: 90% to detect a RR=2.