Literature DB >> 30515910

Considerations in characterizing real-world data relevance and quality for regulatory purposes: A commentary.

Cynthia J Girman¹, Mary E Ritchey², Wei Zhou³, Nancy A Dreyer⁴.

Abstract

Entities: Gene Species

Year: 2018 PMID： 30515910 PMCID： PMC6718007 DOI： 10.1002/pds.4697

Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN： 1053-8569 Impact factor: 2.890

× No keyword cloud information.

Guidance on use of RWD for regulatory purposes should outline circumstances and level of acceptable validation/reliability for outcome, population, and exposure definitions given the research question and intended regulatory use. Representativeness of a population of interest for RWD should be considered in the context of specific research questions for regulatory purposes. Producing interpretable results necessitates valid, relevant research definitions for disease, outcomes, and treatments. A framework for assessing the utility and usefulness of RWD with respect to a specific research question and the intended regulatory purpose is needed for use by agencies, companies, and researchers around the world.

INTRODUCTION

The 21st Century Cures Act of 2016 provided a framework to the US Food and Drug Administration (FDA) to rapidly move treatments to patients.1 The increased acceptability of real‐world data (RWD) sources allows for innovative ways to study products and has the potential to reduce trial costs. Published papers provide guidance regarding data quality issues, reproducibility, and validity assessment.2 Rapid evolvement of electronic health records (EHRs) encourages greater consideration of their use in research.1, 2, 3, 4, 5, 6 For years, the FDA has relied on epidemiological studies of postapproval product safety using RWD5, 6 (eg, administrative claims and EHR) and for device effectiveness studies4; however, regulatory use for evaluating drug effectiveness has been rare. As part of the Prescription Drug User Fee Act (PDUFA VI),3 use of RWD is being considered for potential contributions to evaluating effectiveness and safety of new indications for approved products and to satisfy postapproval study requirements. Recently, the Duke Margolis Center for Health Policy held workshops and issued two paper on this topic.5, 6 The first paper focused on defining RWD as data routinely collected pertinent to patient health status and/or delivery of care, and the use of RWD in regulatory and clinical contexts.5 The second white paper from the October 1, 2018, workshop focused on data relevancy and quality, including cleaning, transforming, and linking RWD to characterize RWD sources as “fit for regulatory purpose.”6 These papers offer a practical “commonsense” high‐level view of primary data and methods considerations for RWD use from a regulatory perspective, facilitating discussion around regulatory uses of RWD within the research community and industry. However, salient points are missing from the papers and the RWD discussions among FDA, researchers, and industry. Here, we provide a commentary on the data considerations discussed in the white papers and highlight pertinent considerations with respect to RWD in the context of whether data are relevant, representative, and robust.

Data relevance

The recent white paper defines data relevance dimensions including representativeness of the population of interest, critical data field availability, accurate linking at the patient level with multiple data sources, and adequate sample size and follow‐up time to demonstrate expected treatment effects.6 Guidance from FDA on how to ensure RWD are fit for purpose and adequate to support regulatory decisions would be helpful on each dimension. Determining if RWD is fit for regulatory purpose is a “contextual exercise” where the specific research question, regulatory use, and data characteristics drive what meaningful conclusions can be drawn.6 Covariates may be critical for one research question but not another. Exposures and outcomes should be well defined when part of the research question but may not be critical for natural history studies. There is no “one‐size‐fits‐all” approach, and critical data components should be evaluated for each research question and regulatory use.7 A framework is needed to guide choice and evaluation of critical data elements for specific research questions for regulatory use. Representativeness of the population of interest is gauged in many ways. Recent FDA guidance on Patient Focused Drug Development suggests a statistical sampling approach be used to obtain patient experience data representative of the target population.8 However, most US real‐world databases use administrative claims or EHR for patients seeking medical attention. These RWD sources should be considered broadly representative of the population eligible for using most, if not all, new products and services. “Representativeness” should be assessed broadly in the context of likely product users with some diversity in geography, health status, and health care system as appropriate for the specific research question and regulatory context. While data linkage is likely to limit the eligible sample, it may be needed to increase the informative nature of RWD, especially with increasing evaluations to support precision medicine. Sample size should be derived based on anticipated treatment effects for studies of treatment effectiveness or safety, whether comparative or not, to ensure appropriate precision of estimates. For rare diseases, there should be flexibility given data sparseness worldwide, as indicated in the FDA guidance on rare disease.8 Additional guidance would be useful regarding how “accurate linking” should be assessed since linking 100% of patients with administrative claims and EHR is impractical. Would FDA accept limited linked data if it was supplemental to cruder variables in the full dataset? Would a subset of 60% be adequate? In the context of probabilistic linkage, what level of certainty would constitute adequate linkage? Salience of linkable individuals to the specific research question should be considered in this determination and pre‐specified sensitivity analyses should help assess robustness of results and conclusions.9, 10

Data quality

Data quality should be considered in terms of validity, conformance, plausibility, and consistency.11 The acceptability of various degrees of accuracy and completeness depends on the specific research question and regulatory purpose. The white paper refers to data verification procedures, minimizing missing data, and consistency with source, often impractical given the anonymized nature of accessible data. RWD have proven valuable for specific purposes despite known limitations, when due attention is given to the adequacy of data elements, study design, and analysis. RWD used to support regulatory decisions must be of sufficient quality to ensure that it can be transformed to adequate and well‐controlled real‐world evidence. Evaluations of data quality should be focused on fit‐for‐purpose design and methods, applying sensitivity analyses to support robustness and interpretation.9, 10 It is highly desirable to use a set of validated codes or algorithms (computable phenotypes) for critical fields, depending on study purpose. Decades of validation work in administrative claims have evaluated such algorithms relative to manual chart review.12 Now that the chart and data for research may be the same (ie, EHR), we need to understand how and when such validation should be conducted.12 Even if all available processes and SOPs for cleaning, transforming, and linkage are followed, overall data adequacy in the context of study and regulatory purpose should be assessed, preferably by a researcher experienced with RWD sources for regulatory decision making. Missing data should be considered in the context of the impact on validity and generalizability of results. Whereas follow‐up data can be critical for certain purposes such as use of RWD as a comparative arm or concurrent/historic control group, missingness may be less critical for other purposes (eg, missing health outcomes may be less likely to affect results of a product utilization study than an outcomes study). That said, US RWD sources often are systematically missing follow‐up data due to turnover in health insurance plans and the US health care system's transient nature. Thus, a key consideration for any real‐world evidence research question is how much systematic loss of follow‐up data or other missing data would influence study conclusions.

Research framework

A fit‐for‐purpose framework starts with a well‐defined research question and an assessment of relevance and quality of specific critical data elements within the RWD source (Table 1). This might include assessing whether the population, outcomes, and treatments, as part of the PICOT definition13 of a well‐defined research question, can be validly and reliably defined using structured data (eg, diagnosis and procedure codes, laboratory tests, and pharmacy data) contained in RWD. If the critical data elements for a specific research objective can be defined in the RWD source, researchers might consider sample size and follow‐up time given the expected effect size, whether validation is needed for critical data elements, and what level of missing data can be tolerated (Table 2), given the specific research question and regulatory use. With data linkage, these considerations would be applicable to the separate data sources and the linked data.

Table 1

Type of structured data in RWD sources possibly needed to define elements of research question, depending on research question

What is the research question?
In specifying the research question, include the relevant data elements¹³ such as population, intervention and comparator (treatments), and outcome, as applicable (eg, to assess effects of intervention compared with comparator treatment on the incidence of outcome over 2 years (timing) in a population of patients with disease).
For Research Involving This Data Element:	Type of Structured Data in RWD Sources Possibly Needed to Define Data Elements*
Population	Diagnosis codes Procedure codes Laboratory values Pharmacy data (rarely)
Intervention and comparator (treatments; drug, biologic, or medical device)	Pharmacy data Procedure codes
Outcome	Diagnosis codes Procedure codes Laboratory values

Abbreviation: RWD, real‐world data.

Specific type of data for each research data element depends on the research question

Table 2

Considerations for choosing RWD sources for research studies

	Key Considerations
Adequate sample size	□ RWD addresses the scientific question with sufficient confidence. □ There are sufficient persons, follow‐up time, and relevant observations to address the scientific question. □ Absent specific feasibility numbers, the crude prevalence can be applied to the total person‐lives in the database to crudely estimate sample size (without applying entry criteria).
Research data element definitions and validation	□ Essential data elements are coded consistently in the RWD health care system (codes capture the research data fields, eg, disease, outcome, treatment, critical covariates, if relevant, adequately). □ Systematic errors (eg, downcoding or upcoding) in the study population and essential data element definitions are identified and minimized and pre‐specified sensitivity analyses can assess potential impact, if possible. □ Definitions for essential data elements (eg, population and outcome) are unlikely to result from “screening” or “rule out” of a specific diagnosis in clinical practice. □ Needed coding algorithms (eg, computable phenotypes) are available and validated for essential data elements. □ If additional validation is needed, given the research purpose and regulatory decision, then it can be performed within the data source. □ Covariates or confounders are available that are critical to the research question. □ (If needed) variables that correlate highly with key missing confounders are available and can be used instead.
Missingness and completeness	□ Consideration has been made regarding essential elements of the research question that may be systematically missing due to patients seeking care out of network or changes in insurance coverage and whether the outcome can be captured reliably over time within the RWD source. □ Level of systematic error will not substantially affect study interpretation. □ Discrepancies between different sources of linked data (claims and EHR) for the data elements needed for specific research question will not affect interpretation of the study results. □ In combining data from multiple health care systems, different coverage policies or benefit designs do not affect ability to address the research question.

Abbreviation: RWD, real‐world data.

Type of structured data in RWD sources possibly needed to define elements of research question, depending on research question Abbreviation: RWD, real‐world data. Specific type of data for each research data element depends on the research question Considerations for choosing RWD sources for research studies RWD addresses the scientific question with sufficient confidence. There are sufficient persons, follow‐up time, and relevant observations to address the scientific question. Absent specific feasibility numbers, the crude prevalence can be applied to the total person‐lives in the database to crudely estimate sample size (without applying entry criteria). Essential data elements are coded consistently in the RWD health care system (codes capture the research data fields, eg, disease, outcome, treatment, critical covariates, if relevant, adequately). Systematic errors (eg, downcoding or upcoding) in the study population and essential data element definitions are identified and minimized and pre‐specified sensitivity analyses can assess potential impact, if possible. Definitions for essential data elements (eg, population and outcome) are unlikely to result from “screening” or “rule out” of a specific diagnosis in clinical practice. Needed coding algorithms (eg, computable phenotypes) are available and validated for essential data elements. If additional validation is needed, given the research purpose and regulatory decision, then it can be performed within the data source. Covariates or confounders are available that are critical to the research question. (If needed) variables that correlate highly with key missing confounders are available and can be used instead. Consideration has been made regarding essential elements of the research question that may be systematically missing due to patients seeking care out of network or changes in insurance coverage and whether the outcome can be captured reliably over time within the RWD source. Level of systematic error will not substantially affect study interpretation. Discrepancies between different sources of linked data (claims and EHR) for the data elements needed for specific research question will not affect interpretation of the study results. In combining data from multiple health care systems, different coverage policies or benefit designs do not affect ability to address the research question. Abbreviation: RWD, real‐world data. Preliminary data extraction may be performed to crudely determine number of patients and median follow‐up time in the specific RWD source. Very small effect sizes may be difficult to address with precision in RWD sources due to potential for bias. Research with larger expected effect sizes can often be addressed with RWD, with careful attention to appropriate design and methods. At a very high level, one can apply the crude estimate of disease or exposure prevalence (whichever is smaller) to the number of lives covered in a database to better understand adequate sample size. A framework to assess usefulness of RWD in the context of specific research questions and intended regulatory purpose, along with published reporting guidelines,9, 14, 15 could significantly help identify major components of well‐designed studies in RWD to support specific product effectiveness and safety research questions for regulatory purposes.

CONCLUSIONS

Recent papers on use of RWD for regulatory purposes have initiated discussions among regulators, industry, and researchers on practical considerations of RWD relevance and quality. Beyond availability of data fields, valid definitions of components of research questions are crucial. More guidance is needed on what constitutes acceptable evidence of validation for critical data elements given the clinical research question and intended regulatory use. Besides FDA, other agencies are also exploring the appropriate usage of RWD in regulatory decisions. Understanding how to use RWD and whether they are “fit for purpose” is helpful for regulatory agencies, industry, and researchers around the world.

9 in total

1. GRACE principles: recognizing high-quality observational studies of comparative effectiveness.

Authors: Nancy A Dreyer; Sebastian Schneeweiss; Barbara J McNeil; Marc L Berger; Alec M Walker; Daniel A Ollendorf; Richard E Gliklich
Journal: Am J Manag Care Date: 2010-06 Impact factor: 2.229

2. Standards and guidelines for observational studies: quality is in the eye of the beholder.

Authors: Sally C Morton; Monica R Costlow; Jennifer S Graff; Robert W Dubois
Journal: J Clin Epidemiol Date: 2015-11-05 Impact factor: 6.437

3. Pre-study feasibility and identifying sensitivity analyses for protocol pre-specification in comparative effectiveness research.

Authors: Cynthia J Girman; Douglas Faries; Patrick Ryan; Matt Rotelli; Mark Belger; Bruce Binkowitz; Robert O'Neill
Journal: J Comp Eff Res Date: 2014-05 Impact factor: 1.744

Review 4. Consensus of recommendations guiding comparative effectiveness research methods.

Authors: Jacob B Morton; Robert McConeghy; Kirstin Heinrich; Nicolle M Gatto; Aisling R Caffrey
Journal: Pharmacoepidemiol Drug Saf Date: 2016-06-30 Impact factor: 2.890

5. The well-built clinical question: a key to evidence-based decisions.

Authors: W S Richardson; M C Wilson; J Nishikawa; R S Hayward
Journal: ACP J Club Date: 1995 Nov-Dec

6. Misclassification in administrative claims data: quantifying the impact on treatment effect estimates.

Authors: Michele Jonsson Funk; Suzanne N Landi
Journal: Curr Epidemiol Rep Date: 2014-12

7. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.

Authors: Michael G Kahn; Tiffany J Callahan; Juliana Barnard; Alan E Bauck; Jeff Brown; Bruce N Davidson; Hossein Estiri; Carsten Goerg; Erin Holve; Steven G Johnson; Siaw-Teng Liaw; Marianne Hamilton-Lopez; Daniella Meeker; Toan C Ong; Patrick Ryan; Ning Shang; Nicole G Weiskopf; Chunhua Weng; Meredith N Zozus; Lisa Schilling
Journal: EGEMS (Wash DC) Date: 2016-09-11

8. Harnessing the Power of Real-World Evidence (RWE): A Checklist to Ensure Regulatory-Grade Data Quality.

Authors: Rebecca A Miksad; Amy P Abernethy
Journal: Clin Pharmacol Ther Date: 2017-12-06 Impact factor: 6.875

9. Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0.

Authors: Shirley V Wang; Sebastian Schneeweiss; Marc L Berger; Jeffrey Brown; Frank de Vries; Ian Douglas; Joshua J Gagne; Rosa Gini; Olaf Klungel; C Daniel Mullins; Michael D Nguyen; Jeremy A Rassen; Liam Smeeth; Miriam Sturkenboom
Journal: Pharmacoepidemiol Drug Saf Date: 2017-09 Impact factor: 2.890

9 in total

1. Methodological considerations when analysing and interpreting real-world data.

Authors: Til Stürmer; Tiansheng Wang; Yvonne M Golightly; Alex Keil; Jennifer L Lund; Michele Jonsson Funk
Journal: Rheumatology (Oxford) Date: 2020-01-01 Impact factor: 7.580

Review 2. A Review of Causal Inference for External Comparator Arm Studies.

Authors: Gerd Rippin; Nicolás Ballarini; Héctor Sanz; Joan Largent; Chantal Quinten; Francesco Pignatti
Journal: Drug Saf Date: 2022-07-27 Impact factor: 5.228

3. A Structured Preapproval and Postapproval Comparative Study Design Framework to Generate Valid and Transparent Real-World Evidence for Regulatory Decisions.

Authors: Nicolle M Gatto; Robert F Reynolds; Ulka B Campbell
Journal: Clin Pharmacol Ther Date: 2019-06-12 Impact factor: 6.875

4. Real-world evidence to support regulatory decision-making for medicines: Considerations for external control arms.

Authors: Mehmet Burcu; Nancy A Dreyer; Jessica M Franklin; Michael D Blum; Cathy W Critchlow; Eleanor M Perfetto; Wei Zhou
Journal: Pharmacoepidemiol Drug Saf Date: 2020-03-11 Impact factor: 2.890

5. Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products.

Authors: Cynthia J Girman; Mary E Ritchey; Vincent Lo Re
Journal: Pharmacoepidemiol Drug Saf Date: 2022-05-03 Impact factor: 2.732

9. Random control selection for conducting high-throughput adverse drug events screening using large-scale longitudinal health data.

Authors: Chien-Wei Chiang; Penyue Zhang; Macarius Donneyong; You Chen; Yu Su; Lang Li
Journal: CPT Pharmacometrics Syst Pharmacol Date: 2021-08-17

9 in total

Considerations in characterizing real-world data relevance and quality for regulatory purposes: A commentary.

INTRODUCTION

Data relevance

Data quality

Research framework

CONCLUSIONS

1. GRACE principles: recognizing high-quality observational studies of comparative effectiveness.

2. Standards and guidelines for observational studies: quality is in the eye of the beholder.

3. Pre-study feasibility and identifying sensitivity analyses for protocol pre-specification in comparative effectiveness research.

Review 4. Consensus of recommendations guiding comparative effectiveness research methods.

5. The well-built clinical question: a key to evidence-based decisions.

6. Misclassification in administrative claims data: quantifying the impact on treatment effect estimates.

7. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.

8. Harnessing the Power of Real-World Evidence (RWE): A Checklist to Ensure Regulatory-Grade Data Quality.

9. Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0.

1. Methodological considerations when analysing and interpreting real-world data.

Review 2. A Review of Causal Inference for External Comparator Arm Studies.

3. A Structured Preapproval and Postapproval Comparative Study Design Framework to Generate Valid and Transparent Real-World Evidence for Regulatory Decisions.

4. Real-world evidence to support regulatory decision-making for medicines: Considerations for external control arms.

5. Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products.

6. Uncontrolled Extensions of Clinical Trials and the Use of External Controls-Scoping Opportunities and Methods.

Review 7. Incorporation of real-world data to a clinical trial: use of external controls.

8. Considerations when evaluating real-world data quality in the context of fitness for purpose.

9. Random control selection for conducting high-throughput adverse drug events screening using large-scale longitudinal health data.