| Literature DB >> 35437515 |
Kanae Togo1, Naohiro Yonemoto1.
Abstract
Real world data (RWD) are generating greater interest in recent times despite being not new. There are various purposes of the RWD analytics in medical research as follows: effectiveness and safety of medical treatment, epidemiology such as incidence and prevalence of disease, burden of disease, quality of life and activity of daily living, medical costs, etc. The RWD research in medicine is a mixture of digital transformation, statistics or data science, public health, and regulatory science. Most of the articles describing the RWD or real-world evidence (RWE) in medical research cover only a portion of these specializations, which might lead to an incomplete understanding of the RWD. This article summarizes the overview and challenges of the RWD analysis in medical fields from methodological perspectives. As the first step for the RWD analysis, data source of the RWD should be comprehended. The progress of the RWD is closely related to the digitization, especially of medical administrative data and medical records. Second, the selection of appropriate statistical and epidemiological methods is highly critical for an RWD analysis than those for randomized clinical trials. This is because it contains greater varieties of bias, which should be controlled by balancing the underlying risk between treatment groups. Last, the future of the RWD is discussed in terms of overcoming limited data by proxy confounders, using unstructured text data, linking of multiple databases, using the RWD or RWE for a regulatory purpose, and evaluating values and new aspects in medical research brought by the RWD.Entities:
Keywords: Bias; Causal inference; Medical database; Observational study; Real world data
Year: 2022 PMID: 35437515 PMCID: PMC9007054 DOI: 10.1007/s42081-022-00156-0
Source DB: PubMed Journal: Jpn J Stat Data Sci ISSN: 2520-8756
Fig. 1Use of real-world data during clinical development and post-launch in pharmaceutical industry
Major data sources of secondary data (Nabhan et al., 2019)
| Data source | Description |
|---|---|
| Administrative claims database | A health insurance claim is a request made for direct payment or reimbursement for medical services from hospitals, clinics, pharmacy. Claims data are systematic and well-structured. Large claims databases are available in many countries. However, claims are recorded to maximize the reimbursement and the data sometimes might be unrepresented as the disease name of clinical practice |
| EHR database (Evans, | An EHR is an individual patient health record. A typical EHR may include a patient’s medical history, diagnoses, treatment plans, immunization dates, allergies, radiology images, pharmacy records, and laboratory and test results. Although EHR databases are more likely to capture important health information about patients than administrative data, most of that information is unstructured |
| Patient registry | A patient registry is defined as an organized system that collects data and information on a group of people defined by a particular disease or condition, and that serves a pre-determined scientific, clinical and/or public health (policy) purpose (European Medical Agency, |
| Wearable, censor | Sensors and/or software apps on smartphones and tablets that can collect health‐related data remotely i.e., outside of the healthcare provider's office (Izmailova et al., |
Biases in the RWD analysis
| Description | Example of limitation due to secondary data | |
|---|---|---|
| Selection bias | Bias that results from procedures used to select participants and from factors that influence study participation | Insurance claims data, based on worker’s association, consists of relatively young and healthier people. The population of EHR depends on the types of hospitals (Rothman et al., |
| Information bias (Measurement error, misclassification) | Systematic error such as self-reporting or recall bias, measurement error bias, confirmation bias (investigator belief) (Althubaiti, | Diagnosis for reimbursement, validity of defined outcome, misclassification of drug exposure especially time-varying exposure due to limited data |
| Confounding | Confounding is related independently to both, the exposure and the outcome. It may create an apparent association or mask a real one (Strom et al., | Limited data of potential confounders |
Study designs using the RWD (Baumfeld et al., 2020; Strom et al., 2019)
| Description | Limitation | |
|---|---|---|
| Epidemiological study design | ||
| Cohort studies of new-user design | Identifying patients who start a new drug and begin follow-up after initiation. Those patients have been evaluated by physicians who concluded that the patients might benefit from a newly prescribed drug. This makes treatment groups highly similar in characteristics that might not have been observed in the database | Secondary data requires the assumption that the first use of the drug after a certain period of no drug use is regarded as a new use |
| Nested case–control studies | Selecting all cases in the cohort, then randomly selecting one or more controls from risk-set for each case. The antecedent exposure is compared between cades and controls | Whenever the disease to be investigated is changed, controls need to be re-selected, as well as the cases. Whereas, the case–control study can easily consider many exposures |
| Self-control studies | This is a design where comparisons between exposures are made within subjects, thus significantly attenuating the problem of confounding | This design can only deal with an acute adverse event, a short wash-out period of exposure and requires precise data of exposure |
| Study designs for regulatory decision making in combination with clinical trials | ||
| External control arm for clinical trials | Participants are selected from a cohort of RWD to be balanced with subject of a clinical trial in backgrounds for comparing key efficacy or safety outcomes | Differences in unmeasured confounders between the arms of clinical trial and the external control remain even after adjusting all measured confounders |
| Pragmatic studies | A randomized clinical trial incorporating pragmatic design elements. The intervention should be delivered as in clinical practice (Ford & Norrie, | This is the most feasible in health care systems with reliable and accessible electronic health records that capture the events of interest, which is at present are challenging in many countries |
| Long‐term follow‐up studies | Post‐marketing requirement studies of safety and effectiveness outcomes of interest require longer follow‐up durations | Regulators or companies may prefer RCTs due to feasibility (e.g., level of measurement and/or monitoring) |