| Literature DB >> 32243569 |
Rona Gini1, Miriam C J Sturkenboom2, Janet Sultana3, Alison Cave4, Annalisa Landi5,6, Alexandra Pacurariu4, Giuseppe Roberto1, Tania Schink7, Gianmario Candore4, Jim Slattery4, Gianluca Trifirò8.
Abstract
Although postmarketing studies conducted in population-based databases often contain information on patients in the order of millions, they can still be underpowered if outcomes or exposure of interest is rare, or the interest is in subgroup effects. Combining several databases might provide the statistical power needed. A multi-database study (MDS) uses at least two healthcare databases, which are not linked with each other at an individual person level, with analyses carried out in parallel across each database applying a common study protocol. Although many MDSs have been performed in Europe in the past 10 years, there is a lack of clarity on the peculiarities and implications of the existing strategies to conduct them. In this review, we identify four strategies to execute MDSs, classified according to specific choices in the execution: (A) local analyses, where data are extracted and analyzed locally, with programs developed by each site; (B) sharing of raw data, where raw data are locally extracted and transferred without analysis to a central partner, where all the data are pooled and analyzed; (C) use of a common data model with study-specific data, where study-specific data are locally extracted, loaded into a common data model, and processed locally with centrally developed programs; and (D) use of general common data model, where all local data are extracted and loaded into a common data model, prior to and independent of any study protocol, and protocols are incorporated in centrally developed programs that run locally. We illustrate differences between strategies and analyze potential implications.Entities:
Mesh:
Year: 2020 PMID: 32243569 PMCID: PMC7484985 DOI: 10.1002/cpt.1833
Source DB: PubMed Journal: Clin Pharmacol Ther ISSN: 0009-9236 Impact factor: 6.875
Figure 1Graphical representation of the four strategies. For simplicity, we did not graphically represent the possibility that in strategies A, C, and D analytic datasets or an aggregated version thereof may be shared. It is intended that the data transformation of raw data into the general CDM in strategy D happens independently of a specific study. CDM, common data model.
Comparison between the four strategies to execute a multi‐database study
| Strategy | First step of the strategy | Data extraction | Conversion to CDM | Data transformation and analysis | Level of data sharing | Examples |
|---|---|---|---|---|---|---|
| A: Local analysis | An agreed study protocol | Each site extracts a dataset specific for the study | Not done | Programmed locally by each site, not necessarily shared by design | Anonymized analytic dataset | CNODES, PROTECT, EU PE, & PV Research network |
| B: Sharing of raw data | As for A | As for A | Not done | Programmed by one site, not necessarily shared by design | Raw extracted data | Collaborative studies on similar countries, or when data are licensed from more than one vendor |
| C: CDM with study‐specific data | As for A | As for A | Once procedures for conversion to a CDM have been programmed, they can be re‐used for subsequent studies to load extracted data into the same CDM | Programmed by one site, existing standard programs can be re‐used, shared with sites | As for A | EU‐ADR, SOS, ARITMO, VAESCO, SOMNIA, EMIF, ADVANCE |
| D: General CDM | Periodically, local data are extracted and loaded to the CDM. An agreed protocol is then the starting point of each individual study. | Each site extracts and converts the entire database periodically; the program that executes the extraction from CDM for each individual study is then programmed centrally | Once procedures for conversion to a CDM have been programmed, they can be re‐used to refresh the data in the CDM | As for C | As for A | Sentinel, OHDSI, no multinational European study completed as yet |
ADVANCE, advancing collaborative vaccine benefits and safety research in Europe; ARITMO, arrhythmogenic potential of drugs; CDM, common data model; CNODES, Canadian Network for Observational Drug Effect Studies; EMIF, European Medical Information Framework; EU PE & PV (European Pharmacoepidemiology & Pharmacovigilance) research network; EU‐ADR, exploring and understanding adverse drug reactions by integrate mining of clinical records and biomedical knowledge; OHDSI, Observational Health Data Sciences and Informatics; PROTECT, Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium; SOMNIA, Influenza Immunization Assessment; SOS, safety of non‐steroidal anti‐inflammatory drugs; VAESCO, Vaccine Adverse Event Surveillance and Communication.
Potential impact of the four strategies on relevant dimensions
| Strategy | Timeliness | Compliance with data regulations | Data processing quality control | Scientific independence and transparency |
|---|---|---|---|---|
| A: Local analysis |
Time to protocol agreement: full involvement of the local expertise is supported by design. Time to approval of protocol: depends on each site. Time for local extraction: according to local resources. Time to create study variables and analytic dataset: according to local resources. Time to develop analytic procedures: done in all sites. | Each site has full control on who uses the data and for which purpose |
Data extraction quality: does not allow common data quality checks. Data transformation quality: difficult to check. |
Transparency: is not necessarily ensured by design. Scientific independence: control by local researchers is complete. |
| B: Sharing of raw data |
Time to protocol agreement: full involvement of the local expertise is not ensured by this design, which may speed up the process. Time to approval of protocol: as for A. Time for local extraction: as for A. Time to create study variables and analytic dataset: lengthy when one site needs to harmonize all the different data. Time to develop analytic procedures: depends on whether programs are available or easily adapted, or study‐specific programs must be developed. | Local sites retain minimum control of data re‐use after data sharing: a specific agreement is needed between data controller and data processors |
Data extraction quality: as for A. Data transformation quality: depends on the quality of a single site, full involvement of the local expertise is not ensured by design. |
Transparency: as for A. Scientific independence: control of local researchers may be minimal, researchers from the coordinating center have control. |
| C: CDM with study specific data |
Time to protocol agreement: as for B. Time to approval of protocol: as for A. Time for local extraction: as for A. Time to load the data into the CDM: depends on whether there is previous experience with the same CDM. Time to create study variables and analytic dataset: depends on whether standard tools are used (fast) or a specific refined strategy is developed (longer). Time to develop analytic procedures: as for B. | Local sites can require that the type of data to be shared (pseudoanonymized analytic dataset or aggregated data, or final results) complies with their local obligations/laws |
Data extraction quality: checks may apply for each database. Data conversion quality: quality framework can be put in place to ensure accurate data conversion into the CDM. Data transformation quality: can be checked by all the partners. |
Transparency: among partners is partly ensured by design, although not necessarily clear to the external audience. Scientific independence: control of local researchers is high. |
| D: General CDM |
Time to protocol agreement: as for B. Time to approval of protocol: as for A. Time for local extraction: centralized as it builds on the CDM. Time to load the data into the CDM: happens periodically, prior to the study. Time to create study variables and analytic dataset: as for C. Time to develop analytic procedures: as for B. | As for C, except for some countries where strategy D cannot be applied because data cannot be accessed independently of a protocol |
Data conversion quality: as for C. Data extraction quality: checks may apply centrally. Data transformation quality: as for C. |
Transparency: as for C. Scientific independence: control of local researchers is high, but there is a dependence on a source of funding to extract, transform, and load periodically data into a CDM. |