| Literature DB >> 33055114 |
Daniel Dedman1,2, Melissa Cabecinha3, Rachael Williams4, Stephen J W Evans5, Krishnan Bhaskaran2, Ian J Douglas2.
Abstract
OBJECTIVE: To identify observational studies which used data from more than one primary care electronic health record (EHR) database, and summarise key characteristics including: objective and rationale for using multiple data sources; methods used to manage, analyse and (where applicable) combine data; and approaches used to assess and report heterogeneity between data sources.Entities:
Keywords: epidemiology; public health; statistics & research methods
Mesh:
Year: 2020 PMID: 33055114 PMCID: PMC7559041 DOI: 10.1136/bmjopen-2020-037405
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Selection and inclusion of studies for systematic review (adapted from Moher et al70).
General characteristics of included studies, including objective, rationale and study design
| Study type | |||||
| Analytical | Descriptive | Other* | All | % | |
| 62 | 45 | 2 | 109 | ||
| Study objective | |||||
| Drug safety | 37 | 5 | 42 | 38.5 | |
| Drug utilisation | 3 | 24 | 1 | 28 | 25.7 |
| Disease epidemiology | 7 | 9 | 16 | 14.7 | |
| Disease risk prediction | 5 | 2 | 7 | 6.4 | |
| Drug comparative effectiveness | 6 | 6 | 5.5 | ||
| Methodology/data quality | 2 | 3 | 1 | 6 | 5.5 |
| Health services research | 2 | 2 | 4 | 3.7 | |
| Main rationale for using multiple data sources (stated or inferred) | |||||
| Describe trends and variation between countries or settings | 2 | 28 | 30 | 27.5 | |
| Increase study power | 24 | 2 | 26 | 23.9 | |
| Examine consistency of findings in different settings (using a standardised approach or common study protocol) | 19 | 3 | 22 | 20.2 | |
| Compare availability/quality of data in each source | 2 | 7 | 2 | 11 | 10.1 |
| Validation of findings in a second data source | 5 | 3 | 8 | 7.3 | |
| Not clearly stated | 10 | 2 | 12 | 11.0 | |
| Databases per study: primary care EHR only | |||||
| Mean | 2.4 | 2.9 | 2.5 | 2.6 | |
| Median (range) | 2 (2–4) | 3 (2–5) | 2.5 (2–3) | 2 (2–5) | |
| Databases per study: all types | |||||
| Mean | 3.1 | 4.3 | 5.0 | 3.7 | |
| Median (range) | 2 (2–8) | 4 (2–8) | 5 (2–8) | 3 (2–8) | |
| Database setting | |||||
| Single country | 25 | 9 | 1 | 35 | 31.8 |
| Multi-country | 37 | 36 | 1 | 75 | 68.2 |
| Study design† | |||||
| Cohort study | 33 | 38 | 1 | 72 | 66.1 |
| Case–control study | 23 | 0 | 0 | 23 | 21.1 |
| Cross-sectional | 1 | 6 | 1 | 8 | 7.3 |
| Self-controlled designs | 7 | 0 | 0 | 7 | 6.4 |
| Other | 0 | 1 | 1 | 2 | 1.8 |
| Interrupted time series | 1 | 0 | 0 | 1 | 0.9 |
| Data management and analysis model (stated or inferred) | |||||
| Centralised management and analysis: raw data shared | 32 | 11 | 1 | 44 | 40.4 |
| Distributed management and analysis: aggregated results shared | 11 | 12 | 23 | 21.1 | |
| Distributed management+centralised analysis: patient level or partially aggregated data shared | 11 | 8 | 1 | 20 | 18.3 |
| Not described | 8 | 14 | 22 | 20.2 | |
| Study drug (ATC chapter) | |||||
| Nervous system | 11 | 8 | 19 | 17.4 | |
| Respiratory system | 9 | 5 | 14 | 12.8 | |
| Musculoskeletal system | 9 | 3 | 12 | 11.0 | |
| Multiple categories | 7 | 4 | 11 | 10.1 | |
| Alimentary tract and metabolism | 6 | 3 | 1 | 10 | 9.2 |
| Antiinfectives for systemic use | 4 | 4 | 8 | 7.3 | |
| Cardiovascular system | 2 | 2 | 4 | 3.7 | |
| Genito urinary system and sex hormonesc | 2 | 1 | 3 | 2.8 | |
| Blood and blood forming organs | 2 | 2 | 1.8 | ||
| Dermatologicals | 1 | 1 | 0.9 | ||
| N/A | 12 | 12 | 1 | 25 | 22.9 |
| Study condition (ICD-10 chapter) | |||||
| Diseases of the circulatory system (I00–I99) | 15 | 4 | 19 | 17.4 | |
| Diseases of the respiratory system (J00–J99) | 11 | 4 | 15 | 13.8 | |
| Diseases of the digestive system (K00–K95) | 9 | 4 | 13 | 11.9 | |
| Multiple categories | 3 | 7 | 10 | 9.2 | |
| Endocrine, nutritional and metabolic diseases (E00–E89) | 4 | 3 | 2 | 9 | 8.3 |
| Injury, poisoning and certain other consequences of external causes (S00–T88) | 6 | 1 | 7 | 6.4 | |
| Neoplasms (C00–D49) | 5 | 1 | 6 | 5.5 | |
| Diseases of the musculoskeletal system and connective tissue (M00–M99) | 4 | 1 | 5 | 4.6 | |
| Diseases of the nervous system (G00–G99) | 2 | 3 | 5 | 4.6 | |
| Pregnancy, childbirth and the puerperium (O00–O9A) | 1 | 3 | 4 | 3.7 | |
| Certain infectious and parasitic diseases (A00–B99) | 2 | 2 | 1.8 | ||
| Diseases of the skin and subcutaneous tissue (L00–L99) | 2 | 2 | 1.8 | ||
| Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism (D50–D89) | 1 | 1 | 0.9 | ||
| Diseases of the genitourinary system (N00–N99) | 1 | 1 | 0.9 | ||
| Health status, including morbidity and/or mortality | 1 | 1 | 0.9 | ||
| Mental, behavioural and neurodevelopmental disorders (F01–F99) | 1 | 1 | 0.9 | ||
| N/A | 8 | 8 | 7.3 | ||
*Other study type category included: case definition validation by chart review (one study) and prescribing data quality assessment (one study).
†Six studies included multiple designs and, therefore, included each relevant category: case–control and cohort [three studies]; case‐crossover and self‐controlled case series (SCCS) (two studies) and cohort study and SCCS (one study).
ATC, Anatomical Therapeutic Chemical Classification; EHR, electronic health record; ICD-10, International Classification of Diseases 10th Revision.
Methodological aspects of analytical studies (N=62)
| Characteristic | Study design* | |||||
| Case–control studies | Cohort studies | Self-controlled studies | Other† | All | % | |
| All studies | 23 | 34 | 7 | 2 | 62 | |
| Statistical methods‡ | ||||||
| Logistic regression | 23 | 8 | 2 | 1 | 34 | 54.8 |
| Poisson regression | 6 | 6 | 12 | 19.4 | ||
| Cox regression | 18 | 18 | 29.0 | |||
| Other§ | 9 | 1 | 1 | 11 | 17.7 | |
| Confounder control‡ | ||||||
| Multiple regression or Mantel Haenszel test | 23 | 32 | 2 | 55 | 88.7 | |
| Matching | 23 | 9 | 1 | 29 | 46.8 | |
| Case only/self-controlled design | 7 | 7 | 11.3 | |||
| Propensity scores | 3 | 3 | 4.8 | |||
| Instrumental variables | 2 | 2 | 3.2 | |||
| None | 1 | 1 | 1.6 | |||
| Database comparisons/heterogeneity assessment‡ | ||||||
| Participant characteristics presented for each database | 17 | 24 | 4 | 2 | 45 | 72.6 |
| Effect estimates presented for each database | 18 | 19 | 5 | 2 | 41 | 66.1 |
| Formal test of effect heterogeneity | 10 | 4 | 3 | 17 | 27.4 | |
| I2 | 6 | 3 | 3 | 12 | 19.4 | |
| Cochran’s Q | 2 | 1 | 3 | 4.8 | ||
| Other or not specified | 3 | 3 | 4.8 | |||
| No database comparisons (combined effect estimates only) | 5 | 11 | 2 | 17 | 27.4 | |
| Method for combining data or results‡ | ||||||
| Data not combined | 3 | 16 | 3 | 2 | 22 | 35.5 |
| Meta-analysis (two-stage) | 15 | 4 | 3 | 21 | 33.9 | |
| Random effects | 10 | 2 | 2 | 13 | 21.0 | |
| Fixed effects | 7 | 3 | 2 | 13 | 21.0 | |
| Method not specified | 1 | 1 | 1.6 | |||
| Pooled analysis (one-stage) | 12 | 15 | 3 | 29 | 46.8 | |
| Multiple: one-stage and two-stage | 7 | 1 | 2 | 10 | 16.1 | |
*Six studies contributed to multiple categories because they included multiple designs: case–control and cohort (three studies); case‐crossover and self‐controlled case series (SCCS) (two studies) and cohort study and SCCS (one study).
†(One cross-sectional and one interrupted time series.
‡A single study could be included in more than one category.
§Other statistical methods included: negative binomial regression (two studies); Mantel-Haenszel test (two studies); two-stage instrumental variable (IV) models (two studies); 'data-mining methods' (two studies); generalised linear models (one study) and univariate tests (one study).
Figure 2Comparison of relative risk (RR) estimates reported in studies using two or more methods to combine data from multiple sources. [study] refers to the review database id number (see online supplemental table S1).
Recommendations
| Recommendation | Rationale |
| Studies should report clearly on all aspects of study design and conduct which impact on harmonisation of analyses across data sources. | Allows assessment of the relative importance of heterogeneity induced by data management and analysis decisions vs heterogeneity inherent in the data. |
| Participant characteristics and effect estimates (where applicable) should be reported for each data source. | Assessment of heterogeneity is essential for interpretation, but formal methods for quantifying heterogeneity are inefficient and possibly biased in multi-database settings. |
| Where one-stage methods are used, studies should report whether and how analyses accounted for clustering and between database heterogeneity. | Interpretation requires understanding of extent to which heterogeneity might influence study results. |
| Where two-stage meta-analysis is used, studies should provide a clear rationale for choice of fixed effect (FE), random effects (RE) or other model. | Interpretation requires understanding of extent to which heterogeneity might influence study results. |
| Sensitivity analyses should include alternative methods for combining data. | Comparing the results of one-stage vs two-stage analyses, or FE vs RE models, provides information about potential impact of modelling assumptions. |
| Further research is needed to compare performance of one-stage and two-stage approaches for multi-database studies. | Relatively few studies have specifically addressed meta-analysis for multi-database studies. |