| Literature DB >> 27272186 |
Isabel Fortier1, Parminder Raina2, Edwin R Van den Heuvel3, Lauren E Griffith2, Camille Craig1, Matilda Saliba1, Dany Doiron1, Ronald P Stolk4, Bartha M Knoppers5, Vincent Ferretti6, Peter Granda7, Paul Burton8.
Abstract
Background: It is widely accepted and acknowledged that data harmonization is crucial: in its absence, the co-analysis of major tranches of high quality extant data is liable to inefficiency or error. However, despite its widespread practice, no formalized/systematic guidelines exist to ensure high quality retrospective data harmonization.Entities:
Keywords: Data harmonization; data integration; data processing; individual participant data; meta-analysis; retrospective harmonization
Mesh:
Year: 2017 PMID: 27272186 PMCID: PMC5407152 DOI: 10.1093/ije/dyw075
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Figure 1Flow chart describing selection of harmonization initiatives from literature search and references from key informants.
General characteristics of the harmonization initiatives surveyed
| Initiative (ref) | Countries | Number of studies | Study designs | Main topics |
|---|---|---|---|---|
| AirPROM | International | 4 | Cohort; | Asthma and chronic pulmonary obstructive diseases |
| Registry | ||||
| APCSC | International | 44 | Cohort | Cardiovascular risk factors and stroke, coronary heart disease and total cardiovascular diseases |
| BioSHaRE | International | 8 | Cohort; | Metabolic risk factors and obesity |
| Cross-sectional | ||||
| CHANCES | International | 15 | Cohort; Repeated cross-sectional | Cardiovascular diseases, diabetes mellitus, cancer, fractures and cognitive impairment |
| CLESA | International | 6 | Cohort | Predictors of institutionalization, hospitalization and mortality |
| CLOSER | UK | 9 | Cohort; Panel | Broad topics (interdisciplinary research across longitudinal studies) |
| COSMIC | International | 19 | Cohort | Cognitive measures and dementia |
| DYNOPTA | Australia | 9 | Cohort | Cognitive measures, dementia and functional disabilities |
| ENGAGE | International | 36 | Cohort; | Cardiometabolic traits |
| Cross-sectional | ||||
| ENRIECO | International | 19 | Cohort | Environmental risk factors in pregnancy and early childhood |
| EPIC | International | 23 | Cohort | Cancer and chronic diseases |
| EPOSA | International | 5 | Cohort | Osteoarthritis |
| ERFC | International | 121 | Cohort | Cardiovascular risk factors |
| EURALIM | International | 7 | Cross-sectional | Diet and cardiovascular risk factors |
| GENEVA | International | 16 | Observational study not specified; | Genetic and environmental risk factors for health and disease |
| Clinical trial/intervention trial | ||||
| GenomEUtwin | International | 8 | Registry | Genetic and environmental risk factors for health and disease |
| HALCyon | UK | 9 | Cohort | Physical capabilities |
| IALSA | International | 60 | Cohort | Cognitive and physical capabilities, health, personality and well-being |
| INHANCE | International | 35 | Case-control | Head and neck cancer |
| IDEFICS | International | 7 | Cohort; | Childhood obesity |
| Cross-sectional | ||||
| IPD Meta-Analysis | Canada | 3 | Cohort; | Cognitive measures |
| Cross-sectional | ||||
| LASA and NLSAA | International | 2 | Cohort | Methodological differences in the harmonization of two longitudinal studies |
| MAGGIC | International | 31 | Observational study not specified; | Survival of patients with heart failure with preserved or reduced left ventricular ejection fraction |
| Clinical trial/intervention trial | ||||
| MeRGE | International | 30 | Case-control; Nested case-control | Restrictive diastolic filling pattern and mortality in patients post-acute myocardial infarction and patient with chronic heart failure |
| MORGAM | International | 28 | Cohort; | Cardiovascular risk factors and outcomes |
| Repeated cross-sectional | ||||
| PAGE | USA | 8 | Cohort; | Genetic and environmental risk factors for health and disease |
| Cross-sectional; | ||||
| Nested case-control; | ||||
| Clinical trial/intervention trial | ||||
| PROG-IMT | International | 50 | Cohort; clinical trial/intervention trial | Cardiovascular events and carotid intima-media thickness |
| PPPSDC | International | 28 | Case-control | Diet and cancer |
| PPSRH59 | International | 12 | Cross-sectional | Self-rated health |
| RELATE | International | 14 | Cross-sectional; Panel | Early life conditions and older adult health |
| THLS | Finland | 3 | Cohort; | Harmonization of clinical data between three studies |
| Cross-sectional | ||||
| TLCS and HPHS | USA | 2 | Cohort | Personality and health |
| TSC | International | 11 | Cohort | Hypothyroidism, coronary heart disease and mortality risk |
| xTEND | Australia | 2 | Cohort | Health and well-being |
a This information was gleaned from the initiative’s website or sources other than published articles.
AirPROM, Airway Disease Predicting Outcomes through Patient Specific Computational Modeling; APCSC, Asia Pacific Cohort Studies Collaboration; BioSHaRE, Biobank Standardisation and Harmonisation for Research Excellence in the European Union; CHANCES, Consortium on Health and Ageing: Network of Cohorts in Europe and in the USA; CLESA, Comparison of Longitudinal European Studies on Aging; CLOSER, Cohort & Longitudinal Studies Enhancement Resources; COSMIC, Cohort Studies of Memory in an International Consortium; DYNOPTA, Dynamic Analyses to Optimise Ageing; ENGAGE, European Network for Genetic and Genomic Epidemiology; ENRIECO, European initiative Environmental Health Risks in European Birth Cohorts; EPIC, European Prospective Investigation into Cancer and Nutrition; EPOSA, European Project on Osteoarthritis; ERFC, Emerging Risk Factor Collaboration; EURALIM, EURope ALIMentation; GENEVA, Gene Environment Association Studies; GenomEUtwin, GenomEUtwin; HALCYon, Health Ageing across the Life Course; IALSA, Integrative Analysis of Longitudinal Studies on Aging; IDEFICS, Identification and prevention of Dietary and lifestyle-induced health Effects In Children and infants; INHANCE, International Head and Neck Cancer Epidemiology; IPD Meta-Analysis, Harmonization of Cognitive Measures In IPD meta-analysis; LASA and NLSAA, Longitudinal Aging Study Amsterdam and Nottingham Longitudinal Study of Activity and Ageing; MAGGIC, Meta-analysis Global Group in Chronic Heart Failure; MeRGE, Meta-analysis Research Group in Echocardiography; MORGAM, MOnica Risk, Genetics, Archiving and Monograph; PAGE, Population Architecture using Genetics and Epidemiology; PROG-IMT, PROGression of Carotid Intima Media Thickness study; PPPSDC, Pooling Project of Prospective Studies of Diet and Cancer; PPSRH, Pooling Project on Self-Rated Health; RELATE, Research on Early Life and Aging Trends and Effects; THLS, National Institute for Health and Welfare (THL) studies (FINRISK cohorts, Health 2000 cohort and Helsinki Birth Cohort Study); TLCS and HPHLS, Terman Life Cycle Study and Hawaii Personality and Health Longitudinal Study; TSC, Thyroid Studies Collaboration; xTEND, eXtending Treatments, Education, and Networks in Depression study.
Checklist helping to review the harmonization process
| Step | Item | Description |
|---|---|---|
| Step 0: define the questions and objectives | 1 | The research question is well defined in term of population, exposure, comparator, outcome and timing |
| 2 | The protocol takes into account questions related to feasibility (e.g. data access, realistic time-lines) and provides information required to guide the harmonization process | |
| Step 1: assemble information and select studies | ||
| Step 1a: document individual study designs, methods and content | 3 | Study-specific information gathered allows understanding study designs, time-line, population characteristics, data contents, standard operating procedures and ethico-legal requirements to access data |
| Step 1b: select participant studies | 4 | Studies are selected based on explicit selection criteria |
| Step 2: define variables and evaluate harmonization potential | ||
| Step 2a: select and define the core variables to be harmonized (DataSchema) | 5 | The DataSchema variables are selected based on their relevance in answering the research question addressed, likelihood to be generated across a number of studies and, where relevant, input from experts |
| 6 | The DataSchema variables are clearly defined, including their specific nature, format and acceptable level of heterogeneity | |
| Step 2b: determine the potential to generate the DataSchema variables making use of study-specific data items | 7 | The potential (or not) for each study to create the DataSchema variables is assessed and documented |
| Step 3: process data | ||
| Step 3a: ensure access to adequate study-specific data items and establish the overall data processing infrastructure | 8 | If harmonization is possible, the study-specific data items required to generate the DataSchema variables are made available in a computing infrastructure allowing data processing |
| 9 | Quality of study-specific data items is assessed and considered adequate | |
| Step 3b: process study-specific data items under a common format to generate the harmonized dataset(s) | 10 | Data processing is achieved using appropriate statistical models or processing algorithms |
| 11 | Harmonized data are generated and algorithms or models used to process data are documented | |
| Step 4: estimate quality of the harmonized dataset(s) generated | 12 | Quality and consistency of the harmonized data are assessed. Where appropriate, statistical models are applied to evaluate heterogeneity and potential bias |
| Step 5: disseminate and preserve final harmonization products | 13 | Harmonized data are available to approved users |
| 14 | All information required to understand harmonization procedures and to analyse the harmonized data are accessible |
Impact of the level of information that is available from each study on the harmonization process
| Level of information available | Location of study-specific individual participant data | Achievement of data processing | Application of the processing models (see |
|---|---|---|---|
| Individual participant data | Transferred on a central server or remain on individual study’s servers | Generally achieved centrally | All models |
| Aggregated data (e.g. means and frequencies) | Remain on individual study’s servers | Achieved by study-specific teams. Can be centralized if a federated infrastructure is implemented | Limited to some models |
| Final results of statistical analysis | Remain on individual study’s servers | Achieved by study-specific teams | Limited to some models |