| Literature DB >> 28338968 |
Fernando Pires Hartwig1,2, Neil Martin Davies2,3, Gibran Hemani2,3, George Davey Smith2,3.
Abstract
Entities:
Mesh:
Year: 2016 PMID: 28338968 PMCID: PMC5722032 DOI: 10.1093/ije/dyx028
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Figure 1Scatter plots of all empirical Mendelian randomization studies in PubMed from 1 January 2011 to 24 October 2016. Left panel: absolute number of one-sample (dotted line) and subsample and/or two-sample Mendelian randomization studies (solid line). Right panel: proportion of subsample and/or two-sample Mendelian randomization studies (among all one-sample and subsample and/or two-sample studies). The dotted line indicates the 50% value.
Overview of the data harmonization process for two-sample Mendelian randomization applications, based on the guidelines provided by Fortier and colleagues
| Harmonization step | Procedure in two-sample Mendelian randomization |
|---|---|
| 0) Define the research question, objectives and protocol | Prior to data collection, define exposure(s) and outcome(s) variables, data analysis methods, targeted variables, etc. Targeted variables typically include an identifier of the genetic variant, effect and other alleles, effect allele frequency and regression coefficient and standard error |
| 1) Assemble pre-existing data sources and select datasets | Identify potential sources of summary results (e.g. published reports, summary results from GWAS consortia or even individual-level data) and select the most appropriate ones given the research question |
| 2) Evaluate harmonization potential of the selected datasets | At minimum, the effect allele must be available in all datasets to be harmonized. Additional variables, such as the other allele Missing exposure-associated variants in the variant-outcome dataset may be replaced by proxies available in the latter iii. Consider whether the populations used to generate the datasets are sufficiently similar to harmonize them |
| 3) Harmonize the data | Identify variants that do not share the same allele pair between datasets, and either correct this if possible |
| 4) Estimate quality of the harmonization process | Strong correlation between effect allele frequencies before and after harmonization, low number of proxy variants used and strong linkage disequilibrium between proxy and index variants suggest good quality of the harmonization process |
| 5) Preserve and disseminate the final harmonized datasets | Publish the harmonized datasets (typically as supplementary material) with all the necessary information to allow replicating the analysis directly from the datasets provided and verifying the quality of the data harmonization process |
Knowing the other allele is particularly useful for harmonization of palindromic variants.
Variants in high linkage disequilibrium with the index variant in the relevant ancestry group.
Not having the same allele pair could be a consequence of strand orientation differences between datasets. In this case, harmonizing strand orientation will result in shared allele pairs. Alternatively, if effect allele frequencies are available, they can be used to identify if the effect allele is the major or minor allele, and such classification can be used to check allele matching. Importantly, this strategy would only be reliable if the minor allele frequency is substantially below 50%.
Multiply by -1 in the case of additive effect estimates (e.g. linear regression coefficients, log(odds ratio), risk differences) or elevate to the power of -1 in the case of multiplicative effect estimates (e.g. odds ratios).
1 (or 100%) minus the effect allele frequency in the raw dataset.
Figure 2Schematic representation of chromosomes, DNA and genetic variants in a diploid cell.
Illustration of the process of data harmonization in two-sample Mendelian randomization using fictional data. Dataset 1 corresponds to instrument-exposure associations, and dataset 2 corresponds to instrument-outcome associations. It is assumed that both datasets are coded in the forward (5’→3’) strand
| Step | SNP | Dataset 1 | Dataset 2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| EA | OA | Beta (SE) | EAF (%) | EA | OA | Beta (SE) | EAF (%) | ||
| Obtain the raw data | rs1 | A | C | −0.1 (0.04) | 20 | A | C | −0.2 (0.04) | 18 |
| rs2 | G | T | −0.2 (0.03) | 40 | T | G | 0.4 (0.03) | 58 | |
| rs3a | T | C | 0.2 (0.03) | 60 | NA | NA | NA | NA | |
| rs4 | G | A | 0.1 (0.04) | 80 | A | G | −0.2 (0.04) | 18 | |
| Identify LD proxies | rs1 | A | C | −0.1 (0.04) | 20 | A | C | −0.2 (0.04) | 18 |
| rs2 | G | T | −0.2 (0.03) | 40 | T | G | 0.4 (0.03) | 58 | |
| rs5a | A | G | 0.18 (0.03) | 58 | A | G | 0.36 (0.03) | 62 | |
| rs4 | G | A | 0.1 (0.04) | 80 | A | G | −0.2 (0.04) | 18 | |
| Standardize the direction in dataset 1 | rs1 | C | A | 0.1 (0.04) | 80 | A | C | −0.2 (0.04) | 18 |
| rs2 | T | G | 0.2 (0.03) | 60 | T | G | 0.4 (0.03) | 58 | |
| rs5 | A | G | 0.18 (0.03) | 58 | A | G | 0.36 (0.03) | 62 | |
| rs4 | G | A | 0.1 (0.04) | 80 | A | G | −0.2 (0.04) | 18 | |
| Match the alleles in dataset 2 with those in dataset 1 | rs1 | C | A | 0.1 (0.04) | 80 | C | A | 0.2 (0.04) | 82 |
| rs2 | T | G | 0.2 (0.03) | 60 | T | G | 0.4 (0.03) | 58 | |
| rs5 | A | G | 0.18 (0.03) | 58 | A | G | 0.36 (0.03) | 62 | |
| rs4 | G | A | 0.1 (0.04) | 80 | G | A | 0.2 (0.04) | 82 | |
The genetic instrument rs3 was not available in the outcome GWAS. Therefore it was replaced by rs5 which was available in both exposure and outcome GWAS. To this end, rs5 must be in high LD with rs3 in the relevant ancestry group.
LD, linkage disequilibrium; SNP, single nucleotide polymorphism; EA, effect allele; OA, other allele; EAF, effect allele frequency; NA, not available.
Odds ratio (95% confidence intervals) of schizophrenia per 1-unit increment in ln(C-reactive protein) based on Mendelian randomization analyses using the inverse variance weighting method, unless indicated otherwise
| Hartwig | Inoshita | Prins | |
|---|---|---|---|
| As originally presented in publications | NA | 1.10 (1.02; 1.19) | 0.86 (0.79; 0.94) |
| Using pre-harmonization datasets | 1.09 (0.98; 1.22) | 1.08 (0.96; 1.22) | 1.10 (0.99; 1.22) |
| Using post-harmonization datasets | 0.87 (0.79; 0.95) | 0.87 (0.79; 0.97) | 0.87 (0.79; 0.95) |
NA, not applicable.
Results computed using random effects meta-analysis.
Results computed using a method that approximates regressing the outcome on an additive weighted allele score.
Datasets were provided in Supplementary Tables 1-3, available as Supplementary data at IJE online.