| Literature DB >> 33931130 |
Sergio Villicaña1, Jordana T Bell2.
Abstract
Multiple recent studies highlight that genetic variants can have strong impacts on a significant proportion of the human DNA methylome. Methylation quantitative trait loci, or meQTLs, allow for the exploration of biological mechanisms that underlie complex human phenotypes, with potential insights for human disease onset and progression. In this review, we summarize recent milestones in characterizing the human genetic basis of DNA methylation variation over the last decade, including heritability findings and genome-wide identification of meQTLs. We also discuss challenges in this field and future areas of research geared to generate insights into molecular processes underlying human complex traits.Entities:
Keywords: DNA methylation; GWAS; Heritability; Methylation quantitative trait loci; meQTL
Mesh:
Year: 2021 PMID: 33931130 PMCID: PMC8086086 DOI: 10.1186/s13059-021-02347-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Glossary of commonly used terms
| Term | Definition |
|---|---|
| A genetic mechanism where alleles at one or more loci have a cumulative contribution to the phenotype. In human | |
| Allelic asymmetry in DNA methylation status at a locus. ASM can be a consequence of several factors, such as genetic variation ( | |
| Region of the genome where the frequency of CpG sites is greater than that expected by chance. Different definitions of CGI have been proposed. CGIs are flanked by regions known as | |
| Computational analysis that aims to identify a statistically significant difference in mean DNA methylation levels across sample groups. The approach can be applied to individual CpG sites (differentially methylated sites or positions, | |
| Microarray-based technology that quantifies DNA methylation levels at a pre-specified set of CpG sites. Commonly used approaches typically apply bisulfite conversion of the DNA, followed by Illumina DNA methylation array profiling. Illumina methylation arrays include the Infinium HumanMethylation27 ( | |
| Analysis that systematically assesses the association between epigenetic marks (e.g., DNA methylation levels) at genetic loci across the genome and a phenotype or exposure of interest. | |
| Analysis that systematically assesses the association between genetic variation at genetic loci across the genome and a phenotype of interest. | |
| The proportion of variance in a phenotype that is attributed to the genetic variation. The | |
| A genetic locus at which genetic variation is associated with variation in DNA methylation at a specific CpG site. MeQTLs can form local associations in | |
| The DNA methylation profile of the genome. The methylome can be profiled at different levels of resolution, in single cells or in populations of cells, across different cells and tissues, and at a specific moment in time. It can be profiled using different technologies including | |
| When multiple simultaneous statistical tests are carried out, the probability of spurious discoveries increases. Different multiple testing correction procedures can be applied, including methods that control the false discovery rate ( | |
| A genetic variant that impacts multiple phenotypes. | |
| Follow-up analysis from | |
| A quantitative measure that describes the strength and direction of association between a genetic variant and its associated phenotype, estimated in genetic association analysis. | |
| Deep sequencing technology used to detect the methylation status of all sites in the |
Fig. 1A typical workflow for meQTL identification. Step 1 is DNA methylation profiling. The most commonly applied methylation profiling technologies in meQTL studies are Illumina methylation arrays and whole genome bisulfite sequencing (WGBS). In both approaches, DNA is treated with bisulfite, converting unmethylated cytosines into uracils, and leaving methylated cytosines unchanged. DNA can then be profiled by sequencing or by Illumina array technologies, consisting of pre-designed probes. In step 2, DNA methylation levels at each CpG site are quantified, typically either as percentage (0–100%, e.g., in WGBS) or proportion methylation (0–1, e.g., in the Illumina technology methylation β-value). The example shows the distribution of methylation β-values for one CpG site (m1) across all profiled samples. Step 3 is the association of a set of genetic variants (coded as allele dosages at each locus) with methylation values at each CpG site, usually using linear models. In this example, after the association test at site m1 with a set of i genetic variants (shown in the Manhattan plot), g1 was found to be significantly associated with m1 (shown in the boxplot). Finally, step 4 represents the extension of the genetic association test to all profiled CpG sites genome-wide and the identification of genome-wide meQTLs after setting an appropriate threshold for statistical significance. The resulting meQTL associations can be either short-range, in cis (shown in heatmap for a few Mbp), or long-range or on different chromosomes, in trans (shown in Circos plot with all chromosomes)
Blood-based genome-wide meQTL studies (sample size > 100) in whole blood or blood-derived cell samples
| Ref. | Sample sizea | Methylation assay | Genotyping assay | Significant CpGsb | Significance thresholdc | Remarks |
|---|---|---|---|---|---|---|
| [ | 27,750 | Illumina 450K | 1000Gd | Study design: two-phase meta-analysis in a total of 36 cohorts. Additional analysis: replication of 188,017 meQTLs in external sample (76% for | ||
| [ | 4170 | Illumina 450K | Affymetrix 500K and 50K MIP, 1000Gd | FWER1 < 5% ( | Additional analysis: replication of effect direction in two external samples (81–99%). | |
| [ | 429 | Illumina 450K | GOLDN studye | FWER1 < 5% ( | Study design: response meQTL, with log transformed post-/pre-treatment methylation values. | |
| [ | 1111 | Illumina EPIC | Illumina CoreExome, 1000Gd | FWER1 < 5% ( | Effect size: mean methylation change per additional reference allele of 3.46% ( | |
| [ | 156 [monocytes] (two samples) | Illumina EPIC | Illumina Omni, WGS, 1000Gd | FDR3 < 5% ( | Study design: separate meQTLs discovery for the two samples, joint results reported. Additional analysis: | |
| [ | 1980 (two samples) | Illumina 450K | Illumina 610-Quad | FWER1 < 5% ( | Study design: separate meQTLs discovery for the two samples. Additional analysis: replication of CpGs in both samples (13.3% for | |
| [ | 337 | Illumina 450K | Illumina CytoSNP, 1000Gd | FDR3 < 1% | ||
| [ | 729 | Illumina 450K | Illumina Exome, Hap300 and Omni, 1000Gd | FWER1 < 5% ( | Additional analysis: variance meQTLs. | |
| [ | 460 [cord blood and whole blood] (two samples) | Illumina 450K | Illumina Omni, 1000Gd | FDR3 < 5% | Study design: separate meQTLs discovery for two samples with different parameters. Additional analysis: meQTLs co-localization with results from two published studies. | |
| [ | 3,841 | Illumina 450K | CODAM34, LLD9, LLS38, NTR12, and RS studiese, GoNLd | FDR3 < 5% ( | Additional analysis: | |
| [ | 744 [T cells and whole blood] (two samples) | MCC-seq, WGBS | WGS, Illumina Omni, inferred from WGBS, 1000Gd | FDR4 < 10% | Study design: separate meQTLs discovery for the two samples. Additional analysis: meQTL in visceral adipose tissue samples, ASM analyses and genotype-independent tests, and validation on Illumina 450K. | |
| [ | 525 [neutrophils, monocytes and T cells] (three samples) | Illumina 450K | WGS | FWER1 < 5% | Study design: separate meQTLs discovery for the three samples, mean results reported. | |
| [ | 3948 [cord blood and whole blood] (five samples) | Illumina 450K | Illumina Hap550 and 660W, 1000Gd | FWER1 < 0.2% ( | Study design: separate meQTLs discovery for the five samples. Additional analysis: replication of CpGs in pairwise comparisons (83–98% for | |
| [ | 850 | Illumina450K | Illumina Hap550, Exon510, 1M and 1M-Duo | FWER1 < 5% ( | ||
| [ | 1748 [lymphocytes] (cancer-case and control samples) | Illumina 450K | OFCCRe, Affymetrix 500K | FDR4 < 5% ( | Study design: joint meQTLs discovery for the two samples; | |
| [ | 697 | MBD-seq | Affymetrix SNP 5.0 and 6.0, Illumina Omni, 1000Gd | FDR4 < 1% | Additional analysis: replication of findings in one sample of schizophrenia cases ( | |
| [ | 264 [cord blood and whole blood] (three samples) | Illumina 27K | Illumina Omni, Affymetrix SNP 5.0 and 6.0, HapMapd | FWER2 <5% | Study design: separate meQTLs discovery for the three samples. Additional analysis: replication of meQTL-CpG pairs in pairwise comparisons (17.8–69.5%); meQTL in four brain regions samples. | |
| [ | 177 [T cells and LCL] | Illumina 450K | Illumina Omni, 1000Gd | FDR3 < 10% | Study design: separate meQTLs discovery for the two samples. Additional analysis: meQTL in fibroblasts sample. | |
| [ | 171 | Illumina 27K | Illumina Hap300, 610-Quad, 1M-Duo and 1.2M-Duo, HapMapd | FDR3 < 5% ( |
aIf not specified, the sample type is whole blood. If more than one sample per analysis, the pooled size and number of samples is reported
bIn parenthesis, maximum or minimum distances are indicated for cis and trans analysis, respectively. The range of results is presented if more than one analysis was done (unless otherwise stated)
cMultiple-testing criteria, with the corresponding p-value threshold for cis and trans meQTLs (where it differs). Different approaches to estimate FWER and FDR are as follows:
1FWER based on Bonferroni correction
2FWER based on Holm-Bonferroni correction
3FDR based on permutations
4FDR based on Benjamini-Hochberg correction
dReference panel for imputations
eDatabase or biobank
FWER family-wise error rate, FDR false discovery rate, LCL lymphoblastoid cell lines, WGS whole genome sequencing, MCC-seq methylC-capture sequencing, WGBS whole genome bisulfite sequencing, MBD-seq methyl-CpG-binding domain sequencing, 1000G 1000 genotypes, GoNL Genome of the Netherlands, TF transcription factor, ASM allele-specific methylation
Overview of published genome-wide DNA methylation quantitative trait loci studies in blood-independent sample types
| Sample type | Association | Methylation | Sample sizesb | Significant | Significant | Ref. |
|---|---|---|---|---|---|---|
| analysesa | assays | |||||
| Brain | 18 | Illumina 450K, | 18–468 | 0.1–13.6% | 0.1–5.1% | [ |
| Illumina 27K | ||||||
| Buccal | 2 | Illumina EPIC | 86–197 | 4.3–7.4% | [ | |
| Cancer | 23 | Illumina 450K | 103–664 | 1.7–6.4% | 0.1–24.8% | [ |
| Connective tissue (adipose, fibroblasts) | 4 | Illumina 450K, | 107–603 | 3.2–28.5% | 0.1% | [ |
| MCC-seq | ||||||
| Epithelial | 1 | Illumina 450K | 111 | 4.4% | [ | |
| Lung | 2 | Illumina 450K | 126–210 | 10–10.1% | 0.2% | [ |
| Placenta | 2 | Illumina 450K | 37–303 | 0.2–0.9% | [ | |
| Skeletal muscle | 1 | Illumina EPIC | 282 | 20.6% | [ |
aWe account for the different association analyses, even if they are published in the same paper
bIf more than one analyses is available, the range is presented
Fig. 2Mechanisms underlying cis-meQTL effects. a Passive mechanism. Under normal conditions a sequence-specific binding protein (such as CTCF) can bind to its target and prevent methylation changes at surrounding CpG sites due to its occupancy. If a meQTL disrupts the site, the protein cannot bind successfully, and the CpG sites are prone to change in baseline methylation status. b Active mechanism. If a meQTL is located in a TFBS, lack of TF binding can promote the recruitment of DNMT or TET enzymes, and thus modify the methylation status of nearby CpG sites
Fig. 3Mechanisms underlying trans-meQLTs effects. a eQTL-mediated mechanism. If a SNP acts as an eQTL for a gene that regulates DNA methylation, the SNP can have an indirect effect on multiple CpG sites in trans. bCis-meQTL-mediated mechanism. If a SNP is a cis-meQTL for nearby CpG sites, which in turn impact the expression of genes involved in epigenetic regulatory processes, the SNP can ultimately alter DNA methylation levels at CpG sites in trans. c 3D organization mechanism. In the 3D genome, distal sites can move in close proximity, whereby a SNP can affect a DNA methylation levels at CpG sites in trans, acting either through cis-meQTL mechanisms, or by disrupting the formation of structural loops. d SNPs in the coding regions of methyl-specific binding proteins (such as MeCP2) can alter their specificity and function, and therefore passively or actively (by recruiting DNMTs or TETs) modify DNA methylation of their binding sites