| Literature DB >> 23468658 |
Corey T Watson1, Paras Garg, Andrew J Sharp.
Abstract
Entities:
Mesh:
Year: 2013 PMID: 23468658 PMCID: PMC3585013 DOI: 10.1371/journal.pgen.1003332
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.
(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1st percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, p = 1.4×10−29, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm [10]. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs [7] compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 [9]. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99th percentile by LINE, SINE, LTR, and total repeat content, n = 1,716), and/or windows in which only a minority of CpGs were sampled (n = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99th percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome [20]. Therefore, a significant negative correlation exists between SNP density and segmental duplications (r = −0.337, p<10−323), a fraction of the genome that is highly enriched for structural variation [2], [3], [7]. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies [9]. However, power calculations (Figure S4) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (n = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.
Figure 2Global assessment of methylation levels and confounders contributing to hypomethylation in common CNV regions.
(a) Mean methylation levels and (b) mean CpG density per base within and flanking 5,360 nonredundant HapMap CNVs. To directly assess the relationship between DNA methylation and structural variation, we used published 15× bisulfite sequencing data [10] to calculate mean methylation per base both within and flanking a high-quality set of HapMap CNVs [7]. We first merged 8,599 CNVs defined by Conrad into 6,142 nonredundant regions, and then removed those <20 kb in size to form a filtered set of 5,360 nonredundant regions (mean size, 3,789 bp). A 100 kb window was then centered on the midpoint of each CNV, and mean methylation levels and CpG count per base in these 100 kb windows were calculated using 15× sperm bisulfite sequencing data [10]. Each plot shows a 100 bp moving average. Although a small decrease in methylation level is evident within CNVs compared to flanking regions, overall mean methylation levels within CNV regions (69%) are very similar to the genome average (70%). Furthermore this dip in methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs. As most CGIs are unmethylated in sperm [10], [17], this fact likely accounts for the small overall reduction in methylation levels associated with CNVs. (c) Regions classified as “methylation deserts” by Li et al. represent an extremely nonrandom subset of the genome that is highly enriched for common repeats and preferential mapping of bisulfite reads to CpG islands. We classified all 100 kb windows defined by Li et al. based on their content of common repeats and fraction of CpGs assayed that map within ±2 kb of CGIs. One hundred and eighty-three of the 285 (64%) windows that were classified as “methylation deserts” by Li et al. are >95th percentile based on satellite, LINE, or LTR content and/or the 99th percentile based on total repeat content. A further 80 windows (28%) are >95th percentile based on the fraction of CpGs assayed within them that map to CGIs or shores. Overall, only 22 of 285 (8%) windows defined by Li et al. as “methylation deserts” do not show extremes of repeat content or highly biased sampling of CpG islands. In contrast, in the rest of the genome, 84% of windows do not overlap any of these categories. Furthermore, windows that overlap a high-quality dataset of HapMap CNVs [7] show a repeat content and proportion of reads mapping to CGIs similar to the genome average. Thus, the set of regions defined as “methylation deserts” by Li et al. represent an extreme fraction of the genome that is likely to be highly enriched for unusual epigenetic and structural features.