| Literature DB >> 26028266 |
Alexej Abyzov1, Shantao Li2, Daniel Rhee Kim3, Marghoob Mohiyuddin4, Adrian M Stütz5, Nicholas F Parrish6, Xinmeng Jasmine Mu7, Wyatt Clark7, Ken Chen8, Matthew Hurles9, Jan O Korbel10, Hugo Y K Lam4, Charles Lee11, Mark B Gerstein12.
Abstract
Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26028266 PMCID: PMC4451611 DOI: 10.1038/ncomms8256
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Deriving confident set of breakpoints. A) Conceptual steps for the derivation. Breakpoints from local target assembly are filtered by mapping reads to putative junctions. B) Null model for breakpoint filtering. C) Comparison of different breakpoint sets. The pilot set[17] was included in the derivation as one of the call sets. Integrated set[21] was biased toward large non-repetitive deletions for the purpose of reliable genotyping, resulting in the strong underrepresentation of mobile element insertions. The overlap between confident set and pilot/integrated sets was roughly 50% (Supplementary Fig. 3).
Figure 2Co-aggregation of SNPs with deletion breakpoints found in the analyzed samples. A) Normalized SNP densities increased while conservation decreased in 400 kbp regions around breakpoints of each class. B) Densities increase for substitutions of all types around NH and TEI breakpoints but this is not the case for NAHR breakpoints. Increase of C to T substitutions around NAHR breakpoints is driven by SNPs in CpG motifs as evident from red bars. Furthermore, this is solely due to enrichment of CpG motifs (Supplementary Fig. 6). This is consistent with common knowledge that NAHR events are associated with sites of recombination.
Figure 3Relation of breakpoints of each class to epigenome and chromatin states. A) Overlap of breakpoints with hypomethylated regions in sperm. NAHR and TEI breakpoints show strong association. B) Breakpoint co-occurrence with chromatin states, defined by corresponding eigenvector of Hi-C data (upper panel). The genome wide co-occurrence is ordered by the value of the eigenvector (lower panel). Curves were smoothed using sliding window of 3,000 bins. NAHR breakpoints are associated with open chromatin. This association cannot be explained by higher content of segmental duplications (SDs), repeats, or recombination rate (RR). C) Association with histone marks. NH breakpoints were depleted for all active marks and also for the H3K9me3 repressive mark (red lines). TEI breakpoints showed weak depletion of active marks. While the density of repressive H3K27me3 mark decreases close to NAHR breakpoints, the density of all active marks increases.
Figure 4Analysis of micro-insertions (MI) at deletion junctions. A) MIs up to 10 bps in length could arise from technical ambiguities in breakpoint reporting when there are SNPs or indels close to breakpoints. Larger MIs are typically found for NH deletions. B) Length of micro-homology (MH) at deletion junction. For deletions with MIs and identified template site, MHs are calculated for 5′-ends/3′-ends of the deletion and the template site (panel insert). Both ends show MI longer than expected by chance and similar to the distribution for blunt deletions. C) The distribution of the nearest distance from template site breakpoints in log10 scale. The distribution is almost symmetrical and exhibits distinct peaks between 20 to 60 bps (adjacent sites) and between 2 to 6 kbps (distant sites). D) The difference in replication time between template site and breakpoints reveals later replication of template sites. For template sites outside the deletion the effect is significant (p-value < 0.03 by binomial test). The effect is even more significant (p-value < 0.01) when excluding difference of up to 0.01, as such small values are comparable to measurement error.