| Literature DB >> 25883321 |
Wigard P Kloosterman1, Laurent C Francioli1, Fereydoun Hormozdiari2, Tobias Marschall3, Jayne Y Hehir-Kwa4, Abdel Abdellaoui5, Eric-Wubbo Lameijer6, Matthijs H Moed6, Vyacheslav Koval7, Ivo Renkens1, Markus J van Roosmalen1, Pascal Arp7, Lennart C Karssen8, Bradley P Coe2, Robert E Handsaker9, Eka D Suchiman6, Edwin Cuppen1, Djie Tjwan Thung4, Mitch McVey10, Michael C Wendl11, André Uitterlinden12, Cornelia M van Duijn8, Morris A Swertz13, Cisca Wijmenga13, GertJan B van Ommen14, P Eline Slagboom6, Dorret I Boomsma5, Alexander Schönhuth3, Evan E Eichler2, Paul I W de Bakker15, Kai Ye16, Victor Guryev17.
Abstract
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25883321 PMCID: PMC4448676 DOI: 10.1101/gr.185041.114
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Overview of study design. A total of 250 parent-offspring families were sequenced at 14.5× coverage. De novo indel and structural variant (SV) calling was performed using 11 algorithms combining gapped reads, split reads, discordant read-pairs, and read depth approaches to cover the entire mutation size spectrum. All candidate indels (1169 in 99 children) and SVs (601 in 258 children) were subjected to experimental validation, leading to 291 validated de novo indels and 41 de novo SVs.
Figure 2.Frequency of de novo indels and SVs. (A) Size-frequency distribution of 332 validated de novo indels and SVs identified in this study. In addition, the frequency of de novo SNVs is shown (Francioli et al. 2014). The asterisk denotes a size bin containing one de novo tandem duplication and six de novo retrotransposon insertions. (B) Bar plot indicating the numbers of de novo indels and SVs on paternal and maternal haplotypes.
Indel classes and mechanisms
Figure 3.Overview of de novo and inherited indel classes and their formation mechanisms. (A) Proportion of de novo and inherited indels by class. Inherited indels exhibit a 2.3-fold enrichment in indels located in homopolymer runs (HR) and tandem repeats (TR) when compared to de novo indels, suggesting lower selective pressures in these regions. (B) Outline of a plausible seven-step process that could account for the formation of a complex de novo indel by SD-MMEJ.
Figure 4.Mechanisms contributing to the formation of de novo SVs. (A) Overview of four SV formation mechanisms, including examples and observed counts for each of these. (L) Left flank, (R) right flank, (J) junction. (B) Schematic structure of a complex de novo interchromosomal SV involving an insertion of DNA from Chromosomes 3 and 19 into Chromosome 4. (TSD) Target site duplication.
Figure 5.Effect of de novo SVs on protein-coding genes. (A) Deletion of six exons of PTPRM, resulting in an in-frame shortened gene. (B) Deletion of one exon of LYN, causing an out-of-frame effect at the transcript level. (C) Deletion of eight exons of UBR5, causing an out-of-frame effect at the transcript level. (D) Duplication of one exon of BANK1, possibly resulting in a premature stop. (E) Duplication of the entire PROC1 gene. (F) Duplication of three entire genes (GCNT3, GTF2A2, BNIP2). Duplications are shown in green and deletions in red. (A) Ancestral allele, (D) derived allele.
Figure 6.Functional impact of de novo indels and SVs. (A) Average number of genomic bases affected by de novo SNVs, indels, and SVs per child. (B) Average number of coding bases affected by de novo SNVs, indels, and SVs per child. (C) Average number of genes affected by de novo SNVs, indels, and SVs per child. The relative frequencies of the effects of the variations on the gene are indicated. (D) Comparison of the footprint of de novo (blue bars) and inherited (brown bars) large SVs (>20 bp) relative to the footprint of SNVs. The footprint was computed genome-wide, in protein-coding regions and genomic regions marked by H3K4me1, H3K4me3, and H3K27ac based on data from the ENCODE Project (The ENCODE Project Consortium 2007). The y-axis shows the ratio of the average number of affected bases per offspring relative to SNVs.