| Literature DB >> 21917142 |
Heather Fairfield1, Griffith J Gilbert, Mary Barter, Rebecca R Corrigan, Michelle Curtain, Yueming Ding, Mark D'Ascenzo, Daniel J Gerhardt, Chao He, Wenhui Huang, Todd Richmond, Lucy Rowe, Frank J Probst, David E Bergstrom, Stephen A Murray, Carol Bult, Joel Richardson, Benjamin T Kile, Ivo Gut, Jorg Hager, Snaevar Sigurdsson, Evan Mauceli, Federica Di Palma, Kerstin Lindblad-Toh, Michael L Cunningham, Timothy C Cox, Monica J Justice, Mona S Spector, Scott W Lowe, Thomas Albert, Leah Rae Donahue, Jeffrey Jeddeloh, Jay Shendure, Laura G Reinholdt.
Abstract
We report the development and optimization of reagents for in-solution, hybridization-based capture of the mouse exome. By validating this approach in a multiple inbred strains and in novel mutant strains, we show that whole exome sequencing is a robust approach for discovery of putative mutations, irrespective of strain background. We found strong candidate mutations for the majority of mutant exomes sequenced, including new models of orofacial clefting, urogenital dysmorphology, kyphosis and autoimmune hepatitis.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21917142 PMCID: PMC3308049 DOI: 10.1186/gb-2011-12-9-r86
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Direct comparison of coverage statistics from exome re-sequencing (2 × 40 bp, Illumina) of four inbred strains with two exome probe pool designs, alpha and beta
| Sample | ||||||||
|---|---|---|---|---|---|---|---|---|
| C57BL/6J | C57BL/6J | 129S1/SvImJ | 129S1/SvImJ | BALB/cJ | BALB/cJ | C3H/HeJ | C3H/HeJ | |
| Exome version | Alpha | Beta | Alpha | Beta | Alpha | Beta | Alpha | Beta |
| Quantitative PCR | 161.81 | 168.53 | 129.43 | 95.75 | 168.92 | 165.08 | 168.38 | 92.00 |
| Target exons | 203,225 | 203,224 | 203,225 | 203,224 | 203,225 | 203,224 | 203,225 | 203,224 |
| Target bases | 54,367,346 | 54,367,244 | 54,367,346 | 54,367,244 | 54,367,346 | 54,367,244 | 54,367,346 | 54,367,244 |
| Target bases covered | 52,266,238 | 53,273,874 | 51,746,839 | 52,508,881 | 51,828,334 | 52,862,662 | 52,136,965 | 51,460,949 |
| Percentage target bases covered | 96.14 | 97.99 | 95.18 | 96.58 | 95.33 | 97.23 | 95.90 | 94.65 |
| Target bases not covered | 2,101,108 | 1,093,370 | 2,620,507 | 1,858,363 | 2,539,012 | 1,504,582 | 2,230,381 | 2,906,295 |
| Percentage target bases not covered | 3.86 | 2.01 | 4.82 | 3.42 | 4.67 | 2.77 | 4.10 | 5.35 |
| Median coverage | 18.45 | 20.74 | 17.93 | 16.37 | 18.05 | 20.75 | 18.76 | 7.86 |
| Total reads | 60,582,097 | 60,207,746 | 64,258,556 | 44,434,168 | 64,495,816 | 63,740,186 | 64,959,026 | 25,760,946 |
| NC80 | 0.28 | 0.37 | 0.25 | 0.33 | 0.25 | 0.31 | 0.29 | 0.32 |
| 1/NC80 | 3.53 | 2.71 | 4.03 | 3.02 | 3.96 | 3.27 | 3.50 | 3.13 |
1/NC80 is the fold 80 penalty, which represents the fold of over-sequencing necessary to move 80% of the below median bases to median.
Figure 1Graphical view (Integrated Genomics Viewer) of read distribution across a gene and an exon . (a,b) Gene (a) and exon (b) annotations shown are from the primary representative RefSeq annotations. The exome design encompasses a unified set of exon annotations from NCBI, Ensembl and VEGA; therefore, there are regions with high coverage, representing exons that are not shown in the primary RefSeq annotation (red arrow) but are represented in Ensembl and/or VEGA. Typical coverage across exons includes sufficient read depth to call single nucleotide variants in coding sequence and in neighboring splice acceptor and donor sites, as well as 20 to 50 bases of additional flanking intron sequence (b).
Representative coverage statistics from exome re-sequencing (2 × 100 bp) of six mutant strains
| Sample | ||||||
|---|---|---|---|---|---|---|
| 5330 ( | 6246 ( | 8568 ( | 12856 ( | 13782 ( | 13716 ( | |
| Targeted exons | 203,224 | 203,224 | 203,224 | 203,224 | 203,224 | 203,224 |
| Final target bases | 54,367,244 | 54,367,244 | 54,367,244 | 54,367,244 | 54,367,244 | 54,367,244 |
| Target bases covered | 52,934,978 | 52,493,811 | 52,832,014 | 52,647,881 | 52,664,921 | 53,004,900 |
| Percentage target bases covered | 97.37 | 96.55 | 97.18 | 96.84 | 96.87 | 97.49 |
| Target bases not covered | 1,432,266 | 1,873,433 | 1,535,230 | 1,719,363 | 1,702,323 | 1,362,344 |
| Percentage target bases not covered | 2.63 | 3.45 | 2.82 | 3.16 | 3.13 | 2.51 |
| Total readsa | 39,675,108 | 39,641,830 | 31,817,686 | 42,405,386 | 59,956,764 | 67,359,382 |
| Number of reads in target regions | 23,319,015 | 23,335,916 | 19,211,748 | 25,227,205 | 36,227,876 | 39,948,582 |
| Percentage reads in target regions | 58.77 | 58.87 | 60.38 | 59.49 | 60.42 | 59.31 |
| Average coverage | 32.72 | 32.59 | 26.75 | 35.32 | 50.78 | 56.31 |
| Median coverage | 30.33 | 30.02 | 23.23 | 33.02 | 46.61 | 50.02 |
| Coverage at 20× | 76.4 | 73.6 | 61.9 | 77.5 | 85.8 | 88 |
| Coverage at 10× | 92.1 | 89.3 | 87.1 | 90.7 | 92.9 | 94.5 |
| Coverage at 5× | 95.7 | 93.8 | 94.3 | 94.4 | 95.1 | 96.2 |
| Coverage at 1× | 97.4 | 96.6 | 97.2 | 96.8 | 96.9 | 97.5 |
| NC80 | 0.51 | 0.47 | 0.46 | 0.49 | 0.47 | 0.46 |
| 1/NC80 | 1.94 | 2.13 | 2.18 | 2.06 | 2.13 | 2.17 |
1/NC80 is the fold 80 penalty, which represents the fold of over sequencing necessary to move 80% of the below median bases to median. Coverage statistics for all samples sequenced can be found in Additional file 3. a2 × 100 bp, Illumina HiSeq.
Analysis of annotated variant data from mutant exome sequencing
| Mutant number (allele) | Inheritance/phenotype | Mutation type: strain background | Variants called | In gene (introns, exons) | Overlap with map position | Non-synonymous coding variants, splice sites | Putative mutation | |||
|---|---|---|---|---|---|---|---|---|---|---|
| 12874 ( | Recessive/metabolic | Spontaneous: stock (mixed B6) | 134,205 | 116,120 | 35,469 | 350 | 155 | 29 | 1 | |
| 12724 ( | Dominant/craniofacial | ENU: C57BL/6J, C3HeB/FeJ | 49,367 | 36,037 | 10,873 | 83 | 53 | 19 | 2 | |
| Recessive/reproductive | ENU: C57BL/6J, C3H/HeJ, Cast/EiJ | 410,333 | 185,999 | 87,568 | 799 | 47 | 7 | 1 | ||
| 5330 ( | Recessive/skeletal | ENU: C57BL/6J | 8,516 | 6,167 | 4,589 | 35 | 3 | 2 | 2 | |
| 13716 ( | Recessive/reproductive | Spontaneous: C57BL/6J | 10,134 | 7,346 | 5,533 | 117 | 6 | 3 | 2 | |
| 8568 ( | Recessive/small ears | Spontaneous: C57BL/6J | 8,219 | 5,715 | 1,889 | 12 | 1 | 1 | 1 | |
| 12856 ( | Recessive/metabolic | Spontaneous: A/J | 164,116 | 59,067 | 16,930 | 454 | 177 | 83 | 1 | |
| Recessive | ENU: B6, 129 | 230,896 | 52,628 | 14,448 | 344 | 37 | 4 | 2 | ||
| 4235 ( | Dominant, craniofacial | Spontaneous: C57BL/6J, AKR/J | 134,207 | 116,122 | 35,471 | 346 | 310 | 121 | 1 | |
| NA | None | 5,980 | 3,953 | 3,132 | NA | 538 | 17 | 3 | NA | |
| 13716 ( | Recessive/reproductive | Spontaneous: C57BL/6J | 10,134 | 7,346 | 5,533 | NA | 940 | 97 | 38 | NA |
aCompared to dbSNP. b> 0.95 for homozygous samples, > 0.2 for heterozygous samples. c compared to unrelated exome data sets. NA, not available.
Figure 2Examples of validated mutations discovered in mutant exome data . The bloodline mutation is a recessive mutation that causes a distinctive dorsal epidermal defect and tooth pulp necrosis. Exome sequencing revealed a G to A mutation in Map3K11 (mitogen-activated protein kinase kinase kinase 11). (a) PCR and sequencing of additional mutant (bloodline/bloodline) and unaffected (+/+ or +/-) animals provided additional support for this putative mutation. The 'Cleft' mutation is an ENU mutation that arose on C57BL/6J. The mutation causes a dominant craniofacial phenotype and recessive perinatal lethality with characteristic cleft palate. (b) Sanger sequencing confirmed the presence of two closely linked mutations in multiple cleft/+ and cleft/cleft samples and the absence of these mutations in +/+ littermate samples. (c) Of the two mutations found, the intron mutation has the potential to cause splicing defects, although it is less likely to contribute to the phenotype since RT-PCR shows no indication of defective splicing mutant samples. The 'Sofa' mutation is a spontaneous mutation that arose on C57BL/6J, causing a dominant craniofacial phenotype and recessive perinatal lethality. (d) Sanger sequencing of heterozygous and control samples confirmed the presence of a 15-bp deletion in Pfas, FGAR amidotransferase. (e) Reads from the mutant, deletion-bearing allele successfully mapped to Pfas using BWA (Burrows-Wheeler aligment tool) and the deletion was called using SAMtools [25] with an allele ratio of 0.2.
In silico analysis of all induced or spontaneous alleles (4,984) with phenotypes reported in the Mouse Genomes Database [1]
| Mutation | Number of alleles |
|---|---|
| Unknown or uncharacterized | 3,105 |
| Introns, UTRs, regulatory regions (including instances where the lesion is not known but coding sequence has been sequenced), cryptic splice sites, inversions | 150 |
| Exons (single nucleotide substitutions, deletions, insertions) | 1,581 |
| Conserved splice acceptor or donor | 148 |
This analysis shows that the vast majority of induced or spontaneous alleles that have been characterized at the molecular level (1,879) are mutations in coding sequence or conserved splice acceptor/splice donor sites.
Validation of putative causative coding mutations in 15 mutant exomes
| Mutant number (allele) | Inheritance/phenotype | Strain background | Variants called | In gene (introns, exons) | Overlap with map position | Non-synonymous coding variants, splice sites | Validation of coding/splice variants | Variants in UTRs | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Dominant/craniofacial | Spontaneous: C57BL/6J, 129S1/SvImJ | 13,453 | 3,271 | 1,821 | 200 | 129 | 55 | 3 | None | 3: | |
| Recessive/craniofacial | Spontaneous: C57BL/6J | 121,109 | 105,964 | 30,275 | 1,441 | 639 | 94 | 3 | None | 4: | |
| Recessive/skin, hair | Spontaneous: MRL/MpJ | 182,564 | 156,802 | 57,317 | 554 | 366 | 33 | 1 | None | 4: | |
| Recessive/size | Spontaneous: A/J | 164,053 | 60,051 | 16,508 | 693 | 303 | 25 | 0 | None | None | |
| Recessive/craniofacial | Spontaneous: C57BL/6J, A/J | 124,054 | 105,326 | 20,073 | 36 | 22 | 0 | 0 | None | None | |
| Recessive/craniofacial | Spontaneous: C57BL/6J | 7,523 | 3,079 | 2,338 | 13 | 7 | 0 | 0 | None | None |
In 6 of the 15 mutant exomes sequenced, candidate mutations in protein coding sequence or splice sites were either not found or could not be validated in additional samples; for three of these, however, candidate mutations in regions annotated at UTRs were identified. aCompared to dbSNP. b> 0.95 for homozygous samples, > 0.2 for heterozygous samples. ccompared to unrelated exome data sets.