| Literature DB >> 31551502 |
Matthew J Meier1,2, Marc A Beal1,3, Andrew Schoenrock1, Carole L Yauk1, Francesco Marchetti4.
Abstract
The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse's whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.Entities:
Mesh:
Year: 2019 PMID: 31551502 PMCID: PMC6760142 DOI: 10.1038/s41598-019-50302-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Animals sequenced in this study and library/coverage statistics.
| Animal | Platform and library type | Mean fragment length (bp) | Total raw bp | Median depth of aligned reads | |
|---|---|---|---|---|---|
| Health Canada MutaMouse | 1 | Illumina TruSeqa | 480 | 160,972,859,100 | 55X |
| 2 | Illumina TruSeq and Nextera Mate Pair | 3,078 (Nextera Mate Pair) 480 (TruSeq) | 270,924,369,600 | 81X | |
| PacBio RSIIb | 20,000 | 11,900,000,000 | 3.5X | ||
| 3 | Illumina TruSeq | 450 | 159,512,702,850 | 55X | |
| 4 | Illumina TruSeq | 450 | 159,691,600,650 | 54X | |
| 5 | Illumina TruSeq | 444 | 196,932,569,400 | 71X | |
| Covance MutaMouse | Male | Illumina Nextera Mate Pair | 3,849 (Nextera mate pair) | 144,615,033,150 | 32X |
| Female | Illumina Nextera Mate Pair | 3,414 (Nextera mate pair) | 131,111,604,600 | 28X | |
aLibraries built by Génome Québec; sequencing carried out by Génome Québec and in-house at Health Canada.
bSequencing and library construction performed by Génome Québec.
Figure 1Summary of the number of genetic variants identified in MutaMouse animals from the Health Canada colony sequenced for this study. (A) The number of small variants discovered in each animal and the relative proportions of SNVs, insertions, and deletions. The number of novel variants (i.e., those not found in dbSNP v142) is shown as a black bar. A consistent number of small variants was observed between animals, with an average pairwise overlap of 85%. A Venn diagram with the number of variants (in millions) shared by two representative MutaMouse individuals is shown above right, and a Venn diagram showing the overlap with parental strains DBA/2 and BALB/c is shown above left. (B) Structural variants (SVs) identified in MutaMouse using NGS data and two different algorithms (Manta and CNVnator). Nearly 10-fold more SVs were predicted by Manta, which uses paired-end information to determine discordant reads. CNVnator was used to predict the locations of copy number variants greater than 250 bp in length. (C) The overlap in CNV calls between different methods. This analysis was limited to CNVs only found in all five animals. The majority of each call set does not overlap with the others, reinforcing the idea that multiple algorithms and methods should be used to call structural variants.
Figure 2Circos plot[54] showing genetic variation between five individual MutaMouse animals from a colony maintained at Health Canada. Links represent structural translocations identified in all mice sequenced. Each outside track has 5 individual components, one for each mouse sequenced (MutaMouse HC 1, 2, 3, 4, 5, from inside to out). The tracks from inside to outside represent (1) runs of homozygosity (where blue is homozygous and red is heterozygous – immediately adjacent to chromosome numbering), (2) Red track: SNV density per megabase of DNA sequence for all SNVs, (3) Blue track: SNV density for novel SNVs, (4) Orange track: SNV density for variants inherited from DBA/2J exclusively, (5) Green track: SNV density for variants inherited from BALB/cJ exclusively, and (6) Grey track: SNV density for variants inherited from both parents. The next tracks show copy number variants for the parental strains DBA/2J and BALB/cJ (orange and green, respectively), followed by copy number variants in MutaMouse predicted by CNVnator (blue). Finally, the line plots show the log2 values for aCGH probes averaged over 1 megabase intervals, with red highlights showing deleted segments and green highlights showing duplicated segments for each mouse.
Figure 3The number of predicted locations of variants sequenced in five MutaMouse animals. The majority of variants are found in intergenic regions, introns, or upstream and downstream non-coding sequences. Only 0.62% of variants (an average of 35,000 per animal) are found within exons.
Variants (SNVs and short indels) identified in MutaMouse in the genes from pathways that are important for genetic toxicology studies.
| Animals | Cancer Progressiona (642 genes) | Cell Cycleb (1,434 genes) | DNA repairb (408 genes) | Xenobiotic Metabolismb (45 genes) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variants within genes | Non synonymous | SIFT score deleteriousc | Variants within genes | Non synonymous | SIFT score deleteriousd | Variants within genes | Non synonymous | SIFT score deleteriousc | Variants within genes | Non synonymous | SIFT score deleteriousc | |
| MutaMouse 1 | 93,708 (15,099)c | 301 (34) | 52 (8) | 133,383 (21,452) | 659 (47) | 98 (23) | 35,592 (5,567) | 244 (3) | 27 (0) | 3,483 (630) | 13 (1) | 2 (1) |
| MutaMouse 2 | 80,922 (14,455) | 313 (38) | 54 (10) | 135,318 (21,943) | 598 (47) | 96 (24) | 34,416 (5,475) | 222 (9) | 27 (4) | 3,295 (586) | 11 (1) | 2 (1) |
| MutaMouse 3 | 98,330 (15,681) | 354 (45) | 67 (20) | 147,108 (23,716) | 745 (58) | 117 (29) | 39,493 (6,055) | 246 (3) | 27 (1) | 3,469 (643) | 11 (1) | 2 (1) |
| MutaMouse 4 | 89,172 (14,906) | 324 (44) | 65 (20) | 136,668 (21,821) | 621 (51) | 102 (26) | 37,618 (5,802) | 238 (7) | 30 (3) | 3,326 (583) | 11 (1) | 2 (1) |
| MutaMouse 5 | 110,565 (16,748) | 327 (39) | 56 (11) | 138, 297 (22,785) | 645 (49) | 98 (23) | 39,869 (6,156) | 258 (5) | 28 (1) | 3,278 (592) | 11 (1) | 2 (1) |
| MutaMouse Covance Female | 71,057 (9,512) | 2,379 (37) | 42 (20) | 111,611 (14,304) | 541 (31) | 91 (17) | 28,344 (3,663) | 198 (6) | 29 (3) | 2,932 (362) | 10 (1) | 2 (1) |
| MutaMouse Covance Male | 71,513 (9,820) | 242 (44) | 43 (22) | 112,389 (14,734) | 545 (36) | 90 (17) | 28,474 (3,478) | 199 (7) | 29 (3) | 2,934 (354) | 10 (1) | 2 (1) |
| Common | 55,389 (7,682) | 219 (22) | 36 (5) | 95,155 (12,129) | 479 (35) | 68 (18) | 27,121 (3,309) | 184 (2) | 19 (0) | 2,979 (358) | 11 (1) | 2 (1) |
| Total | 132,221 (24,470) | 407 (66) | 71 (26) | 180,472 (34,966) | 870 (74) | 129 (32) | 46,554 (8,808) | 280 (11) | 34 (5) | 3,938 (904) | 13 (1) | 2 (1) |
aCOSMIC cancer genes (Mouse homologs).
bRetrieved using BioMart.
cSIFT scores calculated using Ensembl Variant Effect Predictor; only canonical transcripts reported (i.e., only one consequence is reported per variant).
dNot reported in dbSNP 142.
eSubset of variants that are present in all mice.
fTotal count of variants (i.e., present in 1 or more mouse).
Genes in MutaMouse affected by potentially deleterious single nucleotide variants (as determined by SIFT) that are relevant to genetic toxicology studies.
| Gene Symbol | Number of SNVs in gene for each animal | Chromosomal position(s) and nucleotide change(s) | Amino acid change (codon change) | Proteina | Functions | ||||
|---|---|---|---|---|---|---|---|---|---|
| HC 1 | HC 2 | HC 3 | HC 4 | HC 5 | |||||
| Ahr | 1 | 1 | 1 | 1 | 1 | chr12:35508182 T > C | I/V (Atc/Gtc) | Aryl hydrocarbon receptor | Xenobiotic metabolism, Cell cycle |
| Arpp19 | 1 | 1 | 1 | 1 | 1 | chr9:75056711 C > G | P/R (cCg/cGg) | cAMP-regulated phosphoprotein 19 | Cell cycle |
| Cbl | 0 | 0 | 0 | 1 | 2 | chr9:44151504 C > T, chr9:44151527 A > T | A/T (Gct/Act), I/N (aTt/aAt) | E3 ubiquitin-protein ligase CBL | Cancer progression |
| Ccnb2 | 0 | 0 | 0 | 1 | 0 | chr9:70410215 G > A | T/M (aCg/aTg) | G2/mitotic-specific cyclin-B2 | Cell cycle |
| Ccnb3 | 1 | 1 | 1 | 1 | 1 | chrX:7025679 T > A | N/I (aAc/aTc) | G2/mitotic-specific cyclin-B3 | Cell cycle |
| Cdc5l | 0 | 0 | 1 | 0 | 0 | chr17:45407947 G > A, | T/M (aCg/aTg) | Cell division cycle 5-like protein | DNA repair, Cell cycle |
| Cdk4 | 0 | 0 | 1 | 0 | 1 | chr10:127064302 G > A | A/T (Gcc/Acc) | Cyclin-dependent kinase 4 | Cancer progression, Cell cycle |
| Cenpt | 0 | 0 | 0 | 1 | 0 | chr8:105845366 G > A | R/C (Cgc/Tgc) | Centromere protein T | Cell cycle |
| Eif4a2 | 0 | 0 | 2 | 2 | 0 | chr16:23113181 G > T, chr16:23113194 G > T | G/V (gGt/gTt), R/S (agG/agT) | Eukaryotic initiation factor 4A-II | Cancer progression |
| Hsp90ab1 | 0 | 1 | 9 | 9 | 1 | chr17:45568279 T > A, chr17:45568347 T > G, chr17:45568434 C > T, chr17:45568456 T > C, chr17:45568465 G > A, chr17:45568474 T > G, chr17:45568484 G > A, chr17:45568493 T > C, chr17:45568520 C > T, chr17:45568531 G > T, chr17:45568532 G > A | D/V (gAt/gTt), K/N (aaA/aaC), M/I (atG/atA), D/G (gAc/gGc), A/V (gCa/gTa), K/T (aAg/aCg), R/W (Cgg/Tgg), N/D (Aac/Gac), V/M (Gtg/Atg), P/H (cCc/cAc), P/S (Ccc/Tcc) | Heat shock protein HSP 90-beta | Cancer progression |
| Mapk6 | 1 | 1 | 1 | 1 | 1 | chr9:75388680 T > A | Q/L (cAg/cTg) | Mitogen-activated protein kinase 6 | Cell cycle |
| Mplkip | 2 | 2 | 2 | 2 | 2 | chr13:17695605 C > T, chr13:17695681 C > A | R/W (Cgg/Tgg), P/Q (cCg/cAg) | M-phase-specific PLK1-interacting protein | Cell cycle |
| Msi2 | 1 | 1 | 1 | 1 | 1 | chr11:88687461 C > A | V/F (Gtc/Ttc) | RNA-binding protein Musashi homolog | Cancer progression |
| Nabp2 | 0 | 1 | 0 | 0 | 1 | chr10:128408557 C > T | G/S (Ggc/Agc) | SOSS complex subunit B1 | DNA repair, Cell cycle |
| Nanog | 1 | 0 | 1 | 1 | 0 | chr6:122707814 T > G | L/W (tTg/tGg) | Homeobox protein | Cell cycle |
| Nek1 | 1 | 1 | 1 | 1 | 1 | chr8:61054575 C > A | R/S (Cgt/Agt) | Serine/threonine-protein kinase | Cell cycle |
| Rev1 | 0 | 2 | 0 | 2 | 0 | chr1:38088013 T > C, chr1:38088020 C > G | K/E (Aaa/Gaa), K/N (aaG/aaC) | DNA repair protein REV1 | DNA repair |
| Rprd1b | 0 | 1 | 0 | 0 | 0 | chr2:158047932 G > T | K/N (aaG/aaT) | Regulation of nuclear pre-mRNA domain-containing protein 1B | Cell cycle |
| Rrs1 | 13 | 12 | 13 | 13 | 13 | chr1:9545597 C > T, chr1:9545621 A > T, chr1:9545666 C > T, chr1:9545674 C > T, chr1:9545762 C > A, chr1:9545801 C > A, chr1:9545860 C > A, chr1:9545917 G > C, chr1:9545925 C > A, chr1:9546313 G > A, chr1:9546338 T > A, chr1:9546341 G > T, chr1:9546377 C > T | T/M (aCg/aTg), E/V (gAg/gTg), T/I (aCc/aTc), R/C (Cgc/Tgc), P/Q (cCg/cAg), P/Q (cCg/cAg), R/S (Cgc/Agc), V/L (Gtg/Ctg), D/E (gaC/gaA), E/K (Gag/Aag), L/H (cTt/cAt), R/L (cGa/cTa), T/M (aCg/aTg) | Ribosome biogenesis regulatory protein homolog | Cell cycle |
| Tdpoz2 | 1 | 1 | 1 | 1 | 1 | chr3:93652273 G > C | L/V (Ctc/Gtc) | TD and POZ domain-containing protein 2 | Cancer progression |
| Thrap3 | 3 | 1 | 3 | 3 | 3 | chr4:126165462 T > C, chr4:126165536 G > A, chr4:126165542 G > A | I/M (atA/atG), R/W (Cgg/Tgg), R/W (Cgg/Tgg) | Thyroid hormone receptor-associated protein 3 | Cancer progression |
| Tjp3 | 0 | 0 | 1 | 0 | 0 | chr10:81274478 T > G | (Atc/Ctc) | Tight junction protein ZO-3 | Cell cycle |
| Total | 29 | 32 | 46 | 45 | 32 | ||||
aPanther protein family/subfamily or class.
Genes in MutaMouse affected by potentially detrimental deletions discovered with both CNVnator and Manta (NGS-based tools) that are relevant to genetic toxicology studies.
| Gene | Predicted effects of variant | Chromosomal coordinatesa | Number of deleted bases | Protein | Functions | Copy numbers observed in MutaMouse population |
|---|---|---|---|---|---|---|
| Grk5 | intron variant | chr19:60925302–60934033 | 8,731 | G protein-coupled receptor kinase 5 | Cell cycle | 0 or 1 |
| Kif5b | downstream gene variant | chr18:6196144–6199729 | 3,585 | Kinesin-1 heavy chain | Cancer progression | 0 or 1 |
| Cdh11 | intron variant | chr8:102660453–102663326 | 2,873 | Cadherin-11 | Cancer progression | 0 |
| Tnks | intron variant | chr8:34882786–34885003 | 2,217 | Tankyrase-1 | Cell cycle | 0 |
| Dctn6 | downstream gene variant | chr8:34087155–34089063 | 1,908 | Dynactin subunit 6 | Cell cycle | 0 |
| Nrg1 | intron variant | chr8:31969110–31970943 | 1,833 | Neuregulin 1 | Cancer progression | 0 |
| Ncoa2 | upstream gene variant | chr1:13378446–13380234 | 1,788 | Nuclear receptor coactivator 2 | Cancer progression | 0 |
| Arhgap26 | intron variant, non-coding transcript variant | chr18:38663457–38665186 | 1,729 | Rho GTPase-activating protein 26 | Cancer progression | 0 or 1 |
| Rec114 | intron variant | chr9:58685759–58687011 | 1,252 | Meiotic recombination protein REC114 | Cell cycle | 0 or 1 |
| Wdr62 | intron variant | chr7:30275960–30276784 | 824 | WD repeat-containing protein 62 | Cell cycle | 0 |
| Prkdc | intron variant | chr16:15638729–15639323 | 594 | DNA-dependent protein kinase catalytic subunit | Cell cycle, DNA repair | 0 |
| Kmt2e | intron variant | chr5:23494281–23494730 | 449 | Histone-lysine N-methyltransferase 2E | Cell cycle | 0 |
| Fancc | intron variant | chr13:63480021–63480458 | 437 | Fanconi anemia group C protein homolog | Cancer progression, DNA repair | 0 |
aCoordinates reported by Manta are shown since they allow nucleotide resolution, while CNVnator reports only 250 bp bin locations.
Figure 4Four examples of CNVs predicted using aCGH (probes shown as points in top panels; CNV call between dotted lines) overlaid with their respective NGS coverage (bottom panels; NGS coverage is shown as points; CNVnator calls from NGS data shown as black lines below coverage; Manta calls from NGS data shown as red lines below coverage). We find that some CNVs predicted using aCGH are consistent across the entire call region when compared with NGS coverage. On the other hand, many regions predicted as CNVs by aCGH represent regions of the genome encompassing many small deletions and/or gains. (A) A duplication on chromosome 11 showing congruent results between aCGH and NGS. (B) A deletion on chromosome 14 showing congruent results between aCGH and NGS. (C) A deletion on chromosome 9 predicted as one large deletion by aCGH that NGS has revealed is composed of many small deletions. (D) A deletion on chromosome 6 predicted as one large deletion by aCGH that NGS revealed is composed of many small deletions as well as several copy number gains.
Figure 5Assembly of λgt10-lacZ transgene integration site using PacBio read data and NGS coverage data. There are approximately 29 copies of the λgt10 ori replication site[35]. However, we show here that at least one copy (at the right breakpoint, downstream of the integration site) is non-functional. There are 18 SNVs that are constitutional in every copy of the transgene (shown by blue arrows in the top panel for one transgene monomer, and listed in Table 5). The breakpoints near the integration site were resolved by the use of long PacBio reads that map to both the λgt10 transgene sequence and mouse genomic sequence (bottom panel, colored text). The integration of the transgene comprised an unusual event involving (1) the deletion of ~465 Kbp of mouse genomic sequence, (2) the inversion of 370 bp of mouse genomic sequence from the downstream breakpoint to the upstream breakpoint, (3) the insertion of two partial truncated transgene monomers at either breakpoint, and (4) the insertion of shuffled transgene sequence at the downstream breakpoint as well as the insertion of several random nucleotides at the upstream breakpoint.
Constitutional variants that are present in all copies of the MutaMouse transgene across all animals sequenced.
| Position | Reference Base | Alternate Base |
|---|---|---|
| 137 | AG | A |
| 14266 | C | CG |
| 19670 | T | A |
| 19673 | A | C |
| 23445 | G | C |
| 23481 | A | G |
| 24583 | TA | T |
| 25143 | A | G |
| 27867 | T | C |
| 28598 | G | A |
| 28599 | T | A |
| 28817 | A | T |
| 30349 | CT | C |
| 31786 | A | G |
| 33016 | T | C |
| 34070 | A | G |
| 34331 | T | C |
| 44629 | T | C,A,G |
Figure 6Comparison of variants discovered in MutaMouse colonies that were geographically isolated for 23 years of breeding (78 generations). (A) More SNVs of parental origin were retained in the Health Canada colony than the Covance colony, which experienced a population bottleneck circa 2006. The number of putative de novo SNVs is comparable, suggesting that the germline mutation rates do not differ between the populations. (B) On a phylogenetic tree, the MutaMouse colony from Covance forms a distinct branch, which is consistent with our finding that (C) Animals from the Covance MutaMouse colony have higher genotype discordance with animals from Health Canada than with each other.