| Literature DB >> 33067325 |
John Sebastian Sigmon1, Matthew W Blanchard2,3, Ralph S Baric4, Timothy A Bell2, Jennifer Brennan3, Gudrun A Brockmann5, A Wesley Burks6, J Mauro Calabrese7,8, Kathleen M Caron9, Richard E Cheney9, Dominic Ciavatta2, Frank Conlon10, David B Darr8, James Faber9, Craig Franklin11, Timothy R Gershon12, Lisa Gralinski4, Bin Gu9, Christiann H Gaines2, Robert S Hagan13, Ernest G Heimsath8,9, Mark T Heise2, Pablo Hock2, Folami Ideraabdullah2,8,14, J Charles Jennette15, Tal Kafri16,17, Anwica Kashfeen1, Mike Kulis6, Vivek Kumar18, Colton Linnertz2, Alessandra Livraghi-Butrico19, K C Kent Lloyd20,21,22, Cathleen Lutz18, Rachel M Lynch2,8, Terry Magnuson2,3,8, Glenn K Matsushima16,23, Rachel McMullan2, Darla R Miller2,8, Karen L Mohlke2, Sheryl S Moy24,25, Caroline E Y Murphy2, Maya Najarian1, Lori O'Brien9, Abraham A Palmer26, Benjamin D Philpot9,19, Scott H Randell9, Laura Reinholdt18, Yuyu Ren26, Steve Rockwood18, Allison R Rogala15,27, Avani Saraswatula2, Christopher M Sassetti28, Jonathan C Schisler7, Sarah A Schoenrock2, Ginger D Shaw2, John R Shorter2, Clare M Smith28, Celine L St Pierre26, Lisa M Tarantino2,29, David W Threadgill26,30, William Valdar2, Barbara J Vilen16, Keegan Wardwell18, Jason K Whitmire2, Lucy Williams2, Mark J Zylka9, Martin T Ferris31, Leonard McMillan1, Fernando Pardo Manuel de Villena2,3,8.
Abstract
The laboratory mouse is the most widely used animal model for biomedical research, due in part to its well-annotated genome, wealth of genetic resources, and the ability to precisely manipulate its genome. Despite the importance of genetics for mouse research, genetic quality control (QC) is not standardized, in part due to the lack of cost-effective, informative, and robust platforms. Genotyping arrays are standard tools for mouse research and remain an attractive alternative even in the era of high-throughput whole-genome sequencing. Here, we describe the content and performance of a new iteration of the Mouse Universal Genotyping Array (MUGA), MiniMUGA, an array-based genetic QC platform with over 11,000 probes. In addition to robust discrimination between most classical and wild-derived laboratory strains, MiniMUGA was designed to contain features not available in other platforms: (1) chromosomal sex determination, (2) discrimination between substrains from multiple commercial vendors, (3) diagnostic SNPs for popular laboratory strains, (4) detection of constructs used in genetically engineered mice, and (5) an easy-to-interpret report summarizing these results. In-depth annotation of all probes should facilitate custom analyses by individual researchers. To determine the performance of MiniMUGA, we genotyped 6899 samples from a wide variety of genetic backgrounds. The performance of MiniMUGA compares favorably with three previous iterations of the MUGA family of arrays, both in discrimination capabilities and robustness. We have generated publicly available consensus genotypes for 241 inbred strains including classical, wild-derived, and recombinant inbred lines. Here, we also report the detection of a substantial number of XO and XXY individuals across a variety of sample types, new markers that expand the utility of reduced complexity crosses to genetic backgrounds other than C57BL/6, and the robust detection of 17 genetic constructs. We provide preliminary evidence that the array can be used to identify both partial sex chromosome duplication and mosaicism, and that diagnostic SNPs can be used to determine how long inbred mice have been bred independently from the relevant main stock. We conclude that MiniMUGA is a valuable platform for genetic QC, and an important new tool to increase the rigor and reproducibility of mouse research.Entities:
Keywords: chromosomal sex; diagnostic SNPs; genetic QC; genetic background; genetic constructs; substrains
Mesh:
Year: 2020 PMID: 33067325 PMCID: PMC7768238 DOI: 10.1534/genetics.120.303596
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.402
Sample set
| Content | Chromosomal sex | Inbred | F1 | CC | Cross | Unclassified | Cell lines | Total |
|---|---|---|---|---|---|---|---|---|
| Preliminary | 138 | 131 | 305 | 1383 | 817 | 87 | 2861 | |
| 265 | 41 | 181 | 1236 | 907 | 74 | 2704 | ||
| 0 | 1 | 3 | 11 | 8 | 9 | 32 | ||
| 0 | 1 | 1 | 2 | 3 | 0 | 7 | ||
| Subtotal | 5604 | |||||||
| Production | 41 | 59 | 40 | 580 | 21 | 4 | 745 | |
| 153 | 13 | 7 | 248 | 112 | 10 | 543 | ||
| 0 | 1 | 0 | 2 | 0 | 0 | 3 | ||
| 0 | 0 | 0 | 4 | 0 | 0 | 4 | ||
| Subtotal | 1295 | |||||||
| Total | 597 | 247 | 537 | 3466 | 1868 | 184 | 6899 |
The table provides the number of samples genotyped in the preliminary and production version of the array classified according their chromosomal sex and type. CC, collaborative cross; Cross, experimental back- and intercrosses; unclassified, samples provided by the coauthors that may be of any type.
Figure 1Chromosomal sex determination in 6899 samples. Each circle and cross represent one genotyped sample. The x-axis value is the autosome-normalized median sample intensity at 269 sex-informative X chromosome markers, and the y-axis value is the autosome-normalized median sample intensity at 72 sex-informative Y chromosome markers. The dot color denotes the assigned chromosomal sex: XX, red; XY, blue; XO, green; and XXY, purple. Potential mosaic samples are shown in gray and known errors in yellow. Samples with pd_stat lower than the threshold are shown as circles and samples with high pd_stat are shown as crosses.
Sequenced inbred mouse strains used to select the content of the genotyping array.
| Background | Strain group | Diagnostic type | Full | Partial | Reference |
|---|---|---|---|---|---|
| 129P2/OlaHsd | 129P | Substrain | 25 | 0 | |
| 129P3/J | 129P | Substrain | 54 | 0 | M. T. Ferris |
| 129S1/SvImJ | 129S | Substrain | 82 | 13 | |
| 129S2/SvHsd | 129S | Substrain | 7 | 1 | M. T. Ferris |
| 129S2/SvPasOrlRj | 129S | Substrain | 36 | 0 | M. T. Ferris |
| 129S4/SvJaeJ | 129S | Substrain | 45 | 0 | M. T. Ferris |
| 129S5/SvEvBrd | 129S | Substrain | 12 | 0 | |
| 129S6/SvEvTac | 129S | Substrain | 41 | 0 | M. T. Ferris |
| 129T2/SvEmsJ | 129T | Substrain | 38 | 0 | M. T. Ferris |
| 129X1/SvJ | 129X | Substrain | 39 | 0 | M. T. Ferris |
| A/J | A | Substrain | 58 | 7 | |
| A/JCr | A | Substrain | 53 | 0 | M. T. Ferris |
| A/JOlaHsd | A | Substrain | 38 | 0 | M. T. Ferris |
| BALB/cAnNCrl | BALB/c | Substrain | 36 | 2 | M. T. Ferris |
| BALB/cAnNHsd | BALB/c | Substrain | 109 | 4 | M. T. Ferris |
| BALB/cByJ | BALB/c | Substrain | 3 | 4 | M. T. Ferris |
| BALB/cByJRj | BALB/c | Substrain | 19 | 0 | M. T. Ferris |
| BALB/cJ | BALB/c | Substrain | 103 | 3 | |
| BALB/cJBomTac | BALB/c | Substrain | 47 | 0 | M. T. Ferris |
| C3H/HeJ | C3H/He | Substrain | 166 | 2 | |
| C3H/HeNCrl | C3H/He | Substrain | 39 | 0 | M. T. Ferris |
| C3H/HeNHsd | C3H/He | Substrain | 39 | 1 | M. T. Ferris |
| C3H/HeNRj | C3H/He | Substrain | 42 | 0 | M. T. Ferris |
| C3H/HeNTac | C3H/He | Substrain | 45 | 14 | M. T. Ferris |
| C57BL/6J | C57BL/6 | Substrain | 136 | 20 | |
| C57BL/6JBomTac | C57BL/6 | Substrain | 41 | 2 | M. T. Ferris |
| C57BL/6JOlaHsd | C57BL/6 | Substrain | 43 | 0 | M. T. Ferris |
| C57BL/6NJ | C57BL/6 | Substrain | 37 | 7 | |
| C57BL/6NRj | C57BL/6 | Substrain | 20 | 0 | M. T. Ferris |
| B6N-Tyr < c-Brd>/BrdCrCrl | C57BL/6 | Substrain | 21 | 10 | M. T. Ferris |
| DBA/1J | DBA/1 | Substrain | 70 | 0 | |
| DBA/1LacJ | DBA/1 | Substrain | 77 | 2 | M. T. Ferris |
| DBA/1OlaHsd | DBA/2 | Substrain | 32 | 0 | M. T. Ferris |
| DBA/2J | DBA/2 | Substrain | 112 | 0 | |
| DBA/2JOlaHsd | DBA/2 | Substrain | 39 | 0 | M. T. Ferris |
| DBA/2JRj | DBA/2 | Substrain | 30 | 0 | M. T. Ferris |
| DBA/2NCrl | DBA/2 | Substrain | 85 | 14 | M. T. Ferris |
| DBA/2NTac | DBA/2 | Substrain | 36 | 10 | M. T. Ferris |
| FVB/NCrl | FVB | Substrain | 47 | 0 | M. T. Ferris |
| FVB/NHsd | FVB | Substrain | 39 | 1 | M. T. Ferris |
| FVB/NJ | FVB | Substrain | 72 | 7 | |
| FVB/NRj | FVB | Substrain | 47 | 0 | M. T. Ferris |
| FVB/NTac | FVB | Substrain | 37 | 0 | M. T. Ferris |
| NOD/MrkTac | NOD | Substrain | 33 | 0 | M. T. Ferris |
| NOD/ShiLtJ | NOD | Substrain | 51 | 3 | |
| Subtotal | 2281 | 127 | |||
| 129S | 129S | Strain group | 17 | 0 | |
| A | A | Strain group | 57 | 0 | |
| BALB/c | BALB/c | Strain group | 125 | 0 | |
| C3H/He | C3H/He | Strain group | 45 | 0 | |
| C57BL/10 | C57BL/10 | Strain group | 291 | 0 | |
| C57BL/6 | C57BL/6 | Strain group | 19 | 0 | |
| DBA/1 | DBA/1 | Strain group | 5 | 0 | |
| DBA/2 | DBA/2 | Strain group | 62 | 0 | |
| FVB/N | FVB/N | Strain group | 2 | 0 | |
| NZO | NZO | Strain group | 12 | 0 | |
| Subtotal | 635 | 0 | |||
| Total | 2916 | 127 | |||
The table provides the strain name and group, the number and type for both fully and partial diagnostic SNPs, and the source of the whole-genome sequencing data.
Validated constructs
| Name | Abreviation | Number of probes | Number of distinct probes |
|---|---|---|---|
| “Greenish” Fluorescent Protein (EGFP, EYFP, and ECFP) | g_FP | 19 | 19 |
| SV40 large T antigen | SV40 | 18 | 18 |
| Cre recombinase | Cre | 16 | 12 |
| Tetracycline repressor protein | tTA | 14 | 14 |
| Diptheria toxin | DTA | 11 | 11 |
| Human CMV enhancer | hCMV_b | 10 | 7 |
| Luciferase and firefly luciferase | Luc | 10 | 10 |
| Chloramphenicol acetyltransferase | chloR | 9 | 9 |
| Bovine growth hormone poly A signal sequence | bpA | 8 | 4 |
| iCre recombinase | iCre | 8 | 8 |
| Reverse improved tetracycline-controlled transactivator | rtTA | 8 | 4 |
| CRISPR associated protein 9 | cas9 | 7 | 7 |
| Blasticidin resistance | BlastR | 6 | 4 |
| Internal Ribosome Entry Site | IRES | 6 | 6 |
| hCMV enhancer | hCMV_a | 5 | 4 |
| “Reddish” fluorescent protein (tdTomato, mCherry) | r_FP | 6 | 6 |
| Herpesvirus TK promoter | hTK_pr | 2 | 2 |
| Total | 163 | 145 |
The table lists the name, abbreviation shown in the report and the number of total and distinct probes for 17 constructs validated in the data set reported here. EGFP, enhanced green fluorescent protein; EYFP, enhanced yellow fluorescent protein; ECFP, enhanced cyan fluorescent protein; CMV, cytomegalovirus; hCMV, human cytomegalovirus; SV40, simian virus 40; TK, thymidine kinase.
Figure 2Sex chromosome aneuploidy is due to paternal nondisjunction. The figure shows the parental sex chromosome and mitochondrial complement of the dam and sire for two types of crosses. Only the sex chromosomes and the mitochondria are shown. The X chromosomes are shown as long acrocentric, the Y chromosomes as shorter submetacentric, and the mitochondria as circles. The figure also shows the inferred parental origin of the sex chromosome aneuploidy and the actual number of cases observed in our data set. The sex chromosome configuration of standard types of sex chromosome aneuploidy in the progeny in each type of cross are shown with the inferred parental origin of the X chromosomes.
Figure 3Complex sex chromosome aneuploidy and mosaicism in an F1 male. (A) The panel shows the chromosomal sex and mitochondria complement of the parents and F1 individual. Blue denotes C57BL/6J and red denotes 129X1/SvJ. (B) This panel is a reprint of Figure 1 and was used to classify the F1 male, shown as a yellow circle, as an XXY based on the x- and y-axis intensities (two X chromosomes and a Y chromosome present). This panel also provides evidence of mosaicism for the presence and absence of the Y chromosome (based on the low Y chromosome intensity). (C) This panel provides evidence of mosaicism for the X chromosome and identifies the paternal origin (129X1/SvJ) of the chromosome lost in some cells. The plot presents the intensities of the two alternate alleles for 173 X chromosome markers that are informative between the two parents. Four individuals are shown: a C57BL/6J female in blue, a 129X1/SvJ male in red, a (C57BL/6Jx129X1/SvJ)F1 female in gray, and the F1 male case in yellow. The shapes denote the type of call made by the Illumina software: circles are homozygous A, T, C, or G calls; triangles are H calls; and squares are N calls. (D) This panel shows the proposed sex chromosome complement of the two types of cells present in this F1 male case. This solution explains the observations from previous three panels.
Figure 4Segmental chromosome Y duplications in laboratory strains. (A) Normalized median Y chromosome intensity in selected samples with C3H/He, DBA/1, and C57BL/6 Y chromosomes. Within the C3H/He group, samples with a C3H/HeJ Y chromosome are shown in orange while samples with any other C3H/He Y chromosome are shown in blue. For DBA/1, there are multiple technical replicates of a single sample with abnormally high intensity shown in orange. For C57BL/6, there is only one sample with abnormally high intensity. The shape of the point reflects the type of mouse. (B) Range of normalized intensity distributions located at 63 SNPs on the short arm and the beginning of the long arm of Y chromosome in the C3H/He samples shown in (A). The range of intensities (mean ± SD) in samples with a C3H/HeJ Y chromosome are shown in orange while samples with any other types of C3H/He Y chromosomes are shown in blue. At the top of the panel, the potential duplication is shown in red, transition regions with uncertain copy number are shown in pink, and normal copy numbers are shown in black. The bottom of the panel shows the location of the MiniMUGA markers and genes. MUGA, Mouse Universal Genotyping Array.
Figure 5Number of informative SNP calls in pairwise comparisons among classical inbred strains. Strains are ordered by similarity and colors represent the range of number of informative SNPs based on the consensus genotypes. Only homozygous base calls, at tier 1 and 2 markers, on the autosomes, X, and pseudoautosomal region are included.
Figure 6Haplotype diversities. Haplotype diversities of the mitochondria (A) and chromosome Y (B). The trees are built based on the variation present in MiniMUGA and may not represent the real phylogenetic relationships. Colors denote the subspecies-specific origin of the haplotype in question: shades of blue represent M. m. domesticus haplotypes; shades of red represent M. m. musculus haplotypes; and shades of green represent M. m. castaneus haplotypes. The arrow in panel (A) identifies a M. spretus strain with a M. m. domesticus mitochondria haplotype. MUGA, Mouse Universal Genotyping Array.
Dating the origin and fixation of diagnostic SNPs in five mouse inbred strains
| Substrain | Cohort | Year | Number of samples | Range of alleles sampled | Diagnostic allele | ||
|---|---|---|---|---|---|---|---|
| Absent | Segregating | Fixed | |||||
| C57BL/6J | BXD E1 | 1971 | 22 | 11 | 156 | 0 | 0 |
| BXD E2 | 1996 | 4 | 0–4 | 84 | 72 | 0 | |
| BXD E3 | 2001–2009 | 24 (23) | 11.5 | 50 | 31 | 75 | |
| CC | 2004–2007 | 483 (72) | 4–18 | 8 | 30 | 118 | |
| Consensus | 2010–2016 | 15 (1) | 15 | 0 | 20 | 136 | |
| DBA/2J | BXD E1 | 1971 | 22 | 11 | 105 | 7 | 0 |
| BXD E2 | 1996 | 4 | 0–4 | 37 | 62 | 13 | |
| BXD E3 | 2001–2009 | 24 (23) | 11.5 | 24 | 75 | 13 | |
| Consensus | 2010–2016 | 3 (1) | 3 | 0 | 0 | 112 | |
| A/J | CC | 2004–2007 | 483 (72) | 2–22 | 2 | 11 | 47 |
| Consensus | 2010–2016 | 10 (1) | 10 | 0 | 5 | 55 | |
| 129S1/SvImJ | CC | 2004–2007 | 483 (72) | 3–42 | 1 | 6 | 81 |
| Consensus | 2010–2016 | 10 (1) | 10 | 0 | 4 | 84 | |
| NOD/ShiLtJ | CC | 2004–2007 | 483 (72) | 4–43 | 1 | 2 | 34 |
| Consensus | 2010–2016 | 8 (1) | 8 | 0 | 1 | 36 | |
This table lists the name of the substrain, the cohorts used for dating the diagnostic SNPs, the approximate year(s) when these cohorts were derived from the main stock, the number of samples genotyped, and the range of alleles sampled. When the number of samples does not match the number of strains, the number of strains is shown in parentheses. Diagnostic alleles are classified as absent, segregating, and fixed for each substrain and cohort, and the table provides the total number in each category.
In this analysis, we excluded samples purchased from The Jackson Laboratory (the sample names include the suffix jaxDNA) over a decade ago in the consensus cohorts. Details are provided in the text. CC, Collaborative Cross; BXD, recombinant inbred BXD panel
Full dating of diagnostic alleles for the C57BL/6J substrain
| Apparent fixation | Not fixed | ||||||
|---|---|---|---|---|---|---|---|
| BXD E1 | BXD E2 | BXD E3 | CC | Consensus | |||
| Earliest observation | BXD E1 | 0 | 0 | 0 | 0 | 0 | 0 |
| BXD E2 | NA | 0 | 67 | 1 | 4 | 0 | |
| BXD E3 | NA | NA | 8 | 18 | 8 | 0 | |
| CC | NA | NA | NA | 24 | 4 | 14 | |
| Consensus | NA | NA | NA | NA | 2 | 6 | |
The table classifies 156 diagnostic SNPs into one of 20 categories based on the earliest observation (origin) and apparent date of fixation based on whether the diagnostic allele is observed in BXD and CC strains with the C57BL/6J haplotype at each loci. Temporally impossible cells are shown as NA. BXD, Recombinant inbred BXD panel; CC, collaborative cross.
In this analysis, we excluded samples purchased from The Jackson Laboratory (the sample names include the suffix jaxDNA) over a decade ago in the consensus cohorts. Details are provided in the text.
Figure 7Sample dating and breeding history of mice with C57BL/6J background. Red bars denote the ancestral allele for diagnostic SNPs fixed by E3 in the BXD panel. Pink bars denote ancestral alleles for diagnostic SNPs fixed by the start of the CC. Light blue bars denote diagnostic alleles at diagnostic SNPs fixed by E3. Blue bars denote diagnostic alleles at diagnostic SNPs fixed by the start of CC. Gray bars denote ancestral alleles at post-CC diagnostic SNPs. Black bars denote diagnostic alleles at post-CC diagnostic SNPs. Split bars denote heterozygosity. (A) Inbred Baff male in C57BL/6J background. (B) Inbred transgenic and IFNgR1 female in C57BL/6J background. (C) Inbred C57BL/6J male. Diagnostic allele always represent the derived allele, and the nondiagnostic allele is always the ancestral allele. CC, collaborative cross; BXD, Recombinant inbred BXD panel; E, epoch.
Figure 8Percent of the genome covered by MiniMUGA in RCC. Each of the 78 RCC is shown as a circle in ascending order. The order is independent for each one of the six analyses. Coverage was based on the linkage distance to the nearest informative marker in given RCC. MUGA, Mouse Universal Genotyping Array; RCC, reduced-complexity crosses.
Figure 9Detection of genetic constructs validated in MiniMUGA. For each construct, samples are shown as dots and classified as negative controls (left), experimental (center), and positive controls (right). The dot color denotes whether the sample is determined to be negative (blue), positive (red), or questionable (gray) for the respective construct. For each construct, the gray horizontal lines represent data-driven ad hoc thresholds discriminating between presence and absence. Note for each construct, the y-axis scale is different. MUGA, Mouse Universal Genotyping Array; g_FP; ‘greenish’ fluorescent protein; SV40; SV40 large T antigen; Cre, Cre recombinase; tTA, tetracycline repressor protein;; DTA, Diptheria toxin; hCMV_b, Human CMV enhancer version b; Luc, Luciferase and firefly luciferase; chloR, Chloramphenicol acetyltransferase; bpA, Bovine growth hormone poly A signal sequence; iCre, iCre recombinanse; rtTA, Reverse improved tetracycline-controlled transactivator; cas9, CRISPR associated protein 9; BlastR, Blasticidin resistance; IRES, Internal Ribosome Entry Site; hCMV_a, hCMV enhancer version a; r_FP, ‘reddish’ fluorescent protein; hTK_pr, Herpesvirus TK promoter.
Figure 10Background Analysis Report for the sample named MMRRC_UNC_F38673, from the line named B6.Cg-Cdkn2a/Mmnc. The genotype of this sample is of excellent quality. It is a close to inbred female that is a congenic with C57BL/6J as a primary background, and with multiple regions of a 129S secondary background. This sample is positive for the luciferase and firefly luciferase construct, and negative for 16 other constructs. g_FP; ‘greenish’ fluorescent protein; SV40; SV40 large T antigen; Cre, Cre recombinase; tTA, tetracycline repressor protein;; DTA, Diptheria toxin; hCMV_b, Human CMV enhancer version b; Luc, Luciferase and firefly luciferase; chloR, Chloramphenicol acetyltransferase; bpA, Bovine growth hormone poly A signal sequence; iCre, iCre recombinanse; rtTA, Reverse improved tetracycline-controlled transactivator; cas9, CRISPR associated protein 9; BlastR, Blasticidin resistance; IRES, Internal Ribosome Entry Site; hCMV_a, hCMV enhancer version a; r_FP, ‘reddish’ fluorescent protein; hTK_pr, Herpesvirus TK promoter; PAR. pseudoautosomal region