| Literature DB >> 28592645 |
Daniel Oreper1,2, Yanwei Cai1,2, Lisa M Tarantino2,3, Fernando Pardo-Manuel de Villena2,4, William Valdar5,4.
Abstract
The Collaborative Cross (CC) is a panel of recently established multiparental recombinant inbred mouse strains. For the CC, as for any multiparental population (MPP), effective experimental design and analysis benefit from detailed knowledge of the genetic differences between strains. Such differences can be directly determined by sequencing, but until now whole-genome sequencing was not publicly available for individual CC strains. An alternative and complementary approach is to infer genetic differences by combining two pieces of information: probabilistic estimates of the CC haplotype mosaic from a custom genotyping array, and probabilistic variant calls from sequencing of the CC founders. The computation for this inference, especially when performed genome-wide, can be intricate and time-consuming, requiring the researcher to generate nontrivial and potentially error-prone scripts. To provide standardized, easy-to-access CC sequence information, we have developed the Inbred Strain Variant Database (ISVdb). The ISVdb provides, for all the exonic variants from the Sanger Institute mouse sequencing dataset, direct sequence information for CC founders and, critically, the imputed sequence information for CC strains. Notably, the ISVdb also: (1) provides predicted variant consequence metadata; (2) allows rapid simulation of F1 populations; and (3) preserves imputation uncertainty, which will allow imputed data to be refined in the future as additional sequencing and genotyping data are collected. The ISVdb information is housed in an SQL database and is easily accessible through a custom online interface (http://isvdb.unc.edu), reducing the analytic burden on any researcher using the CC.Entities:
Keywords: Collaborative Cross; MPP; haplotype; inbred strain; multiparental populations; online GUI; variant imputation
Mesh:
Year: 2017 PMID: 28592645 PMCID: PMC5473744 DOI: 10.1534/g3.117.041491
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Breeding process for two Collaborative Cross (CC) strains. Both funnels begin by outcrossing the same eight founders, but the initial outcrossing order differs, resulting in completely independent populations per funnel. Animals are outcrossed for three generations, then inbred until genotyping reveals at least two animals with >90% consistent homozygosity by haploytpe. These homozygous animals [a.k.a., the most recent common ancestors (MRCAs)] are chosen to become the obligate ancestors for the CC strains; all subsequent generations of a CC strain descend from a subset of the MRCAs. In (A), arrows show CC1 MRCA regions of inconsistent homozygosity (L1) and residual heterozygosity (L2 and L3). After further inbreeding, only L2 continues to segregate. In (B), the CC2 MRCA set includes three animals rather than two. After further inbreeding, only L1 continues to segregate, but a de novo mutation has become fixed at L2.
Figure 2(A) Example workflow of the Inbred Strain Variant Database (ISVdb) online graphical user interface (GUI). A user has queried the genotype of CC012, on chromosome 19, from 6054740:6054749. The “Primary Query” panel also allows additional strains, and/or specification of the region by genes instead. The user is interested in all zygosity variants, of all consequences, and all probabilities, and thus has applied no secondary restriction. After the user clicked “Submit!,” a URL to download the resulting table was generated, as well as an online version of the table. The first three rows of the table are shown here: noticeably they all represent the same variant in the same strain. The difference between the rows is highlighted in the yellow box: the genotype per row and its associated probability. Collectively, the three rows represent that there is a probability of a genotype, of and of at this variant in CC012. (B) The remaining wrapped columns of output from part (A) (A was too wide). Note that each genotype has a different consequence, accentuating that only accounting for the most likely genotype would cause a nonnegligible loss of information. Also, note that this figure was pieced together from a screen capture to fit on a single page.