Literature DB >> 28838066

Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse.

Abstract

The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided.

Entities: Chemical Disease Gene Species

Keywords: database; genetics; genomics; human disease model; informatics; model organism; mouse; phenotypes

Mesh：

Year: 2017 PMID： 28838066 PMCID： PMC5886341 DOI： 10.1093/ilar/ilx013

Source DB: PubMed Journal: ILAR J ISSN： 1084-2020

Introduction

The laboratory mouse is an essential model for understanding human biology, health, and disease. Among the key advantages of the mouse as an experimental tool are its characteristics of small size, short gestation period with multiple young, a short lifespan, and being a highly evolved mammal that is physiologically and genetically very similar to humans. In addition, the mouse is genetically the best-studied mammalian model, is experimentally accessible at all life stages, its complete genome has been sequenced, and plentiful technologies for precisely manipulating its genome make the mouse an exceptional animal for developing new living tools for scientific inquiry and human disease modeling. Further, the large number of inbred strains and special purpose strains that have been developed provide fertile ground for population studies and the potential for understanding multi-genic disease susceptibilities and testing therapeutics. Many factors contribute to the strength of the mouse as a model system, including the sequencing of its genome (Mouse Genome Sequencing Consortium 2002) and subsequent whole genome sequencing of other mouse strains (c.f., Doran et al. 2016; Keane et al. 2011; Nikolskiy et al. 2015; Yalcin et al. 2012), discovery and analysis of single nucleotide polymorphisms (SNPs) and copy-number variation, the wide diversity of specialized strains (c.f., congenic, recombinant inbred, and humanized mice), and the Collaborative Cross (Collaborative Cross Consortium 2012; Morgan and Welsh 2015; Threadgill and Churchill 2012) and Diversity Outcross (Churchill et al. 2012; Morgan and Welsh 2015; Svenson et al. 2012) populations that model human populations, and the International Knockout Mouse Consortium (IKMC) (Austin et al. 2004; Auwerx et al. 2004; Bradley et al. 2012; International Mouse Knockout Consortium 2007) and International Mouse Phenotyping Consortium (IMPC) (Brown and Moore 2012a,b) that are revealing new functions of the genome. These and other important factors that make the mouse an exceptional model organism are beyond the scope of this review. Rather, this review concentrates on some of the important inputs into the evolution of the mouse as a model genetic system. These include gene discovery, generation of defined mutations, and analysis of phenotypes that are core to the Mouse Genome Informatics (MGI) Resource (Mouse Genome Informatics 2016l). The continued advances in biological knowledge and computational technologies over the last 30 years, as well as the strong community input and participation, have together driven and shaped MGI into the international mouse knowledgebase it is today.

History

The Development of Mouse as an Experimental and Translational System

Mice have been with mankind for millennia, viewed as pests in early human grain stores, revered for bringing good luck in ancient Asia, and selectively bred for interesting physical attributes and docile characteristics by mouse fancier groups (Royer 2015). But the mouse did not ascend to a tractable, exceptional experimental system until after Cuénot’s demonstration that Mendelian genetic principles apply to mammals (Cuénot 1902). Once it was understood that the fundamental rules of inheritance were applicable, studies were undertaken on gene discovery via observations on segregation of visible traits. By 1945 a linkage map including 26 genes in 10 linkage groups, some with more than one allelic form, was published by the staff of the Roscoe B. Jackson Memorial Laboratory (1945). Gene discovery and mapping continued at a steady pace until the 1970s and 1980s when biochemical variants became tractable genetic markers and propelled a rapid expansion in gene discovery of many nonvisible, but physiologically important, molecular variants (c.f., Felder 1980; Hutton and Coleman 1969; Hutton and Roderick 1970; von Deimling et al. 1988). The advent of DNA technology, with the ability to detect polymorphisms based on restriction length polymorphisms (RFLPs) (c.f., Berman et al. 1986; Dandoy et al. 1985; Mock et al. 1987), further accelerated the density of the genetic markers on the mouse linkage map, from 752 known mapped genes in 1985 (Roderick and Davisson 1985) to a map of 7377 RFLP markers in 1996 (Dietrich et al. 1996). The Human Genome Project to sequence the human (and mouse) genomes (Collins et al. 1998) further accelerated the development of the mouse genome, its sequence map, and gene identification, which has settled in to a protein-coding gene number of approximately 20,000 to 25,000 genes in both human and mouse. Figure 1 illustrates the growth of mouse gene identification, with clear identification of the inflection points caused by the advent of molecular markers and the acceleration (to completion) of the mouse genome sequence in 2002 (Mouse Genome Sequencing Consortium 2002).

Figure 1

The number of protein-coding genes identified in mice over time. The earliest genes discovered in mouse were in genetic segregation studies of morphological and physiological characters. A significant uptick can be seen when biochemical and molecular measurements became widespread in the 1980s, followed by the advent of molecular markers (e.g., cloned genes/gene fragments, RFLPs, SSLPs) in the 1990s, and finally the sequencing of the mouse genome, which was published by the Mouse Genome Sequencing Consortium in 2002. The number of protein-coding genes for mouse has consistently maintained at 20,000 to 25,000 for the last several years. The functional noncoding RNAs and myriad expanding identification of regulatory elements will continue to greatly expand the functional map of the mouse for the foreseeable future. Important to the growth and development of the mouse as a model system were the network of researchers working on mice, who not only used standard publication methods to disseminate their work, but established many informal collaborations and open communication channels such as the Mouse News Letter and Chromosome Committees. Such cooperativeness is still observed through the International Committee for Standardized Genetic Nomenclature for Mice, the annual International Mammalian Genome Conferences, and the continued community contributions of data to the MGI resource.

More Data and Computer Advances Drive the Initiation and Expansion of MGI

The precursor of MGI, the “Encyclopedia of the Mouse Genome” software, was initiated in 1989 to develop an interactive tool to simultaneously visualize data from several investigator databases being maintained at The Jackson Laboratory. This was also a time when the Worldwide Web was not in common use and most biologists accessed a computer, if at all, through command line input into a central server. The first versions of this “encyclopedia” were distributed to scientists via postal mail on floppy disks (Richardson et al. 1995). The genetic linkage map, having been carefully compiled from available data and constructed annually by personal efforts of many dedicated mouse geneticists over the years (M Lyon, R Meredith, S Hawkes, C Beechey, M Green, J Womack, TH Roderick, MT Davisson, Mouse News Letters 1965–1994), had reached a point of unsustainability. The last such map, produced by TH Roderick and MT Davisson in 1994, included nearly 800 genetically mapped loci. The emergence of molecular biology and our ability to distinguish gene variants by restriction length polymorphisms (RFLPs) allowed the density of the genetic map of mouse markers to grow rapidly. Now, many, many strain variants could be identified and the map of the mouse quickly expanded. In 1992 a map of 317 simple sequence length polymorphism (SSLP) markers covering 99% of the genome was published (Dietrich et al. 1992), which grew to over 7000 markers by 1996 (Dietrich et al. 1996). Additional thousands of SSLP markers were added using interspecific backcross panels of mice that could be used to map virtually any newly identified gene that had a molecular probe with unprecedented speed (c.f., Avner et al. 1988; Copeland and Jenkins 1991; Reeves et al. 1991; Rhodes et al. 1998; Rowe et al. 1994; Watson et al. 1992). In 1993, a review of the status of gene-specific RFLP mapping in the mouse was published (Copeland et al. 1993a), along with a wall chart (Copeland et al. 1993b). Fortuitously, information technology was also rapidly developing during this time. The growth and accessibility of the Worldwide Web allowed MGI to develop a web interface that was much easier to use than previous command line interfaces and provided the flexibility to create multiple views and combinations of information that were useful to the scientific community. Ultimately, the direction and growth of MGI has been driven by biotechnology and major international projects from sequencing the genome (and now multiple mouse strain genomes), to large-scale mutagenesis programs, to genome manipulation technologies, developing mouse models of human disease and populations, and large-scale phenotyping efforts to understand gene function. Today, we can precisely and quickly create a new genetically engineered mouse with ZFNs (zinc finger nucleases), TALENs (transcription activator-like effector nucleases), or CRISPR/Cas9 (Clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9) technologies, and use exome or whole genome sequencing to determine the causative change in a spontaneous or induced phenotypic mutant or use comparative phenotyping to suggest a new mouse model for a human disease.

SOS: Communication Critical

Uniform application of authoritative nomenclatures, ontologies, and identifiers is essential to successful scientific discourse, to enabling experimental reproducibility, and to integrating and analyzing diverse data sets. The success of the MGI resources and our ability to provide high-quality integrated data that users can access across data types and data sets is critically dependent on development and application of these communication standards. Much of the work of MGI staff is devoted to critically assessing incoming data to resolve discrepancies in identifiers and harmonizing terminologies.

Common Language = Clear Communication

Language, and how it is used, can be precise or vague, with the same or similar thoughts expressed in very different words and contexts. For scientific communication, it becomes critical that precision is emphasized and meaning truly conveyed, as the enterprise of scientific advancement and the next discovery relies on the knowledge of what has been previously discovered. Thus, language used to describe scientific experiments, reagents, methods, outcomes and observations, and supported conclusions demands that words are chosen carefully to succinctly and accurately reflect the results. And, key to that communication is the use of standard vocabularies, nomenclatures, ontologies, and identifiers to uniquely define the elements of experimentation and their interrelationships. The result of failed communication through use of jargon or “common” laboratory terminology can be disastrous personally if one realizes the “gene” they thought they were studying is actually a different gene with extensive reported functional analysis already published. Use of conflicting terminology also causes significant confusion in the scientific literature when genes or other reagents are referred to by overlapping terms.

Nomenclature Standards for Gene Identity, Allele Specification, Genomic Changes, Genotypes, and Strains

Nomenclature standards describing genomic and genetic entities and mutations exist today for well-studied species where there are large active research communities and genomes have been entirely sequenced (Table 1). For species where biological studies are sparse, or where sequencing has been done for comparative genomic data only, the standard is to follow the nomenclature of the most closely related, highly studied species (e.g., for primates, the human genetic nomenclature, for fish, the zebrafish nomenclature).

Table 1

Nomenclature guidelines for human and major model species

Species	Nomenclature guide URL	Email or contact URL
Human	http://www.genenames.org/about/guidelines¹	hgnc@genenames.org
Mouse	http://www.informatics.jax.org/nomen²	nomen@jax.org
Rat	http://rgd.mcw.edu/nomen/nomen.shtml³	http://rgd.mcw.edu/contact/index.shtml
Chicken	http://birdgenenames.org/cgnc/guidelines⁴	agbase@igbb.msstate.edu
Xenopus	http://www.xenbase.org/gene/static/geneNomenclature.jsp⁵	Joshua.Fortriede@cchmc.org
Zebrafish	https://wiki.zfin.org/display/general/ZFIN+Zebrafish+Nomenclature+Guidelines⁶	nomenclature@zfin.org
Drosophila	http://flybase.org/wiki/FlyBase:Nomenclature⁷	http://flybase.org/cgi-bin/mailto-fbhelp.html
C. elegans	http://wiki.wormbase.org/index.php/Nomenclature⁸	genenames@wormbase.org
Arabidopsis	http://www.arabidopsis.org/portals/nomenclature/⁹	curator@arabidopsis.org
Saccharomyces	http://www.yeastgenome.org/help/community/gene-registry¹⁰	sgd-helpdesk@lists.stanford.edu

1Human Gene Nomenclature Committee (HGNC). 2016. HGNC Guidelines. Available online (http://www.genenames.org/about/guidelines), accessed on December 1, 2016.

2Mouse Genome Informatics (MGI). 2016. Mouse Nomenclature Homepage. Available online (http://www.informatics.jax.org/nomen), accessed on December 1, 2016.

3Rat Genome Database (RGD). 2016. Rat Nomenclature Guidelines. Available online (http://rgd.mcw.edu/nomen/nomen.shtml), accessed on December 1, 2016.

4Chicken Gene Nomenclature Consortium. 2016. Gene Nomenclature Guidelines. Available online (http://birdgenenames.org/cgnc/guidelines), accessed on December 1, 2016.

5Xenbase. 2016. Gene Nomenclature Guidelines. Available online (http://www.xenbase.org/gene/static/geneNomenclature.jsp), accessed on December 1, 2016.

6Zebrafish Information Network (ZFIN). 2016. ZFIN Zebrafish Nomenclature Guidelines. Available online (https://wiki.zfin.org/display/general/ZFIN+Zebrafish+Nomenclature+Guidelines), accessed on December 1, 2016.

7FlyBase. 2016. FlyBase:Nomenclature. Available online (http://flybase.org/wiki/FlyBase:Nomenclature), accessed on December 1, 2016.

8WormBase. 2016. Nomenclature Homepage. Available online (http://wiki.wormbase.org/index.php/Nomenclature), accessed on December 1, 2016.

9The Arabadopsis Information Resource (tair). 2016. Nomenclature Homepage. Available online (http://www.arabidopsis.org/portals/nomenclature/), accessed on December 1, 2016.

10Saccharomyces Genome Database. 2016. SGD Help: Gene Registry. Available online (http://www.yeastgenome.org/help/community/gene-registry), accessed on December 1, 2016.

Nomenclature guidelines for human and major model species 1Human Gene Nomenclature Committee (HGNC). 2016. HGNC Guidelines. Available online (http://www.genenames.org/about/guidelines), accessed on December 1, 2016. 2Mouse Genome Informatics (MGI). 2016. Mouse Nomenclature Homepage. Available online (http://www.informatics.jax.org/nomen), accessed on December 1, 2016. 3Rat Genome Database (RGD). 2016. Rat Nomenclature Guidelines. Available online (http://rgd.mcw.edu/nomen/nomen.shtml), accessed on December 1, 2016. 4Chicken Gene Nomenclature Consortium. 2016. Gene Nomenclature Guidelines. Available online (http://birdgenenames.org/cgnc/guidelines), accessed on December 1, 2016. 5Xenbase. 2016. Gene Nomenclature Guidelines. Available online (http://www.xenbase.org/gene/static/geneNomenclature.jsp), accessed on December 1, 2016. 6Zebrafish Information Network (ZFIN). 2016. ZFIN Zebrafish Nomenclature Guidelines. Available online (https://wiki.zfin.org/display/general/ZFIN+Zebrafish+Nomenclature+Guidelines), accessed on December 1, 2016. 7FlyBase. 2016. FlyBase:Nomenclature. Available online (http://flybase.org/wiki/FlyBase:Nomenclature), accessed on December 1, 2016. 8WormBase. 2016. Nomenclature Homepage. Available online (http://wiki.wormbase.org/index.php/Nomenclature), accessed on December 1, 2016. 9The Arabadopsis Information Resource (tair). 2016. Nomenclature Homepage. Available online (http://www.arabidopsis.org/portals/nomenclature/), accessed on December 1, 2016. 10Saccharomyces Genome Database. 2016. SGD Help: Gene Registry. Available online (http://www.yeastgenome.org/help/community/gene-registry), accessed on December 1, 2016. Gene identity for protein-coding genes of human and the major model organisms is largely established thanks to the Human Genome Sequencing Project. However, one must be aware that updates and changes to the sequence continue, as do revisions to annotations to the genome assemblies. Efforts, including those of the CCDs (Consensus Coding Sequences) (Farrell et al. 2014; NCBI CCDS Database 2016; Pruitt et al. 2009) to curate consensus cDNAs and analyze genomic regions that show differences in build algorithm results from NCBI and EBI genome assemblies continue to improve the quality of the sequenced genome of human, mouse, and other species. Tracking gene identities through common nomenclature and accession IDs is an effective way to ensure that the gene object that is being discussed is truly that gene. For the mouse, the International Committee on Standardized Genetic Nomenclature for Mice, an elected body of the International Mammalian Genome Society, establishes, revises, and maintains guidelines for mouse genetic nomenclature for genes, genetic markers, alleles, and mutations, for mouse strains, and for chromosome aberrations. Guidelines are available online (MGI 2016p) and assistance with nomenclature is available by emailing nomen@jax.org.

Ontologies for Gene Function, Sequence Features, Anatomy Terms, Phenotypes, and Disease Terminology

Where nomenclature provides standard vocabularies of discrete entities (genes, microRNAs, noncoding RNAs, strain designations, chromosomes, chromosome aberrations, mutagenesis methods, etc.), ontologies provide both terminology and relationships among term elements. The structure of these ontological relationships, a directed acyclic graph (DAG), is hierarchical in nature, but a single term can have multiple parent and/or child terms. For example, the anatomical structure “limb bone” is a bone; but is also part of the appendicular skeleton and part of the limb. By representing biological features and processes in a DAG structure, with various kinds of relationships among terms, the richness and complexity of biology can be better represented (Figure 2).

Figure 2

The Mammalian Phenotype Ontology (MP) Browser example. At the left, the MP browser is shown displaying the detail for the term “abnormal craniofacial bone morphology.” At the top are the term name, common synonyms, the MP ID, and the definition. Below are two term paths shown as hierarchical trees, with paths listed as multiple sequential hierarchies. The “abnormal craniofacial bone morphology” term is followed by a link to MGI genotypes and annotations associated with that term. On the right is a graphical representation (as a DAG) for the abnormal craniofacial bone morphology term. The MP Browser is accessed using the pull-down “Search” menu from the MGI homepage and either following the Phenotypes submenu or the Vocabularies submenu to select the Mammalian Phenotype (MP) Browser (http://www.informatics.jax.org/searches/MP_form.shtml) (Mouse Genome Informatics (MGI) 2016b). Within the MGI system, the following ontologies are used for data representation and annotation: the Gene Ontology (GO) (Ashburner et al. 2000; The Gene Ontology Consortium 2017), The Mammalian Phenotype Ontology (MP) (Smith and Eppig 2012, 2015), the Mouse Anatomy Ontologies (Hayamizu et al. 2013, 2015), and the Sequence Ontology (SO) (Eilbeck et al. 2005). The Human Phenotype Ontology (Köhler et al. 2014, 2017) is used to integrate human phenotypes into the Human-Mouse: Disease Connection portal (see Section Overview of the current MGI resources, below). The Disease Ontology (Kibbe et al. 2015; Schriml and Mitraka 2015) will be integrated into MGI in 2017. By curating and loading data using these standards, MGI produces the highest quality biological data representation and maximizes the robustness of searching the resource.

Mouse Mutagenesis in Large-scale

Systematic mutagenesis projects have been used effectively for many years in organisms such as worm, yeast, and Drosophila. Researchers studying organisms with larger genomes and more complex body-plans, such as zebrafish, rodents, and primates, have long envied having access to similar tools that would improve genetic accessibility and make possible various genomic manipulations. In the last 20 years, more of these precise genetic tools have been developed for mammals, and for the mouse most of all.

Phenotype-Driven (Forward) Mutagenesis: Large-scale ENU Mutagenesis

N-Ethyl-N-nitrosourea (ENU) was demonstrated to be an effective mutagen, and ENU dosage studies (Justice et al. 1999; Russell et al. 1982a,b) led to the development of several systematic experimental ENU protocols. These, in general, consisted of mutagenizing males of different inbred strains and generationally screening their progeny (using same strain or alternate strain matings) for dominant or recessive mutants via broad or specific phenotype testing. Several variations of mating schemes were developed, but in essence, mating of the mutagenized G0 male to wild-type females would detect dominant mutations (or uncover recessive mutations if the female carried a defined deletion) in the G1(F1) offspring. A further generation of mating, either backcrossing to the G0 father or intercrossing siblings, was required to detect recessive mutations (Barbaric and Dear 2007; Silver et al. 2007). In the late 1990s, systematic, large-scale, chemical mutagenesis projects using ENU as the mutagen were proposed as a method for studying gene function and discovering new disease-related mutations (Brown and Nolan 1998; Hrabé de Angelis and Balling 1998). The first of these programs were developed in Germany (Hrabé de Angelis et al. 2000; Soewarto et al. 2000) and in the United Kingdom (Nolan et al. 2000a,b). Shortly thereafter ENU mutagenesis programs began in Australia (Nelms and Goodnow 2001) and at Japan’s RIKEN Center (Masuya et al. 2004). With the exception of the Australian program that concentrated on immunological phenotypes, these programs screened progeny of ENU-treated males using a wide range of standardized phenotype assays developed in their respective programs. ENU mutagenesis projects in the United States and Canada were wide ranging but tended to focus on specific sets of phenotypes (Clark et al. 2004; Nadeau et al. 2001). Among the projects were neurological mutations at Northwestern University, The Jackson Laboratory, and the Tennessee Genome Consortium (Goldowitz et al. 2004); developmental/embryonic mutations at Baylor College of Medicine, Harvard Medical School, and the Sloan Kettering Institute (García-García et al. 2005; Herron et al. 2002); craniofacial mutations at the Stowers Institute (Sandell et al. 2011); cardiovascular and metabolic mutations at the Center for Modeling Human Disease, Toronto (Xie et al. 2007) and University of Pittsburgh (Yu et al. 2004); phenotypes associated with aging (e.g., cancer, obesity, Alzheimer’s) at the McLaughlin Institute and Case Western Reserve University; and a hypertension, obesity, sleep-centered program (Svenson et al. 2003) and reproductive genomics program (Lessard et al. 2004) at The Jackson Laboratory. Sperm from ENU-treated males carry multiple mutations, and these may or may not result in obvious phenotypic changes in his immediate offspring. Thus, a new mutant discovered in an ENU screen is virtually certain to carry mutations unrelated to the novel phenotype identified. This fact is negative in that these additional mutations (so called “incidental” mutations) may have unknown input into the phenotypic change observed. With further generations of mating, many “incidental” mutations will be lost, as the mutation of interest remains selected for and may have unexpected effects on the variant phenotype that was initially identified. Conversely, frozen G0 sperm are a potential treasure of new mutations that can be identified via exome or genome sequencing as differing from the untreated G0 male, and their effect on the gene in which they occur can be predicted by analyzing sequence changes (Arnold et al. 2012; Bull et al. 2013; Moresco et al. 2013; Simon et al. 2015). Today, there are nearly 4000 identified mouse genes with defined ENU mutations and described phenotypes. In addition, there are over 145,000 associated incidental mutations representing potential mutations of interest for functional or disease-related studies. These numbers will continue to grow as ENU remains a popular and accessible mutagen for projects generating single base pair mutations.

Gene-Driven (Reverse) Mutagenesis: Knocking Out Each of the Mouse Protein Coding Genes

As differentiated from forward mutagenesis, gene-driven or reverse mutagenesis targets specific genes for mutation whether or not there is any a priori knowledge about the phenotypic or physiological consequences. Attendees at a meeting at the Banbury Center at Cold Spring Harbor Laboratory in New York in September of 2003 endorsed creating an international resource of ES cell lines containing knockout alleles for each of the ≈20,000 to 25,000 mouse protein coding genes over a 5-year period using a common genetic background (C57BL/6N) and created using either gene trap or gene targeting techniques. Further, mutant ES cell lines (or mice created from them) would be made available to the entire research community as a resource for studying gene function and phenotype. The anticipated systematic phenotyping of these mutants would likewise be a coordinated international effort, with wide base-line data collected collaboratively, with future specialized phenotyping being done by individual research laboratories and specialists (Austin et al. 2004; Auwerx et al. 2004). This undertaking was not without controversy (see, e.g., Accili 2004), because there was little evidence that there would be demand for this large array of mutants; and although frozen mutant ES cells take up little space and expense, their usage would likely further fade with new technology developments; the systematic (but limited) phenotyping was not thought to yield enough substantive results to garner researcher interest; and a better investment could be made in providing access to less expensive facilities making tailored mutations for researchers (e.g., to study allelic series for specific genes). Despite these concerns, in 2007 this ambitious plan took shape in the form of the IKMC (International Mouse Knockout Consortium et al. 2007), initially consisting of NIH’s Knockout Mouse Project, European Conditional Mouse Mutagenesis Program, and North American (Canadian) Conditional Mouse Mutagenesis Program. Later, Texas Institute for Genomic Medicine also joined the IKMC (Collins et al. 2007). Five years later, Bradley et al. (2012) reviewed the progress of the IKMC efforts. Information about the IKMC is also available on the web (International Knockout Mouse Consortium 2016). Mouse ES cell lines with targeted or trapped alleles were available for 17,473 protein-coding genes, and nearly 1000 of these ES cell lines had been made into live mice for research. The EUCOMMtools project continued to expand the number of genes in the IKMC collection, as well as creating conditional-ready alleles for those genes where only deletion knockouts were created, and developing a set of cre-recombinase carrying knock-ins and transgenes to complement the IKMC conditional-ready alleles (Rosen et al. 2015). The next step in developing and utilizing the IKMC resources is now underway as the IMPC (see Phenotyping and functional discovery projects below).

Conditional Mutagenesis

Conditional mutagenesis is a powerful technique for interrogating gene function in a cell- or tissue-specific manner, potentially with a defined temporal component (Birling et al. 2009; Branda and Dymecki 2004; García-Otín and Guillou 2006; Nagy 2000). This technique is particularly useful to examine gene function temporally where a conventional gene knockout might result in embryonic lethality or to examine age-specific tissue expression. In principle, one mates a mouse carrying a “conditional-ready” allele, in which a recombinase recognition sequence has been inserted flanking the sequence of interest, to a mouse carrying an appropriate recombinase with a driver/promoter that directs its activation to a particular tissue at a particular developmental time. Offspring that inherit both the “conditional-ready” allele and the recombinase sequences will have a deletion of the sequence of interest, due to the activation of the recombinase protein and the resulting recombination between the two inserted recombinase sites (Nagy 2000; Sauer and Henderson 1988). The cre-loxP system is the most commonly used recombinase system. In this case, the genomic region to be manipulated is flanked by loxP sites, unique 34-base pair sequences recognized by the cre protein. Expression of the cre protein results in recombination between the loxP sites and, depending on the experimental design, can cause deletion, inversion, or translocation of the loxP flanked sequence (Friedel et al. 2011; Gierut et al. 2014; Schnütgen et al. 2003). Variations that have been developed for conditional mutagenesis include use of non-cre recombinases, such as flp, dre, and ΦC31, and various inducible forms of recombinases that are activated exogenously, such as by administration of tamoxifen (Anastassiadis et al. 2010). In evaluating which cre-bearing strain to use for experiments, the cre activity, tissue specificity, and expression timing of the cre driver must be known for the intended target tissue and any off-target sites (Heffner et al. 2012; Murray et al. 2012; Smedley et al. 2011). Without adequate knowledge of cre specificity, experimental results can be confounded and interpretation of observed phenotypes difficult due to cre excision in unanticipated tissues or at unexpected times. A plethora of new conditional-ready mutations have been generated by the IKMC (Bradley et al. 2012) and are available to the scientific community. To take advantage of the conditional character of these constructs, appropriate cre recombinase mice are needed to mate with these mice. As of December 2016, over 2620 cre-containing transgenes and knock-in alleles have been documented in the MGI CrePortal (see Overview of the current MGI resources below).

Genome Editing: Endonuclease-Mediated Mutagenesis (Zinc Finger Nucleases, TALENS, CRISPR/Cas9)

Newest among the tools for creating specific mutations are endonuclease-mediated methods, referred to as gene editing or genome editing techniques. These have burst into the mouse toolkit in three waves of technical developments, and much work continues to improve the technology and fidelity of results and to push the flexibility of genome editing systems. An overview of the methodologies can be found in Carroll (2014), Doudna and Charpentier (2014), and Menke (2013). ZFNs as targeting tools in mice were first wave, reported at the beginning of this decade (Carbery et al. 2010; Connelly et al. 2010; Meyer et al. 2010). ZFNs consist of a series of zinc finger domains fused to the cleavage domain of the FokI restriction enzyme. ZFNs are targeted by designing pairs to bind sequences adjacent to the endogenous target site and, once bound, the FokI domains attached to each of the ZNF pairs dimerize and create double-strand breaks in the target DNA. Nonhomologous end joining then occurs, usually resulting in small deletions. Repeated application of the same constructs can produce multiple unique mutations (alleles) in the same target sequence. The double-strand breaks can be repaired via homology directed repair to produce other types of mutations if DNA with homology to the target region is co-introduced with the ZFNs. Quickly on its heels was the second wave, using TALENS, which had similar properties and method of action to the ZFNs. The distinct advantages of TALENS were a greater range of sequence specificities, higher efficiency, and better predictability of their action (Panda et al. 2013; Qiu et al. 2013). Thus, TALENs rapidly overtook ZFNs as the genome editing tool of choice. The third wave of gene editing technology is CRISPR/Cas9 originating from a bacterial adaptive immune system. The CRISPR/Cas9 system, unlike the ZNFs and TALENS, does not require that unique endonuclease pairs be designed for each genomic target, thus making it a simpler and less time-consuming method (Wiles et al. 2015). In addition, its fidelity is higher, with the targeting to specific DNA sequences determined by the 5’ 20 nucleotide sequence of the synthetic single guide RNA, designed to be complementary to the genomic target sequence. And fewer off-target events are observed with CRISPR/Cas9 than with ZNFs or TALENS (Li et al. 2013; Miano et al. 2016; Shen et al. 2013; Singh et al. 2015; Tycko et al. 2016). The IMPC has recently adopted use of CRISPR/Cas9 mutations in the mouse knockout phenotyping program, a move that will greatly reduce its reliance on the IKMC ES cell line mutations that require recovery of ES cell lines into mice and subsequent breeding removal of the critical exon via cre-excision, in the case of conditional-ready alleles or breeding to remove the neo-cassette in the case of deletion alleles (Bradley et al. 2012). This move will enable quick and direct generation of mice carrying single gene knockout mutations and accelerate the ability to produce the mouse cohorts needed for the IMPC’s high-throughput phenotyping pipeline (KL Svenson, The Jackson Laboratory, 2017, personal communication). To emphasize the rapid adoption of gene editing technology in mice, a search of PubMed (Dec. 2016), lists over 1000 publications using ZNFs, TALENs, or CRISPR/Cas9 for genome editing in mice, with over 750 of these using the CRISPR/Cas9 technology. Further, the flexibility of the CRISPR/Cas9 system in its ability to create a variety of genomic changes in mice is illustrated by these “for example” sampling of outcomes: (1) to create conditional-ready alleles (“floxed” alleles) for use in creating mice for conditional mutagenesis experiments (Bishop et al. 2016; Lee and Lloyd 2014; Yang et al. 2013); (2) for developing humanized mice through gene replacement (Gennequin et al. 2013); (3) to model human disease (Lewis et al. 2016; Zhong et al. 2015); (4) to repair gene defects (Mianné et al. 2016; Nelson et al.; 2016; Wu et al. 2013; Yin et al. 2014); (5) to use in multiplexing to simultaneously create mutations in multiple genes (Wang et al. 2013; Zhou et al. 2014); and (6) to create large deletions, insertions, and chromosomal rearrangements (Boroviak et al. 2016; Maddalo et al. 2014; Zhang et al. 2015).

Phenotyping and Functional Discovery Projects: Large-scale Phenotyping

The discovery and creation of new mutations and variants in mouse have been accelerating thanks to the large-scale mutagenesis efforts using forward and reverse genetic technology and the continued development and improvement of new methods to create, validate, and preserve those mutants for the scientific community. Making these resources, however, is only the first step; the goal is to understand the function of genes and develop useful models for studying biology and disease. Thus, standardized, robust phenotyping must accompany our genetic knowledge and mutant mouse resources. Efforts over the last 20 years have made significant progress on collecting systematic phenotypic data, developing standard protocols and phenotyping pipelines, and on making these data accessible online. Initially each project with significant throughput, whether performing ENU mutagenesis or studying knockouts, developed their own phenotyping methods, many of which contributed to the large-scale phenotyping projects that came later (c.f., Ayala et al. 2010; Brommage et al. 2014; Lessard et al. 2004; Li et al. 2015; Pack et al. 2007). One of the earliest collaborative efforts attempted to establish globally standard phenotyping procedures and test phenotype assay robustness was the European Eumorphia project (Brown et al. 2006; Mallon et al. 2008). Baseline data from four inbred strains, C57BL/6J, C3HeB/FeJ, BALB/cByJ, and 129S2/SvPas (formerly 129/SvPas), was collected using identical protocols in multiple European laboratories and differences were investigated to determine if differences were attributable to procedure modifications, equipment differences or calibrations, technician error, lack of robustness of the test, etc. The Eumorphia progam was also responsible for the initial development of the Europhenome data repository (Mallon et al. 2008; Morgan et al. 2010) and the European Mouse Phenotyping Resource for Standardized Screens. European Mouse Disease Clinical program, with the Sanger Institute Mouse Genetics Program, continued the collaborative work of Eumorphia, developing protocols and phenotyping over 650 mutant mouse lines, largely from the growing IKMC mutant ES cell lines (Ayadi et al. 2012; Gates et al. 2011; Hrabé de Angelis et al. 2015). In 2011, the IMPC was established, encompassing these groups and other participant worldwide (Brown and Moore 2012a,b). Its current members include the Medical Research Council, Harwell, UK; Wellcome Trust Sanger Institute, Cambridge, UK; Helmholtz-Zentrum Muenchen, Germany; PHENOMIN, Strasbourg, France; Australian Phenomics Network, Australia; RIKEN BioResource Center, Tsukuba, Japan; CNR Montorotondo, Italy; MARC Nanjing University, Nanjing, China; The Jackson Laboratory, Bar Harbor, USA; The Davis, Toronto; Charles River and CHORI Consortium, USA and Canada; Korea Mouse Phenotype Consoritum, South Korea; Baylor College of Medicine, Houston, USA; National Laboratory Animal Center, NARLabs, Taiwan; European Bioinformatics Institute, Hinxton, UK; Czech Centre for Phenogenomics, IMG, Czech Republic; and Universitat Autònoma de Barcelona, Spain (Koscielny et al. 2014). With a goal of systematically phenotyping cohorts of mice from each of the mutant lines produced from the IKMC resource within 10 years, the task ahead is significant. But, these baseline data will be invaluable for researchers looking for appropriate model systems and for developing more sophisticated tools for refining phenotypes and exploring functional biological networks. Phenotyping data are accessible on the IMPC website (International Mouse Phenotyping Consortium (IMPC) 2016; Ring et al. 2015), and some initial global studies are being published (c.f., Dickinson et al. 2016; White et al. 2013).

Overview of the Current MGI Resources

The mission of the MGI (Mouse Genome Informatics 2016l) resource is to provide integrated genetic, genomic, and biological data about the laboratory mouse to facilitate the study of human health and disease. To fulfill this mission MGI must maintain a dynamic, evolving system that continues to grow and make directional changes as biological science changes and the computing tools supporting its infrastructure advance. MGI is a collection of integrated data resources and tools that can all be accessed via the central MGI homepage (Mouse Genome Informatics 2016l). Data originate from high-throughput data projects, direct investigator or collaborative group data submissions, and through curation of the scientific literature. Semantic and contextual integration is achieved through automated and semiautomated methods, coupled with expert curation and extensive quality controls. By promoting integration throughout, MGI has been able to consistently develop and maintain a unified genome feature catalog (Zhu et al. 2015); enforce nomenclature standards for genes, genome features, alleles, strains, and other mouse entities; and develop and apply ontologies. These unify metadata and vocabulary terminology across data sources, adding value and enabling robust data searching and superior accessibility to users. This review cannot thoroughly cover all aspects of MGI data content or address all possible use cases for MGI. Below, we discuss a few of the common access methods for MGI content. Table 2 provides a high-level look at more of the primary types of data found in MGI, and the value added that MGI provides in integrating and curating these data. Table 3 provides a snapshot of MGI statistics that are updated weekly on the MGI website (Mouse Genome Informatics 2016l). An expanded list of statistics can be found linked on the MGI homepage (Mouse Genome Informatics 2016o). Users are encouraged to explore MGI, review the tutorial and informational items on the MGI homepage (Mouse Genome Informatics 2016l), and contact MGI User Support at mgi-help@jax.org or use the “Contact Us” link in the navy blue navigation bar at the top of MGI webpages for assistance with using MGI or with questions about MGI content or methods. In addition, readers should consult two recent reviews that cover other aspects of MGI data and access methods (Eppig et al. 2015c, 2017) and a retrospective on MGI (Eppig et al. 2015b).

Table 2

Overview of types of data found in MGI

Data type	Description and URL link
Unified nonredundant genome feature catalog	MGI’s fjoin algorithm (Richardson 2006) is used to computationally compare genome assembly predictions from NCBI (NCBI Resource Coordinators 2016) and Ensembl (Aken et al. 2017) and from gene models curated by Havana/VEGA (Harrow et al. 2014). Results are integrated with MGI’s genome features. Curators from MGI and these groups collaboratively analyze and resolve conflicting data on an ongoing basis. The process of creating the unified non-redundant genome feature catalog is described in Zhu et al. (2015). Also see MGI Gene Query (http://www.informatics.jax.org/marker, Mouse Genome Informatics (MGI) 2016j) and the downloadable Marker Coordinates report (Mouse Genome Informatics (MGI) 2016m).
Strain-specific genome features	MGI captures instances of genes that are not present in the C57BL/6J reference sequence genome, but are found in other mouse strains (Eppig et al. 2012). A classical example is Ren2 (renin 2, Piccini et al. 1982). A number of inbred strains carry this duplicated gene, but it is absent in C57BL/6J (see MGI’s Ren2 gene page (Mouse Genome Informatics (MGI) 2016i).
Associating genes to sequences	MGI, NCBI, Ensembl, and Havana/VEGA co-curate the mapping of nucleotide sequences to their corresponding genes; for protein sequences the co-curation effort includes MGI, UniProt KB, and Protein Ontology groups. See NCBI’s information for CCDS (NCBI CCDS Database 2016).
Nomenclature for mouse (genes, genome features, strains, chromosome aberrations)	MGI maintains the website for the International Committee on Standardized Genetic Nomenclature for Mice and Rats online (Mouse Genome Informatics (MGI) 2016p). MGI implements the committee’s guidelines in assigning new or revised nomenclature. Nomenclature is coordinated with HGNC (Human Gene Nomenclature Committee, Gray et al. 2015) and RGD (Rat Genome Database, Shimoyama et al. 2015) to maximize co-assignment of orthologous gene names.
Mouse-Human Orthologs	MGI currently uses a hybrid orthology representation, where ortholog calls from NCBI’s Homologene (NCBI Resource Coordinators 2016) and HGNC’s Comparison of Ortholog Prediction (HCOP, Eyre et al. 2007) are compared by a rule-based algorithm to maximize human-mouse ortholog accuracy. These results are used widely in MGI where human-mouse data are compared (Dolan et al. 2015).
Gene function data—annotations to mouse genes and gene products	MGI is the authoritative source for Gene Ontology (GO) annotations for mouse (The Gene Ontology Consortium 2017). This dataset is available from the MGI website and downloadable from the server, and is provided to the Gene Ontology Consortium website for display with GO data from other species. MGI staff curate GO terms to mouse genes/gene products and integrate efforts from UniProt KB/GOA (The UniProt Consortium 2017). See the Gene Function Topic link (Mouse Genome Informatics (MGI) 2016i) and the GO download report (Mouse Genome Informatics (MGI) 2016k).
Gene Expression data	Data for endogenous gene expression during mouse development is emphasized, for both wild-type and mutant genotypes, and covering a wide range of expression assay types. Supporting images are included. Integration with genomic and phenotype data in MGI provides powerful query capability. (Ringwald and Eppig 2011). Key references for how best to use the expression data web interfaces include Finger et al. (2015) and Smith et al. (2015). See also the Gene Expression Topic link (Mouse Genome Informatics (MGI) 2016e).
Gene Expression Literature Index	For rapid user access, gene expression data for mouse development is indexed by curators to capture the genes, ages, and assay types described for each scientific publication. Users can view a high-level overview and rapidly access data details. See the Gene Expression Literature search (Mouse Genome Informatics (MGI) 2016f).
Recombinase/cre data	MGI maintains data on the recombinase containing transgenes and knock-in alleles used for conditional mutagenesis, capturing both the descriptions of those alleles, the genotypes used in experimentation, and the results (www.creportal.org, CrePortal 2016). Important for researchers is the integration of data on the site specificity for cre activity, as off-target expression can significantly affect experimental interpretation (Heffner et al. 2012).
Genetic variants and mutations	MGI maintains a comprehensive catalog of mutant alleles, including those spontaneously occurring, induced, or genetically engineered. Naturally occurring variants, including SNPs and QTL also are captured (Bult et al. 2013; Eppig et al. 2007, 2012, 2015a). Search forms for mutant alleles (Mouse Genome Informatics (MGI) 2016a) and SNPs (Mouse Genome Informatics (MGI) 2016r) allow flexible searching for a variety of parameters.
Phenotypes	Phenotypes are associated with the genotypes (allelic composition and strain background) in which they were observed. MGI gathers phenotypic data from large-scale programs such as Europhenome, German Mouse Clinic, Sanger Institute Mouse Genetics Program, the IMPC (and others), as well as incorporating researchers’ data submissions, and through curation of scientific publications (Bello et al. 2015; Bult et al. 2016; Eppig et al. 2015a, 2017). Phenotype data in can be found in MGI by choosing the Phenotypes and Mutant Alleles topic area on the MGI Homepage (Mouse Genome Informatics (MGI) 2016q).
Mouse strains available worldwide, the International Mouse Strain Resource (IMSR)	MGI maintains a composite mouse strains catalog, the International Mouse Strain Resource (www.findmice.org, International Mouse Strain Resource (IMSR) 2016) that consolidates the holdings of mouse resource repositories worldwide, including mice available in various states (live, cryopreserved embryos and sperm, ES cell lines) (Eppig et al. 2015b). Data are released weekly to update repository holdings as they change. The website allows searching by strain name, gene or allele symbol or name, as well as by strain type (e.g., inbred strain, congenic strain), allele type (e.g., insertion, targeted mutation), specific repository or geographic region. More than 38,600 mouse resources and nearly 200,000 ES cell lines from 20 repositories and consortia, representing 47 individual repositories are included.
Tumor models	The tumor model data offering within the MGI Resource (Mouse Models of Human Cancer: The Mouse Tumor Biology Database (MTB) 2016) includes data on mouse models of cancer from studies on spontaneous and induced tumors in genetically defined mice (inbred or genetically modified), accompanied by the tumor frequencies observed, diagnoses and metastases, germline and somatic genome information, and pathology reports and images. Data on patient derived xenograft (PDX) mice include diagnoses, images, and, genomic properties of the tumors (Bult et al. 2015).
Human disease models	The Human-Mouse: Disease Connection (HMDC) displays integrated data curated in MGI on mouse genotypes and phenotypes and the human diseases that are modeled with the associations between human genes and human diseases from OMIM and human phenotypes from Human Phenotype Ontology. A grid view displays human and mouse orthologs with their comparative phenotype and disease associations, allowing users to view direct orthologous gene disease models, as well as examining potential mouse models and potential human causative gene associations. (Bello and Eppig 2016; Bello et al. 2015; Eppig et al. 2015d, 2017; www.diseasemodel.org, Human-Mouse: Disease Connection (HMDC) 2016).

Table 3

MGI content summary December 2016*

MGI data type (number of mouse…)	Number
Genes with protein sequences	24,193
Genes/genome features with nucleotide sequences	48,663
Genes with human orthologs	17,091
Genes with rat orthologs	18,513
Genes with Gene Ontology (GO) annotations	24,218
Gene Ontology (GO) annotations (Total)	301,832
Mutant alleles in mice	50,035
Genes with targeted mutations	16,841
Recombinase (cre) transgenes and knock-ins	2626
QTL	5672
Genotypes with Mammalian Phenotype Ontology (MP) annotations	59,133
Mammalian Phenotype Ontology (MP) annotations (Total)	306,913
Models (genotypes) associated with human diseases	5128
Genes with expression assay results	14,333
Expression assay results (Total)	76,990
Tumor frequency records	71,000
Tumor Images	>5800
Clinical, pathological, expression, and genomic data from PDX models	450
References in the MGI bibliography	231,595

*For additional MGI statistics, see Mouse Genome Informatics (MGI) 2016o.

Overview of types of data found in MGI MGI content summary December 2016* *For additional MGI statistics, see Mouse Genome Informatics (MGI) 2016o.

Guide to the MGI Homepage (Mouse Genome Informatics (MGI) 2016l)

The MGI homepage (Figure 3) is a jumping-off platform to the various data sections in MGI. In addition links are provided to “About” pages, “Help,” “FAQ,” tutorials and introductory materials, MGI publications, as well as to “What’s new” in MGI and various community news items.

Figure 3

MGI homepage (www.informatics.jax.org) (Mouse Genome Informatics (MGI) 2016l) The MGI Homepage is the gateway to data, tools, news, and release notes. The major sections of the Homepage allow users to (1) do a “Quick Search” to broadly sweep their area of interest, (2) jump to topic area pages, (3) visit the tutorial and introduction areas, (4) use the navy blue navigation bar to find specific search forms or tools, and (5) read news about MGI and follow informational links for MGI statistics, publications, about pages, help documents, and FAQs.

Accessing MGI data via the “Quick Search”

Analysis of web logs for the MGI website clearly shows that the majority of users use the “Quick Search” over all other search tools that MGI provides (Figure 4). The Quick Search is accessible on virtually all MGI web pages and can initiate your exploration of MGI, or quickly allow the user to jump to a new data search of interest from any MGI page being viewed.

Figure 4

Quick Search results page example. The Quick Search tool is found at the top left of the MGI homepage and on other MGI pages in the upper right corner. In this example, leukemia was entered as the search term, resulting in 1295 results in the Genome Features section and 68 results in the Vocabulary Terms section. From the Quick Search Results page, the symbols in the Genome Features section link to the relevant MGI Gene Detail page or Mutant Allele Detail page. In the Vocabulary Terms section, the term links to the relevant term page: for phenotype terms, to the Mammalian Phenotype Ontology Browser; for disease terms, to the MGI Human Disease and Mouse Model Detail Page; and for a protein family term to the MGI Protein Superfamily Detail page. The Associated Data column links to the underlying annotations and brings the user to the relevant annotation detail page. For new users, the Quick Search provides a fast entry point into a MGI topical area without the need to be familiar with the MGI query forms that provide more precise results with multi-parameter searches (e.g., leukemia). The Quick Search results are broad, including all genes and alleles where the term entered is included in the current or former nomenclature or associated to the gene via annotations. In this example, 1295 genome features were returned, including genome features with annotations to a phenotype term that “contains” leukemia, such as Alox15, as well as genome features whose names included “leukemia,” such as Pmv54, polytropic murine leukemia virus 54. In addition, 68 hits to leukemia appear in the vocabulary term section, which shows annotations to leukemia in MGI disease terms, phenotype terms, protein domain terms, and functional (GO) terms. For experienced users the Quick Search can quickly identify your data of choice by using a simple, but narrowly defined term (e.g., “Bloom Syndrome” using quotes for “exact match” or “Bmp6”). Each of these searches retrieves only six genome features and three vocabulary term hits (data as of Dec. 30, 2016). Keep in mind that many such simple terms may return a large number of results, for example, the term “anemia” returns 1067 genome feature matches and 129 vocabulary matches (as of 30 Dec. 2016), but can be considerably reduced by using the more specific term “macrocytic anemia,” with quotes, which returned 48 genome feature matches and 3 vocabulary terms.

Using Topic-Specific Searching and Tools

On the MGI homepage, in the left column, are a series of buttons providing access to topic areas within MGI, including Genes; Phenotypes and Mutant Alleles; Human-Mouse: Disease Connection; Gene Expression Database (GXD); Recombinase (cre); Function; Strains; Strains, SNPs and Polymorphisms; Vertebrate Homology; Mouse Models of Human Cancer; Pathways; Batch Data and Analysis Tools; and Nomenclature. Clicking on any of these buttons takes the user to a subsection of MGI optimized for information and searching for that particular topic. Each search form within a topic behaves in a similar manner and takes advantage of the integration of MGI data. Figure 5 shows a topic-specific search form and results for the genes and genome feature area of MGI. Note that within each topical area and almost all MGI pages, the header section of the page continues to provide full access to different sections of the database: to the Quick Search at the upper right; via the topical area tabs across the top of the page; and via pull-down menus available in the navy blue navigation bar.

Figure 5

Genes and markers query form and results example. Panels A and B show two methods for accessing the Genes and Markers Query Form and illustrate the general principle for accessing other Query Forms within MGI. Beginning on the MGI homepage (www.informatics.jax.org) (Mouse Genome Informatics (MGI) 2016l), use either the pull-down “Search” (panel A) navigating to the Genes submenu and the Genes & Marker Query (circled) or click on the “Genes” topical area button that leads to the Genes, Genome Features & Maps subpage (panel B), where the Genes & Markers Query (circled) also can be selected. On the Genes and Markers Query Form (Mouse Genome Informatics (MGI) 2016j), panel C users may specify as one or more search parameters as desired. In this case, the following parameters were chosen: Feature type = protein coding gene AND Genome location = Chromosome 2 AND Mouse phenotype term = “dilated cardiomyopathy” (enclosed in parentheses for an exact match against the two-word term). The Results page is shown in panel D, where six genes satisfied the query. Note the “You searched for…” feature at the top, which tells the user what parameters were used, and the “Export” utilities for downloading or forwarding the results for additional analysis. Figure 6 illustrates access to phenotype and mutant allele data. The search form for Phenotypes and Mutant Alleles, with an example summary results is shown in Figure 6.1, with further access to finer data levels shown in Figure 6.2. While this figure does not show all the possible data details that can be accessed, it does illustrate the principle that one is able to drill down to multiple levels of finer data detail. The reader is referred to the “How to use MGI” link (http://www.informatics.jax.org/mgihome/other/homepage_usingMGI.shtml) on the MGI Homepage beneath the listing of topical areas for additional navigation methods (Mouse Genome Informatics (MGI) 2016g).

Figure 6.1 and 6.2

Phenotypes, alleles, and disease models search form and results example. For navigation to the Search Form, see the Genes and Markers access method illustrated in Figure 5, panels A and B. The same overall method utilizing the Search pull-down menu or the topical area buttons on the MGI homepage are used throughout the MGI system. Figure 6.1 shows the phenotypes, alleles, and disease model search form (panel A) where, in this example, only a single search parameter was entered into the gene/marker, or allele field. The entry, Smoc*, uses a wildcard and will return all alleles beginning with Smoc…, as well as any synonyms or allele names containing a term beginning Smoc… Additional parameters may be specified in the search, including phenotype and disease terms, genome location, allele generation methods, allele attributes, and/or alleles that were created as part of large projects. There also is an option to exclude alleles if they only exist in cell lines. Panel B show the results of this search, with 4 of the 16 alleles returned shown here (see upper right of the screen for the allele count). For each allele, its symbol, name, synonyms, chromosome assignment, category of mutant generation, and attributes, systems showing abnormal phenotypes and human disease models are provided. Links are provided from the allele symbol (circled) to the Allele Detail Page (Figure 6.2) and from the human disease to the Disease Ontology browser. In panel C the Smoc1 allele detail page is shown, containing data on the nomenclature and location of the mutant allele, cell lines that contain this mutation, if it has been germline transmitted, its strain of origin, and the project collection that created it. This is followed by a brief description of the mutation itself and its molecular specificity, as known. Images (with additional links to more images and image captions and references) are provided for phenotypic images as well as mutation/vector images, if available. The Phenotypes section of this page shows various genotypes that have been studied with this allele, in this case there are both homozygous and heterozygous genotypes on a C57BL/6N genetic background. The full genotype is always presented, given that genetic background can have significant effect on the phenotypic presentation of mutant genes. A table specifying the anatomical systems in which phenotype was detected is presented, along with the source project, and sex-specific phenotypes, if applicable. It should be noted for each anatomical system, a toggle opens to reveal finer phenotypic detail. Towards the bottom of this detail page are the disease models section showing disease association for this allele, the Find Mice (IMSR) section showing mice available from repositories for the specific mutation being viewed or for all mutations in this gene and linking the user directly to IMSR for information and further access to holding repositories, and finally, a link to all references describing this mutation. Panel D illustrates one of the many links from this Allele Detail Page. Here the link is shown from the genotype “hm1” (homozygous Smoc1 allele on C57BL/6N) to the finer detail of the phenotypes observed in this genotype. Within panel D, links can be seen from the Mammalian Phenotype Ontology term (e.g., neonatal lethality, complete penetrance) to the Mammalian Phenotype Ontology page (see Figure 2) and to the reference from which this phenotypic data came (e.g., J:174198).

Batch Data and Analysis Tools

Many of the search results pages that are returned from a query include the ability to download the results you are viewing in either text or Excel format, or to forward your results for further analysis to the MGI Batch Query or MouseMine tools. In addition, over 75 data files are generated weekly and available for download, either from the Batch Data and Analysis Tools topical area page (Mouse Genome Informatics (MGI) 2016c) or using the pull-down menu labeled “Download” from the navy blue navigation bar at the top of MGI web pages. The Batch Query (Bult et al. 2008, 2010; Eppig et al. 2015c, 2017) allows the user to upload a file or enter a set of IDs or gene symbols and selectively retrieve other database IDs (making it an excellent ID translator) as well as selected gene attributes such as phenotypes, human disease, function (GO annotations), expression data, or RefSNP, UniProt, or RefSeq IDs. (Figure 7).

Figure 7

MGI batch query and results example. The Batch Query (Mouse Genome Informatics (MGI) 2016d) is a quick way to translate IDs between databases and to pull selected data from MGI for further analysis. For navigation to the Batch Query Form, follow the method shown in Figure 5, panels A and B, and use the Search pull-down menu or the topical area buttons on the MGI homepage. In this Batch Query example, a list of gene symbols was entered into the Input box at the upper left: Atp7b, Fbn, Oca2, Pax8, Slc26a2. Note that various IDs can be entered (IDs from MGI, Ensembl, GenBank, UniProt, etc.), or a file of gene symbols or IDs, as well. In the Output box at the upper right, Nomenclature, Genome Location and Human Disease (OMIM) were selected. The results returned are a tabular display of all data matching the request and can be downloaded as text or Excel files or forwarded to MouseMine (Motenko et al. 2015) for further analysis. Links in the Nomenclature Symbol column take the user to the Gene Detail page for that gene; and links in the Disease (OMIM) Term column take the user to the MGI Human Disease and Mouse Model Detail page. MouseMine (Motenko et al. 2015), a mouse-specific instance using InterMine software (Smith et al. 2012), offers great flexibility for building datasets that are not possible to retrieve using the regular MGI web pages. This tool is particularly useful for bioinformaticians or advanced users for mining data with very specific parameters. Mousemine provides a number of common “prepared” queries as well as the ability for users to build unique queries against MGI data. The functionality available includes uploading, manipulating, and saving results lists, as well as downloading data or forwarding data to Galaxy (Afgan et al. 2016). MouseMine also includes utilities for doing enrichment analyses of your results. The web service API for MGI also resides within MouseMine (Kalderimis et al. 2014).

The Human-Mouse: Disease Connection (HMDC)

HMDC (Human-Mouse: Disease Connection 2016) supports translational studies by providing access to published and potential mouse models of human disease and supports the discovery of candidate genes and comparisons of phenotypes in mouse models and human patients. HMDC combines the phenotype and disease model data in MGI with integrated human data from OMIM (Online Mendelian Inheritance in Man) (Amberger et al. 2015) and human phenotype data from Human Phenotype Ontology (Köhler et al. 2014, 2017). The web interface allows users to search (using mouse or human data) by gene symbols, names, or IDs; phenotype or disease names or IDs; and genome locations. The query can use one or multiple of these parameters and will accept a file of gene symbols or IDs as well. Figure 8 shows an example search and the result summary returned.

Figure 8

Human-Mouse Disease Connection (www.diseasemodel.org) (Human-Mouse: Disease Connection (HMDC) 2016) example. For navigation to the Human-Mouse Disease Connection page, follow the method shown in Figure 5, panels A and B, and use the Search pull-down menu or the topical area buttons on the MGI Homepage. In this figure, the top panel shows the HMDC homepage with its facile search form. Users select what they wish to search by from a pull-down list that includes: Gene symbol or ID; Gene name; Phenotype or Disease ID; Phenotype or Disease term; Genome location; or Gene File upload. Once a category and value are entered, the user may choose to add additional search parameters. In this example, the Phenotype/Disease term Osteogenesis Imperfecta Congenita; OIC was selected and an additional search for Gene symbols was selected and entered as: Col1a1, col1a2. Finally the selection was made to “or” these fields together. The lower panel shows the resulting grid display where human and mouse orthologs are shown in rows and phenotypes and disease shown in columns. Blue indicates mouse data; orange indicates human data. The highlighted Osteogenesis Imperfecta column shows both human COL1A1 and COL1A2 and mouse Col1a1 and Col1a2 are associated to the disease. In addition, mice mutant for Smpd3 have been used to model Osteogenesis Imperfecta, suggesting that the human orthologous gene might be a candidate gene for a patient without COL1A1 or COL1A2 mutations. Note also, that in the column for Caffey Disease that there is an association to human COL1A1, but no mouse model has shown this association. In this case, a researcher might want to look at (or engineer) a Col1a1 mutation to create a potential model for Caffey Disease.

The Recombinase (cre) Portal

The CrePortal (CrePortal 2016) facilitates identification of the most suitable cre mouse lines for conditional mutagenesis experiments. It describes over 2620 recombinase containing transgenes and knock-in alleles with detailed molecular information and tissue- and age-specific cre activity. Data on cre-expressing mice have been integrated from individual laboratories and large-scale programs including the NIH Neuroscience Blueprint Cre Driver Network (Tsien 2016), the Allen Institute for Brain Science (Madisen et al. 2010), the Pleiades Promoter project (Portales-Casamar et al. 2010), the JAX Cre Resources project (Heffner et al. 2012), GENSAT (Gerfen et al. 2013), and EUCOMMtools (Friedel et al. 2011). Cre activity data are annotated using the Mouse Anatomy ontologies with images of intended and off-target activity. The CrePortal can be searched by the anatomical structure in which recombinase activity was assayed and/or by the driver used to activate the cre recombinase. The search summary provides a list of drivers, the recombinase-containing allele symbols, associated gene and allele name, allele synonym, a list of tissues in which recombinase activity was detected or not detected, the inducible agent if required, links to references, and links to the International Mouse Strain Resource (IMSR) (International Mouse Strain Resource 2016) for locating those strains available through public repositories. Filters allow users to refine search results. Cre allele symbols are linked to the MGI (Mouse Genome Informatics (MGI) 2016l) phenotypic data pages that provide a more detailed cre activity summary and phenotypic information for genotypes involving the cre transgenes and knock-ins. We encourage you to explore the CrePortal and submit your laboratory’s cre line observations for inclusion in the CrePortal at the MGI online submission forms (Mouse Genome Informatics 2016n). A step-by-step tutorial is available linked from the Recombinase (cre) topic area page.

The IMSR

The IMSR (Eppig et al. 2015a; International Mouse Strain Resource 2016) facilitates access to mouse resources and mouse models of human disease that are used for basic and translational research. The unique and pivotal role of IMSR is in unifying information about mouse resource holdings worldwide, including inbred, mutant, and genetically engineered mice maintained as breeding stocks, cryopreserved embryos and gametes, and ES (embryonic stem) cell lines. At the IMSR website (International Mouse Strain Resource 2016) users can search for mouse resources, locate strains for their research, learn details about a strain, order mice from a repository and contact a repository for questions, and link to phenotype and disease model descriptions. (Figure 9).

Figure 9

International Mouse Strain Resource (www.findmice.org) (International Mouse Strain Resource (IMSR) 2016) example. The link to the IMSR search is found on the navy blue navigation bar appearing near the top of all MGI pages. Here, the results page for an IMSR search for Fgfr2 is shown. The results of this search show 41 strains returned, with each strain row providing links from the strain name to the repository’s strain page, links to email the repository or go to its ordering page, and links to information in MGI about the mutant allele carried and to the gene page. Note that new searches can be initiated from this same page by replacing this gene symbol with other gene symbols or strain designations. A new search including additional options (e.g., strain state (ES cell, embryo, live, etc.), strain type (coisogenic strain, congenic strain, etc.), specific repository of choice, or specific mutations type of choice) can be accessed from the IMSR homepage or by choosing “show options” next to the search box.

Mouse Models of Human Cancer: The Mouse Tumor Biology Database (MTB)

MTB (Mouse Models of Human Cancer: The Mouse Tumor Biology Database 2016) is a semi-independent resource under the MGI umbrella of resources. Developed initially under a National Cancer Institute contract and in conjunction with the Mouse Models for Human Cancer Consortium that was initiated in 1998 (Marks 2009), its goal was and continues to be to provide a comprehensive guide to use of the mouse in understanding the genetic basis of human cancer and informing investigations of novel targets for therapeutic intervention. As are other MGI resources, MTB is an expertly curated resource, utilizing nomenclature and vocabulary standards, and includes data curated from the scientific literature, investigator-submitted data, as well as data from large-scale research efforts, and curated metadata from community resources such as Gene Expression Omnibus (Barrett et al. 2013) and Expression Atlas (Petryszak et al. 2016). MTB includes data about spontaneous and induced tumors in inbred strains and genetically modified mouse strains, as well as data on PDX mouse studies (patient derived xenograft created by implanting patient tumors into immunodeficient or humanized mouse hosts) (Bult et al. 2015). MTB contains data from spontaneous or endogenously induced tumors from genetically defined mice including tumor classification, incidence and latency, tumor associated QTL, pathology reports, images and genetic changes in the tumor (somatic), and background strain (germline) genomes. The PDX resource enables searches based on tumor type, cancer diagnosis, and genomic properties of the engrafted tumors. The MTB (Mouse Models of Human Cancer: The Mouse Tumor Biology Database 2016) provides online query tools to facilitate cohesive searches and visualization of these varied data, thus enabling the identification of novel mouse models of human cancer and potential therapeutic treatments.

Summary and Future…

The Encyclopedia of the Mouse precursor project of what would become MGI was originally funded by NIH in 1989 as molecular biology was a rising star, and the audacious possibility of sequencing the human genome was being debated and planned (Robert 1989). The initial effort for MGI was to unify many nascent databases (reflecting the first steps in computerizing genetic data) and provide a visual interface that connected various data sources. The mouse/mammalian research community propelled MGI’s advancement by participating in the genome sequencing revolution and developing large-scale mutagenesis and phenotyping projects to enhance studies of gene function. Through many iterations and transformative changes, MGI has succeeded in its goal of providing an integrated gold-standard knowledgebase for basic researchers, translational investigators, and computational biologists. The National Human Genome Research Institute has released plans to reduce biomedical resource funding for model organism databases in the coming years (see articles by Kaiser (2016) in Science and Hayden (2016) in Nature). In response MGI and the other primary model organisms in the National Human Genome Research Institute’s portfolio have formed a new Alliance for Genome Resources (AGR). Members include MGI, FlyBase (Gramates et al. 2017), Saccharomyces Genome Database (Engel et al. 2016), WormBase (Harris et al. 2014), the Zebrafish Information Network (Howe et al. 2017), Rat Genome Database (Shimoyama et al. 2015), and The Gene Ontology Consortium (The Gene Ontology Consortium 2017). The immediate goals of the AGR are to study the content, relationships, and operating definitions for data in each resource to determine commonalities and differences that affect integration and cross-referencing between systems and to develop a web portal that allows users to traverse multiple model organism data in a shared system. Ultimately a shared infrastructure for these data will emerge for the common types of data. Species-specific data types and methods will need to be maintained so as not to compromise the functionality and data quality that each model organism database provides to its unique user communities. An announcement of the AGR appears on the Genetics Society of America website (Genetics Society of America (GSA) 2016). While AGR members have ongoing collaborations that have lasted for many years, for example, in developing GO, in striving for common nomenclatures for gene orthologs, in adopting common tools such as JBrowse (Skinner et al. 2009), and participating in each other’s Advisory Boards, there remain many significant challenges and differences to overcome including curation policies, data acquisition infrastructure, and paradigms for user interactions for searching and visualization of data. MGI’s involvement in AGR will undoubtedly mean more change, but that is not a new paradigm; change is the name of the game in keeping MGI in tune with the ever-evolving science.

177 in total

Review 1. Cre recombinase: the universal reagent for genome tailoring.

Authors: A Nagy
Journal: Genesis Date: 2000-02 Impact factor: 2.487

2. Generation of gene-modified mice via Cas9/RNA-mediated gene targeting.

Authors: Bin Shen; Jun Zhang; Hongya Wu; Jianying Wang; Ke Ma; Zheng Li; Xueguang Zhang; Pumin Zhang; Xingxu Huang
Journal: Cell Res Date: 2013-04-02 Impact factor: 25.617

Review 3. Mammalian genome targeting using site-specific recombinases.

Authors: Angel Luis García-Otín; Florian Guillou
Journal: Front Biosci Date: 2006-01-01

4. In vivo genome editing improves muscle function in a mouse model of Duchenne muscular dystrophy.

Authors: Christopher E Nelson; Chady H Hakim; David G Ousterout; Pratiksha I Thakore; Eirik A Moreb; Ruth M Castellanos Rivera; Sarina Madhavan; Xiufang Pan; F Ann Ran; Winston X Yan; Aravind Asokan; Feng Zhang; Dongsheng Duan; Charles A Gersbach
Journal: Science Date: 2015-12-31 Impact factor: 47.728

Review 5. Large-scale mutagenesis of the mouse to understand the genetic bases of nervous system structure and function.

Authors: Dan Goldowitz; Wayne N Frankel; Joseph S Takahashi; Martha Holtz-Vitaterna; Carol Bult; Warren A Kibbe; Jay Snoddy; Yanxia Li; Stephanie Pretel; Jeana Yates; Douglas J Swanson
Journal: Brain Res Mol Brain Res Date: 2004-12-20

6. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse.

Authors: Janan T Eppig; Judith A Blake; Carol J Bult; James A Kadin; Joel E Richardson
Journal: Nucleic Acids Res Date: 2011-11-10 Impact factor: 16.971

7. Using whole-genome sequences of the LG/J and SM/J inbred mouse strains to prioritize quantitative trait genes and nucleotides.

Authors: Igor Nikolskiy; Donald F Conrad; Sung Chun; Justin C Fay; James M Cheverud; Heather A Lawson
Journal: BMC Genomics Date: 2015-05-28 Impact factor: 3.969

8. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease.

Authors: Mary Shimoyama; Jeff De Pons; G Thomas Hayman; Stanley J F Laulederkind; Weisong Liu; Rajni Nigam; Victoria Petri; Jennifer R Smith; Marek Tutaj; Shur-Jen Wang; Elizabeth Worthey; Melinda Dwinell; Howard Jacob
Journal: Nucleic Acids Res Date: 2014-10-29 Impact factor: 19.160

9. EuroPhenome and EMPReSS: online mouse phenotyping resource.

Authors: Ann-Marie Mallon; Andrew Blake; John M Hancock
Journal: Nucleic Acids Res Date: 2007-09-28 Impact factor: 16.971

10. Current status and new features of the Consensus Coding Sequence database.

Authors: Catherine M Farrell; Nuala A O'Leary; Rachel A Harte; Jane E Loveland; Laurens G Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M J Searle; Bronwen Aken; Susan M Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A Steward; Garth R Brown; Ruth Bennett; Michael Murphy; Wendy Wu; Mike P Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D Riddick; Toby Hunt; David Webb; Mark Thomas; Pamela Tamez; Sanjida H Rangwala; Kelly M McGarvey; Shashikant Pujar; Andrei Shkeda; Jonathan M Mudge; Jose M Gonzalez; James G R Gilbert; Stephen J Trevanion; Robert Baertsch; Jennifer L Harrow; Tim Hubbard; James M Ostell; David Haussler; Kim D Pruitt
Journal: Nucleic Acids Res Date: 2013-11-11 Impact factor: 16.971

29 in total

Review 1. Animal models to study bile acid metabolism.

Authors: Jianing Li; Paul A Dawson
Journal: Biochim Biophys Acta Mol Basis Dis Date: 2018-05-18 Impact factor: 5.187

2. scREMOTE: Using multimodal single cell data to predict regulatory gene relationships and to build a computational cell reprogramming model.

Authors: Andy Tran; Pengyi Yang; Jean Y H Yang; John T Ormerod
Journal: NAR Genom Bioinform Date: 2022-03-15

3. 1-deoxysphingolipids bind to COUP-TF to modulate lymphatic and cardiac cell development.

Authors: Ting Wang; Zheng Wang; Lauriane de Fabritus; Jinglian Tao; Essa M Saied; Ho-Joon Lee; Bulat R Ramazanov; Benjamin Jackson; Daniel Burkhardt; Mikhail Parker; Anne S Gleinich; Zhirui Wang; Dong Eun Seo; Ting Zhou; Shihao Xu; Irina Alecu; Parastoo Azadi; Christoph Arenz; Thorsten Hornemann; Smita Krishnaswamy; Serge A van de Pavert; Susan M Kaech; Natalia B Ivanova; Fabio R Santori
Journal: Dev Cell Date: 2021-11-10 Impact factor: 12.270

Review 4. Calpains as mechanistic drivers and therapeutic targets for ocular disease.

Authors: Jennifer T Vu; Elena Wang; Jolan Wu; Young Joo Sun; Gabriel Velez; Alexander G Bassuk; Soo Hyeon Lee; Vinit B Mahajan
Journal: Trends Mol Med Date: 2022-05-29 Impact factor: 15.272

5. Multi-Omic Approaches to Identify Genetic Factors in Metabolic Syndrome.

Authors: Karen C Clark; Anne E Kwitek
Journal: Compr Physiol Date: 2021-12-29 Impact factor: 8.915

6. PATHBIO: an international training program for precision mouse phenotyping.

Authors: Jesus Ruberte; Paul N Schofield; Cord Brakebusch; Peter Vogel; Yann Herault; Guillem Gracia; Colin McKerlie; Martin Hrabĕ de Angelis; Michael Hagn; John P Sundberg
Journal: Mamm Genome Date: 2020-02-22 Impact factor: 2.957

7. RUNX2 co-operates with EGR1 to regulate osteogenic differentiation through Htra1 enhancers.

Authors: Qian Zhang; Huanyan Zuo; Shuaitong Yu; Yuxiu Lin; Shuo Chen; Huan Liu; Zhi Chen
Journal: J Cell Physiol Date: 2020-04-23 Impact factor: 6.384

8. DES-Tcell is a knowledgebase for exploring immunology-related literature.

Authors: Ahdab AlSaieedi; Adil Salhi; Faroug Tifratene; Arwa Bin Raies; Arnaud Hungler; Mahmut Uludag; Christophe Van Neste; Vladimir B Bajic; Takashi Gojobori; Magbubah Essack
Journal: Sci Rep Date: 2021-07-12 Impact factor: 4.379

9. GWAS reveals loci associated with velopharyngeal dysfunction.

Authors: Jonathan Chernus; Jasmien Roosenboom; Matthew Ford; Myoung Keun Lee; Beth Emanuele; Joel Anderton; Jacqueline T Hecht; Carmencita Padilla; Frederic W B Deleyiannis; Carmen J Buxo; Eleanor Feingold; Elizabeth J Leslie; John R Shaffer; Seth M Weinberg; Mary L Marazita
Journal: Sci Rep Date: 2018-05-31 Impact factor: 4.379

10. The UCSF Mouse Inventory Database Application, an Open Source Web App for Sharing Mutant Mice Within a Research Community.

Authors: Estelle Wall; Jonathan Scoles; Adriane Joo; Ophir Klein; Carlo Quinonez; Jeffrey O Bush; Gail R Martin; Diana J Laird
Journal: G3 (Bethesda) Date: 2020-05-04 Impact factor: 3.154