Literature DB >> 21051359

The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics.

Judith A Blake1, Carol J Bult, James A Kadin, Joel E Richardson, Janan T Eppig.   

Abstract

The Mouse Genome Database (MGD) is the community model organism database for the laboratory mouse and the authoritative source for phenotype and functional annotations of mouse genes. MGD includes a complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) resource. MGD contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. Major improvements to the Mouse Genome Database include comprehensive update of genetic maps, implementation of new classification terms for genome features, development of a recombinase (cre) portal and inclusion of all alleles generated by the International Knockout Mouse Consortium (IKMC).

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21051359      PMCID: PMC3013640          DOI: 10.1093/nar/gkq1008

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The Mouse Genome Database (MGD) is an integrated database of genetic, genomic and phenotypic data for the laboratory mouse (1–3). MGD is a central component of the Mouse Genome Informatics (MGI) database resource (http://www.informatics.jax.org). Other MGI data resources that are integrated with MGD include the Gene Expression Database (GXD) (4), the Mouse Tumor Biology Database (MTB) (5), the Gene Ontology (GO) project (6) and the MouseCyc database of biochemical pathways (7). Data in MGD are updated daily. There are typically four to six major software releases per year to support access and display of new data types. All data and associated utilities are freely and openly available. The primary data maintained in MGD include mouse genes and other genome features along with their function and phenotype annotations, associations of genome features with nucleotide and protein sequences, genetic and physical maps, associations between human diseases and mouse models, SNPs and other polymorphisms, and mammalian homology data. A recent summary of MGD content is shown in Table 1.
Table 1.

Summary of MGD data content (1 September 2010)

MGD data statistics1 September 2010
Genes with nucleotide sequence data28 837
Genes with protein sequence data25 878
Genes with mutant alleles in mice12 900
Genes with experimentally based GO annotations11 257
Mouse/human orthologs17 852
Genes with one or more mutant allelesa19 063
Genes with one or more phenotypic allelesb8766
Total mutant alleles, including gene trapsa570 982
Phenotypic allelesb24 997
Genes with targeted alleles11 940
Gene trapped alleles531 232
Human diseases with one or more mouse models1033
QTLs4473
Number of references157 509
Mouse RefSNPs10 089 892

aMutant alleles include those occurring in mice and/or in ES cell lines.

bPhenotypic alleles include only those mutant alleles present in mice.

Summary of MGD data content (1 September 2010) aMutant alleles include those occurring in mice and/or in ES cell lines. bPhenotypic alleles include only those mutant alleles present in mice. MGI curatorial staff acquires data by direct data loads from other databases, from direct submission from researchers, and from published literature. To facilitate data integration, MGI employs recognized standards for genetic and genomic nomenclature, and provides functional and phenotypic annotations describing mouse genes, sequences, strains, expression data, alleles and phenotypes. All data associations in MGD are supported with evidence and citations. Researchers can access MGD data using keyword or ID-based searches, multi-value integrated queries and programmatically using web services. MGD provides vocabulary browsers to support access to database content via GO annotations, Mammalian Phenotype (MP) (8) annotations and Human Disease Term annotations using OMIM (9). The MGI MouseBLAST server allows users to interrogate the MGI database using nucleotide and/or protein sequences. Access to data in MGD is also facilitated by a variety of tab-delimited database reports that are updated nightly and that are available for download via FTP. MGD collaborates with other large genome informatics resources (i.e. NCBI, Ensembl, UniProt, HGNC) to curate and maintain a comprehensive catalog of mouse genes and other genome features, and to resolve inconsistencies in the representation of mouse genome features. Biological annotations for mouse genes based on MGD curation are incorporated into scores of external informatics resources and software products.

NEW IN 2010

Update genetic map positions

The genetic map (i.e. centiMorgan; cM) positions for genes and markers in MGI have been updated using the data and methods described in Cox et al. (10). The revised standard genetic map described in Cox et al. incorporates over 10 000 single nucleotide polymorphisms (SNPs) using a set of 47 families of a heterogeneous mouse population comprising over 3500 meioses. The revised map corrects errors in marker order in earlier consensus genetic maps for the laboratory mouse. The Cox map integrates simple sequence length polymorphisms (SSLP) markers from other genetic maps and with physical maps of the mouse genome. Linear interpolation was used to translate mouse genome coordinates (NCBI Build 37) for genes and markers in MGI to sex-averaged cM locations. The update to the Cox map resulted in the addition of cM locations for over 35 000 genes and genetic markers, almost doubling the number of markers with cM positions. Approximately 11 000 genes and markers in MGI that did not have genome coordinates were not updated to new cM positions; however, the original mapping data for these markers can still be found in the mapping experiment detail pages.

Classification terms for genome features

We have implemented new classification terms for genome features that improve the user’s ability to search for specific categories features (e.g. protein-coding gene, non-coding gene, heritable phenotype, etc.). The new genome classifications are accessible from the Genes and Markers Query Form (Figure 1) as well as the MGI instance of BioMart. Most of the classification terms and definitions are derived from the Sequence Ontology (SO) (11) project.
Figure 1.

New classification terms for MGD markers and genome features. The definitions for the terms are displayed when a user ‘mouses over’ a term. Numbers following the term are the current number of entities in that class within MGD. Updated nightly.

New classification terms for MGD markers and genome features. The definitions for the terms are displayed when a user ‘mouses over’ a term. Numbers following the term are the current number of entities in that class within MGD. Updated nightly.

Represent mutant alleles generated by the International Knockout Mouse Consortium

The International Knockout Mouse Consortium (IKMC) (12–14), a consortium composed of KOMP (KnockOut Mouse Project) in the USA, EUCOMM (EUropean Conditional Mouse Mutagenesis Program) in Europe, NorCOMM (North American Conditional Mouse Mutagenesis Project) in Canada and TIGM (the Texas Institute of Genomic Medicine) in the US. The goal of IKMC is to use gene-targeting and gene-trapping technologies in mouse ES cells to mutate all protein-coding genes in the genome and to make these resources available to the scientific community. As new mutations are made in ES cells, alleles are created and accessioned in MGI. Additional information available includes description of the molecular mutation and the ES cell line IDs associated with the allele. Currently over 74 000 alleles in 14 800 genes have been loaded into MGI from the IKMC projects. Plans are underway to incorporate data for those alleles that have been made into mice and phenotyped, so that comparative phenotype analysis can be done with these mutants in the context of all other known mouse phenotypic mutations.

Recombinase (cre) portal

Many of the new alleles being created by the IKMC are ‘conditional-ready’; that is by mating a mouse carrying such an allele to a recombinase bearing transgenic or knockin mouse, a conditional genotype can be produced. These conditional genotypes will have the gene of interest ‘knockedout’ in specific tissues or at specific developmental stages, thus allowing finer analysis of gene function and mitigating potential lethality of effects of a null allele during development. Knowledge of the expression and specificity of the recombinase transgene or knockin allele is key to selecting the appropriate mouse to use in generating conditional genotypes. MGI has released a Recombinase (cre) Data Portal that specifically addresses this need (www.creportal.org). Through this portal, users can access information about all existing cre transgenes and knockins. Data include molecular description of the cre transgene or knockin, the driver / promoter used, inducibility information, publications and availability of cre mice through the IMSR (www.findmice.org, Figure 2). Detailed data, including annotated images showing cre activity/expression for the tissues analyzed are being added as available. Access to phenotypes displayed by cre-deleted mice is provided via integration with MGI’s phenotype data. Currently, there are over 1260 recombinase-containing transgenes and knockin alleles cataloged in the Recombinase (cre) portal.
Figure 2.

Details for the specificity of the recombinase bearing knockin allele, Tgfb3tm1(cre)Vk in sensory organs. Information shown includes molecular description, links to strain availability, other tissues showing recombinase activity and a gallery of images for Tgfb3tm1(cre)Vk in sensory organs. Arrow shows how images may be moved and enlarged to enable better inspection. The table in the lower portion shows detailed annotations for the sensory organ recombinase activities.

Details for the specificity of the recombinase bearing knockin allele, Tgfb3tm1(cre)Vk in sensory organs. Information shown includes molecular description, links to strain availability, other tissues showing recombinase activity and a gallery of images for Tgfb3tm1(cre)Vk in sensory organs. Arrow shows how images may be moved and enlarged to enable better inspection. The table in the lower portion shows detailed annotations for the sensory organ recombinase activities.

Other functional updates and changes

Several minor changes to MGD were incorporated this year including a series of updates to the gene detail pages in regards to integration with other major providers of sequence and gene model data. For example, links are now provided to the underlying evidence that supports gene predictions from VEGA (15), Ensembl (16) and NCBI (17). In addition, if there is a discrepancy in the biotype classification for a gene prediction (i.e. gene versus pseudogene), a ‘biotype conflict’ note now appears on the gene detail page in MGI (Figure 3). The transcript and protein sequences for VEGA and Ensembl gene predictions were incorporated into MGI and can be downloadable from the sequence summary report for each gene record.
Figure 3.

Screenshot showing a biotype conflict note for the Cecr6 gene. In this instance, the Ensembl annotation pipeline has assigned a status of ‘pseudogene’ to Cecr6 and the NCBI annotation pipeline has assigned it a status of ‘protein-coding gene.’ MGI provides links to the underlying evidence for both gene predictions so that users can examine the evidence used to support the gene structure and biotype assignments by different annotation groups.

Screenshot showing a biotype conflict note for the Cecr6 gene. In this instance, the Ensembl annotation pipeline has assigned a status of ‘pseudogene’ to Cecr6 and the NCBI annotation pipeline has assigned it a status of ‘protein-coding gene.’ MGI provides links to the underlying evidence for both gene predictions so that users can examine the evidence used to support the gene structure and biotype assignments by different annotation groups. We now also supply links to Protein Ontology (18) annotations. The PRO provides an ID for each type of protein including protein variants, isoforms and modified forms. As a member of the Protein Ontology Consortium, we are providing detailed annotations for mouse isoforms (in particular). We are also working with the MouseCyc group and PRO to provide specific representations for protein complexes including the exact descriptions and accession IDs for each protein form found in a protein complex. We envision that this approach will eventually support functional annotations to specific proteins and protein complexes rather than to the more generic ‘gene’. As genome sequence data emerges for strains of mice other than the C57BL/6J reference genome, it becomes possible to identify strain-specific genes. MGI now provides a ‘strain specific genome feature’ note for these features. For, example, the renin 2 (Ren2; MGI:97899) gene is not present in the reference genome but is found in the genomes of other strains of mice.

OTHER INFORMATION

Mouse gene, allele and strain nomenclature

MGD is the authoritative source of symbols and names for mouse genes, alleles and strains. The nomenclature in MGD follows the guidelines set by the ‘International Committee on Standardized Genetic Nomenclature for Mice’ (http://www.informatics.jax.org/nomen). This official nomenclature is widely disseminated through regular data exchange and curation of shared links between MGI and other bioinformatics resources. MGD staff members work with editors of journal publications to promote adherence to mouse nomenclature standards in publications. To support consistency of nomenclature across multiple mammalian species, members of the MGD nomenclature group coordinate gene names and symbols with nomenclature specialists from the Human Gene Nomenclature Committee (HGNC) (19) (http://www.genenames.org/) and the rat genome database (RGD) (20) (http://rgd.mcw.edu). The MGD nomenclature coordinator can be contacted by email (nomen@informatics.jax.org).

Programmatic and bulk data access

Programmatic access is available to select portions of the database through two routes. First, the MGI Web Service accepts SOAP 1.1 and 1.2 requests. For details, see http://www.informatics.jax.org/mgihome/other/web_service.shtml. Second, the MGD BioMart (http://biomart.informatics.jax.org/) is accessible through MartServices. See http://www.biomart.org/martservice.html information on MartServices. In addition bulk data sets are available for download via FTP reports (ftp://ftp.informatics.jax.org) and via the MGI Batch Query (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=batchQF).

Electronic data submission

MGD accepts contributed data sets from individuals and organizations for any type of data maintained by the database. The most frequent types of contributed data are mutant and phenotypic allele information originating with the large mouse mutagenesis centers and repositories that contribute to the International Mouse Strain Resource [IMSR, http://www.imsr.org, (21)]. Each electronic submission receives a permanent database accession ID. All data sets are associated with their source, either a publication or an electronic submission reference. Details about data submission procedures can be found at http://www.informatics.jax.org/mgihome/submissions/submissions_menu.shtml. Suggestions and corrections to the representation of data and information in MGD can be submitted using the ‘Your Input Welcome’ link which appears in the upper right hand corner of gene and allele detail pages.

Community outreach and user support

The MGD resource has full time staff members who are dedicated to user support and training. Members of the User Support team can be contacted via e-mail, web requests, phone or FAX. MGD User Support staff are available for on-site training on the use of MGD and other MGI data resources. The traveling tutorial program includes lectures, demos and hands-on tutorials that can be customized according to the research interests of the audience. MGI-LIST (http://www.informatics.jax.org/mgihome/lists/lists.shtml) is a moderated and active email bulletin board supported by the MGD User Support group. The MGI listserve has over 2100 subscribers. On average there are three posts per day, every day.

HIGH LEVEL OVERVIEW OF THE MAIN COMPONENTS AND IMPLEMENTATION

MGD is implemented in the Sybase relational database management system with ∼180 tables within which the biological information is stored. BLAST-able databases and genome assembly files for sequence data are stored outside the relational database. An editing interface (EI) and automated load programs are used to input data into the MGD system. The EI is an interactive, graphical application used by curators. Automated load programs that integrate larger data sets from many sources into the database include quality control (QC) checks and processing algorithms that integrate the bulk of the data automatically and identify issues to be resolved by curators or the data provider. Thus, through EI and automated loads, we acquire and integrate large amounts of data into a high quality, knowledgebase. Public data access to MGD is provided primarily through the web interface (WI) where users can interactively query and download our data through a web browser. MouseBLAST allows users to do sequence similarity searches against a variety of rodent sequence databases that are updated weekly from selected sequence databases from NCBI, UniProt and other providers. Mouse GBrowse allows users to visualize mouse data sets against the genome as a series of linear tracks. All MGD files and programs are openly and freely available. We continue to provide MGD BioMart with the addition of new classification terms for genome features. MGD BioMart is updated on a weekly basis. MGD BioMart supports chaining to several other BioMarts including Ensembl, VEGA and RGD. Additional functionalities such as the ability to filter by GO, MP and OMIM terms and including additional information about alleles are planned for future extensions.

CITING MGD

For a general citation of the MGI resource please cite this article. In addition, the following citation format is suggested when referring to data sets specific to the MGD component of MGI: MGD, MGI, The Jackson Laboratory, Bar Harbor, Maine (URL: http://www.informatics.jax.org) [Type in date (month, year) when you retrieved the data cited].

FUNDING

National Institutes of Health/National Human Genome Research Institute, The Mouse Genome Database (grant HG000330). Funding for open access charge: (grant HG000330). Conflict of interest statement. None declared.
• World wide web:http://www.informatics.jax.org/mgihome/support/ support.shtml
• E-mail access:mgi-help@informatics.jax.org
• Telephone access:+1 207 288 6445
• Fax access:+1 207 288 6132
  21 in total

1.  A new standard genetic map for the laboratory mouse.

Authors:  Allison Cox; Cheryl L Ackert-Bicknell; Beth L Dumont; Yueming Ding; Jordana Tzenova Bell; Gudrun A Brockmann; Jon E Wergedal; Carol Bult; Beverly Paigen; Jonathan Flint; Shirng-Wern Tsaih; Gary A Churchill; Karl W Broman
Journal:  Genetics       Date:  2009-06-17       Impact factor: 4.562

2.  Finding a mouse: the International Mouse Strain Resource (IMSR).

Authors:  J T Eppig; M Strivens
Journal:  Trends Genet       Date:  1999-02       Impact factor: 11.639

Review 3.  The mammalian phenotype ontology: enabling robust annotation and comparative analysis.

Authors:  Cynthia L Smith; Janan T Eppig
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2009 Nov-Dec

4.  Ensembl Genomes: extending Ensembl across the taxonomic space.

Authors:  P J Kersey; D Lawson; E Birney; P S Derwent; M Haimel; J Herrero; S Keenan; A Kerhornou; G Koscielny; A Kähäri; R J Kinsella; E Kulesha; U Maheswari; K Megy; M Nuhn; G Proctor; D Staines; F Valentin; A J Vilella; A Yates
Journal:  Nucleic Acids Res       Date:  2009-11-01       Impact factor: 16.971

Review 5.  The Mouse Tumor Biology database.

Authors:  Debra M Krupke; Dale A Begley; John P Sundberg; Carol J Bult; Janan T Eppig
Journal:  Nat Rev Cancer       Date:  2008-04-24       Impact factor: 60.716

6.  The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium.

Authors:  Martin Ringwald; Vivek Iyer; Jeremy C Mason; Kevin R Stone; Hamsa D Tadepally; James A Kadin; Carol J Bult; Janan T Eppig; Darren J Oakley; Sebastien Briois; Elia Stupka; Vincenza Maselli; Damian Smedley; Songyan Liu; Jens Hansen; Richard Baldock; Geoff G Hicks; William C Skarnes
Journal:  Nucleic Acids Res       Date:  2010-10-06       Impact factor: 16.971

7.  genenames.org: the HGNC resources in 2011.

Authors:  Ruth L Seal; Susan M Gordon; Michael J Lush; Mathew W Wright; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2010-10-06       Impact factor: 16.971

8.  The mouse Gene Expression Database (GXD): 2007 update.

Authors:  Constance M Smith; Jacqueline H Finger; Terry F Hayamizu; Ingeborg J McCright; Janan T Eppig; James A Kadin; Joel E Richardson; Martin Ringwald
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

9.  MouseCyc: a curated biochemical pathways database for the laboratory mouse.

Authors:  Alexei V Evsikov; Mary E Dolan; Michael P Genrich; Emily Patek; Carol J Bult
Journal:  Genome Biol       Date:  2009-08-14       Impact factor: 13.583

10.  The vertebrate genome annotation (Vega) database.

Authors:  L G Wilming; J G R Gilbert; K Howe; S Trevanion; T Hubbard; J L Harrow
Journal:  Nucleic Acids Res       Date:  2007-11-14       Impact factor: 16.971

View more
  156 in total

1.  Discovery Genetics - The History and Future of Spontaneous Mutation Research.

Authors:  Muriel T Davisson; David E Bergstrom; Laura G Reinholdt; Leah Rae Donahue
Journal:  Curr Protoc Mouse Biol       Date:  2012-06-01

Review 2.  New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models.

Authors:  Paul N Schofield; John P Sundberg; Robert Hoehndorf; Georgios V Gkoutos
Journal:  Brief Funct Genomics       Date:  2011-09       Impact factor: 4.241

3.  Isolation and proteomic characterization of the mouse sperm acrosomal matrix.

Authors:  Benoit Guyonnet; Masoud Zabet-Moghaddam; Susan SanFrancisco; Gail A Cornwall
Journal:  Mol Cell Proteomics       Date:  2012-06-15       Impact factor: 5.911

Review 4.  New routes for transgenesis of the mouse.

Authors:  José E Belizário; Priscilla Akamini; Philip Wolf; Bryan Strauss; José Xavier-Neto
Journal:  J Appl Genet       Date:  2012-05-09       Impact factor: 3.240

5.  Serotonergic neuron regulation informed by in vivo single-cell transcriptomics.

Authors:  Jennifer M Spaethling; David Piel; Hannah Dueck; Peter T Buckley; Jacqueline F Morris; Stephen A Fisher; Jaehee Lee; Jai-Yoon Sul; Junhyong Kim; Tamas Bartfai; Sheryl G Beck; James H Eberwine
Journal:  FASEB J       Date:  2013-11-05       Impact factor: 5.191

6.  Mitochondrial amidoxime-reducing component 2 (MARC2) has a significant role in N-reductive activity and energy metabolism.

Authors:  Sophia Rixen; Antje Havemeyer; Anita Tyl-Bielicka; Kazimiera Pysniak; Marta Gajewska; Maria Kulecka; Jerzy Ostrowski; Michal Mikula; Bernd Clement
Journal:  J Biol Chem       Date:  2019-09-25       Impact factor: 5.157

7.  On the footsteps of Triadin and its role in skeletal muscle.

Authors:  Claudio F Perez
Journal:  World J Biol Chem       Date:  2011-08-26

Review 8.  Mouse genetic and phenotypic resources for human genetics.

Authors:  Paul N Schofield; Robert Hoehndorf; Georgios V Gkoutos
Journal:  Hum Mutat       Date:  2012-05       Impact factor: 4.878

9.  Prioritization of Candidate Genes for Congenital Diaphragmatic Hernia in a Critical Region on Chromosome 4p16 using a Machine-Learning Algorithm.

Authors:  Danielle A Callaway; Ian M Campbell; Samantha R Stover; Andres Hernandez-Garcia; Shalini N Jhangiani; Jaya Punetha; Ingrid S Paine; Jennifer E Posey; Donna Muzny; Kevin P Lally; James R Lupski; Chad A Shaw; Caraciolo J Fernandes; Daryl A Scott
Journal:  J Pediatr Genet       Date:  2018-05-30

10.  Isomer-specific LC/MS and LC/MS/MS profiling of the mouse serum N-glycome revealing a number of novel sialylated N-glycans.

Authors:  Serenus Hua; Ha Neul Jeong; Lauren M Dimapasoc; Inae Kang; Chanyoung Han; Jong-Soon Choi; Carlito B Lebrilla; Hyun Joo An
Journal:  Anal Chem       Date:  2013-04-18       Impact factor: 6.986

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.