| Literature DB >> 31713623 |
Jennifer R Smith1, G Thomas Hayman1, Shur-Jen Wang1, Stanley J F Laulederkind1, Matthew J Hoffman1,2, Mary L Kaldunski1, Monika Tutaj1, Jyothi Thota1, Harika S Nalabolu1, Santoshi L R Ellanki1, Marek A Tutaj1, Jeffrey L De Pons1, Anne E Kwitek2, Melinda R Dwinell2, Mary E Shimoyama1.
Abstract
Formed in late 1999, the Rat Genome Database (RGD, https://rgd.mcw.edu) will be 20 in 2020, the Year of the Rat. Because the laboratory rat, Rattus norvegicus, has been used as a model for complex human diseases such as cardiovascular disease, diabetes, cancer, neurological disorders and arthritis, among others, for >150 years, RGD has always been disease-focused and committed to providing data and tools for researchers doing comparative genomics and translational studies. At its inception, before the sequencing of the rat genome, RGD started with only a few data types localized on genetic and radiation hybrid (RH) maps and offered only a few tools for querying and consolidating that data. Since that time, RGD has expanded to include a wealth of structured and standardized genetic, genomic, phenotypic, and disease-related data for eight species, and a suite of innovative tools for querying, analyzing and visualizing this data. This article provides an overview of recent substantial additions and improvements to RGD's data and tools that can assist researchers in finding and utilizing the data they need, whether their goal is to develop new precision models of disease or to more fully explore emerging details within a system or across multiple systems.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31713623 PMCID: PMC7145519 DOI: 10.1093/nar/gkz1041
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The Rat Genome Database at 20. RGD has evolved over twenty years from a simple data repository offering a few data types and a small number of tools, to a multispecies knowledgebase offering numerous data types, utilizing 15 ontologies, housing over 130 000 references used in house for creating manual annotations across data types and species, and integrating a wealth of data imported by over 80 automated pipelines. In addition, RGD has developed a suite of innovative tools for searching, visualization and analysis.
Comparison of the number of data records in RGD from 2000 to 2019 by data type and species
| SPECIES | Rat | Human | Mouse | Chinchilla | Bonobo | Dog | Squirrel | Pig | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | 2000 | 2019 | 2000 | 2019 | 2000 | 2019 | 2019 | 2019 | 2019 | 2019 | 2019 |
| GENES | 1987 | 45 816 | 14 | 40 984 | 857 | 53 724 | 29 971 | 33 712 | 36 850 | 26 325 | 30 414 |
| MARKERS (SSLPS + ESTS) | 19 562 | 50 130 | 3 20 143 | 55 165 | |||||||
| STRAINS | 76 | 3740 | |||||||||
| QTLS | 2378 | 1911 | 6335 | ||||||||
| PROTEINS | 36 262 | 1 82 299 | 88 294 | 99 | 43 648 | 29 696 | 25 485 | 3 33 908 | |||
| MAPS/ASSEMBLIES | 5 | 12 | 1 | 19 | 11 | 1 | 2 | 2 | 1 | 3 | |
| CELL_LINES | 41 | ||||||||||
| PROMOTERS | 12 720 | 63 992 | 57 546 | 7 545 | |||||||
| TRANSCRIPTS | 1 64 769 | 2 11 136 | 1 78 349 | 75 934 | 62 481 | 98 852 | 50 117 | 75 517 | |||
| VARIANTS | >600 000 000 | 6 00 981 | |||||||||
| REFERENCES (species is not assigned for references) | 12 | 1 31 295 | |||||||||
In 2000, the major data type in RGD was rat markers, comprised of both expressed sequence tags (ESTs) and simple sequence length polymorphisms (SSLPs). The maps in 2000 included only genetic and RH maps and the cytogenetic map. In 2019, RGD stores and presents data across eight species, with genes, transcripts and proteins for all species, and variants for rat and human comprising the largest datasets. Note that RGD does not currently track rat variants across assemblies or between strains.
Figure 2.MOET: the Multi Ontology Enrichment Tool Result Page. Seventy seven rat genes annotated to hypotension in RGD were submitted to the MOET tool. The result page shows the number of genes that matched the input list of symbols and when the ‘Symbols Found’ button is clicked, the popup shows the list of gene symbols (A). This same list of genes can be submitted to other tools at RGD using the ‘All Analysis Tools’ link. Click on the tabs across the top of the page to switch the display between species and/or between ontologies. Here, the default selections of ‘Rat’ and ‘Disease Ontology’ have been made (B). The table on the left side of the display shows the enriched terms with the number of genes annotated to each term, its P-value and its Bonferroni corrected P-value. The default is to show terms sorted by P-value but options are given to sort on any of the columns in the table (C). The graph on the right side of the display shows the number of genes annotated to each term in blue and the corresponding P-values in orange. The P-values displayed in the graph are limited to 0.05 or less by default but a dropdown at the top of the graph allows the user to change this value (D).
Figure 3.The OntoMate text mining-based literature search tool. The OntoMate tool uses text mining and natural language processing to find and tag abstracts with ontology terms, gene symbols and names, species terms and mutation designations. The search interface (A) gives options to specify any of these, as well as searching or filtering the results by date, PubMed ID, abstract title, author or keyword(s). The Query Result page (B) shows each abstract with its tags and provides the user with additional filtering options. See Supplementary Figure S6 for a more complete description.
Figure 4.The Developmental Disease Portal. The top of the page in each disease portal shows the name of the portal and indicates which species' data is being shown. The data showing in the portal can be changed by selecting a different data type, e.g. Phenotypes or Pathways versus Disease, using the buttons in the top panel (A), or a different species by selecting a picture in the second panel (B). For more information and a full view of a portal page, see Supplementary Figure S7.