| Literature DB >> 29140470 |
.
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 39 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Data Management, RefSeq Functional Elements, genome data download, variation services API, Magic-BLAST, QuickBLASTp, and Identical Protein Groups. Resources that were updated in the past year include the genome data viewer, a human genome resources page, Gene, virus variation, OSIRIS, and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.Entities:
Mesh:
Year: 2018 PMID: 29140470 PMCID: PMC5753372 DOI: 10.1093/nar/gkx1095
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The Entrez Databases (as of 11 September 2017)
| Database | Records | Annual Growth | Description |
|---|---|---|---|
|
|
| ||
| PubMed Central | 4 527 796 | 11.35% | full-text journal articles |
| Books | 584 666 | 10.70% | books and reports |
| PubMed | 27 575 666 | 4.40% | scientific and medical abstracts/citations |
| MeSH | 272 224 | 2.58% | ontology used for PubMed indexing |
| NLM Catalog | 1 568 517 | 1.08% | index of NLM collections |
|
|
| ||
| ClinVar | 329 260 | 106.84% | human variations of clinical significance |
| dbGaP | 260 869 | 16.64% | genotype/phenotype interaction studies |
| PubMed Health | 69 322 | 10.05% | clinical effectiveness, disease and drug reports |
| GTR | 52 685 | 8.38% | genetic testing registry |
| MedGen | 296 696 | 1.49% | medical genetics literature and links |
|
|
| ||
| Genome | 25 433 | 49.94% | genome sequencing projects by organism |
| SRA | 4 464 466 | 44.37% | high-throughput DNA and RNA sequence read archive |
| Assembly | 128 427 | 41.55% | genome assembly information |
| BioSample | 6 921 928 | 32.50% | descriptions of biological source materials |
| SNP | 1 070 203 043 | 30.62% | short genetic variations |
| BioProject | 246 934 | 27.30% | biological projects providing data to NCBI |
| Nucleotide | 244 630 453 | 16.41% | DNA and RNA sequences |
| Taxonomy | 1 752 913 | 8.38% | taxonomic classification and nomenclature catalog |
| dbVar | 6 571 714 | 6.89% | genome structural variation studies |
| GSS | 39 972 895 | 0.90% | genome survey sequences |
| Clone | 38 325 051 | 0.63% | genomic and cDNA clones |
| Probe | 32 406 650 | 0.01% | sequence-based probes and primers |
| BioCollections | 7 246 | N/A* | culture collections, museums, and herbaria |
|
|
| ||
| Gene | 29 441 891 | 20.90% | collected information about gene loci |
| GEO DataSets | 2 299 715 | 14.51% | functional genomics studies |
| PopSet | 281 116 | 9.25% | sequence sets from phylogenetic and population studies |
| EST | 76 444 005 | 0.25% | expressed sequence tag sequences |
| GEO Profiles | 128 414 055 | -.- | gene expression and molecular abundance profiles |
| UniGene | 6 473 284 | -.- | clusters of expressed transcripts |
| HomoloGene | 141 268 | -.- | homologous gene sets for selected organisms |
|
|
| ||
| Protein | 430 457 706 | 39.85% | protein sequences |
| Structure | 132 219 | 8.86% | experimentally-determined biomolecular structures |
| Conserved Domains | 56 066 | 6.97% | conserved protein domains |
| Protein Clusters | 820 546 | 0.00% | sequence similarity-based protein clusters |
| Identical Protein Groups | 141 763 601 | N/A | groups of identical protein sequences |
|
|
| ||
| BioSystems | 983 968 | 11.82% | molecular pathways with links to genes, proteins, and chemicals |
| PubChem Substance | 229 830 586 | 2.99% | deposited substance and chemical information |
| PubChem BioAssay | 1 252 796 | 2.80% | bioactivity screening studies |
| PubChem Compound | 91 752 585 | 0.08% | chemical information with structures, information, and links |
*Database was first released in 2017
Figure 1.Graphical Depiction of Selected Entrez Links. Each cell in the matrix is shaded according to the log (base 10) of the number of records in the source database (rows) that have an Entrez link to the destination database (columns). Diagonal cells represent computational links (e.g. pubmed related articles) and off-diagonal cells assert biological relationships (e.g. nuccore to taxonomy). The matrix is not diagonal because an individual record in a source database may have many links to a destination database (e.g. genome to protein).
Figure 2.Expression profile data for human TRIM15 (gene ID 89870) as displayed on the Gene full report (www.ncbi.nlm.nih.gov/gene/89870).