| Literature DB >> 22139910 |
Abstract
The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.Entities:
Mesh:
Year: 2011 PMID: 22139910 PMCID: PMC3245000 DOI: 10.1093/nar/gkr1178
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(a) Total growth of the taxonomy database. This includes formal and informal taxa at all levels, from unranked isolate-level taxids added for the influenza genome project to genera, families and higher taxa. (b) Valid species in the taxonomy database. This includes only valid binomial and trinomial species, subspecies, varietas and forma (infraspecific taxa with standing in the nomenclature). The viruses and bacteria are basically flat in this figure, since the rate-limiting step is the description of new species, not the sequencing.
Duplicated binomials in the sequence database
| wasp | AJ302786 | |
| conifer | U96478 | |
| angiosperm | AY398512 | |
| cricket | ||
| angiosperm | AY398526 | |
| cricket | ||
| copepod | AY015993 and CQ977721 | |
| angiosperm | DQ227206 | |
| angiosperm | AY145176, etc. | |
| fly | FJ435902 | |
| moss | AY039052 and AY039077 | |
| land snail | HQ328315 and HQ328433 |
TAXON name types
| Scientific name | Exactly one per node |
| Synonym | |
| Acronym | |
| anamorph | Asexual fungal name |
| teleomorph | Sexual fungal name |
| misspelling | Data not shown on public pages |
| misnomer | |
| equivalent name | |
| Includes | |
| in-part | |
| blast name | |
| Common name | |
| genbank common name | At most one per node |
| Genbank synonym | At most one per node |
| Genbank acronym | At most one per node |
| Genbank anamorph | At most one per node |
| unpublished name | Data not shown on public pages |
| Authority |
Unless otherwise specified, each name type may appear any number of times at a given node.
Figure 2.The taxonomy portlet in Nucleotide Entrez. This particular display summarizes the taxonomic distribution of plant sequences released in 2011, given by the Entrez query viridiplantae[orgn] AND 2011[pdat]. http://www.ncbi.nlm.nih.gov/nuccore?term=viridiplantae[orgn]+AND+2011[pdat] The taxonomy portlet toggles between a list of top taxa by entry count in the Entrez results list, and the taxonomic overview shown above.
Some useful Entrez queries
| all [filter] | Retrieves everthing |
| Specified [property] | Formal binomial and trinomial |
| at or below species level [property] | |
| family [rank] | Rank-based query |
| taxonomy genome [filter] | Taxa with a direct link to a genome sequence |
| 2009/10/21:2020 [date] | Date-bounded query |
| mammalia [subtree] | All taxa within the Mammalia |
| extinct [property] | Extinct organisms |
| Terminal [property] | Terminal nodes in the tree |
| loprovencyclife [filter] | Entries with LinkOut links to the Encyclopedia of Life |
These can be combined in Boolean expressions, e.g. mammalia [subtree] AND specified [prop] AND subspecies [rank] AND 2009 [date].
Figure 3.Taxonomy browser page for the Mammalia. Exploded and unexploded links to other Entrez database are shown in ‘Entrez records’. LinkOut links to external databases are displayed below the Comments and References (data not shown).