| Literature DB >> 23193287 |
Dennis A Benson1, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, Eric W Sayers.
Abstract
GenBank® (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.Entities:
Mesh:
Year: 2012 PMID: 23193287 PMCID: PMC3531190 DOI: 10.1093/nar/gks1195
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank divisions (nucleotide base pairs)
| Division | Description | Release 191 (8/2012) | Annual increase (%) |
|---|---|---|---|
| Taxnomic divisons | |||
| SYN | Synthetic | 928 200 038 | 494.2% |
| PHG | Phages | 84 079 451 | 34.4% |
| ENV | Environmental samples | 3 374 433 548 | 32.1% |
| VRL | Viruses | 1 429 464 786 | 21.1% |
| BCT | Bacteria | 8 439 854 434 | 21.0% |
| PLN | Plants | 5 481 470 133 | 15.6% |
| MAM | Other mammals | 863 036 872 | 6.9% |
| VRT | Other vertebrates | 2 886 594 595 | 6.7% |
| PRI | Primates | 6 317 656 773 | 3.3% |
| UNA | Unannotated | 127 803 | 1.5% |
| ROD | Rodents | 4 435 106 948 | 0.9% |
| INV | Invertebrates | 2 493 058 927 | −1.7% |
| Functional divisions | |||
| TSA | Transcriptome shotgun data | 5 759 588 580 | 207.3% |
| WGS | Whole-genome shotgun data | 308 196 411 905 | 47.9% |
| PAT | Patented sequences | 12 118 622 726 | 8.6% |
| GSS | Genome survey sequences | 21 947 780 105 | 5.7% |
| EST | Expressed sequence tags | 40 888 051 100 | 4.8% |
| HTG | High-throughput genomic | 24 359 210 558 | 0.1% |
| STS | Sequence tagged sites | 636 262 446 | 0.1% |
| HTC | High-throughput cDNA | 639 165 410 | −3.5% |
| TOTAL | All GenBank sequences | 451 278 177 138 | 33.1% |
aMeasured relative to Release 185 (8/2011).
Top organisms in GenBank (Release 191)
| Organism | Non-WGS base pairs |
|---|---|
| 16 310 774 187 | |
| 9 974 977 889 | |
| 6 521 253 272 | |
| 5 386 258 455 | |
| 5 062 731 057 | |
| 4 887 861 860 | |
| 3 120 857 462 | |
| 1 435 236 534 | |
| 1 256 203 101 | |
| 1 255 686 573 | |
| 1 249 938 611 | |
| 1 197 357 811 | |
| 1 144 226 616 | |
| 1 119 965 220 | |
| 1 008 323 292 | |
| 999 010 073 | |
| 951 238 343 | |
| 906 638 854 | |
| 899 631 338 | |
| 898 689 329 |
Retrieval databases containing GenBank data
| Division | Entrez database | BLAST database |
|---|---|---|
| BCT, ENV, INV, MAM, PHG, PLN, PRI, ROD, SYN, UNA, VRL, VRT | nucleotide | nr |
| EST | est | est |
| GSS | gss | gss |
| HTC | nucleotide | nr |
| HTG | nucleotide | htg |
| PAT | nucleotide | pat |
| STS | nucleotide | dbsts |
| TSA | nucleotide | tsa |
| WGS | nucleotide | wgs |