| Literature DB >> 24217914 |
Dennis A Benson1, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, Eric W Sayers.
Abstract
GenBank is a comprehensive database that contains publicly available nucleotide sequences for over 280,000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.Entities:
Mesh:
Year: 2013 PMID: 24217914 PMCID: PMC3965104 DOI: 10.1093/nar/gkt1030
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank divisions (nucleotide base pairs)
| Division | Description | Release 197 (8/2013) | Annual increase (%) |
|---|---|---|---|
| WGS | Whole-genome shotgun data | 500 420 412 665 | 62.4 |
| TSA | Transcriptome shotgun data | 8 633 123 935 | 49.9 |
| PHG | Phages | 119 812 712 | 42.5 |
| VRL | Viruses | 1 757 202 472 | 22.9 |
| BCT | Bacteria | 10 281 048 518 | 21.8 |
| ENV | Environmental samples | 3 743 277 434 | 10.9 |
| INV | Invertebrates | 2 737 140 646 | 9.8 |
| PAT | Patented sequences | 13 290 161 247 | 9.7 |
| PLN | Plants | 5 963 882 822 | 8.8 |
| GSS | Genome survey sequences | 23 726 384 753 | 8.1 |
| VRT | Other vertebrates | 3 068 956 026 | 6.3 |
| MAM | Other mammals | 911 342 025 | 5.6 |
| HTG | High-throughput genomic | 25 184 819 955 | 3.4 |
| HTC | High-throughput cDNA | 656 196 063 | 2.7 |
| UNA | Unannotated | 130 510 | 2.1 |
| EST | Expressed sequence tags | 41 665 629 009 | 1.9 |
| PRI | Primates | 6 425 093 034 | 1.7 |
| SYN | Synthetic | 941 078 074 | 1.4 |
| ROD | Rodents | 4 451 315 297 | 0.4 |
| STS | Sequence tagged sites | 636 326 479 | 0.0 |
| TOTAL | All GenBank sequences | 654 613 333 676 | 45.1 |
aMeasured relative to Release 191 (8/2012).
Top organisms in GenBank (Release 197)
| Organism | Non-WGS base pairs |
|---|---|
| 17 111 514 261 | |
| 9 982 065 736 | |
| 6 524 450 090 | |
| 5 389 432 575 | |
| 5 071 648 554 | |
| 4 889 229 566 | |
| 3 119 512 595 | |
| 1 463 247 509 | |
| 1 435 237 072 | |
| 1 263 872 842 | |
| 1 256 717 300 | |
| 1 249 741 450 | |
| 1 198 798 076 | |
| 1 152 899 341 | |
| 1 139 790 400 | |
| 1 127 199 957 | |
| 1 069 944 084 | |
| 1 008 818 677 | |
| 966 234 744 | |
| 952 526 510 |