| Literature DB >> 22144687 |
Dennis A Benson1, Ilene Karsch-Mizrachi, Karen Clark, David J Lipman, James Ostell, Eric W Sayers.
Abstract
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 250,00 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.Entities:
Mesh:
Year: 2011 PMID: 22144687 PMCID: PMC3245039 DOI: 10.1093/nar/gkr1202
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank divisions (nucleotide base pairs)
| Division | Description | Release 185 (8/2011) | Annual increase (%) |
|---|---|---|---|
| TSA | Transcriptome shotgun data | 1 874 047 448 | 370.1 |
| ENV | Environmental samples | 2 553 693 157 | 48.2 |
| PHG | Phages | 62 579 756 | 44.0 |
| PAT | Patented sequences | 11 154 487 762 | 30.9 |
| BCT | Bacteria | 6 975 597 755 | 30.8 |
| INV | Invertebrates | 2 535 336 197 | 24.5 |
| WGS | Whole-genome shotgun data | 208 315 831 132 | 23.1 |
| VRL | Viruses | 1 180 083 600 | 21.6 |
| MAM | Other mammals | 807 098 397 | 18.8 |
| PLN | Plants | 4 741 991 057 | 17.4 |
| GSS | Genome survey sequences | 20 770 772 329 | 12.6 |
| SYN | Synthetic | 156 218 063 | 9.6 |
| VRT | Other vertebrates | 2 705 250 711 | 6.8 |
| EST | Expressed sequence tags | 39 018 185 344 | 6.0 |
| UNA | Unannotated | 125 912 | 4.7 |
| PRI | Primates | 6 116 546 725 | 2.9 |
| ROD | Rodents | 4 396 957 541 | 2.3 |
| HTC | High-throughput cDNA | 662 320 919 | 0.4 |
| STS | Sequence tagged sites | 635 872 683 | 0.3 |
| HTG | High-throughput genomic | 24 324 068 445 | 0.2 |
| TOTAL | All GenBank sequences | 338 987 064 933 | 18.2 |
aMeasured relative to Release 179 (8/2010).
Top Organisms in GenBank (Release 185)
| Organism | Non-WGS base pairs |
|---|---|
| 15 881 839 899 | |
| 9 118 049 806 | |
| 6 503 434 302 | |
| 5 381 235 474 | |
| 5 055 840 446 | |
| 4 793 300 236 | |
| 3 127 958 433 | |
| 1 352 948 327 | |
| 1 251 053 810 | |
| 1 194 842 997 | |
| 1 147 237 486 | |
| 1 138 511 865 | |
| 1 058 563 193 | |
| 1 003 309 475 | |
| 947 332 578 | |
| 915 431 680 | |
| 896 784 038 | |
| 895 052 594 | |
| 828 906 407 | |
| 778 132 243 |