| Literature DB >> 29140468 |
Dennis A Benson1, Mark Cavanaugh1, Karen Clark1, Ilene Karsch-Mizrachi1, James Ostell1, Kim D Pruitt1, Eric W Sayers1.
Abstract
GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 400 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects. Most submissions are made using BankIt, the National Center for Biotechnology Information (NCBI) Submission Portal, or the tool tbl2asn. GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to sequence identifiers, submission wizards for 16S and Influenza sequences, and an Identical Protein Groups resource. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29140468 PMCID: PMC5753231 DOI: 10.1093/nar/gkx1094
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank Divisions (nucleotide base-pairs)
| Division | Description | Release 221 (August 2017) | Annual increase (%)* |
|---|---|---|---|
| TSA | Transcriptome shotgun assembly | 167 045 663 417 | 61.55 |
| BCT | Bacteria | 39 102 455 601 | 47.70 |
| WGS | Whole genome shotgun data | 2 242 294 609 510 | 36.96 |
| VRT | Other vertebrates | 9 248 495 804 | 33.70 |
| PHG | Phages | 344 579 387 | 27.37 |
| VRL | Viruses | 3 482 143 321 | 17.09 |
| PLN | Plants | 16 782 598 904 | 14.12 |
| PAT | Patent sequences | 19 219 724 521 | 12.21 |
| SYN | Synthetic | 1 173 218 483 | 12.21 |
| ENV | Environmental samples | 5 590 106 999 | 7.12 |
| MAM | Other mammals | 3 872 932 998 | 6.18 |
| INV | Invertebrates | 17 226 520 457 | 6.07 |
| PRI | Primates | 8 024 647 559 | 2.85 |
| HTC | High-throughput cDNA | 696 583 486 | 2.08 |
| UNA | Unannotated | 208 576 | 1.75 |
| GSS | Genome survey sequences | 25 974 685 352 | 1.08 |
| ROD | Rodents | 4 520 933 672 | 0.42 |
| EST | Expressed sequence tags | 42 640 092 444 | 0.29 |
| HTG | High-throughput genomic | 27 646 512 131 | 0.06 |
| STS | Sequence tagged sites | 640 875 196 | 0.01 |
| TOTAL | All GenBank sequences | 2 635 527 587 818 | 35.52 |
* Measured relative to Release 215 (August 2016)
Figure 1.Size in base pairs of the five GenBank divisions with the highest annual growth rates in 2017. The growth of GenBank as a whole is also shown as ‘TOTAL’.
Top Organisms in GenBank
| Organism | Base pairs* | WGS Genomes** | Non-WGS Genomes** |
|---|---|---|---|
|
| 19 065 856 381 | 58 | 3 |
|
| 10 233 714 809 | 21 | 1 |
|
| 6 529 312 672 | 9 | 0 |
|
| 5 429 768 145 | 2 | 0 |
|
| 5 228 306 576 | 7 | 0 |
|
| 5 072 476 333 | 15 | 0 |
|
| 3 235 943 623 | 7 | 0 |
|
| 3 191 032 985 | 3 | 1 |
|
| 2 836 475 665 | 2 | 3 |
|
| 2 590 574 434 | 0 | 1 |
|
| 1 944 658 425 | 12 | 1 |
|
| 1 836 551 064 | 1 | 1 |
|
| 1803 951 183 | 8768 | 457 |
|
| 1 746 806 294 | 3 | 1 |
|
| 1 642 593 575 | 18 | 4 |
|
| 1 595 510 956 | 0 | 1 |
|
| 1 436165 842 | 1 | 0 |
|
| 1 337 270 420 | 5 | 0 |
|
| 1 264 448 364 | 0 | 1 |
|
| 1250 011 608 | 1 | 0 |
*Counts correspond to Release 221 and exclude sequences from chloroplasts, mitochondria, metagenomes, uncultured organisms, WGS, and TSA.
**Counts are as of 16 October 2017 and include all INSDC genomes.
Selected BLAST nucleotide databases*
| Database | Contents |
|---|---|
| nt | Taxonomic GenBank divisions |
| env_nt | ENV division |
| tsa_nt | TSA division |
| wgs | WGS sequences |
| 16SMicrobial | Bacterial and archaeal 16S rRNA |
*For more databases, see ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html