| Literature DB >> 27899564 |
Dennis A Benson1, Mark Cavanaugh1, Karen Clark1, Ilene Karsch-Mizrachi1, David J Lipman1, James Ostell1, Eric W Sayers2.
Abstract
GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 370 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or the NCBI Submission Portal. GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to policies regarding sequence identifiers, an improved 16S submission wizard, targeted loci studies, the ability to submit methylation and BioNano mapping files, and a database of anti-microbial resistance genes. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27899564 PMCID: PMC5210553 DOI: 10.1093/nar/gkw1070
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank divisions (nucleotide base-pairs)
| Division | Description | Release 215 (August 2016) | Annual Increase (%)* |
|---|---|---|---|
| TSA | Transcriptome shotgun data | 103 399 724 586 | 49.1% |
| WGS | Whole genome shotgun data | 1 637 224 970 324 | 40.7% |
| BCT | Bacteria | 26 474 028 571 | 36.9% |
| PHG | Phages | 270 541 687 | 28.7% |
| PLN | Plants | 14 705 679 094 | 22.9% |
| VRL | Viruses | 2 973 938 989 | 19.2% |
| PRI | Primates | 7 802 428 126 | 14.6% |
| PAT | Patent sequences | 17 128 458 325 | 10.2% |
| UNA | Unannotated | 204 984 | 9.3% |
| ENV | Environmental samples | 5 218 628 157 | 7.7% |
| INV | Invertebrates | 16 241 123 317 | 5.4% |
| SYN | Synthetic | 1 045 567 653 | 4.4% |
| VRT | Other vertebrates | 6 917 600 814 | 4.1% |
| HTG | High-throughput genomic | 27 630 729 177 | 2.1% |
| MAM | Other mammals | 3 647 546 848 | 1.5% |
| HTC | High-throughput cDNA | 682 400 482 | 1.3% |
| ROD | Rodents | 4 502 193 236 | 0.4% |
| EST | Expressed sequence tags | 42 516 725 239 | 0.4% |
| GSS | Genome survey sequences | 25 696 517 526 | 0.3% |
| STS | Sequence tagged sites | 640 833 351 | 0.0% |
| TOTAL | All GenBank sequences | 1 944 719 840 486 | 36.8% |
*Measured relative to Release 209 (8/2015).
Top organisms in GenBank (release 215)
| Organism | Base pairs* |
|---|---|
| 18 313 373 647 | |
| 10 031 175 251 | |
| 6 528 259 145 | |
| 5 414 550 206 | |
| 5 207 478 336 | |
| 4 896 632 524 | |
| 3 235 262 275 | |
| 3 183 146 925 | |
| 2 590 574 434 | |
| 1 941 609 064 | |
| 1 836 265 727 | |
| 1 745 709 667 | |
| 1 641 275 254 | |
| 1 595 400 865 | |
| 1 436 120 930 | |
| 1 335 500 855 | |
| 1 264 190 782 | |
| 1 250 099 171 | |
| 1 204 613 994 | |
| 1 202 887 576 |
*Excludes sequences from chloroplasts, mitochondria, metagenomes, uncultured organisms, WGS and TSA.