| Literature DB >> 26590407 |
Karen Clark1, Ilene Karsch-Mizrachi1, David J Lipman1, James Ostell1, Eric W Sayers2.
Abstract
GenBank(®) (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species. Recent developments include a new starting page for submitters, a shift toward using accession.version identifiers rather than GI numbers, a wizard for submitting 16S rRNA sequences, and an Identical Protein Report to address growing issues of data redundancy. GenBank organizes the sequence data received from individual laboratories and large-scale sequencing projects into 18 divisions, and GenBank staff assign unique accession.version identifiers upon data receipt. Most submitters use the web-based BankIt or standalone Sequin programs. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the nuccore, nucest, and nucgss databases of the Entrez retrieval system, which integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26590407 PMCID: PMC4702903 DOI: 10.1093/nar/gkv1276
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank Divisions (nucleotide base-pairs)
| Division | Description | Release 209 (8/2015) | Annual Increase (%)a |
|---|---|---|---|
| INV | Invertebrates | 15 413 731 414 | 399.5% |
| MAM | Other mammals | 3 592 838 191 | 277.5% |
| VRT | Other vertebrates | 6 643 601 831 | 108.4% |
| WGS | Whole genome shotgun data | 1 163 275 601 001 | 50.3% |
| PHG | Phages | 210 143 517 | 43.1% |
| BCT | Bacteria | 19 331 233 520 | 40.9% |
| PLN | Plants | 11 966 142 676 | 32.8% |
| TSA | Transcriptome shotgun data | 11 171 215 516 | 19.8% |
| VRL | Viruses | 2 493 936 092 | 17.3% |
| ENV | Environmental samples | 4 845 868 034 | 12.8% |
| HTG | High-throughput genomic | 27 057 268 218 | 6.6% |
| PAT | Patented sequences | 15 549 880 984 | 6.2% |
| GSS | Genome survey sequences | 25 607 093 540 | 5.4% |
| SYN | Synthetic | 1 001 954 270 | 2.6% |
| PRI | Primates | 6 808 335 498 | 1.7% |
| EST | Expressed sequence tags | 42 333 093 845 | 0.6% |
| ROD | Rodents | 4 482 375 973 | 0.3% |
| HTC | High-throughput cDNA | 673 910 306 | 0.3% |
| UNA | Unannotated | 187 511 | 0.1% |
| STS | Sequence tagged sites | 640 833 351 | 0.0% |
| TOTAL | All GenBank sequences | 1 363 099 245 288 | 45.0% |
aMeasured relative to Release 203 (8/2014).
Top organisms in GenBank (release 203)
| Organism | Non-WGS base pairs |
|---|---|
| 17 791 718 636 | |
| 10 004 995 614 | |
| 6 526 314 722 | |
| 5 412 338 175 | |
| 5 203 408 728 | |
| 4 895 555 549 | |
| 3 229 866 896 | |
| 3 151 064 646 | |
| 2 590 569 059 | |
| 1 937 727 565 | |
| 1 835 902 375 | |
| 1 744 606 771 | |
| 1 595 383 668 | |
| 1 435 471 103 | |
| 1 297 900 273 | |
| 1 267 676 263 | |
| 1 264 189 828 | |
| 1 249 270 365 | |
| 1 202 220 229 | |
| 1 200 826 354 |