| Literature DB >> 30365038 |
Eric W Sayers1, Mark Cavanaugh1, Karen Clark1, James Ostell1, Kim D Pruitt1, Ilene Karsch-Mizrachi1.
Abstract
GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 420 000 formally described species. Most GenBank submissions are made using BankIt, the NCBI Submission Portal, or the tool tbl2asn, and are obtained from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include an expansion of sequence identifier formats to accommodate expected database growth, submission wizards for ribosomal RNA, and the transfer of Expressed Sequence Tag (EST) and Genome Survey Sequence (GSS) data into the Nucleotide database.Entities:
Mesh:
Year: 2019 PMID: 30365038 PMCID: PMC6323954 DOI: 10.1093/nar/gky989
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth of GenBank divisions (nucleotide base-pairs)
| Division | Description | Release 227 (August 2018) | Annual increase (%)a |
|---|---|---|---|
| MAM | Other mammals | 6 214 774 850 | 60.47% |
| WGS | Whole genome shotgun data | 3 204 855 013 281 | 42.93% |
| UNA | Unannotated | 296 706 | 42.25% |
| PLN | Plants | 23 027 832 426 | 37.21% |
| BCT | Bacteria | 53 541 127 504 | 36.93% |
| TSA | Transcriptome shotgun data | 225 520 004 678 | 35.01% |
| PHG | Phages | 463 029 085 | 34.38% |
| VRL | Viruses | 4 073 816 676 | 16.99% |
| PAT | Patent sequences | 22 019 723 131 | 14.57% |
| VRT | Other vertebrates | 10 441 689 546 | 12.90% |
| ENV | Environmental samples | 5 818 999 756 | 4.09% |
| HTC | High-throughput cDNA | 721 454 983 | 3.57% |
| PRI | Primates | 8 262 441 252 | 2.96% |
| SYN | Synthetic | 1 192 279 390 | 1.62% |
| GSS | Genome survey sequences | 26 339 143 098 | 1.40% |
| EST | Expressed sequence tags | 42 988 632 150 | 0.82% |
| HTG | High-throughput genomic | 27 770 730 435 | 0.45% |
| ROD | Rodents | 4 534 815 151 | 0.31% |
| STS | Sequence tagged sites | 640 879 986 | 0.00% |
| INV | Invertebratesb | 8 597 126 159 | −50.09% |
| TOTAL | All GenBank sequences | 3 677 023 810 243 | 39.52% |
aMeasured relative to Release 221 (August 2017).
bThe decrease in INV data resulted from the suppression of 36 nematode-related genomes. See the release notes for Release 227 for more details (ftp.ncbi.nlm.nih.gov/genbank/release.notes/gb227.release.notes).
Top organisms in GenBank (Release 227)
| Organism | Base pairsa |
|---|---|
|
| 19 752 523 722 |
|
| 10 246 475 076 |
|
| 6 530 046 440 |
|
| 5 431 692 037 |
|
| 5 245 788 885 |
|
| 5 075 446 882 |
|
| 3 237 283 130 |
|
| 3 220 757 391 |
|
| 3 191 415 637 |
|
| 2 836 938 628 |
|
| 2 682 391 941 |
|
| 2 636 490 116 |
|
| 2 590 574 434 |
|
| 2 572 291 998 |
|
| 2 290 216 303 |
|
| 1 836 731 087 |
|
| 1 727 115 789 |
|
| 1 595 510 956 |
|
| 1 456 386 736 |
|
| 1 436 247 256 |
aExcludes sequences from chloroplasts, mitochondria, metagenomes, uncultured organisms, WGS, TSA, and the CON division.
Selected BLAST nucleotide databasesa
| Database | Contents |
|---|---|
| nr/nt | Taxonomic GenBank divisions |
| env_nt | ENV division |
| tsa_nt | TSA division |
| wgs | WGS sequences |
| 16SMicrobial | Bacterial and archaeal 16S rRNA |
aFor more databases, see ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html.