| Literature DB >> 34850943 |
Eric W Sayers1, Mark Cavanaugh1, Karen Clark1, Kim D Pruitt1, Conrad L Schoch1, Stephen T Sherry1, Ilene Karsch-Mizrachi1.
Abstract
GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 15.3 trillion base pairs from over 2.5 billion nucleotide sequences for 504 000 formally described species. Recent updates include resources for data from the SARS-CoV-2 virus, including a SARS-CoV-2 landing page, NCBI Datasets, NCBI Virus and the Submission Portal. We also discuss upcoming changes to GI identifiers, a new data management interface for BioProject, and advice for providing contextual metadata in submissions. Published by Oxford University Press on behalf of Nucleic Acids Research 2021.Entities:
Mesh:
Year: 2022 PMID: 34850943 PMCID: PMC8690257 DOI: 10.1093/nar/gkab1135
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.SARS-CoV-2 Data Hub in the NCBI Virus resource.
Growth of GenBank Divisions
| Division | Description | Base pairsa | Annual increaseb |
|---|---|---|---|
| VRL | Viruses | 39 351 597 469 | 575.68% |
| UNA | Unannotated | 4 421 782 | 550.93% |
| INV | Invertebrates | 108 680 334 593 | 450.00% |
| ROD | Rodents | 23 336 550 435 | 93.02% |
| PRI | Primates | 15 165 437 356 | 72.97% |
| WGS | Whole genome shotgun data | 13 888 187 863 722 | 57.08% |
| TLS | Targeted Loci Studies | 39 930 167 315 | 43.50% |
| MAM | Other mammals | 28 568 850 588 | 37.06% |
| VRT | Other vertebrates | 85 320 979 451 | 34.22% |
| BCT | Bacteria | 130 518 385 589 | 32.07% |
| PLN | Plants | 350 590 744 188 | 30.12% |
| TSA | Transcriptome shotgun data | 454 757 992 932 | 19.31% |
| PHG | Phages | 935 884 237 | 19.59% |
| PAT | Patent sequences | 29 588 418 021 | 11.85% |
| ENV | Environmental samples | 7 394 414 660 | 9.46% |
| SYN | Synthetic | 7 994 601 379 | 0.78% |
| HTC | High-throughput cDNA | 737 423 641 | 0.57% |
| HTG | High-throughput genomic | 27 800 219 072 | 0.07% |
| EST | Expressed sequence tags | 43 324 455 796 | 0.05% |
| GSS | Genome survey sequences | 26 380 049 011 | 0.01% |
| STS | Sequence tagged sites | 640 923 137 | 0.00% |
| TOTAL | All GenBank sequences | 15 309 209 714 374 | 54.79% |
aRelease 245 (8/2021).
bRelative to release 239 (8/2020).
Figure 2.Growth of SARS-CoV-2 sequence data in GenBank. Each data point represents the cumulative number of records (left axis) or base pairs (right axis) at each date.