| Literature DB >> 18927115 |
Kim D Pruitt1, Tatiana Tatusova, William Klimke, Donna R Maglott.
Abstract
NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18927115 PMCID: PMC2686572 DOI: 10.1093/nar/gkn721
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Annual growth of the RefSeq ftp release
| Release date | June 2003 | July 2004 | July 2005 | July 2006 | July 2007 | July 2008 |
|---|---|---|---|---|---|---|
| Release number | 1 | 6 | 12 | 18 | 24 | 30 |
| Number of species | 2005 | 2467 | 2969 | 3695 | 4511 | 5395 |
| Annual percent growth | 23 | 20 | 24 | 22 | 20 | |
| Number of proteins | 785 143 | 1 050 975 | 1 695 929 | 2 762 164 | 3 866 210 | 5 590 364 |
| Annual percent growth | 34 | 61 | 63 | 40 | 45 |
aRefSeq statistics are reported in the release notes provided for each release (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-notes/) and archives are also available (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-notes/archive/).
Annual growth of curated records
| Total | Microbial | Mammalian | |
|---|---|---|---|
| Release 24 (July 2007) | 210 503 | 113 640 | 27 069 |
| Release 30 (July 2008) | 265 002 | 156 834 | 34 027 |
| Annual growth (%) | 26 | 38 | 26 |
Percent curation of release 30 per taxonomic group
| Release node | Curated species | Curated proteins (%) |
|---|---|---|
| Complete | 41 | 5 |
| Fungi | 20 | 2 |
| Invertebrate | 89 | 20 |
| Microbial | 36 | 4 |
| Plant | 32 | 2 |
| Protozoa | 16 | 0 |
| Vertebrate_mammalian | 86 | 11 |
| Vertebrate_other | 92 | 10 |
| Viral | 17 | 10 |
aThe total number of species per RefSeq release node is calculated by counting distinct NCBI tax_ids annotated for all RefSeq records available in that node.
bRecords for drosophila species are tracked as curated by FlyBase by default.