| Literature DB >> 17130148 |
Kim D Pruitt1, Tatiana Tatusova, Donna R Maglott.
Abstract
NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17130148 PMCID: PMC1716718 DOI: 10.1093/nar/gkl842
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Size of RefSeq release 19 per category
| Release 19, node | No. of species | Records per molecule type | ||
|---|---|---|---|---|
| Genomic | RNA | Protein | ||
| Complete | 3774 | 725 746 | 686 689 | 2 879 860 |
| Fungi | 69 | 3957 | 114 598 | 121 302 |
| Invertebrate | 231 | 212 698 | 99 939 | 101 902 |
| Microbial | 917 | 35 877 | 0 | 2 109 125 |
| Mitochondrion | 969 | 977 | 0 | 14 486 |
| Plant | 67 | 313 | 66 512 | 76 503 |
| Plasmid | 475 | 908 | 0 | 45 744 |
| Plastid | 71 | 71 | 44 | 7079 |
| Protozoa | 70 | 63 473 | 110 128 | 119 128 |
| Vertebrate_mammalian | 180 | 344 435 | 260 668 | 244 632 |
| Vertebrate_other | 459 | 62 454 | 54 092 | 60 049 |
| Viral | 1743 | 2515 | 0 | 49 598 |
Figure 1Entrez query results include records from RefSeq and GenBank (nucleotide queries) or GenPept (protein queries). (A) Users who register for MyNCBI can log on to access several services including customizing results displays. The display illustrates that user pruitt is logged in to MyNCBI. (B) Results are categorized into Tabs. The query for ‘adenylosuccinate lyase’ returns a total of 1545 records (first tab), 715 of which are RefSeq records (last tab). The display illustrates that additional tabs were added to the display to report result subsets for Bacteria and for proteins that have links to the NCBI Map Viewer. (C) Numerous links are calculated between records and can be accessed via the default ‘Links’ menu, or as shown here, the complete set of links can be shown for each record by selecting the option to display links as ‘Plain Links’ in MyNCBI. The link to ‘PubMed (RefSeq)’ returns all publications that are associated with the Entrez Gene record and thus may include a more comprehensive bibliography than that annotated on the RefSeq record.
Number of curated protein records for select subsets
| Total | Bacteria | Plant | Viral | Coelomataa (no.) | Human | Mouse | |
|---|---|---|---|---|---|---|---|
| No. of recordsb | 2 762 164 | 1 990 849 | 72 696 | 48 799 | 78 550 | 24 874 | 19 629 |
| No. of curated | 208 783 | 120 230 | 2398 | 6472 | 20 119 | 16 049 | 2390 |
| % Curated | 7.56 | 6.04 | 3.3 | 13.26 | 25.61 | 64.52 | 12.18 |
aTranscript and protein records are curated independently of submitted annotated genomes by NCBI staff for the following organisms: Tribolium castaneum, Bombyx mori, Apis mellifera, Strongylocentrotus purpuratus, Ciona intestinalis, Danio rerio, Xenopus tropicalis, Gallus gallus, Macaca mulatta, Pan troglodytes, Homo sapiens, Canis familiaris, Felis catus, Sus scrofa, Bos Taurus, Ovis aries, Mus musculus, Rattus norvegicus, Monodelphis domestica and Takifugu rubripes.
bCuration counts per category (columns) reflect the total curation effort as contributed by either collaborating groups or NCBI staff. Curated records are annotated with a status of VALIDATED or REVIEWED.