| Literature DB >> 24217909 |
Catherine M Farrell1, Nuala A O'Leary, Rachel A Harte, Jane E Loveland, Laurens G Wilming, Craig Wallin, Mark Diekhans, Daniel Barrell, Stephen M J Searle, Bronwen Aken, Susan M Hiatt, Adam Frankish, Marie-Marthe Suner, Bhanu Rajput, Charles A Steward, Garth R Brown, Ruth Bennett, Michael Murphy, Wendy Wu, Mike P Kay, Jennifer Hart, Jeena Rajan, Janet Weber, Catherine Snow, Lillian D Riddick, Toby Hunt, David Webb, Mark Thomas, Pamela Tamez, Sanjida H Rangwala, Kelly M McGarvey, Shashikant Pujar, Andrei Shkeda, Jonathan M Mudge, Jose M Gonzalez, James G R Gilbert, Stephen J Trevanion, Robert Baertsch, Jennifer L Harrow, Tim Hubbard, James M Ostell, David Haussler, Kim D Pruitt.
Abstract
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24217909 PMCID: PMC3965069 DOI: 10.1093/nar/gkt1059
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
CCDS release information for human and mouse
| Species | CCDS release | NCBI annotation release | Ensembl annotation release | Assembly name | Assembly ID | CCDS release date |
|---|---|---|---|---|---|---|
| 1 | 35.1 | 23 | NCBI35 | GCF_000001405.11 | 02-03-2005 | |
| 3 | 36.2 | 41 | NCBI36 | GCF_000001405.12 | 26-02-2007 | |
| 5 | 36.3 | 47 | NCBI36 | GCF_000001405.12 | 30-04-2008 | |
| 6 | 37.1 | 55 | GRCh37 | GCF_000001405.13 | 02-09-2009 | |
| 8 | 37.2 | 62 | GRCh37.p2 | GCF_000001405.14 | 20-04-2011 | |
| 9 | 37.3 | 64 | GRCh37.p5 | GCF_000001405.17 | 07-09-2011 | |
| 11 | 103 | 68 | GRCh37.p9 | GCF_000001405.21 | 25-10-2012 | |
| 12 | 104 | 71 | GRCh37.p10 | GCF_000001405.22 | 30-04-2013 | |
| 2 | 36.1 | 39 | MGSCv36 | GCF_000001635.15 | 10-10-2006 | |
| 4 | 37.1 | 47 | MGSCv37 | GCF_000001635.16 | 28-11-2007 | |
| 7 | 37.2 | 61 | MGSCv37 | GCF_000001635.18 | 24-01-2011 | |
| 10 | 38.1 | 68 | GRCm38 | GCF_000001635.20 | 14-08-2012 | |
| 13 | 103 | 72 | GRCm38.p1 | GCF_000001635.21 | 05-08-2013 |
Figure 1.CCDS release statistics for human and mouse. The Y-axis indicates counts of CCDS IDs or Gene IDs and the X-axis shows CCDS release dates. (A) Growth in the number of CCDS IDs at each release date (Table 1) compared with the number of Gene IDs with at least one protein isoform in the CCDS dataset. (B) Growth in the number of Gene IDs with more than one protein isoform in the CCDS dataset. All data used to generate the graphs are available in the CCDS Releases and Statistics page (http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=SHOW_STATISTICS) on the website.
Figure 2.CCDS database screenshot showing the partial report page for CCDS59435. This CCDS ID is associated with the ‘Inferred exon combination’ attribute, as explained in the Public Note. The exon locations table is partially visible in the Chromosomal Locations section. The blue ‘N’ icon circled in red links to a graphical view of the entire genomic region in NCBI’s Nucleotide database (boxed inset image). The blue ‘N’ icons boxed in red link to the sequences of each individual exon in the Nucleotide database.
Attribute types currently found in CCDS reports
| Attribute | Description | Count | CCDS example |
|---|---|---|---|
| CDS uses downstream AUG | The annotated start codon is not the first AUG found in-frame with the CDS | 482 | 816.1 |
| Contains selenocysteine | A UGA codon encodes a selenocysteine residue instead of resulting in translation termination | 56 | 42 457.1 |
| Inferred exon combination | The CDS exon combination lacks full-length transcript support in INSDC databases ( | 160 | 44 873.1 |
| NonAUG initiation codon | The annotated start codon is not AUG | 73 | 4907.2 |
| Nonsense-mediated decay (NMD) candidate | The transcript may escape NMD ( | 152 | 37 108.1 |
| Ribosomal slippage (translational frameshift) | The CDS contains an experimentally verified translational frameshift due to ribosomal slippage | 5 | 58 639.1 |
aCounts reflect data available as of 30 August 2013.