| Literature DB >> 18984618 |
Aron Marchler-Bauer1, John B Anderson, Farideh Chitsaz, Myra K Derbyshire, Carol DeWeese-Scott, Jessica H Fong, Lewis Y Geer, Renata C Geer, Noreen R Gonzales, Marc Gwadz, Siqian He, David I Hurwitz, John D Jackson, Zhaoxi Ke, Christopher J Lanczycki, Cynthia A Liebert, Chunlei Liu, Fu Lu, Shennan Lu, Gabriele H Marchler, Mikhail Mullokandov, James S Song, Asba Tasneem, Narmada Thanki, Roxanne A Yamashita, Dachuan Zhang, Naigong Zhang, Stephen H Bryant.
Abstract
NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either 'specific' (identifying molecular function with high confidence) or as 'non-specific' (identifying superfamily membership only).Entities:
Mesh:
Substances:
Year: 2008 PMID: 18984618 PMCID: PMC2686570 DOI: 10.1093/nar/gkn845
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.CDD-based annotation on a recently predicted protein sequence. This summary is the default concise version of the annotation view as generated by CD-Search, using precalculated alignment information. The view is divided into two panels, a graphical summary (items a through d) and a table detailing the matches (items e and f). The query sequence is represented as a gray bar in the top portion of the graphical summary, with a ruler indicating sequence length and coordinates. (a) ‘Specific hits’ to NCBI-curated domain models are indicated in a separate area below the query sequence, and the corresponding balloons are rendered in bright colors. The extent of the hits also defines annotations with conserved domain ‘Superfamilies’, which are indicated in the area below the ‘Specific hits’, and enclosed in boxes to indicate superfamily relationships. If the full display is selected, an area summarizing ‘Non-specific hits’ will be shown as well, and the boxes will be drawn to resolve superfamily relationships, where highest ranked match for each superfamily defines the extents of each corresponding box. ‘Non-specific hits’ and ‘Superfamilies’ balloons are rendered in pastel colors, with each homologous superfamily being assigned a separate color. (b) If a region of the query has no ‘Specific hits’, only the ‘Superfamilies’ annotation is shown in the concise default display. If a match to a conserved domain model is incomplete, as in this case, the balloon is rendered with a jagged edge to indicate a missing region. (c) In the default concise display, matches to multi-domain models are rendered as gray balloons in a separate area of the summary graph. Only the best-ranked nonoverlapping multi-domain models are shown. (d) Functional sites as annotated on NCBI-curated domain models are mapped to the query sequence. Sites are mapped from the highest ranked model only, and they are colored to correspond to their source model. When no ‘Specific hits’ are available, such as in (b), sites may still be mapped if they have been annotated on the parent model of a hierarchy that gave a ‘Non-specific hit’. Both conserved domain balloons and site annotations are hot-linked so that moving the mouse over the objects displays pop-ups with additional information, and so that clicking on the objects generates summary pages for the particular domain model, embedding the user query sequence in the alignment for further analysis, if applicable. (e) A table view summarizes what the graphical view indicates as well, listing E-values, multi-domain status and various identifiers for the conserved domain models identified as matches. The table rows can be expanded (f) to display detailed sequence alignment information between the query and the domain model's consensus sequence. An alignment of all sequences comprising a domain model, with or without the query sequence embedded, is accessible by clicking on the domain's balloon representation in the graphical summary or its unique numerical identifier (PSSM-Id) in the tabular summary, respectively.
| CDD | Database home page | |
| CDD help | CDD help documentation | |
| CDD FTP | CD models, prebuilt search databases | ftp://ftp.ncbi.nih.gov/pub/ mmdb/cdd |
| CD-Search | Live and precomputed RPS-BLAST | |
| CDTree/Cn3D | Domain hierarchy viewer and editor | |
| rpsblast | Stand-alone tool for searching databases of profile models, part of the NCBI toolkit distribution | ftp://ftp.ncbi.nlm.nih.gov/toolbox executables can be obtained from: |