| Literature DB >> 21109532 |
Aron Marchler-Bauer1, Shennan Lu, John B Anderson, Farideh Chitsaz, Myra K Derbyshire, Carol DeWeese-Scott, Jessica H Fong, Lewis Y Geer, Renata C Geer, Noreen R Gonzales, Marc Gwadz, David I Hurwitz, John D Jackson, Zhaoxi Ke, Christopher J Lanczycki, Fu Lu, Gabriele H Marchler, Mikhail Mullokandov, Marina V Omelchenko, Cynthia L Robertson, James S Song, Narmada Thanki, Roxanne A Yamashita, Dachuan Zhang, Naigong Zhang, Chanjuan Zheng, Stephen H Bryant.
Abstract
NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21109532 PMCID: PMC3013737 DOI: 10.1093/nar/gkq1189
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Conserved domain annotation on a well-characterized protein sequence. Shown here is the default concise view generated by the CD-Search tool, using pre-calculated alignment information. The view is divided into two panels: a graphical summary and a table detailing the individual matches. The query sequence coordinates are indicated on a gray bar in the top portion of the graphical summary. ‘Specific hits’ to NCBI-curated domain models are positioned in a separate area below the query sequence, with corresponding balloons rendered in saturated colors. The extent of the best-scoring hit for a region on the query also determines the annotation with the corresponding conserved domain ‘Superfamily’. ‘Superfamilies’ are positioned in the area below the ‘Specific hits’, and together these are enclosed in boxes to indicate superfamily membership of the NCBI-curated models. If the full (detailed) results display is selected, an area summarizing ‘Non-specific hits’ will be shown as well, and the corresponding boxes will be drawn so as to resolve their superfamily relationships; the highest ranked match for each superfamily defines the extents of the corresponding box. ‘Non-specific hits’ and ‘Superfamily’ balloons are rendered in pastel colors, with each superfamily being assigned a separate color. Matches to ‘multi-domain’ models are rendered as gray balloons in a separate area of the summary graph. Only the best-ranked non-overlapping multi-domain models are shown. Functional sites, as annotated on NCBI-curated domain models, are mapped to the query sequence and depicted as triangles. Sites are mapped from the highest ranked model only, and they are colored according to their source. Both conserved domain balloons and site annotations are hot-linked, so that moving the mouse over the objects displays additional information, and so that clicking on the objects launches conserved domain summary pages for the particular domain model, embedding the user query sequence in the alignment for further analysis, if applicable. A tabular view below the graphical summary lists E-values, multi-domain status and various identifiers for the conserved domain models identified as matches. The table rows can be expanded to display a detailed pair-wise sequence alignment between the query sequence and the domain model’s consensus sequence. An alignment of all sequences comprising a domain model, with or without the query sequence embedded, is accessible by clicking on the domain’s balloon representation in the graphical summary or its unique accession in the tabular summary, respectively.
Figure 2.The web-interface to Batch CD-Search. An input dialogue lets the user specify a set of protein queries or upload a corresponding file. The preliminary results page (not shown here) provides controls for downloading results in a variety of formats. The sample download format featured here lists one annotation per line, specifying the protein query, the type of domain hit (specific hit, superfamily or multidomain), from–to intervals on the query, E-value and score and the domain model’s name and accession. The Batch CD-Search help document describes the additional download options and formats available.
URLs and other resources associated with the CDD project
| CDD | Database home page | |
| CDD help | CDD help documentation | |
| CDD FTP | CD models and alignments, pre-built search databases | |
| CD-Search | Live and pre-computed RPS-BLAST | |
| Batch CD-Search | Live and pre-computed RPS-BLAST | |
| CDTree/Cn3D | Domain hierarchy viewer and editor | |
| rpsblast | Stand-alone tool for searching databases of profile models, part of the NCBI toolkit distribution | executables can be obtained from: |