| Literature DB >> 15608175 |
Aron Marchler-Bauer1, John B Anderson, Praveen F Cherukuri, Carol DeWeese-Scott, Lewis Y Geer, Marc Gwadz, Siqian He, David I Hurwitz, John D Jackson, Zhaoxi Ke, Christopher J Lanczycki, Cynthia A Liebert, Chunlei Liu, Fu Lu, Gabriele H Marchler, Mikhail Mullokandov, Benjamin A Shoemaker, Vahan Simonyan, James S Song, Paul A Thiessen, Roxanne A Yamashita, Jodie J Yin, Dachuan Zhang, Stephen H Bryant.
Abstract
The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein-protein queries submitted to NCBI's BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608175 PMCID: PMC540023 DOI: 10.1093/nar/gki069
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Pre-calculated or live CD-Search results are readily available for protein sequences in Entrez. Clicking on the colored bars will launch alignment displays that merge the query into the domain alignment model, for further analysis. Domain annotation bars with identical colors have been grouped into sets of ‘related’ domains, indicating that they share many of the sequence intervals hit with significant E-values. Annotation bars colored in gray have been classified as putative multi-domain models and are excluded from domain–domain neighboring. The lower half of the figure displays a graphical representation of a domain family hierarchy, giving the summary for one particular member (cd01366).
Figure 2Subfamily hierarchy of the Myosin/Kinesin motor domains, the corresponding sequence tree and taxonomy display. One subfamily has been highlighted (KISc_C_terminal), and the highlights are reflected in both the sequence tree and taxonomy view. It is evident that members of this subfamily form a distinct node in this tree calculated by the neighbor-joining algorithm. It is also evident that members of this subfamily span a variety of taxa, suggesting that this particular type of domain was already present in their common ancestor's genome.