| Literature DB >> 15980439 |
Maria Novatchkova1, Michael Wildpaner, Dieter Schweizer, Frank Eisenhaber.
Abstract
The analysis of taxonomic distribution and lineage-specific variation of domains and domain combinations is an important step in the assessment of their functional roles and potential interoperability. In the study of eukaryote sequence sets with many multi-domain proteins, it can become laborious to evaluate the phylogenetic context of the many occurring domains and their mutual relationships. PhyloDome is an answer to that problem. It provides a fast overview on the taxonomic spreading and potential interrelation of domains that are either given as a list of names and PFAM/SMART accessions or derived from a user-defined set of sequences. This taxonomic distribution analysis can be helpful in protein function and interaction assignment as the comparative study of potential Hedgehog pathway members in C.elegans shows. An implementation of PhyloDome is accessible for public use as a WWW-Service at http://mendel.imp.univie.ac.at/phylodome/. Software components are available on request.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15980439 PMCID: PMC1160134 DOI: 10.1093/nar/gki373
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1PhyloDome representation of sequence sets exemplified by the analysis of Hedgehog pathway proteins in D.melanogaster and C.elegans. The Hint and Patched domains, which are implicated in cholesterol modification and sensing, show an over-representation in worm, indicated by the magenta fraction in the domain-representation, and are detectable throughout metazoans. In contrast, the Hedgehog signaling (HH signal) and SUFU domains, present in coelomata, and the Ground domain in worm are found in taxonomically exclusive groups. This suggests that not only Hedgehog signaling itself, but also cholesterol signaling is shared between D.melanogaster and C.elegans Hedgehog and Patched-homologs. Domains are shown as bars divided into colored areas proportional to their occurrence found in proteomes of fully sequenced eukaryote genomes. The color code is repeated in the tabulated phylogenetic profiles: Gt, Guillardia theta; Pf, Plasmodium falciparum; Ch, Cryptosporidium hominis; Tg, Toxoplasma gondii; Tb, Trypanosoma brucei; Tc, Trypanosoma cruzi; Lm, Leishmania major; Cm, Cyanidioschyzon merolae; Cr, Chlamydomonas reinhardtii; At, Arabidopsis thaliana; Os, Oryza sativa; Ec, Encephalitozoon cuniculi; Sp, Schizosaccharomyces pombe; En, Emericella nidulans; Nc, Neurospora crassa; Sc, Saccharomyces cerevisiae; Cg, Candida glabrata; Ce, Caenorhabditis elegans; Cb, Caenorhabditis briggsae; Am, Apis mellifera; Ag, Anopheles gambiae; Dm, Drosophila melanogaster; Ci, Ciona intestinalis; Mm, Mus musculus; Hs, Homo sapiens.
Figure 2Taxonomic correlation and functional link between domain pairs. (a) The distribution of multi-domain proteins with physically linked domain pairs is shown with respect to the taxonomic correlation coefficient (cc) (only reliable physical links between non-homologous domains found in more than three sequences across all species have been considered). (b) Diagram showing the fraction of physically associated domain types among the taxonomically correlating domain pairs (with at least one domain from a multi-domain protein). (c) Average functional distance between correlating domain pairs estimated by the minimal number of vertices separating them within the GO tree. These data show that a functional relationship between domains is associated with high correlation of their respective taxonomic distributions. Although the performance of the various correlation coefficients is similar, the Pearson cc appears slightly more predictive and is, therefore, used by PhyloDome. (dark-blue, Pearson cc of taxonomic distribution; red, Pearson cc of taxonomic profile; gray, Spearman cc of taxonomic distribution).