| Literature DB >> 35536332 |
Fei Ji1,2, Gracia Bonilla1,2, Rustem Krykbaev1, Gary Ruvkun1,2, Yuval Tabach3, Ruslan I Sadreyev1,4.
Abstract
Proteins with similar phylogenetic patterns of conservation or loss across evolutionary taxa are strong candidates to work in the same cellular pathways or engage in physical or functional interactions. Our previously published tools implemented our method of normalized phylogenetic sequence profiling to detect functional associations between non-homologous proteins. However, many proteins consist of multiple protein domains subjected to different selective pressures, so using protein domain as the unit of analysis improves the detection of similar phylogenetic patterns. Here we analyze sequence conservation patterns across the whole tree of life for every protein domain from a set of widely studied organisms. The resulting new interactive webserver, DEPCOD (DEtection of Phylogenetically COrrelated Domains), performs searches with either a selected pre-defined protein domain or a user-supplied sequence as a query to detect other domains from the same organism that have similar conservation patterns. Top similarities on two evolutionary scales (the whole tree of life or eukaryotic genomes) are displayed along with known protein interactions and shared complexes, pathway enrichment among the hits, and detailed visualization of sources of detected similarities. DEPCOD reveals functional relationships between often non-homologous domains that could not be detected using whole-protein sequences. The web server is accessible at http://genetics.mgh.harvard.edu/DEPCOD.Entities:
Year: 2022 PMID: 35536332 PMCID: PMC9252791 DOI: 10.1093/nar/gkac349
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.(A) Example of DEPCOD output: the heatmap of top eukaryotic phylogenetic profiles most similar to that of Protein Kinase domain from human MAP2K6 protein as a query. Rows, top human domain hits, with query domain on top. Columns, individual species within a chosen evolutionary range (eukaryotes in this example), with taxonomic tree of these genomes shown on top. Hues of blue indicate normalized sequence similarity scores across all species to the human domain. Two yellow-brown columns on the left: Pearson correlation coefficient (left) and the corresponding statistical significance Z-score (second left) for the comparison between the given profile and the profile for the query domain (top row). The third column (‘Correlated and significant’) highlights the most confident hits that satisfy the cutoffs of both Pearson R and Z-score. Next three white-green columns on the left: BioGRID and Hu.Map scores for the physical interactions between the corresponding proteins and Hu.Map score for sharing the same protein complex. (B) The barplot of statistical significance of functional enrichment among the top domain hits (-log10 of Benjamini-Hochberg False Discovery Rate) based on functional gene sets from the KEGG database. (C) Evolutionary rearrangements of domain architecture between different species reduce the similarity of whole-protein phylogenetic profiles. PFAM domain architecture for human STARD9 protein (UniProt ID Q9P2P6) and corresponding proteins in mouse (Stard9, UniProt ID Q80TF6), fly (Klp98A, UniProt ID Q9VB25), and worm (unc-104, UniProt ID P23678). Kinesin domain at the N terminus is highlighted in orange. Changes in composition of other domains between species obstruct the detection of profile similarity using whole-protein sequences. (D) As a result, our previous PhyloGene method based on whole-protein sequences was not able to produce strong correlation estimates between phylogenetic profiles of STARD9 and two functionally related non-homologous proteins EIF4A3 and RAN, whereas DEPCOD detected strong correlation between individual domains of these proteins. Heatmaps of Pearson correlation coefficients for whole-protein sequences (PhyloGene, left) compared to individual domain sequences (DEPCOD, right). (E) Example of increased correlation between phylogenetic profiles when these profiles were expanded from eukaryotic species to the whole tree of life. Heatmaps of all-to-all Pearson correlation coefficients between tRNA synthase 1 domain of human LARS protein as a query and domains of functionally associated proteins POLR3A, POLR1B and RIOK1. Phylogenetic profiles based on eukaryotes had only modest correlations (R < 0.5) for most domain pairs (left), which increased to much higher levels when species across the whole tree of life were used (right). (F) Precision/recall plots comparing the accuracy of detecting functional protein associations using phylogenetic profiles based on whole proteins (PhyloGene) and on protein domains (DEPCOD). KEGG pathways were used as a benchmarking reference, with the definition of a true positive hit based on sharing the same KEGG pathway with the query. DEPCOD has a higher accuracy than PhyloGene. DEPCOD mode with phylogenetic profiles based on the whole tree of life (DEPCOD All) has a higher accuracy than the mode based on eukaryotes only (DEPCOD Euk).
Figure 2.Comprehensive analysis of DEPCOD phylogenetic profiles among all human protein domains reveals functionally related clusters with specific patterns of evolutionary history. Heatmap of phylogenetic profiles for a subset of all human domains. Rows, domains clustered by the similarity of their phylogenetic profiles (hues of blue) across eukaryotic species (columns, with taxonomic tree of genomes shown on top). Functional protein categories enriched in these clusters are indicated on the right.