Kai Ye1, Gert Vriend, Adriaan P IJzerman. 1. Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden University, Einsteinweg 55, 2333CC, Leiden, The Netherlands. k.ye@lacdr.leidenuniv.nl
Abstract
MOTIVATION: Recent advances in sequencing techniques have yielded enormous amounts of protein sequence data from various species. This large dataset allows sequence comparison between paralogous and orthologous proteins to identify motifs or functional positions that account for the differences of functional subgroups ('specificity' positions). Algorithms such as SDPpred and the two-entropies analysis (TEA) have been developed to detect such specificity positions from a multiple sequence alignment (MSA) grouped into classes according to certain biological functions. Other algorithms such as TreeDet compute a classification and then predict specificity positions associated with it. However, there are still many unresolved questions: Was the optimal subdivision of a protein family achieved? Do the definitions at different levels of the phylogenetic tree affect the prediction of specificity positions? Can the whole phylogenetic tree be used instead of only one level in it to predict specificity positions? RESULTS: Here we present a novel method, TEA-O (Two-entropies analysis-Objective), to trace the evolutionary pressure from the root to the branches of the phylogenetic tree. At each level of the tree, a TEA plot is produced to capture the signal of the evolutionary pressure. A consensus TEA-O plot is composed from the whole series of plots to provide a condensed representation. Positions related to functions that evolved early (conserved) or later (specificity) are close to the lower-left or upper-left corner of the TEA-O plot, respectively. This novel approach allows an unbiased, user-independent, analysis of residue relevance in a protein family. We compared our TEA-O method with various algorithms using both synthetic and real protein sequences. The results show that our method is robust, sensitive to subtle differences in evolutionary pressure during evolution and comprehensive because all positions in the MSA are presented in the consensus plot. AVAILABILITY: All computer programs and datasets used in this work are available at http://nava.liacs.nl/kye/TEA-O/ for academic use.
MOTIVATION: Recent advances in sequencing techniques have yielded enormous amounts of protein sequence data from various species. This large dataset allows sequence comparison between paralogous and orthologous proteins to identify motifs or functional positions that account for the differences of functional subgroups ('specificity' positions). Algorithms such as SDPpred and the two-entropies analysis (TEA) have been developed to detect such specificity positions from a multiple sequence alignment (MSA) grouped into classes according to certain biological functions. Other algorithms such as TreeDet compute a classification and then predict specificity positions associated with it. However, there are still many unresolved questions: Was the optimal subdivision of a protein family achieved? Do the definitions at different levels of the phylogenetic tree affect the prediction of specificity positions? Can the whole phylogenetic tree be used instead of only one level in it to predict specificity positions? RESULTS: Here we present a novel method, TEA-O (Two-entropies analysis-Objective), to trace the evolutionary pressure from the root to the branches of the phylogenetic tree. At each level of the tree, a TEA plot is produced to capture the signal of the evolutionary pressure. A consensus TEA-O plot is composed from the whole series of plots to provide a condensed representation. Positions related to functions that evolved early (conserved) or later (specificity) are close to the lower-left or upper-left corner of the TEA-O plot, respectively. This novel approach allows an unbiased, user-independent, analysis of residue relevance in a protein family. We compared our TEA-O method with various algorithms using both synthetic and real protein sequences. The results show that our method is robust, sensitive to subtle differences in evolutionary pressure during evolution and comprehensive because all positions in the MSA are presented in the consensus plot. AVAILABILITY: All computer programs and datasets used in this work are available at http://nava.liacs.nl/kye/TEA-O/ for academic use.
Authors: Kathryn D Fenton; Kathleen M Meneely; Tiffany Wu; Tyler A Martin; Liskin Swint-Kruse; Aron W Fenton; Audrey L Lamb Journal: Protein Sci Date: 2021-11-12 Impact factor: 6.725
Authors: Filipa L Sousa; Daniel J Parente; David L Shis; Jacob A Hessman; Allen Chazelle; Matthew R Bennett; Sarah A Teichmann; Liskin Swint-Kruse Journal: J Mol Biol Date: 2015-09-25 Impact factor: 5.469