Literature DB >> 23748951

CoPAP: Coevolution of presence-absence patterns.

Ofir Cohen1, Haim Ashkenazy, Eli Levy Karin, David Burstein, Tal Pupko.   

Abstract

Evolutionary analysis of phyletic patterns (phylogenetic profiles) is widely used in biology, representing presence or absence of characters such as genes, restriction sites, introns, indels and methylation sites. The phyletic pattern observed in extant genomes is the result of ancestral gain and loss events along the phylogenetic tree. Here we present CoPAP (coevolution of presence-absence patterns), a user-friendly web server, which performs accurate inference of coevolving characters as manifested by co-occurring gains and losses. CoPAP uses state-of-the-art probabilistic methodologies to infer coevolution and allows for advanced network analysis and visualization. We developed a platform for comparing different algorithms that detect coevolution, which includes simulated data with pairs of coevolving sites and independent sites. Using these simulated data we demonstrate that CoPAP performance is higher than alternative methods. We exemplify CoPAP utility by analyzing coevolution among thousands of bacterial genes across 681 genomes. Clusters of coevolving genes that were detected using our method largely coincide with known biosynthesis pathways and cellular modules, thus exhibiting the capability of CoPAP to infer biologically meaningful interactions. CoPAP is freely available for use at http://copap.tau.ac.il/.

Entities:  

Mesh:

Year:  2013        PMID: 23748951      PMCID: PMC3692100          DOI: 10.1093/nar/gkt471

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

A phyletic pattern (also termed phylogenetic profile) is a binary-coded data set in which presence (‘1’) versus absence (‘0’) of homologous characters is denoted across species. This 0/1 matrix is equivalent to a gap-free multiple sequence alignment, in which rows correspond to species and columns correspond to binary characters. Phyletic pattern representation is useful for evolutionary analysis of various types of data including gene families (1–3), restriction sites (4–6), indels (7,8), introns (9,10) and morphological characters [reviewed in (11)]. Methods for evolutionary analysis of phyletic patterns have progressed from the traditional parsimony (1) to likelihood models, in which the dynamics of gain (0→1) and loss (1→0) events are assumed to follow a continuous-time Markov process (9,10,12,13). Recently, we have implemented a stochastic-mapping approach that uses advanced evolutionary mixture models to accurately infer branch-site specific events (14). We have shown that our stochastic-mapping approach is over two folds more accurate in detecting branch-specific events compared with the prevalent maximum-parsimony approach (15). Previous studies have shown that genomes evolve under various constraints, which are reflected in correlated evolutionary histories. Examples include coevolving sites within a protein (16–18) and coevolutionary interactions between different genes (19–27). Importantly, many of these studies have demonstrated that coevolutionary interactions between genes are highly suggestive of functional interactions [reviewed in (28)]. In the case of prokaryotic genomes, coevolutionary interactions between genes can be inferred from phyletic patterns by searching for co-occurrence of gene gain (resulting from horizontal gene transfer) and loss events. Several evolutionary methods to infer coevolutionary interactions from phyletic patterns exist, ranging from maximum-parsimony methods (29,30) to methods that provide explicit models of coevolution (31). Recently, we developed a probabilistic method to infer coevolutionary interactions from phyletic patterns (32). In contrast to the maximum-parsimony approach, our method heavily relies on advanced probabilistic models for mapping gain and loss events along the tree. Moreover, unlike explicit models for pairwise coevolution (31), our method allows analyzing data sets with thousands of characters and hundreds of species. Here we present CoPAP (Coevolution of Presence–Absence Patterns), a user-friendly web server which is the first publically available web server for coevolutionary analysis of phyletic data. The main features and novelties of our web server are as follows: (i) usage of efficient probabilistic methods, capable of analyzing evolutionary interactions across hundreds of genomes (see case study below); (ii) implementation of various evolutionary models including complex mixture models, which can accurately capture gain–loss dynamics; (iii) visualization and analysis of the inferred coevolutionary network using Cytoscape (33) with additional preloaded plug-ins to study clusters within the network (34); (iv) providing benchmark data sets of both coevolving and independently evolving genes; (v) phylogenetic visualization of the phyletic patterns using tree visualization applets; (vi) multiple advanced options for expert users, while providing novice users with a minimalistic interface, which enables fast and reliable results for typical inputs.

RESULTS

Input

The CoPAP input is a phyletic pattern provided as a 0/1 matrix. A phylogenetic tree is either provided as input by the user or estimated from the phyletic pattern by the neighbor joining (NJ) algorithm (35). For NJ, distances among genomes are computed using maximum likelihood (a two state model, in which the stationary frequencies are estimated by counting). CoPAP allows for an optional input with description and annotation of characters (e.g. gene information) to facilitate biological interpretation of the resulting coevolutionary network. While the method is suitable for analyzing various types of binary data, we will refer to genes throughout the manuscript to facilitate readability. We note that CoPAP can only analyze binary characters, and therefore cannot capture evolutionary events such as variation in gene copy number [see for example (29)].

Coevolution computation

CoPAP infers coevolutionary interactions and computes statistical significance using simulations. For methodological details see (32) as well as the ‘Overview’ section in the CoPAP web server. Parameters that can be adjusted by the user include, for example, controlling the minimal significance level of reported coevolutionary interactions and controlling for unobservable data (see the ‘Overview’ section within the CoPAP web server for more details).

Evolutionary model

The inference of coevolutionary interactions is dependent on ancestral mapping of gain and loss events along the tree. The accuracy of such mapping depends on the underlying evolutionary model (15). The simplest model assumes that a single evolutionary rate characterizes all characters and allows obtaining results in the shortest time. However, typically this model is extremely unrealistic, as different genes evolve in different rates. Thus, the default model allows for among-gene rate variation, by assuming that the rates are gamma distributed with an additional invariant category. A more advanced mixture model is additionally available, which allows both the gain rate and the loss rate to independently vary among genes (14). The free parameters of all evolutionary models are estimated using maximum likelihood from the data. Further details regarding all available parameters are provided in the ‘Overview’ section in the web server.

A comparative platform for estimating performance of coevolution inference using simulations

Using simulations we evaluated the CoPAP methodology and compared it with the explicit models for pairwise coevolution as implemented in BayesTraits (31) and with a phylogeny-independent approach, based on correlation between observed (extant) patterns of presence and absence, which we term ‘Observed Correlation’ (19). We found area under precision-recall curve of 0.527, 0.453 and 0.292 for CoPAP, BayesTraits and ‘Observed Correlation’ methods, respectively. These results indicate that CoPAP infers coevolving characters more accurately than both other methods. Notably, CoPAP’s run time was <1% of that of BayesTraits but much higher than ‘Observed Correlation’. Further details are provided in the ‘Benchmark’ section within the CoPAP web server.

Case study: the bacterial genes coevolutionary network

We used CoPAP to analyze 4258 bacterial clusters of orthologous genes (COGs) across 681 bacterial genomes. Phyletic patterns were retrieved from eggNOG (36) and the tree from Wu et al. (37). This is the first model-based coevolutionary analysis of such extensive data, substantially larger than the data previously analyzed with this method [282 species (32)], or a previous coevolutionary analysis based on the parsimony approach [163 species (29)]. CoPAP identified 5605 significant interactions (with a significance level of alpha = 0.01 and controlling for false discovery rate). Out of the 4258 COGs analyzed, almost 40% (1664) were found to be involved in strong coevolutionary interactions. CoPAP automatically produces graphical representation of the global properties of the coevolution network. Figure 1 includes examples of such graphical representations illustrating the distribution of the number of interactions (i.e. degree distribution among genes, Figure 1A), and the frequency of various significance levels of coevolutionary interactions (Figure 1B).
Figure 1.

Global properties of the coevolutionary network. The global properties are illustrated with graphs that are automatically produced by CoPAP. (A) Distribution of the number of interactions. (B) Frequency of various significance levels of coevolutionary interactions. The high frequency for top interactions in this example is the result of limited number of simulations. Thus, all the strongest coevolutionary interactions fall in the top-significance bin with P < 2.51e-09.

Global properties of the coevolutionary network. The global properties are illustrated with graphs that are automatically produced by CoPAP. (A) Distribution of the number of interactions. (B) Frequency of various significance levels of coevolutionary interactions. The high frequency for top interactions in this example is the result of limited number of simulations. Thus, all the strongest coevolutionary interactions fall in the top-significance bin with P < 2.51e-09. CoPAP allows users to easily inspect presence–absence patterns for genes of interest with respect to their underlying phylogeny using FigTree http://tree.bio.ed.ac.uk/software/figtree/ and Archaeopteryx (38). Figure 2 presents the patterns of two coevolving genes, COG4521 (ABC-type taurine transport system, periplasmic component) and COG4525 (ABC-type taurine transport system, ATPase component) using FigTree.
Figure 2.

Projecting the phyletic patterns of two coevolving genes onto the tree. CoPAP allows automatic visualization of the presence–absence pattern for a given pair of genes. The pattern for a given pair is mapped onto the tree with taxa names colored according to presence in both (‘11’, red), absence in both (‘00’, gray), presence in the first only (‘10’, green) or presence in the second only (‘01’, blue). Here, the patterns of COG4521 (ABC-type taurine transport system, periplasmic component) and COG4525 (ABC-type taurine transport system, ATPase component) are presented. In this case, the high similarity in their phyletic patterns (as seen by the dominant red and gray colors) is in line with CoPAP’s inference of a statistically significant coevolution.

Projecting the phyletic patterns of two coevolving genes onto the tree. CoPAP allows automatic visualization of the presence–absence pattern for a given pair of genes. The pattern for a given pair is mapped onto the tree with taxa names colored according to presence in both (‘11’, red), absence in both (‘00’, gray), presence in the first only (‘10’, green) or presence in the second only (‘01’, blue). Here, the patterns of COG4521 (ABC-type taurine transport system, periplasmic component) and COG4525 (ABC-type taurine transport system, ATPase component) are presented. In this case, the high similarity in their phyletic patterns (as seen by the dominant red and gray colors) is in line with CoPAP’s inference of a statistically significant coevolution. The reconstructed coevolutionary network is available for download as a detailed text file. Additionally, CoPAP provides advanced network visualization and analysis by automatically loading the network to the Cytoscape platform (33). Figure 3A exemplifies network visualization using Cytoscape for our case study. Cytoscape further allows many functions for network analysis. The detection of groups of genes that coevolve with each other is of special interest, as it may provide valuable insights revealing modularity within bacterial genomes. For this purpose, Cytoscape was preloaded with plug-ins to analyze clusters within the network. In our case study, we clustered genes using the transitivity clustering plug-in (34) to reveal hundreds of clusters of coevolving genes. Coevolving clusters of genes show overwhelming agreement with known function annotation: >90% of the 54 largest clusters (with at least five members) consist of genes with a similar function. A cluster is considered as consisting of genes with similar function if at least 80% of its members share a function, such as members of the same metabolic pathway (e.g. B12 Synthesis, Figure 3B), genes having a similar function description or biological process (e.g. Type IV secretion/conjugation, Figure 3C), genes that contribute to the same phenotype or trait (e.g. motility-related genes, see ‘Gallery’ section in the web server), genes encoding subunits of a protein complex (e.g. NADH:ubiquinone oxidoreductase complex, see ‘Gallery’) or genes sharing the same COG functional category (e.g. ‘amino acid transport and metabolism’, see ‘Gallery’). The inferred coevolving clusters represent functional modules in bacterial genomes.
Figure 3.

Visualization and analysis of the network using Cytoscape. The Cytoscape platform is deployed by CoPAP, allowing in-depth analysis of the coevolutionary network. (A) Global view of the bacterial coevolutionary network. Examples of clusters automatically detected using the transitivity clustering plug-in for Cytoscape: (B) B12 (Cobalamin) synthesis; (C) Type IV secretion/conjugation.

Visualization and analysis of the network using Cytoscape. The Cytoscape platform is deployed by CoPAP, allowing in-depth analysis of the coevolutionary network. (A) Global view of the bacterial coevolutionary network. Examples of clusters automatically detected using the transitivity clustering plug-in for Cytoscape: (B) B12 (Cobalamin) synthesis; (C) Type IV secretion/conjugation.

CONCLUSION

The observation that by-and-large clusters of coevolving genes are annotated with similar biological functions strongly supports the validity of this approach to extract meaningful biological interactions. This observation also suggests a crucial role for coevolutionary analysis in uncovering dependencies and associations between evolving genes. The publically available web server we present here is suitable for analyzing various binary-coded data and thus, has the potential to facilitate further biological understanding with the discovery of additional coevolutionary networks.
  36 in total

1.  Modularity in the gain and loss of genes: applications for function prediction.

Authors:  T Ettema; J van der Oost; M Huynen
Journal:  Trends Genet       Date:  2001-09       Impact factor: 11.639

2.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

Authors:  M Pellegrini; E M Marcotte; M J Thompson; D Eisenberg; T O Yeates
Journal:  Proc Natl Acad Sci U S A       Date:  1999-04-13       Impact factor: 11.205

3.  Using phylogenetic profiles to predict functional relationships.

Authors:  Matteo Pellegrini
Journal:  Methods Mol Biol       Date:  2012

4.  Inferring functional linkages between proteins from evolutionary scenarios.

Authors:  Yun Zhou; Rui Wang; Li Li; Xuefeng Xia; Zhirong Sun
Journal:  J Mol Biol       Date:  2006-04-24       Impact factor: 5.469

5.  Identification and analysis of evolutionarily cohesive functional modules in protein networks.

Authors:  Mónica Campillos; Christian von Mering; Lars Juhl Jensen; Peer Bork
Journal:  Genome Res       Date:  2006-01-31       Impact factor: 9.043

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

7.  Cytoscape 2.8: new features for data integration and network visualization.

Authors:  Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

8.  On the estimation of intron evolution.

Authors:  Miklós Csurös
Journal:  PLoS Comput Biol       Date:  2006-07-28       Impact factor: 4.475

9.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.

Authors:  Boris G Mirkin; Trevor I Fenner; Michael Y Galperin; Eugene V Koonin
Journal:  BMC Evol Biol       Date:  2003-01-06       Impact factor: 3.260

10.  Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns.

Authors:  Galina V Glazko; Arcady R Mushegian
Journal:  Genome Biol       Date:  2004-04-27       Impact factor: 13.583

View more
  9 in total

1.  Recent events dominate interdomain lateral gene transfers between prokaryotes and eukaryotes and, with the exception of endosymbiotic gene transfers, few ancient transfer events persist.

Authors:  Laura A Katz
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-09-26       Impact factor: 6.237

2.  Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires.

Authors:  David Burstein; Francisco Amaro; Tal Zusman; Ziv Lifshitz; Ofir Cohen; Jack A Gilbert; Tal Pupko; Howard A Shuman; Gil Segal
Journal:  Nat Genet       Date:  2016-01-11       Impact factor: 38.330

3.  Protein complexes in bacteria.

Authors:  J Harry Caufield; Marco Abreu; Christopher Wimble; Peter Uetz
Journal:  PLoS Comput Biol       Date:  2015-02-27       Impact factor: 4.475

4.  BIS2Analyzer: a server for co-evolution analysis of conserved protein families.

Authors:  Francesco Oteri; Francesca Nadalin; Raphaël Champeimont; Alessandra Carbone
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

5.  Coinfinder: detecting significant associations and dissociations in pangenomes.

Authors:  Fiona Jane Whelan; Martin Rusilowicz; James Oscar McInerney
Journal:  Microb Genom       Date:  2020-02-24

6.  Phylogenetic Clustering of Genes Reveals Shared Evolutionary Trajectories and Putative Gene Functions.

Authors:  Chaoyue Liu; Benjamin Wright; Emma Allen-Vercoe; Hong Gu; Robert Beiko
Journal:  Genome Biol Evol       Date:  2018-09-01       Impact factor: 3.416

7.  A multi-scale coevolutionary approach to predict interactions between protein domains.

Authors:  Giancarlo Croce; Thomas Gueudré; Maria Virginia Ruiz Cuevas; Victoria Keidel; Matteo Figliuzzi; Hendrik Szurmant; Martin Weigt
Journal:  PLoS Comput Biol       Date:  2019-10-21       Impact factor: 4.475

8.  Signatures of selection in core and accessory genomes indicate different ecological drivers of diversification among Bacillus cereus clades.

Authors:  Hugh White; Michiel Vos; Samuel K Sheppard; Ben Pascoe; Ben Raymond
Journal:  Mol Ecol       Date:  2022-05-17       Impact factor: 6.622

9.  Inverse Potts model improves accuracy of phylogenetic profiling.

Authors:  Tsukasa Fukunaga; Wataru Iwasaki
Journal:  Bioinformatics       Date:  2022-01-21       Impact factor: 6.937

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.