Literature DB >> 32343307

MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life.

Uciel Chorostecki1,2, Manuel Molina1,2, Leszek P Pryszcz3,4, Toni Gabaldón1,2,5.   

Abstract

Inferring homology relationships across genes in different species is a central task in comparative genomics. Therefore, a large number of resources and methods have been developed over the years. Some public databases include phylogenetic trees of homologous gene families which can be used to further differentiate homology relationships into orthology and paralogy. MetaPhOrs is a web server that integrates phylogenetic information from different sources to provide orthology and paralogy relationships based on a common phylogeny-based predictive algorithm and associated with a consistency-based confidence score. Here we describe the latest version of the web server which includes major new implementations and provides orthology and paralogy relationships derived from ∼8.2 million gene family trees-from 13 different source repositories across ∼4000 species with sequenced genomes. MetaPhOrs server is freely available, without registration, at http://orthology.phylomedb.org/.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 32343307      PMCID: PMC7319458          DOI: 10.1093/nar/gkaa282

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Accurate prediction of orthology is central to comparative genomics. Although many orthology prediction tools have been developed, many users rely on pre-computed relationships for model and non-model organisms that are provided by a growing number of orthology databases (1). Homologous genes can be orthologs or paralogs, depending on whether they diverged from their common ancestor through speciation or duplication, respectively, and this is best inferred through phylogenetic analysis (2,3). Hence phylogenetic trees derived from sets of homologous genes (i.e. gene trees) can be used to infer orthology and paralogy relationships and can help uncover relevant evolutionary events. In contrast to popular similarity network-based approaches that build clusters of orthologous genes comprising orthologs and in-paralogs, phylogeny-based inference of orthology and paralogy provides pair-wise homology relationships and is able to reconstruct complex patterns of co-orthology (i.e. one-to-many or many-to-many relationships (3). Different databases provide access to thousands of gene trees across different taxa, but they often do not provide specific orthology or paralogy information. If they do, different parameters and methods are used which may result in incongruent results that are difficult to compare by the end-user. MetaPhOrs (Meta Phylogeny Based Orthologs) fills this gap by integrating phylogenetic information derived from different databases into a single framework for the inference of orthology and paralogy relationships (4). MetaPhOrs uses the species-overlap orthology prediction algorithm (2,5) over gene trees retrieved from heterogeneous sources and integrates the resulting orthology and paralogy pairwise relationships using a consistency-based approach (4). Hence, MetaPhOrs serves as a global repository of highly accurate, phylogeny-based orthology and paralogy predictions that is easily accessible for non-expert users. The accuracy of orthology predictions provided by MetaPhOrs has been benchmarked alongside other methods, showing the superiority of the integrative approach over the individual methods or databases (4,6). Since its first release a decade ago, MetaPhOrs, has been regularly updated and expanded and here we describe the features and characteristics of the last and major update to the server, which provides significantly increased functionality.

MATERIALS AND METHODS

Web server

The MetaPhOrs web server is currently hosted at the Barcelona Supercomputing Center (BSC-CNS) and runs on Ubuntu Linux with 16 GB memory. Other significant software packages used include: Apache (version 2.4.18, https://httpd.apache.org), PHP (version 7.0.15, http://www.php.net), MariaDB (version 10.2.22, htps://mariadb.org/), ETE (version 3.1.1, (7)) and Python (version 3.7.4, https://www.python.org/) on the backend. The client-side user interface was implemented using Drupal (version 7.69), SQLite (version 3.7.3) and JavaScript libraries, including jQuery (version 1.11.0). Page tracking is provided by Google Analytics (http://www.google.com/analytics/).

Overview of the consistency score

Combined orthology/paralogy assignment in MetaPhOrs is based on the consistency score (CS). CS defines the overall agreement of source gene trees about a given prediction and it ranges from 0 to 1, the closer the value of CS to 1, the more confident the prediction. Further details on how CS is calculated, can be found in the original MetaPhOrs' publication (4). In addition, MetaPhOrs assigns an Evidence Level (EL) to each prediction, defined as the number of independent sources used for inferring this prediction.

METAPHORS WEB SERVER DESCRIPTION

MetaPhOrs is a public repository of phylogeny-based orthologs and paralogs that were computed using phylogenetic trees available in several repositories. In addition, MetaPhOrs computes ∼77 000 gene trees for OrthoMCL repository using the PhylomeDB pipeline (8) (Figure 1A). The MetaPhOrs web server has undergone several upgrades since its first release in 2010. The current version represents a major upgrade of the web interface and a significant expansion of the taxonomic scope covered by the web server. In this new version, orthology and paralogy predictions were computed from data available in 13 different large-scale databases (Figure 1A). These orthology and paralogy predictions for ∼117 million proteins (Figure 1A) were inferred based on the analyses of ∼8.2 million maximum-likelihood (ML) trees covering ∼4000 different fully-sequenced species. For each prediction, MetaPhOrs provides a CS (defining overall agreement of source gene trees about a given prediction) and EL (informing about the number of repositories from which prediction is retrieved) describing its goodness, together with the number of trees and links to their source databases (Figure 1B).
Figure 1.

(A) Bar plot showing the contribution of the different source repositories for the total of ∼8.2 million gene family trees and for the total of ∼117 million proteins, included in the new MetaPhOrs release. Note the base-10 log scale used for the Y-axis. Ensembl databases including vertebrates, bacteria, fungi, Metazoa, Pan, plants and protists. (B) Overview of the CS, and EL used in MetaPhOrs for Orthology and Paralogy assignment. Trees in green are in agreement with gene trees about a given prediction, trees in red are not.

(A) Bar plot showing the contribution of the different source repositories for the total of ∼8.2 million gene family trees and for the total of ∼117 million proteins, included in the new MetaPhOrs release. Note the base-10 log scale used for the Y-axis. Ensembl databases including vertebrates, bacteria, fungi, Metazoa, Pan, plants and protists. (B) Overview of the CS, and EL used in MetaPhOrs for Orthology and Paralogy assignment. Trees in green are in agreement with gene trees about a given prediction, trees in red are not. By dynamically interacting through a highly intuitive web interface, users can interrogate orthology and paralogy relationships for a given gene, set of genes or species, across a defined set of species or taxonomic range. The user can then visualize or explore evolutionary or functional information associated with the set of retrieved orthologs, as well as download homology tables, sequences or related information. Using the web interface, the orthology and paralogy information can be retrieved in different ways: (i) by searching for a particular gene (ii) by providing a sequence as input or (iii) by given pair of species and searching all predictions for this pair. Queries can be limited to a species, or a set of species, a source database, and the output can be filtered by levels of CS or EL values, depending on the level of stringency desired by the user.

IMPROVEMENTS IN THE NEW RELEASE

The latest update introduces major features that greatly improve the functions and usability of MetaPhOrs. Here we describe the most relevant that were implemented in this new release.

Redesign user interface

The original MetaPhOrs web server was developed using a Drupal content management system, and custom scripts in Python to process the data. But the incredible advances in the client-side languages have rendered this design obsolete, not very fast for the user and incompatible with mobile devices. Thus, we have redesigned the web interface for fast interactivity and full functionality with any of the popular browsers (e.g. Chrome, Edge, Firefox and Safari). The front-end software development uses Responsive Web Design technologies, supporting desktop/laptops computers, smartphones and tablet devices combined with a robust Drupal+JS application to manage the user interface. Furthermore, by incorporating modern software features like asynchronous javascript, we have made MetaPhOrs more secure in terms of errors, more intuitive to the users and we have improved the data retrieval performance, which will, in turn, be able to make more efficient use of the underlying database.

Multiple comparisons between species

The previous version of MetaPhOrs did not support whole-genome orthology comparisons in one search due to limitations in computational capacity and visualization methods. While the previous version of MetaPhOrs required that the user perform a different search for each gene, this new version allows retrieving genome-wide predictions of orthology or paralogy for a given pair of species.

New organisms and database update

The previous publication of MetaPhOrs (4) includes orthology and paralogy predictions across ∼800 organisms. In this main update of MetaPhOrs’ web server, we made a significant expansion of the taxonomic scope covered by the web server, which now covers ∼4000 different organisms with available genomes. In this update of the database, we combine different large-scale databases of phylogenetic information, including the last releases of PhylomeDB (5), seven database from Ensembl Compara (including vertebrates, bacteria, fungi, Metazoa, Pan, plants and protists, release 98) (9), EggNOG (version 4.5.1) (10), TreeFam (Release 9) (11), Evolclust (version 1) (12) and Hogenom (release 6) (13). Additionally, we have reconstructed ML trees from protein families alignments stored in OrthoMCL (version 5) (14) (Figure 1A). Then, a score is assigned to each orthology and paralogy prediction based on its level of consistency across the different sources (Figure 1B). The MetaPhOrs v2 server is running on MareNostrum 4 supercomputer at the Barcelona Supercomputing Center (www.bsc.es), which provides us with state of the art access to high storing and computing facilities. In addition, we have recently improved the pipeline to retrieve the information from the different sources and to compute the CS and EL, that will make it easier to schedule new releases.

Additional features

In addition to the main changes in this update, we have implemented other new features in order to serve the users with an easy-to-use and fast website. To explore orthologous relationships, the user can inspect the results through several interactive tables where they can filter the table (e.g. select only the results from a specific species, gene or CS) and re-order it. Furthermore, the results can be copied to the clipboard, or downloaded in several formats such as, PDF, comma-separated value, and, importantly, the OrthoXML standard format (15) which has been adopted by the Quest for Orthologs consortium to facilitate interoperability across orthology databases (1). Furthermore, users can download a multi-FASTA file with the orthologous (or paralogous) proteins in different species. Another feature added to MetaPhOrs is the addition of a History tab, that shows the last 10 searches performed by the user and the incorporation of error handling. Furthermore, we provided renewed help pages accessible through the header menu. At last, we added an FTP server that contains dumps of data that are served through the MetaPhOrs portal, which provides an archive of current and past releases, and facilitates reproducibility of studies based on MetaPhOrs data.

USE CASES

With MetaPhOrs, users can take advantage of multiple phylogenetic evidence to derive orthology and paralogy predictions. To illustrate the usage of MetaPhOrs, and some of the new features included in the latest release, we search for orthologs and paralogs for TP53 gene in human (Figure 2). At the top of the result page, MetaPhOrs’ website shows a description of the gene of interest. Thus, It shows that TP53 has 125 orthologs (CS > 0.5) in 91 species, as inferred from 106 phylogenetic trees retrieved from eight different sources (Figure 2). Below the description, there is a table of orthology and paralogy relationships for TP53 gene, one row for each of the 125 orthologs found (Figure 2). The homology table can be filtered by Confidence Score, EL values or species, and can be sorted by any column (Figure 2).
Figure 2.

Screenshot of MetaPhOrs web-server interface showing the orthologs and paralogs for TP53 gene in human. In the top, a description of the TP53 gene. In the bottom, a table of orthology and paralogy relationships for the TP53 gene.

Screenshot of MetaPhOrs web-server interface showing the orthologs and paralogs for TP53 gene in human. In the top, a description of the TP53 gene. In the bottom, a table of orthology and paralogy relationships for the TP53 gene. Since its initial release in 2010, MetaPhOrs usage continues to increase with ∼290 visitors per month in 2019 based on usage data gathered using Google Analytics. MetaPhOrs server constitutes a resource of broad interest and applicability and it was used in several relevant projects. For example, it was used to determine the evolutionary ages of the germ layers in Caenorhabditis elegans (16) or to understand the differentiation and regulation of tissue-specific gene products in plants (17) and helped resolving early branches in the tree of life of modern birds (18).

BENCHMARKING

Orthology predictions have been benchmarked on the Quest for Orthologs reference proteomes. The quest for orthologs initiative aims to improve and standardize orthology predictions through collaboration and sharing knowledge among users and developers of algorithms and databases in this field (19,20). The accuracy of orthology predictions provided by MetaPhOrs has been benchmarked alongside other methods, showing the superiority of the integrative approach over the individual methods or databases (6), especially when looking at both precision and recall. Furthermore, MetaPhOrs has been benchmarked using alternative approaches and datasets in the original MetaPhOrs’ publication (4). Most benchmarks show a good compromise between accuracy and sensitivity in predictions provided by MetaPhOrs, comparable to the best scoring methods.

CONCLUSION

We describe the latest developments in MetaPhOrs, including new functions and a significant expansion of the database. The new MetaPhOrs web server offers a unique, modern interactive user interface providing phylogeny-based orthology and paralogy predictions.
  20 in total

1.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.

Authors:  Albert J Vilella; Jessica Severin; Abel Ureta-Vidal; Li Heng; Richard Durbin; Ewan Birney
Journal:  Genome Res       Date:  2008-11-24       Impact factor: 9.043

Review 2.  Functional and evolutionary implications of gene orthology.

Authors:  Toni Gabaldón; Eugene V Koonin
Journal:  Nat Rev Genet       Date:  2013-04-04       Impact factor: 53.242

3.  OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors:  Li Li; Christian J Stoeckert; David S Roos
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

4.  Mutational bias and translational selection shaping the codon usage pattern of tissue-specific genes in rice.

Authors:  Qingpo Liu
Journal:  PLoS One       Date:  2012-10-29       Impact factor: 3.240

5.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score.

Authors:  Leszek P Pryszcz; Jaime Huerta-Cepas; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2010-12-11       Impact factor: 16.971

6.  Advances and Applications in the Quest for Orthologs.

Authors:  Natasha Glover; Christophe Dessimoz; Ingo Ebersberger; Sofia K Forslund; Toni Gabaldón; Jaime Huerta-Cepas; Maria-Jesus Martin; Matthieu Muffato; Mateus Patricio; Cécile Pereira; Alan Sousa da Silva; Yan Wang; Erik Sonnhammer; Paul D Thomas
Journal:  Mol Biol Evol       Date:  2019-10-01       Impact factor: 16.240

7.  Whole-genome analyses resolve early branches in the tree of life of modern birds.

Authors:  Erich D Jarvis; Siavash Mirarab; Andre J Aberer; Bo Li; Peter Houde; Cai Li; Simon Y W Ho; Brant C Faircloth; Benoit Nabholz; Jason T Howard; Alexander Suh; Claudia C Weber; Rute R da Fonseca; Jianwen Li; Fang Zhang; Hui Li; Long Zhou; Nitish Narula; Liang Liu; Ganesh Ganapathy; Bastien Boussau; Md Shamsuzzoha Bayzid; Volodymyr Zavidovych; Sankar Subramanian; Toni Gabaldón; Salvador Capella-Gutiérrez; Jaime Huerta-Cepas; Bhanu Rekepalli; Kasper Munch; Mikkel Schierup; Bent Lindow; Wesley C Warren; David Ray; Richard E Green; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Shengbin Li; Ning Li; Yinhua Huang; Elizabeth P Derryberry; Mads Frost Bertelsen; Frederick H Sheldon; Robb T Brumfield; Claudio V Mello; Peter V Lovell; Morgan Wirthlin; Maria Paula Cruz Schneider; Francisco Prosdocimi; José Alfredo Samaniego; Amhed Missael Vargas Velazquez; Alonzo Alfaro-Núñez; Paula F Campos; Bent Petersen; Thomas Sicheritz-Ponten; An Pas; Tom Bailey; Paul Scofield; Michael Bunce; David M Lambert; Qi Zhou; Polina Perelman; Amy C Driskell; Beth Shapiro; Zijun Xiong; Yongli Zeng; Shiping Liu; Zhenyu Li; Binghang Liu; Kui Wu; Jin Xiao; Xiong Yinqi; Qiuemei Zheng; Yong Zhang; Huanming Yang; Jian Wang; Linnea Smeds; Frank E Rheindt; Michael Braun; Jon Fjeldsa; Ludovic Orlando; F Keith Barker; Knud Andreas Jønsson; Warren Johnson; Klaus-Peter Koepfli; Stephen O'Brien; David Haussler; Oliver A Ryder; Carsten Rahbek; Eske Willerslev; Gary R Graves; Travis C Glenn; John McCormack; Dave Burt; Hans Ellegren; Per Alström; Scott V Edwards; Alexandros Stamatakis; David P Mindell; Joel Cracraft; Edward L Braun; Tandy Warnow; Wang Jun; M Thomas P Gilbert; Guojie Zhang
Journal:  Science       Date:  2014-12-12       Impact factor: 47.728

8.  PhylomeDB: a database for genome-wide collections of gene phylogenies.

Authors:  Jaime Huerta-Cepas; Anibal Bueno; Joaquín Dopazo; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2007-10-25       Impact factor: 16.971

9.  TreeFam v9: a new website, more species and orthology-on-the-fly.

Authors:  Fabian Schreiber; Mateus Patricio; Matthieu Muffato; Miguel Pignatelli; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

10.  Standardized benchmarking in the quest for orthologs.

Authors:  Adrian M Altenhoff; Brigitte Boeckmann; Salvador Capella-Gutierrez; Daniel A Dalquen; Todd DeLuca; Kristoffer Forslund; Jaime Huerta-Cepas; Benjamin Linard; Cécile Pereira; Leszek P Pryszcz; Fabian Schreiber; Alan Sousa da Silva; Damian Szklarczyk; Clément-Marie Train; Peer Bork; Odile Lecompte; Christian von Mering; Ioannis Xenarios; Kimmen Sjölander; Lars Juhl Jensen; Maria J Martin; Matthieu Muffato; Toni Gabaldón; Suzanna E Lewis; Paul D Thomas; Erik Sonnhammer; Christophe Dessimoz
Journal:  Nat Methods       Date:  2016-04-04       Impact factor: 28.547

View more
  10 in total

1.  Follicular Hyperstimulation Dysgenesis: New Explanation for Adverse Effects of Excessive FSH in Ovarian Stimulation.

Authors:  Zaramasina L Clark; Meghan L Ruebel; Peter Z Schall; Kaitlin R Karl; James J Ireland; Keith E Latham
Journal:  Endocrinology       Date:  2022-09-01       Impact factor: 5.051

2.  Asymmetrical dose responses shape the evolutionary trade-off between antifungal resistance and nutrient use.

Authors:  Philippe C Després; Angel F Cisneros; Emilie M M Alexander; Ria Sonigara; Cynthia Gagné-Thivierge; Alexandre K Dubé; Christian R Landry
Journal:  Nat Ecol Evol       Date:  2022-09-01       Impact factor: 19.100

3.  Essential shared and species-specific features of mammalian oocyte maturation-associated transcriptome changes impacting oocyte physiology.

Authors:  Peter Z Schall; Keith E Latham
Journal:  Am J Physiol Cell Physiol       Date:  2021-04-21       Impact factor: 5.282

4.  Nuclear RNA purification by flow cytometry to study nuclear processes in plants.

Authors:  Belén Moro; Malgorzata Kisielow; Veronica Barragan Borrero; Antoine Bouet; Christopher A Brosnan; Nicolás G Bologna
Journal:  STAR Protoc       Date:  2021-02-12

5.  PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies.

Authors:  Diego Fuentes; Manuel Molina; Uciel Chorostecki; Salvador Capella-Gutiérrez; Marina Marcet-Houben; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

6.  Leveraging machine learning essentiality predictions and chemogenomic interactions to identify antifungal targets.

Authors:  Ci Fu; Xiang Zhang; Amanda O Veri; Kali R Iyer; Emma Lash; Alice Xue; Huijuan Yan; Nicole M Revie; Cassandra Wong; Zhen-Yuan Lin; Elizabeth J Polvi; Sean D Liston; Benjamin VanderSluis; Jing Hou; Yoko Yashiroda; Anne-Claude Gingras; Charles Boone; Teresa R O'Meara; Matthew J O'Meara; Suzanne Noble; Nicole Robbins; Chad L Myers; Leah E Cowen
Journal:  Nat Commun       Date:  2021-11-11       Impact factor: 17.694

7.  Transcriptome and proteome profiling reveals complex adaptations of Candida parapsilosis cells assimilating hydroxyaromatic carbon sources.

Authors:  Andrea Cillingová; Renáta Tóth; Anna Mojáková; Igor Zeman; Romana Vrzoňová; Barbara Siváková; Peter Baráth; Martina Neboháčová; Zuzana Klepcová; Filip Brázdovič; Hana Lichancová; Viktória Hodorová; Broňa Brejová; Tomáš Vinař; Sofia Mutalová; Veronika Vozáriková; Giacomo Mutti; Ľubomír Tomáška; Atilla Gácser; Toni Gabaldón; Jozef Nosek
Journal:  PLoS Genet       Date:  2022-03-07       Impact factor: 5.917

8.  Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms.

Authors:  Megan Crow; Hamsini Suresh; John Lee; Jesse Gillis
Journal:  Nucleic Acids Res       Date:  2022-04-22       Impact factor: 19.160

9.  Genome analysis of five recently described species of the CUG-Ser clade uncovers Candida theae as a new hybrid lineage with pathogenic potential in the Candida parapsilosis species complex.

Authors:  Verónica Mixão; Valentina Del Olmo; Eva Hegedűsová; Ester Saus; Leszek Pryszcz; Andrea Cillingová; Jozef Nosek; Toni Gabaldón
Journal:  DNA Res       Date:  2022-02-27       Impact factor: 4.477

10.  Systematic analysis of bypass suppression of essential genes.

Authors:  Jolanda van Leeuwen; Carles Pons; Guihong Tan; Jason Zi Wang; Jing Hou; Jochen Weile; Marinella Gebbia; Wendy Liang; Ermira Shuteriqi; Zhijian Li; Maykel Lopes; Matej Ušaj; Andreia Dos Santos Lopes; Natascha van Lieshout; Chad L Myers; Frederick P Roth; Patrick Aloy; Brenda J Andrews; Charles Boone
Journal:  Mol Syst Biol       Date:  2020-09       Impact factor: 11.429

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.