| Literature DB >> 27043882 |
Adrian M Altenhoff1,2, Brigitte Boeckmann3, Salvador Capella-Gutierrez4,5,6, Daniel A Dalquen7, Todd DeLuca8, Kristoffer Forslund9, Jaime Huerta-Cepas9, Benjamin Linard10, Cécile Pereira11,12, Leszek P Pryszcz4, Fabian Schreiber13, Alan Sousa da Silva13, Damian Szklarczyk14,15, Clément-Marie Train1, Peer Bork9,16,17, Odile Lecompte18, Christian von Mering14,15, Ioannis Xenarios3,19,20, Kimmen Sjölander21, Lars Juhl Jensen22, Maria J Martin13, Matthieu Muffato13, Toni Gabaldón4,5,23, Suzanna E Lewis24, Paul D Thomas25, Erik Sonnhammer26, Christophe Dessimoz7,20,27,28,29.
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.Entities:
Mesh:
Year: 2016 PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1The Orthology Benchmark service facilitates assessment and comparison of orthology inference methods.
Orthology method developers run their methods on a reference proteome set and submit the inferred orthologs to the service. The predictions are subjected to a battery of phylogenetic and functional tests, and the results are returned to the method developer, who can choose to disclose them publicly.
Figure 2The Generalized Species Tree Discordance test assesses the congruence of inferred orthologs with a trusted reference tree.
Benchmarking results are shown for eukaryotes. A trade-off between precision (measured in terms of tree error in the y-axis) and recall (measured in terms of completed tree samples in the x-axis; Online Methods) can be observed. Only high-confidence branches of the reference tree (L90, Online Methods), at least 10 myr long, are considered. Error bars indicate 95% confidence intervals and the line indicates the 'Pareto frontier'.
Source data
Figure 3Benchmark results using sets of reference gene trees.
Evolutionary gene relationships are predicted for the QfO reference proteomes by 15 different methods. From the results, pairs of orthologous relationships are determined for each method and compared to those obtained from the reference gene trees of (a) SwissTree and (b) TreeFam-A. Error bars indicate 95% confidence intervals.
Source data
Figure 4Benchmarks of functional similarity between inferred orthologous gene pairs.
Two different types of functional annotations are used: (a) experimentally supported GO annotations and (b) Enzyme Commission (EC) numbers. Error bars indicate 95% confidence intervals.
Source data