Manuel Lafond1, Riccardo Dondi2, Nadia El-Mabrouk1. 1. Département d'informatique et de recherche opérationnelle, Université de Montréal, Montreal, QC Canada. 2. Dipartimento di Scienze Umane e Sociali, Università degli Studi di Bergamo, Via Donizetti 3, 24129 Bergamo, Italy.
Abstract
BACKGROUND: While tree-oriented methods for inferring orthology and paralogy relations between genes are based on reconciling a gene tree with a species tree, many tree-free methods are also available (usually based on sequence similarity). Recently, the link between orthology relations and gene trees has been formally considered from the perspective of reconstructing phylogenies from orthology relations. In this paper, we consider this link from a correction point of view. Indeed, a gene tree induces a set of relations, but the converse is not always true: a set of relations is not necessarily in agreement with any gene tree. A natural question is thus how to minimally correct an infeasible set of relations. Another natural question, given a gene tree and a set of relations, is how to minimally correct a gene tree so that the resulting gene tree fits the set of relations. RESULTS: We consider four variants of relation and gene tree correction problems, and provide hardness results for all of them. More specifically, we show that it is NP-Hard to edit a minimum of set of relations to make them consistent with a given species tree. We also show that the problem of finding a maximum subset of genes that share consistent relations is hard to approximate. We then demonstrate that editing a gene tree to satisfy a given set of relations in a minimum way is NP-Hard, where "minimum" refers either to the number of modified relations depicted by the gene tree or the number of clades that are lost. We also discuss some of the algorithmic perspectives given these hardness results.
BACKGROUND: While tree-oriented methods for inferring orthology and paralogy relations between genes are based on reconciling a gene tree with a species tree, many tree-free methods are also available (usually based on sequence similarity). Recently, the link between orthology relations and gene trees has been formally considered from the perspective of reconstructing phylogenies from orthology relations. In this paper, we consider this link from a correction point of view. Indeed, a gene tree induces a set of relations, but the converse is not always true: a set of relations is not necessarily in agreement with any gene tree. A natural question is thus how to minimally correct an infeasible set of relations. Another natural question, given a gene tree and a set of relations, is how to minimally correct a gene tree so that the resulting gene tree fits the set of relations. RESULTS: We consider four variants of relation and gene tree correction problems, and provide hardness results for all of them. More specifically, we show that it is NP-Hard to edit a minimum of set of relations to make them consistent with a given species tree. We also show that the problem of finding a maximum subset of genes that share consistent relations is hard to approximate. We then demonstrate that editing a gene tree to satisfy a given set of relations in a minimum way is NP-Hard, where "minimum" refers either to the number of modified relations depicted by the gene tree or the number of clades that are lost. We also discuss some of the algorithmic perspectives given these hardness results.
Entities:
Keywords:
Gene tree; NP-Hardness; Orthology; Paralogy; Species tree
Authors: Marc Hellmuth; Maribel Hernandez-Rosales; Katharina T Huber; Vincent Moulton; Peter F Stadler; Nicolas Wieseke Journal: J Math Biol Date: 2012-03-29 Impact factor: 2.259
Authors: Marcus Lechner; Sven Findeiss; Lydia Steiner; Manja Marz; Peter F Stadler; Sonja J Prohaska Journal: BMC Bioinformatics Date: 2011-04-28 Impact factor: 3.169
Authors: Manuela Geiß; Edgar Chávez; Marcos González Laffitte; Alitzel López Sánchez; Bärbel M R Stadler; Dulce I Valdivia; Marc Hellmuth; Maribel Hernández Rosales; Peter F Stadler Journal: J Math Biol Date: 2019-04-09 Impact factor: 2.259
Authors: Manuela Geiß; Marcos E González Laffitte; Alitzel López Sánchez; Dulce I Valdivia; Marc Hellmuth; Maribel Hernández Rosales; Peter F Stadler Journal: J Math Biol Date: 2020-01-30 Impact factor: 2.259
Authors: Nikolai Nøjgaard; Manuela Geiß; Daniel Merkle; Peter F Stadler; Nicolas Wieseke; Marc Hellmuth Journal: Algorithms Mol Biol Date: 2018-02-06 Impact factor: 1.405