BACKGROUND: A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? RESULTS: Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
BACKGROUND: A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? RESULTS: Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: Robert M Waterhouse; Evgeny M Zdobnov; Fredrik Tegenfeldt; Jia Li; Evgenia V Kriventseva Journal: Nucleic Acids Res Date: 2010-10-23 Impact factor: 16.971
Authors: Marcus Lechner; Sven Findeiss; Lydia Steiner; Manja Marz; Peter F Stadler; Sonja J Prohaska Journal: BMC Bioinformatics Date: 2011-04-28 Impact factor: 3.169
Authors: Manuela Geiß; Edgar Chávez; Marcos González Laffitte; Alitzel López Sánchez; Bärbel M R Stadler; Dulce I Valdivia; Marc Hellmuth; Maribel Hernández Rosales; Peter F Stadler Journal: J Math Biol Date: 2019-04-09 Impact factor: 2.259
Authors: Manuela Geiß; Marcos E González Laffitte; Alitzel López Sánchez; Dulce I Valdivia; Marc Hellmuth; Maribel Hernández Rosales; Peter F Stadler Journal: J Math Biol Date: 2020-01-30 Impact factor: 2.259
Authors: Nikolai Nøjgaard; Manuela Geiß; Daniel Merkle; Peter F Stadler; Nicolas Wieseke; Marc Hellmuth Journal: Algorithms Mol Biol Date: 2018-02-06 Impact factor: 1.405