Daniel Doerr1,2, Metin Balaban1, Pedro Feijão2, Cedric Chauve3. 1. School of Computer and Communication Sciences, EPFL, INJ211 Station 14, 1015 Lausanne, Switzerland. 2. Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany. 3. Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6 Canada.
Abstract
BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
Authors: Marcus Lechner; Sven Findeiss; Lydia Steiner; Manja Marz; Peter F Stadler; Sonja J Prohaska Journal: BMC Bioinformatics Date: 2011-04-28 Impact factor: 3.169
Authors: Marcus Lechner; Maribel Hernandez-Rosales; Daniel Doerr; Nicolas Wieseke; Annelyse Thévenin; Jens Stoye; Roland K Hartmann; Sonja J Prohaska; Peter F Stadler Journal: PLoS One Date: 2014-08-19 Impact factor: 3.240
Authors: Catia Pesquita; Daniel Faria; Hugo Bastos; António E N Ferreira; André O Falcão; Francisco M Couto Journal: BMC Bioinformatics Date: 2008-04-29 Impact factor: 3.169