Vladimir Makarenkov1, François-Joseph Lapointe. 1. Département d'Informatique, Université du Québec à Montréal, C.P. 8888, Succ. Centre-Ville, Montréal, Québec, Canada H3C 3P8. makarenkov.vladimir@uqam.ca
Abstract
MOTIVATION: The problem of phylogenetic inference from datasets including incomplete or uncertain entries is among the most relevant issues in systematic biology. In this paper, we propose a new method for reconstructing phylogenetic trees from partial distance matrices. The new method combines the usage of the four-point condition and the ultrametric inequality with a weighted least-squares approximation to solve the problem of missing entries. It can be applied to infer phylogenies from evolutionary data including some missing or uncertain information, for instance, when observed nucleotide or protein sequences contain gaps or missing entries. RESULTS: In a number of simulations involving incomplete datasets, the proposed method outperformed the well-known Ultrametric and Additive procedures. Generally, the new method also outperformed all the other competing approaches including Triangle and Fitch which is the most popular least-squares method for reconstructing phylogenies. We illustrate the usefulness of the introduced method by analyzing two well-known phylogenies derived from complete mammalian mtDNA sequences. Some interesting theoretical results concerning the NP-hardness of the ordinary and weighted least-squares fitting of a phylogenetic tree to a partial distance matrix are also established. AVAILABILITY: The T-Rex package including this method is freely available for download at http://www.info.uqam.ca/~makarenv/trex.html
MOTIVATION: The problem of phylogenetic inference from datasets including incomplete or uncertain entries is among the most relevant issues in systematic biology. In this paper, we propose a new method for reconstructing phylogenetic trees from partial distance matrices. The new method combines the usage of the four-point condition and the ultrametric inequality with a weighted least-squares approximation to solve the problem of missing entries. It can be applied to infer phylogenies from evolutionary data including some missing or uncertain information, for instance, when observed nucleotide or protein sequences contain gaps or missing entries. RESULTS: In a number of simulations involving incomplete datasets, the proposed method outperformed the well-known Ultrametric and Additive procedures. Generally, the new method also outperformed all the other competing approaches including Triangle and Fitch which is the most popular least-squares method for reconstructing phylogenies. We illustrate the usefulness of the introduced method by analyzing two well-known phylogenies derived from complete mammalian mtDNA sequences. Some interesting theoretical results concerning the NP-hardness of the ordinary and weighted least-squares fitting of a phylogenetic tree to a partial distance matrix are also established. AVAILABILITY: The T-Rex package including this method is freely available for download at http://www.info.uqam.ca/~makarenv/trex.html
Authors: Alexander G Ioannidis; Javier Blanco-Portillo; Karla Sandoval; Erika Hagelberg; Juan Francisco Miquel-Poblete; J Víctor Moreno-Mayar; Juan Esteban Rodríguez-Rodríguez; Consuelo D Quinto-Cortés; Kathryn Auckland; Tom Parks; Kathryn Robson; Adrian V S Hill; María C Avila-Arcos; Alexandra Sockell; Julian R Homburger; Genevieve L Wojcik; Kathleen C Barnes; Luisa Herrera; Soledad Berríos; Mónica Acuña; Elena Llop; Celeste Eng; Scott Huntsman; Esteban G Burchard; Christopher R Gignoux; Lucía Cifuentes; Ricardo A Verdugo; Mauricio Moraga; Alexander J Mentzer; Carlos D Bustamante; Andrés Moreno-Estrada Journal: Nature Date: 2020-07-08 Impact factor: 49.962