Literature DB >> 26733454

Prediction of missing sequences and branch lengths in phylogenomic data.

Diego Darriba1, Michael Weiß2, Alexandros Stamatakis3.   

Abstract

MOTIVATION: The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting missing sequences for organisms by using information from genes/partitions that have data for these organisms alleviates the problem and improves phylogenetic accuracy.
RESULTS: We present several algorithms for correcting excessively long branch lengths induced by missing data. We also present methods for predicting/imputing missing sequence data. We evaluate our algorithms by systematically removing sequence data from three empirical and 100 simulated alignments. We then compare the Maximum Likelihood trees inferred from the gappy alignments and on the alignments with predicted sequence data to the trees inferred from the original, complete datasets. The datasets with predicted sequences showed one to two orders of magnitude more accurate branch lengths compared to the branch lengths of the trees inferred from the alignments with missing data. However, prediction did not affect the RF distances between the trees.
AVAILABILITY AND IMPLEMENTATION: https://github.com/ddarriba/ForeSeqs CONTACT: : diego.darriba@h-its.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2016        PMID: 26733454     DOI: 10.1093/bioinformatics/btv768

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Lineage and role in integrative taxonomy of a heterotrophic orchid complex.

Authors:  Craig F Barrett; Mathilda V Santee; Nicole M Fama; John V Freudenstein; Sandra J Simon; Brandon T Sinn
Journal:  Mol Ecol       Date:  2022-07-22       Impact factor: 6.622

2.  Genomic characterization of three marine fungi, including Emericellopsis atlantica sp. nov. with signatures of a generalist lifestyle and marine biomass degradation.

Authors:  Ole Christian Hagestad; Lingwei Hou; Jeanette H Andersen; Espen H Hansen; Bjørn Altermark; Chun Li; Eric Kuhnert; Russell J Cox; Pedro W Crous; Joseph W Spatafora; Kathleen Lail; Mojgan Amirebrahimi; Anna Lipzen; Jasmyn Pangilinan; William Andreopoulos; Richard D Hayes; Vivian Ng; Igor V Grigoriev; Stephen A Jackson; Thomas D S Sutton; Alan D W Dobson; Teppo Rämä
Journal:  IMA Fungus       Date:  2021-08-09       Impact factor: 3.515

3.  Imputing missing distances in molecular phylogenetics.

Authors:  Xuhua Xia
Journal:  PeerJ       Date:  2018-07-24       Impact factor: 2.984

4.  Museomics Clarifies the Classification of Aloidendron (Asphodelaceae), the Iconic African Tree Aloes.

Authors:  Panagiota Malakasi; Sidonie Bellot; Richard Dee; Olwen M Grace
Journal:  Front Plant Sci       Date:  2019-10-15       Impact factor: 5.753

5.  Divergence time estimation of Galliformes based on the best gene shopping scheme of ultraconserved elements.

Authors:  Peter A Hosner; Donna L Dittmann; John P O'Neill; Sharon M Birks; Edward L Braun; Rebecca T Kimball
Journal:  BMC Ecol Evol       Date:  2021-11-22

6.  Maximize Resolution or Minimize Error? Using Genotyping-By-Sequencing to Investigate the Recent Diversification of Helianthemum (Cistaceae).

Authors:  Sara Martín-Hernanz; Abelardo Aparicio; Mario Fernández-Mazuecos; Encarnación Rubio; J Alfredo Reyes-Betancort; Arnoldo Santos-Guerra; María Olangua-Corral; Rafael G Albaladejo
Journal:  Front Plant Sci       Date:  2019-11-11       Impact factor: 5.753

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.