Literature DB >> 32585030

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.

Shiran Abadi1, Oren Avram2, Saharon Rosset3, Tal Pupko2, Itay Mayrose1.   

Abstract

Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Keywords:  Random Forest for regression; machine learning; model selection; nucleotide substitution models; phylogenetic reconstruction; simulations

Mesh:

Year:  2020        PMID: 32585030     DOI: 10.1093/molbev/msaa154

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  10 in total

1.  Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data.

Authors:  David A Duchêne; Niklas Mather; Cara Van Der Wal; Simon Y W Ho
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

2.  Felsenstein Phylogenetic Likelihood.

Authors:  David Posada; Keith A Crandall
Journal:  J Mol Evol       Date:  2021-01-13       Impact factor: 2.395

3.  Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty.

Authors:  Stephanie J Spielman; Molly L Miraglia
Journal:  BMC Ecol Evol       Date:  2021-11-29

4.  Embracing Green Computing in Molecular Phylogenetics.

Authors:  Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2022-03-02       Impact factor: 16.240

Review 5.  Current progress and open challenges for applying deep learning across the biosciences.

Authors:  Nicolae Sapoval; Amirali Aghazadeh; Michael G Nute; Dinler A Antunes; Advait Balaji; Richard Baraniuk; C J Barberan; Ruth Dannenfelser; Chen Dun; Mohammadamin Edrisi; R A Leo Elworth; Bryce Kille; Anastasios Kyrillidis; Luay Nakhleh; Cameron R Wolfe; Zhi Yan; Vicky Yao; Todd J Treangen
Journal:  Nat Commun       Date:  2022-04-01       Impact factor: 14.919

6.  AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.

Authors:  Nhan Ly-Trong; Suha Naser-Khdour; Robert Lanfear; Bui Quang Minh
Journal:  Mol Biol Evol       Date:  2022-05-03       Impact factor: 8.800

7.  Remarks on phylogeny and molecular variations of criconematid species (Nematoda: Criconematidae) with case studies from Vietnam.

Authors:  Huu Tien Nguyen; Thi Duyen Nguyen; Thi Mai Linh Le; Quang Phap Trinh; Wim Bert
Journal:  Sci Rep       Date:  2022-09-01       Impact factor: 4.996

8.  Human follicular mites: Ectoparasites becoming symbionts.

Authors:  Gilbert Smith; Alejandro Manzano Marín; Mariana Reyes-Prieto; Cátia Sofia Ribeiro Antunes; Victoria Ashworth; Obed Nanjul Goselle; Abdulhalem Abdulsamad A Jan; Andrés Moya; Amparo Latorre; M Alejandra Perotti; Henk R Braig
Journal:  Mol Biol Evol       Date:  2022-06-21       Impact factor: 8.800

Review 9.  Incorporating Machine Learning into Established Bioinformatics Frameworks.

Authors:  Noam Auslander; Ayal B Gussow; Eugene V Koonin
Journal:  Int J Mol Sci       Date:  2021-03-12       Impact factor: 5.923

10.  Evolutionary Sparse Learning for Phylogenomics.

Authors:  Sudhir Kumar; Sudip Sharma
Journal:  Mol Biol Evol       Date:  2021-10-27       Impact factor: 16.240

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.