Literature DB >> 31504938

Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.

Anton Suvorov1, Joshua Hochuli2, Daniel R Schrider2.   

Abstract

Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several "zones" of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. In this study, we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. Although numerous practical challenges remain, these findings suggest that the deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.
© The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Keywords:  Supervised machine learning; convolutional neuronal network; phylogenetics

Mesh:

Year:  2020        PMID: 31504938      PMCID: PMC8204903          DOI: 10.1093/sysbio/syz060

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  40 in total

1.  Rare genomic changes as a tool for phylogenetics.

Authors: 
Journal:  Trends Ecol Evol       Date:  2000-11-01       Impact factor: 17.712

2.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability.

Authors:  Christophe J Douady; Frédéric Delsuc; Yan Boucher; W Ford Doolittle; Emmanuel J P Douzery
Journal:  Mol Biol Evol       Date:  2003-02       Impact factor: 16.240

3.  A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion.

Authors:  Bernhard Misof; Katharina Misof
Journal:  Syst Biol       Date:  2009-05-20       Impact factor: 15.683

4.  Evolutionary trees from DNA sequences: a maximum likelihood approach.

Authors:  J Felsenstein
Journal:  J Mol Evol       Date:  1981       Impact factor: 2.395

5.  A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks.

Authors:  Jeffrey Chan; Valerio Perrone; Jeffrey P Spence; Paul A Jenkins; Sara Mathieson; Yun S Song
Journal:  Adv Neural Inf Process Syst       Date:  2018-12

6.  Accurate phylogenetic tree reconstruction from quartets: a heuristic approach.

Authors:  Rezwana Reaz; Md Shamsuzzoha Bayzid; M Sohel Rahman
Journal:  PLoS One       Date:  2014-08-12       Impact factor: 3.240

7.  Detecting false positive sequence homology: a machine learning approach.

Authors:  M Stanley Fujimoto; Anton Suvorov; Nicholas O Jensen; Mark J Clement; Seth M Bybee
Journal:  BMC Bioinformatics       Date:  2016-02-24       Impact factor: 3.169

8.  Maximum Likelihood Phylogenetic Inference is Consistent on Multiple Sequence Alignments, with or without Gaps.

Authors:  Jakub Truszkowski; Nick Goldman
Journal:  Syst Biol       Date:  2015-11-28       Impact factor: 15.683

Review 9.  Machine learning and its applications to biology.

Authors:  Adi L Tarca; Vincent J Carey; Xue-wen Chen; Roberto Romero; Sorin Drăghici
Journal:  PLoS Comput Biol       Date:  2007-06       Impact factor: 4.475

10.  Alignment Modulates Ancestral Sequence Reconstruction Accuracy.

Authors:  Ricardo Assunção Vialle; Asif U Tamuri; Nick Goldman
Journal:  Mol Biol Evol       Date:  2018-07-01       Impact factor: 16.240

View more
  11 in total

1.  Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data.

Authors:  David A Duchêne; Niklas Mather; Cara Van Der Wal; Simon Y W Ho
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

2.  Harnessing machine learning to guide phylogenetic-tree search algorithms.

Authors:  Dana Azouri; Shiran Abadi; Yishay Mansour; Itay Mayrose; Tal Pupko
Journal:  Nat Commun       Date:  2021-03-31       Impact factor: 14.919

3.  Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks.

Authors:  Alina F Leuchtenberger; Stephen M Crotty; Tamara Drucks; Heiko A Schmidt; Sebastian Burgstaller-Muehlbacher; Arndt von Haeseler
Journal:  Mol Biol Evol       Date:  2020-12-16       Impact factor: 16.240

Review 4.  Current progress and open challenges for applying deep learning across the biosciences.

Authors:  Nicolae Sapoval; Amirali Aghazadeh; Michael G Nute; Dinler A Antunes; Advait Balaji; Richard Baraniuk; C J Barberan; Ruth Dannenfelser; Chen Dun; Mohammadamin Edrisi; R A Leo Elworth; Bryce Kille; Anastasios Kyrillidis; Luay Nakhleh; Cameron R Wolfe; Zhi Yan; Vicky Yao; Todd J Treangen
Journal:  Nat Commun       Date:  2022-04-01       Impact factor: 14.919

5.  phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

Authors:  Nicola De Maio; William Boulton; Lukas Weilguny; Conor R Walker; Yatish Turakhia; Russell Corbett-Detig; Nick Goldman
Journal:  PLoS Comput Biol       Date:  2022-04-29       Impact factor: 4.779

6.  AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.

Authors:  Nhan Ly-Trong; Suha Naser-Khdour; Robert Lanfear; Bui Quang Minh
Journal:  Mol Biol Evol       Date:  2022-05-03       Impact factor: 8.800

7.  Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks.

Authors:  Paul D Blischak; Michael S Barker; Ryan N Gutenkunst
Journal:  Mol Ecol Resour       Date:  2021-03-08       Impact factor: 7.090

Review 8.  Incorporating Machine Learning into Established Bioinformatics Frameworks.

Authors:  Noam Auslander; Ayal B Gussow; Eugene V Koonin
Journal:  Int J Mol Sci       Date:  2021-03-12       Impact factor: 5.923

9.  Evolutionary Sparse Learning for Phylogenomics.

Authors:  Sudhir Kumar; Sudip Sharma
Journal:  Mol Biol Evol       Date:  2021-10-27       Impact factor: 16.240

10.  Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning.

Authors:  Alexander T Xue; Daniel R Schrider; Andrew D Kern
Journal:  Mol Biol Evol       Date:  2021-03-09       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.