Literature DB >> 32653921

The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment.

Nicola De Maio1.   

Abstract

Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The "cumulative indel model" approximates realistic evolutionary indel dynamics using differential equations. "Adaptive banding" reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ($\approx$530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.].
© The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

Entities:  

Year:  2021        PMID: 32653921     DOI: 10.1093/sysbio/syaa050

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  5 in total

1.  Correlations between alignment gaps and nucleotide substitution or amino acid replacement.

Authors:  Tae-Kun Seo; Benjamin D Redelings; Jeffrey L Thorne
Journal:  Proc Natl Acad Sci U S A       Date:  2022-08-16       Impact factor: 12.779

2.  Measuring Phylogenetic Information of Incomplete Sequence Data.

Authors:  Tae-Kun Seo; Olivier Gascuel; Jeffrey L Thorne
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

3.  A Model of Indel Evolution by Finite-State, Continuous-Time Machines.

Authors:  Ian Holmes
Journal:  Genetics       Date:  2020-10-05       Impact factor: 4.562

4.  Maximum likelihood pandemic-scale phylogenetics.

Authors:  Nicola De Maio; Prabhav Kalaghatgi; Yatish Turakhia; Russell Corbett-Detig; Bui Quang Minh; Nick Goldman
Journal:  bioRxiv       Date:  2022-03-22

5.  Tatajuba: exploring the distribution of homopolymer tracts.

Authors:  Leonardo de Oliveira Martins; Samuel Bloomfield; Emily Stoakes; Andrew J Grant; Andrew J Page; Alison E Mather
Journal:  NAR Genom Bioinform       Date:  2022-02-02
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.