Literature DB >> 35972964

Correlations between alignment gaps and nucleotide substitution or amino acid replacement.

Tae-Kun Seo1, Benjamin D Redelings2,3,4, Jeffrey L Thorne5,6.   

Abstract

To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.

Entities:  

Keywords:  deletion; gaps; insertion; substitution

Mesh:

Substances:

Year:  2022        PMID: 35972964      PMCID: PMC9407537          DOI: 10.1073/pnas.2204435119

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   12.779


  27 in total

1.  BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.

Authors:  Julie D Thompson; Patrice Koehl; Raymond Ripp; Olivier Poch
Journal:  Proteins       Date:  2005-10-01

2.  An evolutionary model for maximum likelihood alignment of DNA sequences.

Authors:  J L Thorne; H Kishino; J Felsenstein
Journal:  J Mol Evol       Date:  1991-08       Impact factor: 2.395

3.  Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences.

Authors:  Yichen Zheng; Dan Graur; Ricardo B R Azevedo
Journal:  J Mol Evol       Date:  2018-06-28       Impact factor: 2.395

4.  Combining protein evolution and secondary structure.

Authors:  J L Thorne; N Goldman; D T Jones
Journal:  Mol Biol Evol       Date:  1996-05       Impact factor: 16.240

5.  Touring protein space with Matt.

Authors:  Noah M Daniels; Anoop Kumar; Lenore J Cowen; Matt Menke
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011-04-01       Impact factor: 3.710

6.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

7.  The relation between the divergence of sequence and structure in proteins.

Authors:  C Chothia; A M Lesk
Journal:  EMBO J       Date:  1986-04       Impact factor: 11.598

8.  Incorporating indel information into phylogeny estimation for rapidly emerging pathogens.

Authors:  Benjamin D Redelings; Marc A Suchard
Journal:  BMC Evol Biol       Date:  2007-03-14       Impact factor: 3.260

9.  Bayesian coestimation of phylogeny and sequence alignment.

Authors:  Gerton Lunter; István Miklós; Alexei Drummond; Jens Ledet Jensen; Jotun Hein
Journal:  BMC Bioinformatics       Date:  2005-04-01       Impact factor: 3.169

10.  INDELible: a flexible simulator of biological sequence evolution.

Authors:  William Fletcher; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2009-05-07       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.