Literature DB >> 34469581

Measuring Phylogenetic Information of Incomplete Sequence Data.

Tae-Kun Seo1,2, Olivier Gascuel2,3, Jeffrey L Thorne4.   

Abstract

Widely used approaches for extracting phylogenetic information from aligned sets of molecular sequences rely upon probabilistic models of nucleotide substitution or amino-acid replacement. The phylogenetic information that can be extracted depends on the number of columns in the sequence alignment and will be decreased when the alignment contains gaps due to insertion or deletion events. Motivated by the measurement of information loss, we suggest assessment of the effective sequence length (ESL) of an aligned data set. The ESL can differ from the actual number of columns in a sequence alignment because of the presence of alignment gaps. Furthermore, the estimation of phylogenetic information is affected by model misspecification. Inevitably, the actual process of molecular evolution differs from the probabilistic models employed to describe this process. This disparity means the amount of phylogenetic information in an actual sequence alignment will differ from the amount in a simulated data set of equal size, which motivated us to develop a new test for model adequacy. Via theory and empirical data analysis, we show how to disentangle the effects of gaps and model misspecification. By comparing the Fisher information of actual and simulated sequences, we identify which alignment sites and tree branches are most affected by gaps and model misspecification. [Fisher information; gaps; insertion; deletion; indel; model adequacy; goodness-of-fit test; sequence alignment.].
© The Author(s) 2021. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2022        PMID: 34469581      PMCID: PMC9226685          DOI: 10.1093/sysbio/syab073

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   9.160


  46 in total

1.  An evolutionary model for maximum likelihood alignment of DNA sequences.

Authors:  J L Thorne; H Kishino; J Felsenstein
Journal:  J Mol Evol       Date:  1991-08       Impact factor: 2.395

Review 2.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases.

Authors:  S A Benner; D Gerloff
Journal:  Adv Enzyme Regul       Date:  1991

3.  An improved general amino acid replacement matrix.

Authors:  Si Quang Le; Olivier Gascuel
Journal:  Mol Biol Evol       Date:  2008-03-26       Impact factor: 16.240

4.  Phylogenetic information and experimental design in molecular systematics.

Authors:  N Goldman
Journal:  Proc Biol Sci       Date:  1998-09-22       Impact factor: 5.349

5.  MODELTEST: testing the model of DNA substitution.

Authors:  D Posada; K A Crandall
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

6.  The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment.

Authors:  Nicola De Maio
Journal:  Syst Biol       Date:  2021-02-10       Impact factor: 15.683

7.  Information Criteria for Comparing Partition Schemes.

Authors:  Tae-Kun Seo; Jeffrey L Thorne
Journal:  Syst Biol       Date:  2018-07-01       Impact factor: 15.683

8.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.

Authors:  K Tamura; M Nei
Journal:  Mol Biol Evol       Date:  1993-05       Impact factor: 16.240

9.  Cryptic Patterns of Speciation in Cryptic Primates: Microendemic Mouse Lemurs and the Multispecies Coalescent.

Authors:  Jelmer W Poelstra; Jordi Salmona; George P Tiley; Dominik Schüßler; Marina B Blanco; Jean B Andriambeloson; Olivier Bouchez; C Ryan Campbell; Paul D Etter; Paul A Hohenlohe; Kelsie E Hunnicutt; Amaia Iribar; Eric A Johnson; Peter M Kappeler; Peter A Larsen; Sophie Manzi; JosÉ M Ralison; Blanchard Randrianambinina; Rodin M Rasoloarison; David W Rasolofoson; Amanda R Stahlke; David W Weisrock; Rachel C Williams; LounÈs Chikhi; Edward E Louis; Ute Radespiel; Anne D Yoder
Journal:  Syst Biol       Date:  2021-02-10       Impact factor: 15.683

10.  Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy.

Authors:  David A Duchêne; Sebastian Duchêne; Simon Y W Ho
Journal:  Genome Biol Evol       Date:  2018-06-01       Impact factor: 3.416

View more
  1 in total

1.  Correlations between alignment gaps and nucleotide substitution or amino acid replacement.

Authors:  Tae-Kun Seo; Benjamin D Redelings; Jeffrey L Thorne
Journal:  Proc Natl Acad Sci U S A       Date:  2022-08-16       Impact factor: 12.779

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.