Literature DB >> 12519899

Factors affecting the errors in the estimation of evolutionary distances between sequences.

D C Hoyle1, P G Higgs.   

Abstract

Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.

Mesh:

Substances:

Year:  2003        PMID: 12519899     DOI: 10.1093/oxfordjournals.molbev.a004230

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  2 in total

1.  Family-Joining: A Fast Distance-Based Method for Constructing Generally Labeled Trees.

Authors:  Prabhav Kalaghatgi; Nico Pfeifer; Thomas Lengauer
Journal:  Mol Biol Evol       Date:  2016-07-19       Impact factor: 16.240

2.  VNTR polymorphism in the breakpoint region of ABL1 and susceptibility to bladder cancer.

Authors:  Min-Hye Kim; Gi-Eun Yang; Mi-So Jeong; Jeong-Yeon Mun; Sang-Yeop Lee; Jong-Kil Nam; Yung Hyun Choi; Tae Nam Kim; Sun-Hee Leem
Journal:  BMC Med Genomics       Date:  2021-05-05       Impact factor: 3.063

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.