Literature DB >> 1435238

Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock.

A Zharkikh1, W H Li.   

Abstract

The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value. Bootstrap estimation is formulated as a two-step sampling procedure: (1) sampling of sequences from the evolutionary process and (2) resampling of the original sequence sample. The probability that a bootstrap resampling of an original sequence sample will support the true tree is found to depend on the model tree, the sequence length, and the probability that a randomly chosen nucleotide site is an informative site. When a trifurcating tree is used as the model tree, the probability that one of the three bifurcating trees will appear in > or = 95% of the bootstrap replicates is < 5%, even if the number of bootstrap replicates is only 50; therefore, the probability of accepting an erroneous tree as the true tree is < 5% if that tree appears in > or = 95% of the bootstrap replicates and if more than 50 bootstrap replications are conducted. However, if a particular bifurcating tree is observed in, say, < 75% of the bootstrap replicates, then it cannot be claimed to be better than the trifurcating tree even if > or = 1,000 bootstrap replications are conducted. When a bifurcating tree is used as the model tree, the bootstrap approach tends to overestimate P when the sequences are very short, but it tends to underestimate that probability when the sequences are long. Moreover, simulation results show that, if a tree is accepted as the true tree only if it has appeared in > or = 95% of the bootstrap replicates, then the probability of failing to accept any bifurcating tree can be as large as 58% even when P = 95%, i.e., even when 95% of the samples from the evolutionary process will support the true tree. Thus, if the rate-constancy assumption holds, bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa.

Mesh:

Year:  1992        PMID: 1435238     DOI: 10.1093/oxfordjournals.molbev.a040782

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  40 in total

1.  Complexity of the simplest phylogenetic estimation problem.

Authors:  Z Yang
Journal:  Proc Biol Sci       Date:  2000-01-22       Impact factor: 5.349

Review 2.  Evolution of genes and taxa: a primer.

Authors:  J J Doyle; B S Gaut
Journal:  Plant Mol Biol       Date:  2000-01       Impact factor: 4.076

Review 3.  SWORDS: a statistical tool for analysing large DNA sequences.

Authors:  Probal Chaudhuri; Sandip Das
Journal:  J Biosci       Date:  2002-02       Impact factor: 1.826

4.  Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics.

Authors:  Yoshiyuki Suzuki; Galina V Glazko; Masatoshi Nei
Journal:  Proc Natl Acad Sci U S A       Date:  2002-11-25       Impact factor: 11.205

5.  Molecular evidence of HIV-1 transmission in a criminal case.

Authors:  Michael L Metzker; David P Mindell; Xiao-Mei Liu; Roger G Ptak; Richard A Gibbs; David M Hillis
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-18       Impact factor: 11.205

6.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: II. Four taxa without a molecular clock.

Authors:  A Zharkikh; W H Li
Journal:  J Mol Evol       Date:  1992-10       Impact factor: 2.395

Review 7.  Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods.

Authors:  Borys Wróbel
Journal:  J Appl Genet       Date:  2008       Impact factor: 3.240

8.  A tandemly repeated DNA family originated from SINE-related elements in the European plethodontid salamanders (Amphibia, Urodela).

Authors:  R Batistoni; G Pesole; S Marracci; I Nardi
Journal:  J Mol Evol       Date:  1995-06       Impact factor: 2.395

9.  Human T-lymphotropic virus type 1 in coastal natives of British Columbia: phylogenetic affinities and possible origins.

Authors:  F J Picard; M B Coulthart; J Oger; E E King; S Kim; J Arp; G P Rice; G A Dekaban
Journal:  J Virol       Date:  1995-11       Impact factor: 5.103

10.  Bacterial community structures of phosphate-removing and non-phosphate-removing activated sludges from sequencing batch reactors.

Authors:  P L Bond; P Hugenholtz; J Keller; L L Blackall
Journal:  Appl Environ Microbiol       Date:  1995-05       Impact factor: 4.792

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.