| Literature DB >> 24722425 |
Konrad Scheffler1, Ben Murrell2, Sergei L Kosakovsky Pond2.
Abstract
Evolutionary models that make use of site-specific parameters have recently been criticized on the grounds that parameter estimates obtained under such models can be unreliable and lack theoretical guarantees of convergence. We present a simulation study providing empirical evidence that a simple version of the models in question does exhibit sensible convergence behavior and that additional taxa, despite not being independent of each other, lead to improved parameter estimates. Although it would be desirable to have theoretical guarantees of this, we argue that such guarantees would not be sufficient to justify the use of these models in practice. Instead, we emphasize the importance of taking the variance of parameter estimates into account rather than blindly trusting point estimates - this is standardly done by using the models to construct statistical hypothesis tests, which are then validated empirically via simulation studies.Entities:
Mesh:
Year: 2014 PMID: 24722425 PMCID: PMC3983186 DOI: 10.1371/journal.pone.0094534
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Convergence of site-specific and branch-specific parameter estimates with increasing data set size.
A–B and D–E: estimated versus true parameter values for site-specific rate parameters estimated from small (A–B) and large (D–E) simulated data sets using the true tree topologies (A,D) and tree topologies inferred by the neighbor-joining algorithm (B,E). C and F: estimated versus true parameter values for branch-specific rate parameters estimated from small (C) and large (F) simulated data sets. See text for details.
Figure 2Degree of normality depends on rate and branch length.
Each point on the scatter plot depicts the p-value of a Kolmogorov Smirnov test for the normality of the maximum likelihood parameter estimates. The p-value distribution corresponding to any particular rate or branch length value can be evaluated visually by considering a vertical slice through the plot. When MLEs are normally distributed, p-values will be uniformly distributed. The red curve displays the (kernel weighted) local average of the p-values, which should be near 0.5 when normality is achieved, and lower when it is rejected. For some range of true parameter values, normality is achieved for both sites (A, using the true tree topologies, and B, using tree topologies inferred by the neighbor-joining algorithm) and branches (C). However, at lower rates and shorter branch lengths, the KS test identifies systematic departures from normality, indicating that the effective sample size is likely too small for the asymptotic distribution to be reached.