| Literature DB >> 22114984 |
Abstract
BACKGROUND: The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set.Entities:
Mesh:
Year: 2011 PMID: 22114984 PMCID: PMC3314578 DOI: 10.1186/1745-6150-6-60
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Extreme value probability distribution for similarity scores. A standard extreme value distribution (EVD) is shown. In Karlin-Altschul statistics, similarity scores (S) for alignments of random sequences are assumed to follow an EVD. P-values are based on the tail probability, shown as the shaded blue area, corresponding to the probability of observing a similarity score greater than or equal to the observed similarity score (So) for a given alignment of interest.
Figure 2Example phylogeny. A toy phylogeny of six sequences, represented by the letters A-F.
Significant similarity among four artificially constructed sequences.
| Sequence | E | M | P |
|---|---|---|---|
| B | 4e-50 | 1e-17 | 1e-27 |
| E | 4e-31 | 1e-17 | |
| M | 2e-36 | ||
E-values from pairwise BLAST comparisons between four protein sequences (called B, E, M, and P), as reported by the NCBI bl2seq utility (see Appendix 2, Additional File 1 for the sequences).
Model selection scores for four artificially constructed, significantly similar sequences.
| Model | ln Lik | LLR | K | ΔAIC |
|---|---|---|---|---|
| CA (BEMP) | -5411 | 0 | 25 | 0 |
| IA (BE+MP) | -5092 | -319 | 44 | -300 |
| IA (BM+EP) | -5009 | -402 | 44 | -383 |
| IA (BP+EM) | -5038 | -373 | 44 | -354 |
| IA (E+BMP) | -5115 | -296 | 42 | -279 |
| IA (M+BEP) | -5128 | -283 | 42 | -266 |
| IA (P+BEM) | -5060 | -351 | 42 | -334 |
| IA (B+EMP) | -5022 | -389 | 42 | -372 |
Common ancestry is compared with various independent ancestry models for four sequences with highly significant similarity to each other (see Appendix 2, Additional File 1 for the sequences). CA, common ancestry; IA, independent ancestry; LLR, log-likelihood ratio; K, number of parameters in model; AIC, Akaike information criterion. Model selection scores are relative to the common ancestry model; negative scores (LLR and ΔAIC) indicate better models. Independent ancestry model "BE+MP" indicates that B and E are homologous to each other, and that M and P are homologous to each other, yet B and E have an independent ancestry from M and P.
Model selection scores for Koonin and Wolf's artificial data.
| Model | ln marginal lik (+/- SEM) | ln BF |
|---|---|---|
| profile | -7521 (21) | 0 |
| CA | -7646 (20) | 125 |
| Star | -7813 (20) | 292 |
| IA | -8164 (14) | 643 |
Log marginal likelihoods and Bayes factors for Koonin and Wolf's artificial data. Values reported are averages and standard error of the mean (SEM) for the 100 simulated alignments. CA, common ancestry; IA, independent ancestry; BF, Bayes factor
Model selection scores for the universal protein data.
| Model | ln marginal lik | ln BF |
|---|---|---|
| CA (ABE) | -126,713 | 0 |
| IA (AE+B) | -133,602 | 6,889 |
| IA (AB+E) | -134,744 | 8,031 |
| IA (BE+A) | -135,201 | 8,488 |
| IA (ABE_M +M) | -138,899 | 12,186 |
| IA (A+B+E) | -140,578 | 13,865 |
| IA (ABE_H +H) | -140,713 | 14,001 |
| star | -148,883 | 22,170 |
| profile | -151,145 | 24,432 |
Log marginal likelihoods and Bayes factors for the real, universal protein data set.