| Literature DB >> 23935949 |
Klara L Verbyla1, Von Bing Yap, Anuj Pahwa, Yunli Shao, Gavin A Huttley.
Abstract
Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23935949 PMCID: PMC3728303 DOI: 10.1371/journal.pone.0069187
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Rooted (a) and unrooted (b) phylogenetic tree with embeddable (single ) and non-embeddable (multiple ) edges.
Summary of Datasets.
| Data Set | Taxa | Number of Alignments | Sequence Length (bp) | Total Tree length |
| D1: Nuclear protein- coding genes | O,M,H | 8193 | >300 | 1.7081 |
| O,M,R | 8014 | >300 | 1.5622 | |
| H,M,R | 8394 | >300 | 0.6890 | |
| D2: Mitochondrial protein-coding genes | M, H, O | 11 | 67–598 | 3.7267 |
| D3: Primate introns | C, H, Ma | 62 | >50,000 | 0.0763 |
| D4: Microbial protein- coding genes | bad, bas, bba, bbu, bpn, bvu, cjk, dps, eca, ent, kra, lla, lre, mgi, mle, mta, ppe, pth, sma, wsu | 1 | 591–867 | 1.935 |
– C: Chimpanzee, H:Human, M:Mouse, Ma:Macaque, O:Opossum, R:Rat, bad:Bifidobacterium adolescentis, bas: Buchnera aphidicola Sg, bba:Bdellovibrio bacteriovorus, bbu:Borrelia burgdorferi B31, bpn: Candidatus Blochmannia pennsylvanicus, bvu: Bacteroides vulgatus, cjk:Corynebacterium jeikeium, dps:Desulfotalea psychrophila, eca:Pectobacterium atrosepticum, ent:Enterobacter sp. 638, kra:Kineococcus radiotolerans, lla:Lactococcus lactis subsp. lactis IL1403, lre:Lactobacillus reuteri DSM 20016, mgi:Mycobacterium gilvum, mle:Mycobacterium leprae TN, mta:Moorella thermoacetica, ppe:Pediococcus pentosaceus, pth:Pelotomaculum thermopropionicum, sma:Streptomyces avermitilis, wsu:Wolinella succinogenes, – average length from consensus tree , -All possible triads (1140).
Markov Process Assumptions for an Edge.
| Assumption | Continuous | Discrete (BH) |
| Time- Homogeneity | √ | X |
| Reversibility | X | X |
| Stationary | X | X |
| Independent Sites | √ | √ |
Summary of The Two Markov Models.
| Edge | Tested | Mixed Model | Discrete Model |
| 1 | Yes | Continuous | Discrete |
| 2 | No | Discrete | Discrete |
| 3 | No | Discrete | Discrete |
– tested for non-embeddability, – Assumption of local time-homogeneity.
Figure 2Testing scheme.
Non-Embeddability – D1 Human, Rat, Mouse Triad (8394 Alignments).
| STEPS a | ||||||||
| Edge | Codon position | 1 | 2 | 3 | 4 | 5 | NE | NE Processes |
| Human | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 16 | 16 | 0 | 3 | 91 | 107 | 6 (5.6) | |
| Mouse | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | |
| Rat | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | |
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value (percentage of total tests), 1 Alignment failed to find stable estimates.
Non-Embeddability- D4 Microbial Protein Coding Gene (1140 Triads).
| STEPS a | |||||||
| Codon position | 1 | 2 | 3 | 4 | 5 | NE | NE Processes |
| 1 | 0 | 0 | 0 | 0 | 2 | 2 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 574 | 591 | 0 | 470 | 1052 | 1122 | 27 |
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value , 584 Alignments failed to find stable estimates.
Non-Embeddability – D1 Opossum, Mouse, Human Triad (8194 Alignments).
| STEPS | |||||||||
| Edge | Codon position | 1 | 2 | 3 | 4 | 5 | NE | NE Processes | |
| Opossum | 1 | 3 | 3 | 0 | 3 | 4 | 7 | 1 | |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 3 | 73 | 76 | 0 | 40 | 478 | 547 | 40 (7.3) | ||
| Human | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 3 | 24 | 24 | 0 | 14 | 75 | 99 | 12 (12.1) | ||
| Mouse | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 3 | 20 | 20 | 0 | 19 | 87 | 108 | 5 (4.6) | ||
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value (percentage of total tests), 1 Alignment failed to find stable estimates.
Non-Embeddability-D2 (Opossum) Mitochondrial Protein coding genes (11 Alignments).
| STEPS | ||||||||
| Edge | Codon position | 1 | 2 | 3 | 4 | 5 | NE | NE Processes |
| Opossum | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 1 | 1 | 0 | 1 | 8 | 9 | 0 | |
| Mouse | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 2 | 2 | 0 | 0 | 5 | 7 | 1 | |
| Human | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 1 | 1 | 0 | 0 | 8 | 9 | 0 | |
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value (percentage of total tests), 3 Alignments failed to find stable estimates.
Non-Embeddability – D3: Primate Introns Dinucleotide Model (62 alignments).
| STEPS | |||||||
| Edge | 1 | 2 | 3 | 4 | 5 | NE | NE Processes |
| Macaque | 0 | 0 | 0 | 0 | 16 | 16 | 5 (31.3) |
| Human | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Chimpanzee | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value (percentage of total tests).
Phylogenetic reconstruction results.
| Codon Position | ||||
| Species or Gene | 1 | 2 | 3 | Total Possible Tetrads or Alignments |
|
| 135 | 69 | 910 | 4845 |
|
| 85 | 67 | 135 | 4845 |
| Mammalian | 3 | 1 | 8 | 8005c |
– Microbial tetrads for 20 species (numA :N utilization substance protein A,IF2:translation initiation factor IF-2), – Total Number of Tetrads , – Total Number of Alignments.
Figure 3Average difference between matrices produced by the continuous () and discrete () models (i.e. ) for alignments with non-embeddable () matrices which were found to be (a) Non-embeddable or (b) Embeddable using the parametric bootstrap.
Where represents a larger transition probability in (e.g. ) and ▪ indicates a larger transition probability in (e.g. ).
Figure 4GC percentage for the Non-Embeddable () vs Embeddable (▪) Matrices for the Mouse, Human and Opossum Triad at the third codon position.
Non-Embeddability – D1 Opossum, Rat, Mouse Triad (8014 Alignments).
| STEPS | ||||||||
| Edge | Codon position | 1 | 2 | 3 | 4 | 5 | NE | NE Processes |
| Opossum | 1 | 0 | 0 | 0 | 0 | 4 | 4 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 117 | 119 | 0 | 26 | 638 | 777 | 43 (5.5) | |
| Mouse | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 1 | 1 | 0 | 1 | 2 | 2 | 0 | |
| Rat | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | |
Steps to identify Non-embeddability 1. , 2. Negative eigenvalues have odd algebraic multiplicity, 3. Complex eigenvalues occur in non-conjugate pairs, 4. The set of eigenvalues, , lie outside the region in the complex plane, 5. – negative off-diagonals – threshold −0.1, NE = Non-Embeddable, No. rejections of from parametric bootstrap scheme with a p-value (percentage of total tests), 1 Alignment failed to find stable estimates.