| Literature DB >> 19099591 |
Harold W Schranz1, Von Bing Yap, Simon Easteal, Rob Knight, Gavin A Huttley.
Abstract
BACKGROUND: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19099591 PMCID: PMC2639438 DOI: 10.1186/1471-2105-9-550
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Exponentiation of matrices from microbes.
| Nuc | (0.062, 0.59) | (1.4, 4.6) | (7.1e-06, 2e-05) | (7.1e-06, 2e-05) | (7.1e-06, 2e-05) | 0 | 0 | 272 |
| Dinuc 1+2 | (0.082, 1.9) | (4.7, inf) | (0.00012, 0.0032) | (0.00012, inf) | (0.00012, 0.0032) | 0 | 3 | 272 |
| Dinuc 2+3 | (0.19, 3.1) | (3, 1e+02) | (0.00015, 0.0079) | (0.00015, 0.0079) | (0.00015, 0.0079) | 0 | 1 | 272 |
| Trinuc | (3.2e+02, 4.4e+02) | (19, 3.5e+02) | (3.7e+84, 1.5e+136) | (0.22, 0.23) | (0.22, 0.23) | 256 | 0 | 257 |
1 – median and maximum values; 2 – Number of P matrices for the indicated algorithm with an invalid probability; 3 – total number of matrices; inf – an infinite difference, typically arising from an exponentiation error.
Exponentiation of matrices from primate intron sequences.
| Nuc | (0.015, 0.047) | (1.5, 56) | (6.8e-06, 1.7e-05) | (6.8e-06, 1.7e-05) | (6.8e-06, 1.7e-05) | 0 | 0 | 2158 |
| Dinuc 1+2 | (0.074, 0.21) | (10, 3.3e+02) | (0.00016, 0.00058) | (0.00016, 0.00058) | (0.00016, 0.00058) | 0 | 0 | 2158 |
| Dinuc 2+3 | (0.074, 0.2) | (11, 1.2e+03) | (0.00016, 0.00085) | (0.00016, 0.00085) | (0.00016, 0.00085) | 0 | 0 | 2158 |
| Trinuc | (0.24, 0.95) | (1.8e+02, 1.1e+07) | (0.001, 0.017) | (0.001, 7.5) | (0.001, 0.017) | 0 | 188 | 2080 |
1 – median and maximum values; 2 – Number of P matrices for the indicated algorithm with an invalid probability; 3 – total number of matrices.
Figure 1Eigenvector matrix condition number increases with the dimension of the substitution model. Data are from primate introns.
Exponentiation of matrices from primate protein coding exons.
| Dinuc 1+2 | (0.021, 0.16) | (42, 1.4e+04) | (3.1e-05, 0.00044) | (3.1e-05, 0.00044) | (3.1e-05, 0.00044) | 0 | 17 | 206 |
| Dinuc 2+3 | (0.051, 0.18) | (23, 1.1e+04) | (7.1e-05, 0.00041) | (7.1e-05, 0.00041) | (7.1e-05, 0.00041) | 0 | 3 | 206 |
| Trinuc | (2.7e+02, 3.4e+02) | (2.9e+02, 2.3e+04) | (6.4e+65, 3.6e+89) | (0.22, 0.22) | (0.22, 0.22) | 206 | 0 | 206 |
| Codon | (0.15, 0.42) | (4.7e+02, inf) | (0.00049, 0.0029) | (0.0005, 0.087) | (0.00049, 0.0029) | 0 | 62 | 206 |
1 – median and maximum values; 2 – Number of P matrices for the indicated algorithm with an invalid probability; 3 – total number of matrices.
Exponentiation of individual matrices from protein coding exons from a triad of microbial species.
| expPADÉ | Dinuc 1+2 | (0.13, 3.7e+02) | (10, 1.3e+18) | (6.2e-06, 1.2e+137) | (6.2e-06, 86) | (6.2e-06, 0.96) | 85 | 141 | 0 | 5580 |
| Dinuc 2+3 | (0.35, 5.8e+02) | (15, 9.4e+16) | (0.0017, 5.2e+141) | (0.0017, inf) | (0.0017, 0.97) | 89 | 140 | 0 | 5580 | |
| expEIG | Dinuc 1+2 | (0.13, 2.6e+02) | (15, inf) | (1.4e-05, 1.5e+70) | (1.4e-05, 0.96) | (1.4e-05, 0.96) | 86 | 137 | 0 | 5580 |
| Dinuc 2+3 | (0.36, 5.4e+02) | (17, 2e+17) | (0.0026, 1.6e+133) | (0.0025, 1) | (0.0026, 1) | 99 | 56 | 0 | 5580 |
The species were Anabaena variabilis, Anabaena sp. PCC 7120 and Thermosynechococcus elongatus.
1 Fit – the algorithm used for the constrained estimation of Q; 2 – median and maximum values; 3 – Number of P matrices for the indicated algorithm with an invalid probability; 4 – total number of matrices; inf – an infinite difference, typically arising from an exponentiation error.
Exponentiation algorithm compute times.
| Nucleotide | expEIG | 0.2 |
| expTAYL | 0.4 | |
| expPADÉ | 0.2 | |
| Dinucleotide | expEIG | 0.8 |
| expTAYL | 0.6 | |
| expPADÉ | 0.4 | |
| Trinucleotide | expEIG | 11.2 |
| expTAYL | 7.6 | |
| expPADÉ | 3.7 |
Mean compute time (milliseconds) from 100 different matrices.