| Literature DB >> 25260628 |
Abstract
BACKGROUND: The study of discrete characters is crucial for the understanding of evolutionary processes. Even though great advances have been made in the analysis of nucleotide sequences, computer programs for non-DNA discrete characters are often dedicated to specific analyses and lack flexibility. Discrete characters often have different transition rate matrices, variable rates among sites and sometimes contain unobservable states. To obtain the ability to accurately estimate a variety of discrete characters, programs with sophisticated methodologies and flexible settings are desired.Entities:
Mesh:
Year: 2014 PMID: 25260628 PMCID: PMC4261585 DOI: 10.1186/1471-2105-15-320
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
DiscML estimates from the gene family data in the Bacillaceae (B1, B2, B3) clades
| Models | Parameters | B1 | B2 | B3 |
|---|---|---|---|---|
| ER |
| 3.073 | 0.677 | 0.540 |
| (1s/0s only) | Ln | -15150 | -16467 | -22229 |
| ER+0 |
| 1.887 | 0.463 | 0.388 |
| (1s/0s only) | Ln | -13682 | -15268 | -21207 |
| BDER |
| 2.490 | 0.590 | 0.485 |
| Ln | -20901 | -22196 | -29127 | |
| BDISYM |
| 2.669 | 0.556 | 0.438 |
| Ln | -19684 | -20973 | -27811 | |
| BDARD |
| 5.746 | 1.369 | 1.450 |
| Ln | -18254 | -20073 | -26578 | |
| ER |
| 2.940 | 0.638 | 0.459 |
| Ln | -21411 | -23273 | -31405 | |
| SYM |
| 2.635 | 0.546 | 0.427 |
| Ln | -19615 | -20947 | -27801 | |
| ARD |
| 5.601 | 1.345 | 1.314 |
| Ln | -18143 | -19678 | -26239 | |
| GTR |
| 3.731 | 0.739 | 0.632 |
| (SYM+ | Ln | -17753 | -19337 | -25381 |
| ER+0 |
| 2.339 | 0.531 | 0.395 |
| Ln | -20595 | -22586 | -30753 | |
|
ER+ |
| 2.935 | 0.624 | 0.454 |
| Ln | -20070 | -21783 | -28771 | |
|
ER+ |
| 3.205 | 0.638 | 0.459 |
| Ln | -21398 | -23273 | -31405 | |
|
ER+0+ |
| 1.358 | 0.236 | 0.240 |
| Ln | -18719 | -19960 | -26712 | |
|
ER+0+ |
| 3.630 | 0.379 | 0.387 |
| Ln | -16839 | -17960 | -23398 |
The parameter μ is the estimated evolutionary rate of the characters. “1s/0s only” indicates binary analysis by converting all non-zero characters to 1s using simplify=TRUE, ‘+0’ indicates the correction for unobservable data using zerocorrection=TRUE, ‘+ Γ’ indicates the implementation of a discrete Γ distribution using alpha=TRUE, ‘+ π’ indicates the estimation of prior root probabilities using rootprobability=TRUE, ‘+ π REV’ indicates the estimation of prior root probabilities with forced reversibility using rootprobability=TRUE and reversible=TRUE.
Figure 1Phylogenetic relationship of three Bacillaceae (B1, B2, B3) clades, on which the evolutionary rates of gene families are estimated using DiscML. A, a constant rate is estimated on each phylogeny; B, separate rates are estimated for external branches ( μ 1) versus internal branches ( μ 2) on each phylogeny. These three clades were studied in our previous study on gene presence, absence, and fragments [20]. Gene families are recategorized, with gene absence and fragments as character state 0, single-copy genes as 1, and gene families with two or more members as 2.
Computational time on an Intel Core i7 (3.4 Ghz) 16 GB RAM Dell desktop to generate the results in Table 1
| Models | B1(5453) | B2(5614) | B3(6813) |
|---|---|---|---|
| ER (1s/0s only) | 0 m 49 s | 1 m 00 s | 1 m 26 s |
| ER+0 (1s/0s only) | 1 m 39 s | 2 m 01 s | 3 m 03 s |
| BDER | 0 m 48 s | 1 m 06 s | 1 m 36 s |
| BDISYM | 1 m 58 s | 2 m 20 s | 3 m 01 s |
| BDARD | 7 m 54 s | 6 m 58 s | 8 m 28 s |
| ER | 1 m 04 s | 1 m 15 s | 1 m 17 s |
| SYM | 3 m 14 s | 4 m 47 s | 5 m 31 s |
| ARD | 9 m 53 s | 9 m 12 s | 16 m 59 s |
|
GTR(SYM+ | 9 m 04 s | 9 m 54 s | 11 m 44 s |
| ER+0 | 1 m 36 s | 2 m 34 s | 2 m 21 s |
|
ER+ | 2 m 41 s | 3 m 13 s | 4 m 40 s |
|
ER+ | 12 m 00 s | 39 m 01 s | 45 m 23 s |
|
ER+0+ | 82 m 22 s | 81 m 20 s | 178 m 27 s |
|
ER+0+ | 80 m 13 s | 67 m 33 s | 91 m 42 s |
The number of gene families is shown in parentheses for each clade. The time is shown in minutes (m) and seconds (s).
Separate rates on branches estimated from the gene family data in the Bacillaceae (B1, B2, B3) clades
| Models | Parameters | B1 | B2 | B3 |
|---|---|---|---|---|
| ( |
| 2.940 | 0.638 | 0.459 |
| Ln | -21411 | -23273 | -31405 | |
| ( |
| 4.430 | 0.674 | 0.477 |
|
| 0.306 | 0.526 | 0.344 | |
| Ln | -21045 | -23267 | -31395 | |
| 2 | 732 ∗∗∗ | 14 ∗∗∗ | 20 ∗∗∗ |
μ 1is for external branches, while μ 2 is for internal branches on each tree as illustrated in Figure 1B.
∗∗∗ P<0.001 (df=1), as 2 ΔLnL approximately follows a χ 2-distribution.
Figure 2Phylogenetic relationship of the yeast strains in the complex, on which the rates of mitochondrial intron gain and loss are estimated using DiscML. The phylogeny was reconstructed using the concatenated sequences of all mitochondrial protein genes after excluding the var1 gene.
Figure 3Plot of individual turnover rates of the 17 mitochondrial introns in yeast. Ten introns in the cox1 gene are shown as sites 1-10, six introns in the cob gene are shown as sites 11-16, and one intron in the 21S rRNA gene is shown as site 17.