| Literature DB >> 17288578 |
Le Bao1, Hong Gu, Katherine A Dunn, Joseph P Bielawski.
Abstract
BACKGROUND: Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17288578 PMCID: PMC1796614 DOI: 10.1186/1471-2148-7-S1-S5
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fixed-effect models implemented by Yang and Swanson [4].
| Model code | Parameters for partitions | Number of Parameters |
| A | same branch lengths, | |
| B | different but proportional branch lengths, same | |
| C | different but proportional branch lengths, same | |
| D | different but proportional branch lengths, different | |
| E | different but proportional branch lengths, different | |
| F | different branch lengths, |
The number of parameters is computed under the F3 × 4 method of estimating codon frequencies. b denotes the number of branches in the tree. g denotes the number of site classes. When models employ empirical estimates of each codon frequency (F61 method) the number of model parameters increases by 51 for models with homogenous π's, and by 51 × g for models with heterogeneous π's among partitions.
An expanded set of fixed-effect models.
| Parameters heterogeneous among partitions | |||||
| New code | Number of parameters | ||||
| 1 (E) | Yes | Yes | Yes | Yes | |
| 2 (D) | Yes | Yes | Yes | No | |
| 3 | Yes | Yes | No | Yes | |
| 4 | Yes | Yes | No | No | |
| 5 | Yes | No | Yes | Yes | |
| 6 | Yes | No | Yes | No | |
| 7 | Yes | No | No | Yes | |
| 8 | Yes | No | No | No | |
| 9 | No | Yes | Yes | Yes | |
| 10 | No | Yes | Yes | No | |
| 11 | No | Yes | No | Yes | |
| 12 | No | Yes | No | No | |
| 13 (C) | No | No | Yes | Yes | |
| 14 (B) | No | No | Yes | No | |
| 15 | No | No | No | Yes | |
| 16 (A) | No | No | No | No | |
Number of parameters is for the F3 × 4 method of estimating codon frequencies. b and g denote the number of branches and the number of site classes, respectively. Letters in parentheses indicate the model codes formerly used by Yang and Swanson [4].
Figure 1Relationships among fixed-effect codon models. The most complex model (FE1) is located at the top and a completely homogenous model (FE16) is located at the bottom. Parameters heterogeneous among partitions for a given model are shown after the model name. Lines between models indicate "1-step" differences in complexity among the models.
Accuracy of model selection under backward elimination, AIC and AICc. Letters in parentheses indicate the model codes formerly used by Yang and Swanson [4].
| Backward elimination | |||||
| Model | Heterogeneous parameters | AIC | AICc | ||
| 1 (E) | 100% | 96% | 91.7% | 91.7% | |
| 2 (D) | 92% | 100% | 91.7% | 100.0% | |
| 3 | 61% | 67% | 38.9% | 33.3% | |
| 4 | 94% | 100% | 77.8% | 83.3% | |
| 5 | 88% | 92% | 62.5% | 66.7% | |
| 6 | 79% | 100% | 75.0% | 75.0% | |
| 7 | 67% | 67% | 44.4% | 44.4% | |
| 8 | 83% | 100% | 55.6% | 55.6% | |
| 9 | 71% | 83% | 58.3% | 58.3% | |
| 10 | 79% | 96% | 75.0% | 75.0% | |
| 11 | 50% | 67% | 38.9% | 38.9% | |
| 12 | 89% | 100% | 66.7% | 66.7% | |
| 13 (C) | 83% | 88% | 62.5% | 58.3% | |
| 14 (B) | 63% | 88% | 58.3% | 58.3% | |
| 15 | 56% | 61% | 38.9% | 38.9% | |
| 16 (A) | none | 83% | 100% | 38.9% | 55.6% |
| overall | 78% | 88% | 63% | 64% | |
Likelihood scores, parameter estimates, ΔAIC and ΔAICC scores for the abalone sperm lysin gene under codon models with two fixed partitions.
| Parameter estimates | |||||
| Model | ℓ | ΔAIC (ΔAICC) | |||
| 1 (E) | -4473.88 | ( | 0.3 (6.9) | ||
| 2 (D) | -4532.12 | ( | 98.7 (54.2) | ||
| 3 | -4535.64 | ( | 121.8 (121.8) | ||
| 4 | -4603.44 | ( | 239.4 (190.2) | ||
| 5 | -4486.60 | ( | 23.7 (23.7) | ||
| 6 | -4548.86 | ( | 130.2 (81.0) | ||
| 7 | -4550.36 | ( | 149.2 (142.9) | ||
| 8 | -4622.36 | ( | 275.2 (221.6) | ||
| 9 | -4474.75 | ( | 0 (0) | ||
| 10 | -4532.38 | ( | 97.3 (48.1) | ||
| 11 | -4538.32 | ( | 125.1 (118.8) | ||
| 12 | -4604.91 | ( | 240.3 (186.6) | ||
| 13 (C) | -4490.07 | ( | 28.6 (22.3) | ||
| 14 (B) | -4549.99 | ( | 130.5 (76.8) | ||
| 15 | -4561.36 | ( | 169.2 (156.7) | ||
| 16 (A) | -4627.03 | ( | 282.6 (224.6) | ||
Parameters in parentheses are fixed. Partition 1 contained the buried sites and partition 2 contained the solvent exposed sites. ΔAIC= AIC- min AIC, and ΔAICC= AICC- min AICC.
Likelihood scores, parameter estimates, ΔAIC and ΔAICC scores for genes located in the regions of the Listeria genome encoding putative flagellar related proteins.
| Parameter estimates | |||||
| Model | ℓ | ΔAIC (ΔAICC) | |||
| 1 (E) | -64965.25 | ( | 22.2 (28.2) | ||
| 2 (D) | -65076.39 | ( | 0.5 (0.6) | ||
| 3 | -65000.40 | ( | 88.5 (94.3) | ||
| 4 | -65107.87 | ( | 59.5 (59.5) | ||
| 5 | -65106.46 | ( | 300.7 (306.5) | ||
| 6 | -65219.99 | ( | 283.7 (283.7) | ||
| 7 | -65162.07 | ( | 407.9 (413.5) | ||
| 8 | -65269.12 | ( | 378.0 (377.9) | ||
| 9 | -64969.44 | ( | 26.6 (32.4 | ||
| 10 | -65078.13 | ( | 0 (0) | ||
| 11 | -65007.34 | ( | 98.4 (104.1) | ||
| 12 | -65111.31 | ( | 62.4 (62.4) | ||
| 13 (C) | -65124.90 | ( | 333.6 (339.2) | ||
| 14 (B) | -65235.39 | ( | 310.5 (310.5) | ||
| 15 | -65190.76 | ( | 461.5 (466.8) | ||
| 16 (A) | -65292.90 | ( | 421.5 (421.4) | ||
Parameters in parentheses are fixed. Partition 1 contained genes known to encode components of the flagellar machinery (7973 codons). Partition 2 contained genes encoding proteins with unknown functions (2308 codons). Partition 3 contained genes encoding proteins with non-flagellar functions (1427 codons). ΔAIC= AIC- min AIC, and ΔAICC= AICC- min AICC.
Likelihood scores and parameter estimates obtained under the model selected by backward elimination for subsets of the genes located in the regions of the Listeria genome encoding putative flagellar related proteins.
| Parameter estimates | |||||
| Dataset & numbered partitions | Model | ℓ | |||
| Rapidly-evolving non-flagellar genes [4 genes] | FE9 | -8528.24 | |||
| 1. Chemotaxis regulatory protein | ( | ||||
| 2. Chemotaxis-related sensory protein | |||||
| 3. Cell surface protein | |||||
| 4. Phage-related, similar to transglycosylase | |||||
| Flagellar related genes [24 genes] | FE13 | -42796.43 | |||
| 1. | ( | ||||
| 2. 23 other genes | |||||
Number within square brackets is the number of genes in the dataset. Parentheses indicate a model parameter with a fixed value.
Parameter values used in simulations of two-partition datasets.
| Homogeneous parameter values | Heterogeneous parameter values | |||
| partition 1 | partition 2 | partition 1 | partition 2 | |
| Rates | ||||
| Selection pressure | ||||
| Ts/Tv ratio | ||||
| Codon frequencies | ||||
The lysin gene was used to obtain empirical estimates of codon frequencies, this is indicated in the table by π= lysin.