| Literature DB >> 16542458 |
Wendy S W Wong1, Raazesh Sainudiin, Rasmus Nielsen.
Abstract
BACKGROUND: Statistical methods for identifying positively selected sites in protein coding regions are one of the most commonly used tools in evolutionary bioinformatics. However, they have been limited by not taking the physiochemical properties of amino acids into account.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16542458 PMCID: PMC1431568 DOI: 10.1186/1471-2105-7-148
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Mixture models of ω and λ
| Model | Site classes* | parameters | constraints | |
| A1 | (i), (ii), (iii), (iv) | 7 | ||
| A2 | (i), (iii) | 4 |
* With regard to the site classes listed in Equation (2)
** P = number of parameters in the ω and γ distributions
Figure 1The 15 taxa tree used in the simulation study.
Summary of the results from simulated data under Model A1. The Likelihood Ratio Tests were performed between Model A1 versus Model A2 using the volume partition at the 5% significance level. The table also shows the percentage of sites predicted to be in each site class.
| Data Set | Partition | % of significant LRT tests* | % of sites in each category | |||
| Set 1 ( | Volume | 100 | 0.00 | 0.00 | 0.01 | |
| Hydrophobicity | 100 | 0.27 | 0.41 | 0.08 | 0.24 | |
| Set 2 ( | Volume | 0 | 0.00 | 0.00 | 0.03 | |
| Hydrophobicity | 100 | 0.08 | 0.11 | 0.17 | 0.64 | |
| Set 3 ( | Volume | 4 | 0.00 | 0.00 | 0.01 | |
| Hydrophobicity | 1 | 0.00 | 0.01 | 0.00 | ||
| Set 4 ( | Volume | 100 | 0.00 | 0.00 | 0.00 | |
| Hydrophobicity | 100 | 0.00 | 0.00 | 0.00 | ||
* Bold numbers indicate the correct category
Figure 2Distribution of the posterior probabilities for the correct site class classification in each of the simulated data sets. Each data set has 100 replicated and they were analyzed using Model A1 and the volume partition.
Log-likelihood Values and Parameter Estimates of the different physicochemical properties for the MHC data.
| Data set/Model | loglikelihood | proportions | |||
| MHC Class I | |||||
| Model A1 | -2463.35 | 2.86 | 0.22, 8.74 | 0.00, 3.47 | Prob ( |
| Prob ( | |||||
| Prob ( | |||||
| Prob ( | |||||
| Model A2 | -2474.61 | 2.84 | 0.23, 9.30 | 0.53 | Prob ( |
| Prob ( | |||||
| Model A1 | -2464.89 | 2.81 | 0.14, 9.94 | 0.12, 5.48 | Prob ( |
| Prob ( | |||||
| Prob ( | |||||
| Prob ( | |||||
| Model A2 | -2505.65 | 2.59 | 0.09, 9.12 | 0.62 | Prob ( |
| Prob ( | |||||
| Model A1 | -2470.99 | 2.75 | 0.00, 5.46 | 0.19, 5.48 | Prob ( |
| Prob ( | |||||
| Prob ( | |||||
| Prob ( | |||||
| Model A2 | -2515.343693 | 2.59 | 0.37,18.77 | 0.65 | Prob ( |
| Prob ( | |||||
| Model A1 | -2470.149113 | 2.93 | 0.17, 6.99 | 0.02, 3.85 | Prob ( |
| Prob ( | |||||
| Prob ( | |||||
| Prob ( | |||||
| Model A2 | -2491.533373 | 2.89 | 0.18,7.75 | 0.61 | Prob ( |
| Prob ( | |||||
Posterior probabilities of being in each site class for the previously identified positively selected sites in the MHC data in [11].
| Previously identified sites | Posterior probabilities in the 4 categories with the same partition | ||||
| ( | ( | ||||
| Volume-altering | 63 | 0.00 | 0.00 | 0.72 | 0.28 |
| 67 | 0.00 | 0.03 | 0.60 | 0.37 | |
| 97 | 0.00 | 0.00 | 0.01 | 0.99 | |
| Polarity | 116 | 0.00 | 0.00 | 0.01 | 0.99 |
| Charge | 45 | 0.00 | 0.28 | 0.00 | 0.72 |
| 114 | 0.00 | 0.39 | 0.00 | 0.61 | |
| 156 | 0.00 | 0.36 | 0.00 | 0.64 | |
Abalone sperm lysin data [23] analyzed with Model A1 and 4 (hydrophobicity, volume, polarity and charge) partitions: sites that have high posterior probabilities in each site class.
| Property | Sites identified | |||
| hydrophobicity | 16, | none | ||
| volume | 13, | 17,25,27,30,40, | none | 11, |
| polarity | 16,18,20,23,26, 28,31,34,35,38, 39,46,48,50,52, 55,56,57,58,59, 60,62,65,66,76, 77,78,84,85,89, 90,91,92,93,94, 95,102,104,112, 118,128,130 | none | 43 | |
| charge | 30,33, | |||
Site listed have posterior probabilities >0.95 being in the indicated site class. Those that are in bold have posterior probabilities >0.99.
Figure 3Lysin crystal structure from the red abalone Haliotis rufescens ([24], PDB ID 1ILS). Sites in color are in the (ω≤1, γ≥1) category. Sites that are blue (68,69,96,129) are from the volume and charge partitions. Sites that are red (17,5,27,30,40,73,80,99,101,114,127,131) are from the volume partition only. Finally, sites that are green (97–98) are from the charge partition only.
Figure 4Lysin crystal structure from the red abalone Haliotis rufescens ([24], PDB ID 1ILS). Sites in color are in the (ω≥1, γ≤1) category. The site that is blue (47) from both the charge and hydrophobicity partitions. Sites that are green (69, 129) are from the hydrophobicity partition only. Sites that are red (30,33,63,64,71,75,79,80,81,99,113,116,121,124,127) are from the charge partition only. Finally, the site that is hot pink (43) is from the polarity partition only.