| Literature DB >> 23107360 |
Saheli Datta1, Abel Rodriguez, Raquel Prado.
Abstract
BACKGROUND: Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23107360 PMCID: PMC3577475 DOI: 10.1186/1471-2105-13-278
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Stylized representation of our model. Each sub table at the second level of clustering shares a common value for the regression coefficient β. Rows correspond to properties, while columns correspond to sites.
Figure 2Image plots for true values (left panel) and posterior means s (right panel).
Figure 3Marginal posterior probabilities of each pair of columns belonging to the same cluster.
Figure 4Marginal posterior probabilities of each pair of rows belonging to the same cluster for two different clusters of columns.
Figure 5Marginal posterior probabilities of any two properties being in the same cluster for the data simulated under a biological model.
Figure 6Posterior means of s for the three clusters in Figure 5 for the simulated data under a biological model. The sites are sorted according to the increasing value of posterior means.
Figure 7Marginal posterior probabilities of any two sites for the simulated data being grouped together in the first cluster in Figure 5. The sites are sorted according to the increasing value of posterior means of βs.
Comparing results between models in [[10]] and the new semiparametric model, for the data simulated under a biological model
| 30 sites with largest posterior mean
| ||
| 30 sites with lowest posterior mean
|
Sites marked in bold are the ones which are in the region of interest - for h this is where radical changes were encouraged and for Mwhere small changes were encouraged while generating the sequences. Underlined sites are identified by both methods.
Sites identified as significant by TreeSAAP for the different properties for the simulation study based on a biological model
| 5, 59, | 36, 83 | None | ||
| 21, 24, 37, | None | 7, 18, 36, 49, 55 | None | |
| 10, 33, 66 | None | 5, 18, | None | |
| 10, 13, 33, 66 | None | 18, | None | |
| 39, 55, 72 | None | 11, 64, 72 | None |
Values in parentheses denote the cut-off values for the z-test statistic. Sites marked in bold are in the region of interest.
Sites that have high posterior probabilities (>0.95) of belonging to each site class for the different partitions for EvoRadical for the simulated data
| None | None | None | 1, 2, 5, 7, 10, 11, 12, 13, 14, 18, 19, 20, 26, 27, 30, 32, 33, 34, 36, 37, 42, 43, 47, 53, 57, 59, | |
| None | None | None | 2, 7, 9, 18, 19, 20, 22, 27, 31, 32, 36, 38, 53, 55, 61, 62, 64, 67, 72, 74, 86 |
Sites marked in bold are in the region of interest.
List of 32 amino acid properties used in the analysis
| KYTJ820101 | Hydropathy | ∗ | Helical contact area | ||
| GRAR740103 | Molecular volume | ZIMJ680104 | Isoelectric point | ||
| MANP780101 | Surrounding hydrophobicity | OOBM770103 | Long-range non-bonded energy | ||
| ZIMJ680103 | Polarity(Zimmerman) | ∗ | Mean r.m.s. fluctuation displacement | ||
| CHOP780201 | Alpha-helical tendencies | FASG760101 | Molecular weight | ||
| GRAR740102 | Polarity(Grantham) | ∗ | Normalized consensus hydrophobicity | ||
| PONP800108 | Average number of surrounding residues | COHE430101 | Partial specific volume | ||
| ∗ | Power to be at the C-terminal | WOEC730101 | Polar requirement | ||
| GRAR740101 | Composition | ∗ | Power to be at the middle of alpha-helix | ||
| ∗ | Compressibility | ∗ | Power to be at the N-terminal | ||
| FAUJ880113 | Equilibrium constant (ionization of COOH) | MCMT640101 | Refractive index | ||
| CHOP780202 | Beta-structure tendencies | OOBM770102 | Short and medium range non-bonded energy | ||
| ZIMJ680102 | Bulkiness | PONP800107 | Solvent accessible reduction ratio | ||
| ∗ | Buriedness | ∗ | Thermodynamic transfer hydrophobicity | ||
| ∗ | Chromatographic index | OOBM770101 | Total non-bonded energy | ||
| CHAM830101 | Coil tendencies | CHOP780101 | Turn tendencies |
Properties marked by ∗ are from [36].
Figure 8Marginal posterior probabilities of any two properties being in the same cluster for the lysin data.
Figure 9Posterior means s for the four clusters (denoted by representative properties) in Figure 8 for lysin. The sites are sorted according to the increasing value of posterior means.
Figure 10Posterior summaries of s different from zero for sites 82, 99, 120 and 127 in lysin data. The first 4 properties on the x-axis belong to 4 different clusters and the next 2 do not belong to any specific cluster all the time. The vertical lines are 90% posterior intervals of the βs that are different from 0, the medians (filled circles) and the 25 and 75 percentiles (stars) are highlighted.
Strongly conserved sites () for lysin data for different clusters
| 1 | 96 |
| 2 and 3 | 22, 28, 35, 51, 111, 117, 128 |
| | 11, 17, 18, 19, 24, 25, 27, 29, 33, 35, 42, 43, 47, 49, 51, |
| 4 | 53, 58, 64, 66, 68, 69, 71, 73, 79, 81, 88, 94, 96, 98, 100, |
| 101, 104, 105, 110, 111, 114, 115, 117, 121, 122, 129, 131 |