| Literature DB >> 18402697 |
Francisco M Codoñer1, Shirley O'Dea, Mario A Fares.
Abstract
BACKGROUND: The strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature. In fact, amino acid sites within proteins coevolve due to their functional and/or structural relationships. Different methods have been developed that attempt to account for the evolutionary dependencies between amino acid sites. Researchers have invested a significant effort to increase the sensitivity of such methods. However, the difficulty in disentangling functional co-dependencies from historical covariation has fuelled the scepticism over their power to detect biologically meaningful results. In addition, the biological parameters connecting linear sequence evolution to structure evolution remain elusive. For these reasons, most of the evolutionary studies aimed at identifying functional dependencies among protein domains have focused on the structural properties of proteins rather than on the information extracted from linear multiple sequence alignments (MSA). Non-parametric methods to detect coevolution have been reported to be especially susceptible to produce false positive results based on the properties of MSAs. However, no formal statistical analysis has been performed to definitively test the differential effects of these properties on the sensitivity of such methods.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18402697 PMCID: PMC2362121 DOI: 10.1186/1471-2148-8-106
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Identification of pairs of parsimony-informative coevolving amino acid sites. The figure represents a multiple sequences alignments with the phylogenetic relationships of its constituent sequences. The phylogenetic tree is asymmetric due to the accelerated rates of evolution of one of the sequences (long branch). As a result, most of the evolutionary signal (amino acid variability) in the multiple protein sequence alignment (right hand) is due to amino acid differences at sites contributed by the accelerated sequence. Parsimony-informative coevolving sites (those sites showing at least two different amino acid states with each one represented by at least one sequence of the alignment) are shown, and the variability in their states is colour coded. Non-parsimony informative coevolving sites are also highlighted as example of phylogenetic coevolution.
Figure 2Testing the effect of different parameters in the percentage of positive values (PPV, plots a to c) and sensitivity (SN, plots d to f) of Mutual Information Criterion based methods to detect coevolution when parsimony filtering is applied. We tested the effect on PPV and SN of variations in the size of the multiple sequences alignment (MSAs, with the sizes ranging between 20 and 100 sequences), mean pairwise amino acid distance (with distances ranging between 0.1 and 2 amino acid substitutions per site) and strength of coevolution (the level of coevolution ranged between a minimum of 10% and a maximum of 25%). Coevolution levels indicate the distribution of patterns of coevolution at a particular pairs of amino acid sites. For example levels of 10% indicate that 10% of sequences at that particular pair of sites in the multiple sequence alignment correlate in one amino acid state pattern whereas the 90% remaining correlate in a different state. The strongest levels of coevolution (those showing highest MI values) will be presented by those pairs of sites showing 25% of coevolution strength.
Figure 3Test of the improvement of the percentage of positive values (PPV) detected and sensitivity (SN) when we add filters to the detection of coevolution. The X-axis represents the different filters (1 = parsimony, 2 = hydrophobicity and 3 = molecular weight). Y-axis represents PPV (a) or SN (b). The comparison of each filter against filtering by parsimony was always significant (* indicates P < 0.01).
Figure 4Testing the effect of different parameters in the percentage of positive values (PPV, plots a to c) and sensitivity (SN, plots d to f) of Mutual Information Criterion based methods to detect coevolution when filtering by the correlation in hydrophobicity is applied. We tested the effect on PPV and SN of variations in the size of the multiple sequences alignment (MSAs, with the sizes ranging between 20 and 100 sequences), mean pairwise amino acid distance (with distances ranging between 0.1 and 2 amino acid substitutions per site) and strength of coevolution (the level of coevolution ranged between a minimum of 10% and a maximum of 25%).
Figure 5Testing the effect of different parameters in the percentage of positive values (PPV, plots a to c) and sensitivity (SN, plots d to f) of Mutual Information Criterion based methods to detect coevolution when filtering by the correlation in molecular weight is applied. We tested the effect on PPV and SN of variations in the size of the multiple sequences alignment (MSAs, with the sizes ranging between 20 and 100 sequences), mean pairwise amino acid distance (with distances ranging between 0.1 and 2 amino acid substitutions per site) and strength of coevolution (the level of coevolution ranged between a minimum of 10% and a maximum of 25%).
Amino acid sites in Escherichia coli detected to be important for the function of GroEL.
| Amino acid site | Function Description | Literature |
|---|---|---|
| 30–33, 51–53, 87–91, 150–151, 398, 414–416, 454, 478–481, 493, 495 | ATP/ADP and Mg2+ binding residues (ATP/Mg-B) | [48] |
| 199, 201, 203–204, 234, 237, 259, 263–264 | Substrate binding (SubB) | [48] |
| 230, 238, 241, 257, 260, 261, 265, 268, 270, 271 | Substrate binding(SubB) | [49] |
| 234, 237, 238, 241, 242, 257, 261, 265, 270 | GroES binding (ESB) | [48] |
| 241, 257, 261, 265, 270 | GroES binding (ESB) | [49] |
| 4, 41–42, 58–59, 61, 63, 75–76, 80, 83, 178–179, 188, 196–197, 224–226, 252, 253, 255, 257, 277, 283, 286, 303, 304, 308, 327, 328, 359, 361, 363, 364, 367, 368, 371, 380, 386, 390, 393, 397, 404, 408, 523 | Charged residues exposed to the central cavity in the cis ring, probably contacting substrates (CER) | [50] |
Intra-molecular coevolution analysis in the heat-shock protein GroEL.
| Group of coevolution | |||||
|---|---|---|---|---|---|
| G1 | D11, V94, C458, K171, E172, I379, E191, A251, V254, A293, G306, E354, Q366, V369 | 0.223 | 0.238 | 0.244 | 0.217 |
| G2 | S154, E354 | 0.231 | 0.319 | 0.319 | 0.231 |
| G3 | A312, E354 | 0.324 | 0.319 | 0.439 | 0.204 |
| G4 | D334, A530 | 0.202 | 0.202 | 0.202 | 0.202 |
| G5 | V336, Q432 | 0.473 | 0.394 | 0.624 | 0.243 |
| G6 | M544, G545 | 0.231 | 0.231 | 0.231 | 0.231 |
a Amino acid sites included in each coevolution group. All sites within the same group coevolve between each other. Numbers indicate the position of the site taking as reference the sequence of the bacterium Escherichia coli K12.
b Measure of mean uncertainty for the first site of the coevolving pair.
c Measure of mean uncertainty for the second site of the coevolving pair.
d Measure of the mean joined uncertainty for the pairs of sites within each group
e Mutual Information mean value for all the possible coevolving pairs within the group.
Figure 6Example of the effect of biological filters in coevolution analyses. In this case study we have used the heat-shock protein GroEL, which functional domains are well characterised. The different domains (apical, intermediate and equatorial) are identified in the cartoon representing the linear GroEL sequence in red, green and yellow colours, respectively. G1 to G6 blocks show the sites under coevolution belonging to each coevolution group as coloured bars. Sites belonging to the same group of coevolution are shown in the crystal structure of one of the GroEL subunits (PDB accession number: 1svt.pdb) and are coloured following the same pattern as in the blocks of the groups of coevolution.
Coevolution analysis in the heat-shock protein 90.
| Group of coevolution | |||||
|---|---|---|---|---|---|
| G1 | E244, T458 | 0.4617 | 0.3993 | 0.6336 | 0.2275 |
| G2 | E244, K541 | 0.4617 | 0.5573 | 0.7610 | 0.2581 |
| G3 | E244, Q578 | 0.4617 | 0.3602 | 0.5588 | 0.2631 |
| G4 | V260, A287 | 0.3247 | 0.3585 | 0.4589 | 0.2243 |
| G5 | V260, I520 | 0.3247 | 0.3038 | 0.3605 | 0.2680 |
| G6 | V260, N673 | 0.3247 | 0.3220 | 0.4138 | 0.2328 |
| G7 | A287, K541 | 0.3585 | 0.5573 | 0.6573 | 0.2585 |
| G8 | L450, T688 | 0.5628 | 0.4966 | 0.8183 | 0.2411 |
| G9 | A477, D499 | 0.3524 | 0.2251 | 0.3524 | 0.2251 |
| G10 | A477, N673 | 0.3524 | 0.3220 | 0.4489 | 0.2256 |
| G11 | P482, G555 | 0.3452 | 0.4028 | 0.5027 | 0.2453 |
| G12 | F492, K541 | 0.3942 | 0.5573 | 0.7139 | 0.2376 |
| G13 | I520, G555 | 0.3038 | 0.4028 | 0.4768 | 0.2298 |
a Amino acid sites included in each coevolution group. All sites within the same group coevolve between each other. Numbers indicate the position of the site taking as reference the sequence of Saccharomyces cerevisae.
b Measure of mean uncertainty for the first site of the coevolving pair.
c Measure of mean uncertainty for the second site of the coevolving pair.
d Measure of the mean joined uncertainty for the pairs of sites within each group
e Mutual Information mean value for all the possible coevolving pairs within the group.
Coevolution analysis in the env protein of the Human Immunodeficiency Type 1 virus HIV-1.
| Group of coevolution | |||||
|---|---|---|---|---|---|
| G1 | L21, S640, L641, T723 | 0.5560 | 0.5527 | 0.8858 | 0.2228 |
| G2 | 0.5391 | 0.5510 | 0.8858 | 0.1926 | |
| G3 | T232, S640, L641 | 0.5930 | 0.6066 | 0.8858 | 0.2385 |
a Amino acid sites included in each coevolution group. All sites within the same group coevolve between each other. Numbers indicate the position of the site taking as reference the sequence of Saccharomyces cerevisae.
b Measure of mean uncertainty for the first site of the coevolving pair.
c Measure of mean uncertainty for the second site of the coevolving pair.
d Measure of the mean joined uncertainty for the pairs of sites within each group
e Mutual Information mean value for all the possible coevolving pairs within the group.
Figure 7Representation of the different types of coevolution strengths. In this example we represent a phylogeny of 20 related sequences the amino acid states at each sequence is represented by one of the four different symbols, circles, star, sqaure and triangle. Coevolution in the simulation data set was set at three different strengths: 10% with 10 of the sequences (2 sequences) sharing the same amino acid state (circle for example), a different 10% of sequences was sharing another pattern or amino acid state (triangles), 40% another state (star) and the remaining 40% sharing the final state (square). The same rationale applied to the case of 20% as well as 25% coevolution strengths. These levels of coevolution arer considered low (yielding low Mutual Information values, the case of 10%), medium (greater MI values, 20%) and strong coevolution (the greatest possible MI values, 25%).