| Literature DB >> 20551039 |
Romain A Studer1, Marc Robinson-Rechavi.
Abstract
Functional divergence between homologous proteins is expected to affect amino acid sequences in two main ways, which can be considered as proxies of biochemical divergence: a "covarion-like" pattern of correlated changes in evolutionary rates, and switches in conserved residues ("conserved but different"). Although these patterns have been used in case studies, a large-scale analysis is needed to estimate their frequency and distribution. We use a phylogenomic framework of animal genes to answer three questions: 1) What is the prevalence of such patterns? 2) Can we link such patterns at the amino acid level with selection inferred at the codon level? 3) Are patterns different between paralogs and orthologs? We find that covarion-like patterns are more frequently detected than "constant but different," but that only the latter are correlated with signal for positive selection. Finally, there is no obvious difference in patterns between orthologs and paralogs.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20551039 PMCID: PMC2955734 DOI: 10.1093/molbev/msq149
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FTree topologies studied. Boxes represent speciation events and circles represent duplication events. The branches tested are the 3R genome duplication at the basis of teleost fishes (A), the speciation between fishes and tetrapods (B), the 1R/2R genome duplications at the origin of vertebrates (C), and the speciation between insects and chordates (D).
FCorrelation of sites under positive selection with covarions and CBDs. (A) The histogram presents the values of the chi-square score per site for covarions. The curves are the chi-square values for different BEB intervals (posterior probability for a site to be under positive selection): black for BEB < 50%, blue for BEB < 95%, green for 99% < BEB, and red for BEB ≥ 99%. (B) The histogram represents the values of BAD scores per site for CBD. The curves are the BAD values for different BEB intervals: black for BEB < 50%, blue for BEB < 95%, green for 99% < BEB, and red for BEB ≥ 99%. The dashed curve represents BAD scores of nearly neutral simulated sequences. The dashed line at 3.5 is the 99th percentile of BAD scores for BEB < 50%.
FPercentage of CBDs per families under RAS model and real data. Histograms of proportion of CBD sites with BAD score >3.5 per subtree in data simulated under a neutral model of RAS (A) and in real data (B). The dashed line at 3.9% is the 99th percentile under the neutral model.
Evaluation of the Accuracy and Power of the Test for Covarions.
| Event Studied | Number of Families | Average Number of Taxa | Average Sequences Length | Accuracy under Neutral Evolution | Power under Covarion Process |
| (=1 − Percentage Covarion) (%) | (=Percentage Covarion) (%) | ||||
| Duplication 3R | 2,745 | 9.1 | 390.7 | 99.6 | 75.2 |
| Speciation tetrapods–fishes | 20,120 | 16.2 | 353.8 | 99.4 | 94.5 |
| Duplication 2R | 5,070 | 31.3 | 229.5 | 99.4 | 99.2 |
| Speciation chordates–insects | 6,170 | 28.0 | 254.9 | 99.2 | 94.5 |
Five simulations per set of parameters derived from one family of real data.
Results of the Detection of Covarions.
| Event Studied | Number of Families | Average Number of Sequences | Average Number of Sites | Families without Significant | Families with Significant | |||
| Number of Families (%) | Mean Branch Length | Number of Families (%) | Percentage of Covarions Sites (%) | Mean Branch Length | ||||
| Duplication 3R | 549 | 9.1 | 391.8 | 547 (99.6) | 0.134 | 2 (0.4) | 3.2 | 0.201 |
| Speciation tetrapods–fishes | 4,024 | 16.2 | 355.2 | 3991 (99.2) | 0.247 | 33 (0.8) | 3.5 | 0.445 |
| Duplication 2R | 1,014 | 31.3 | 231.0 | 928 (91.5) | 0.290 | 86 (8.5) | 4.8 | 0.522 |
| Speciation chordates–insects | 1,234 | 28.0 | 256.2 | 1161 (94.1) | 0.426 | 73 (5.9) | 4.4 | 0.667 |
P value = 1% and Q value threshold at 10%.
In amino acid substitutions.
Results of the Detection of CBD.
| Event Studied | Number of Families | Average Number of Sequences | Average Number of Sites | Families without Significant | Families with significant | |||
| Number of Families (%) | Mean Branch Length | Number of Families (%) | Percentage of BADASP Sites (%) | Mean Branch Length | ||||
| Duplication 3R | 549 | 9.1 | 391.8 | 549 (100) | 0.134 | 0 (0) | NA | NA |
| Speciation tetrapods–fishes | 4,024 | 16.2 | 355.2 | 4003 (99.5) | 0.245 | 21 (0.5) | 9.7 | 0.900 |
| Duplication 2R | 1,014 | 31.3 | 231.0 | 1010 (99.6) | 0.305 | 4 (0.4) | 12.7 | 1.592 |
| Speciation chordates–insects | 1,234 | 28.0 | 256.2 | 1217 (98.6) | 0.430 | 17 (1.4) | 10.3 | 1.149 |
Cutoff at 4%, based on simulation data, and Q value threshold at 10%.
In amino acid substitutions.
Effect of Potential Confounding Factors on the Detection of Covarions.
| Variable | Duplication 3R | Speciation Tetrapods–Fishes | Duplication 2R | Speciation Vertebrates–Insects | ||||
| Variance Explained (%) | Variance Explained (%) | Variance Explained (%) | Variance Explained (%) | |||||
| Number of genes | ||||||||
| Number of sites | 0 | 1.5 × 10−01 | 0 | 2.1 × 10−01 | 0 | 8.8 × 10−01 | 0 | 7.2 × 10−01 |
| Branch length separating subtrees | ||||||||
| Number of branches in subtree alpha | 0 | 8.3 × 10−01 | 0 | 2.9 × 10−01 | ||||
| Sum of branch lengths in subtree alpha | ||||||||
| Number of branches in subtree beta | NA | NA | 0 | 6.1 × 10−01 | 0 | 3.7 × 10−01 | NA | NA |
| Sum of branch lengths in subtree beta | ||||||||
| Median_diff | 0 | 1.3 × 10−02 | ||||||
| Residuals | 74 | 72 | 79 | 84 | ||||
NOTE.—NA, non-available.
Italic values indicate significant after a Bonferroni correction (α = 0.05/4 = 0.0125).
The values NA were removed from the analysis of variance because the number of branches in the subtree beta is identical to the number of branches in the subtree alpha.
Median_diff represents the difference between the medians of all branch lengths for both trees.
Effect of Potential Confounding Factors on the Detection of CBD Sites.
| Variable | Duplication 3R | Speciation Tetrapods–Fishes | Duplication 2R | Speciation Vertebrates–Insects | ||||
| Variance Explained (%) | Variance Explained (%) | Variance Explained (%) | Variance Explained (%) | |||||
| Number of genes | ||||||||
| Number of sites | 0 | 5.3 × 10−01 | 0 | 2.1 × 10−02 | 0 | 5.2 × 10−01 | 0 | 4.6 × 10−01 |
| Branch length separating subtrees | ||||||||
| Number of branches in subtree alpha | 0 | 9.6 × 10−01 | 0 | 8.0 × 10−01 | 0 | 5.7 × 10−01 | 0 | 2.4 × 10−01 |
| Sum of branch lengths in subtree alpha | 0 | 2.3 × 10−01 | 0 | 6.6 × 10−02 | 0 | 2.6 × 10−02 | 0 | 5.7 × 10−01 |
| Number of branchesin subtree beta | NA | NA | 0 | 6.8 × 10−01 | 0 | 2.1 × 10−01 | NA | NA |
| Sum of branch lengths in subtree beta | 0 | 9.0 × 10−02 | 0 | 2.4 × 10−01 | ||||
| Median_diff | 0 | 5.2 × 10−01 | ||||||
| Residuals | 44 | 44 | 59 | 55 | ||||
NOTE.—NA, non-available.
Italic values indicates significant after a Bonferroni correction (α = 0.05/4 = 0.0125).
The values NA were removed from the ANOVA because the number of branches in the subtree beta is identical to the number of branches in the subtree alpha.
Median_diff represents the difference between the medians of all branch lengths for both trees.