| Literature DB >> 35166775 |
Vivak Soni1, Adam Eyre-Walker1.
Abstract
The rate of amino acid substitution has been shown to be correlated to a number of factors including the rate of recombination, the age of the gene, the length of the protein, mean expression level, and gene function. However, the extent to which these correlations are due to adaptive and nonadaptive evolution has not been studied in detail, at least not in hominids. We find that the rate of adaptive evolution is significantly positively correlated to the rate of recombination, protein length and gene expression level, and negatively correlated to gene age. These correlations remain significant when each factor is controlled for in turn, except when controlling for expression in an analysis of protein length; and they also generally remain significant when biased gene conversion is taken into account. However, the positive correlations could be an artifact of population size contraction. We also find that the rate of nonadaptive evolution is negatively correlated to each factor, and all these correlations survive controlling for each other and biased gene conversion. Finally, we examine the effect of gene function on rates of adaptive and nonadaptive evolution; we confirm that virus-interacting proteins (VIPs) have higher rates of adaptive and lower rates of nonadaptive evolution, but we also demonstrate that there is significant variation in the rate of adaptive and nonadaptive evolution between GO categories when removing VIPs. We estimate that the VIP/non-VIP axis explains about 5-8 fold more of the variance in evolutionary rate than GO categories.Entities:
Keywords: adaptive evolution; chimpanzees; gene age; humans; recombination rate
Mesh:
Year: 2022 PMID: 35166775 PMCID: PMC8882387 DOI: 10.1093/gbe/evac028
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.Estimates of ωa and ωna plotted against the (a) mean RR, (b) gene age, (c) mean gene expression, and (d) mean protein length. The respective significance of each correlation is shown in the plot legend, (*P < 0.05; **P < 0.01; ***P < 0.001; “.” 0.05 ≤ P < 0.10 for ωa and ωna). Also shown is the line of best fit through the data. An unweighted regression is fitted to the estimates of ωa and ωna.
The Correlation between Gene Age, Gene Expression, Protein Length, and RR
| Gene Expression | Protein Length | RR | CV | CV of Near Modal Values | |
|---|---|---|---|---|---|
| Gene age | 0.87 | 0.86 | −0.62 | 1.4 | 0.38 |
| Gene expression | 0.44 | −0.035 | 1.5 | 0.41 | |
| Protein length | 0.10 | 1.7 | 0.50 | ||
| RR | 1.1 | 0.33 |
Note.—Logs were taken of all variables. The CV column is the coefficient of variation of the factor for all the data. The final column is the CV of the restricted data (i.e., when we control for the factor in question by restricting the analysis to genes with the modal value ±0.5 SDs).
P < 0.05, **P < 0.01, ***P < 0.001.
The Observed Correlation between Y and X Controlling for a Covariate, Z, and the Observed and Predicted Correlation between Y and X Assuming the Relationship Is Solely due to the Correlation between Each Variable and a Third Factor Z
| Y Variate | X Variate | Observed | Z Variate | Observed | Predicted | Predicted/Observed>1 |
|---|---|---|---|---|---|---|
| ωa | RR | 0.74 | Age | 0.25 | 0.15 | 0 |
| ωa | RR | 0.74 | Length | 0.43 | 0.086 | 0 |
| ωa | Age | −0.40 | RR | −0.58 | −0.093 | 0.02 |
| ωa | Expression | 0.64 | Length | 0.00 | 0.38 | 0.03 |
| ωa | Length | 0.60 | RR | 0.64 | 0.091 | 0 |
| ωa | Length | 0.60 | Expression | 0.25 | 0.37 | 0.13 |
| ωna | RR | −0.73 | Length | −0.54 | −0.34 | 0 |
| ωna | Age | −0.91 | Expression | −0.76 | −0.76 | 0 |
| ωna | Age | −0.91 | Length | −0.87 | −0.75 | 0 |
| ωna | Expression | −0.98 | Age | −0.74 | −0.90 | 0 |
| ωna | Expression | −0.98 | Length | −0.61 | −0.95 | 0.01 |
| ωna | Length | −0.94 | RR | −0.91 | −0.42 | 0 |
| ωna | Length | −0.94 | Age | −0.49 | −0.88 | 0 |
| ωna | Length | −0.94 | Expression | −0.71 | −0.89 | 0 |
Note.—The final column gives the proportion of 100 bootstrap replicates in which the predicted correlation divided by the observed correlation is greater than 1—that is, the predicted correlation is larger in magnitude.
P < 0.05, **P < 0.01, ***P < 0.001.
Fig. 2.Estimates of ωa (top) and ωna (bottom) for GO categories that contain >100 VIP and non-VIP genes.
Top Ten GO Categories, Ranked by Rate of Adaptive Substitution
| GO Category | ωa | ωa 95% CIs |
|---|---|---|
| Ubiquitin protein ligase binding | 0.0843 | 0.0702–0.0995 |
| Protein kinase binding | 0.0804 | 0.0698–0.0914 |
| Sequence-specific DNA binding | 0.0735 | 0.0633–0.0842 |
| DNA-binding transcription factor activity | 0.0719 | 0.0628–0.0812 |
| Transcription factor complex | 0.0682 | 0.0496–0.0883 |
| Transcription by RNA polymerase II | 0.0673 | 0.0518–0.0836 |
| Negative regulation of apoptotic process | 0.0671 | 0.0552–0.0796 |
| Chromatin organization | 0.0669 | 0.0567–0.0775 |
| DNA-binding transcription activator activity | 0.0649 | 0.0524–0.078 |
| Transcription coactivator activity | 0.0648 | 0.0519–0.0786 |
Top Ten GO Categories, Ranked by Rate of Nonadaptive Substitution
| GO Category | ωna | ωna 95% CIs |
|---|---|---|
| Immune system process | 0.297 | 0.283–0.310 |
| Innate immune response | 0.264 | 0.248–0.279 |
| Chromosome | 0.262 | 0.249–0.274 |
| Protein C-terminus binding | 0.246 | 0.228–0.264 |
| Centrosome | 0.243 | 0.232–0.253 |
| DNA repair | 0.236 | 0.223–0.249 |
| Signal transduction | 0.225 | 0.219–0.231 |
| Neutrophil degranulation | 0.218 | 0.206–0.229 |
| Extracellular region | 0.217 | 0.211–0.223 |
| Proteolysis | 0.204 | 0.195–0.214 |
Fig. 3.Correlation between the log of the mean strength of selection against deleterious mutations and (a) gene age, (b) RR, (c) gene length, and (d) gene expression. A linear regression has been fitted to each data set.