| Literature DB >> 34694387 |
Asif U Tamuri1,2, Mario Dos Reis3.
Abstract
We use first principles of population genetics to model the evolution of proteins under persistent positive selection (PPS). PPS may occur when organisms are subjected to persistent environmental change, during adaptive radiations, or in host-pathogen interactions. Our mutation-selection model indicates protein evolution under PPS is an irreversible Markov process, and thus proteins under PPS show a strongly asymmetrical distribution of selection coefficients among amino acid substitutions. Our model shows the criteria ω>1 (where ω is the ratio of nonsynonymous over synonymous codon substitution rates) to detect positive selection is conservative and indeed arbitrary, because in real proteins many mutations are highly deleterious and are removed by selection even at positively selected sites. We use a penalized-likelihood implementation of the PPS model to successfully detect PPS in plant RuBisCO and influenza HA proteins. By directly estimating selection coefficients at protein sites, our inference procedure bypasses the need for using ω as a surrogate measure of selection and improves our ability to detect molecular adaptation in proteins.Entities:
Keywords: RuBisCO; cytochrome b; distribution of fitness effects; influenza; mutation–selection model; positive selection
Mesh:
Substances:
Year: 2022 PMID: 34694387 PMCID: PMC8760937 DOI: 10.1093/molbev/msab309
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Performance of the LRT for Detecting PPS Sites in Simulated Data after FDR Correction (5%).
| True Model |
|
|
|
|---|---|---|---|
| swMutSel | FPR at 0.05 Significance | ||
|
| 0.066 | 0.066 | 0.066 |
| swMutSel+PPS | TPR at 0.05 Significance | ||
| ( | 0.441 | 0.452 | 0.449 |
| ( | 0.952 | 0.952 | 0.947 |
| ( | 0.965 | 0.963 | 0.960 |
Note.—FPR, false-positive rate; TPR, true-positive rate.
Number of Sites Estimated to be under PPS in Three Real Data Sets.
| Data Set | # Taxa | # Sites | # | ( |
|---|---|---|---|---|
|
|
| |||
| Plant rbcL | 478 | 466 | 65 (55) | 50 (40) |
| Influenza HA | 466 | 589 | 18 (18) | 17 (14) |
| Mammal CYTB | 418 | 407 | 0 (0) | — |
Fig. 1.Analysis of proteins under the PPS mutation–selection model. (A–A″) Estimates of ω at protein sites. (B–B″) Distribution of selection coefficients among nonsynonymous substitutions. (C–C″) Relationship between ω and average selection at protein sites. Sites under PPS () are indicated in red in A–A″ and C–C″, and their contribution to the distribution of selection coefficients indicated in red in B–B″. In C–C′, the solid line is equation (1). The penalty on Z is .
Fig. 2.Relationship between Z and evolutionary parameters for PPS sites in HA and rbcL. (A) Irreversibility index, I, versus Z. The index is normalized to give the expected excess number of substitutions from detailed balance. (B) Site substitution rate, , versus Z. Note the are scaled so that they give the relative rate with respect to a neutral sequence (Tamuri et al. 2014). Thus, if r = 1, then the site evolves at the same rate as, say, a pseudogene. (C) Nonsynonymous rate, ω versus Z. The penalty on Z is in all cases.
Fig. 3.Pattern of amino acid substitution in PPS sites of human influenza (H1N1) HA protein between 1918 and 2009. The penalty on Z is . Each colored dot represents a particular amino acid as indicated in the legend.