| Literature DB >> 29846678 |
Ravi Patel1,2, Laura B Scheinfeldt1,2,3, Maxwell D Sanderford1, Tamera R Lanham1, Koichiro Tamura4, Alexander Platt1,2,5, Benjamin S Glicksberg6, Ke Xu6, Joel T Dudley6, Sudhir Kumar1,2,7.
Abstract
The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.Entities:
Mesh:
Year: 2018 PMID: 29846678 PMCID: PMC6063297 DOI: 10.1093/molbev/msy107
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Known Adaptive Missense Polymorphisms and Their Candidate Adaptive Polymorphism (CAP) Status with Empirical Probability (Pneu).
| Protein | SNP Identifier | CAP? | |
|---|---|---|---|
| ALMS1 | rs10193972 | Yes | <0.02 |
| rs2056486 | Yes | <0.02 | |
| rs3813227 | Yes | <0.02 | |
| rs6546837 | Yes | <0.02 | |
| rs6546838 | Yes | <0.02 | |
| rs6546839 | Yes | <0.02 | |
| rs6724782 | Yes | <0.02 | |
| APOL1 | rs73885319 | No | n/a |
| DARC | rs12075 | Yes | <0.02 |
| EDAR | rs3827760 | Yes | <0.03 |
| G6PD | rs1050828 | Marginal | n/a |
| rs1050829 | Yes | <0.03 | |
| HBB | rs334 | Marginal | n/a |
| MC1R | rs1805007 | No | n/a |
| rs1805008 | No | n/a | |
| rs885479 | Yes | <0.03 | |
| SLC24A5 | rs1426654 | Yes | <0.02 |
| SLC45A2 | rs16891982 | Yes | <0.02 |
| TLR4 | rs4986790 | Yes | <0.04 |
| rs4986791 | Marginal | n/a | |
| TLR5 | rs5744174 | No | n/a |
| TRPV6 | rs4987657 | Yes | <0.01 |
| rs4987667 | Yes | <0.01 | |
| rs4987682 | Yes | <0.01 |
Note.—A candidate adaptive polymorphism (CAP) is an amino acid polymorphism with the evolutionary probability (EP)< 0.05 and population allele frequency (AF)> 5%. n/a marks alleles for which at least one of these two conditions was not met. Supplementary table 1, Supplementary Material online, presents more details on each of these polymorphisms and the source references. Marginal status is given to alleles with EP< 0.05 and global AF> 2%.
. 1.Evolutionary Probability Approach. The evolutionary probabilities (EPs) and their application to discover candidate adaptive polymorphisms (CAPs). (a) Timetree of 46 vertebrates (Hedges et al. 2015), which was used along with alignments of orthologous amino acid sequences for all human proteins (Kent et al. 2002) to compute the probability of observing each amino acid residue at a given position. Under neutral theory, we expect a strong relationship between EP and allele frequency (AF) such that evolutionarily unexpected alleles (EP < 0.05) will be rare. (b) Relationship between EP and AF. Average EP (y axis) was calculated for 0.05 sized AF bins (x axis) for all polymorphic missense alleles in the 1000 Genomes Project Phase 3 whole genome sequencing data, which confirms the general relationship between EP and AF to be consistent with neutral expectations. The standard deviation is visualized with gray lines (averages are in blue), which is expected to be large because contemporary AFs are a product of time of origin, natural selection, and genetic drift experienced by a mutation. (c) Distribution of empirical P values (–log10) generated from the empirical framework (AF | EP < 0.05). The cutoff used to identify CAPs is shown with a dashed red line and is more extreme than a false positive rate of 0.05.
. 2.Chromosomal distribution of CAPs. (a) The distribution of candidate adaptive alleles (CAPs) across autosomal chromosomes (red points). Chromosomal banding patterns are also visualized for reference. (b) A plot of −log10(Pneu) generated from the Evolutionary Probability Approach (y axis) against chromosome position (x axis) for the MHC region of chromosome 6. CAPs are shaded red and non-CAPs are shaded gray. The CAP Pneu cutoff is shown with a dashed red line. Notable HLA genes with >20 CAPs are indicated.
. 3.Properties of candidate adaptive alleles. Distribution of all (red bars) and phenotype-associated (pink bars) (a) CAP counts across proteins, and (b) number of CAPs found per amino acid position in each protein coding gene.
. 4.Functional distribution of CAPs. The top 75 GO-slim biological process categories with the most CAPs per amino acid position (red bars). The number of proteins found in each biological process annotation is in parentheses. The number of CAPs found in each biological process is listed next to the corresponding bar. Additional information for all PANTHER GO-slim biological process categories can be found as a Supplementary Material.
. 5.Selection model fits to observed CAPs. Site frequency spectra (SFS) for SNPs with AF > 5%. Site frequency spectra (SFS) were scaled to have the same number of sites for AF > 5%. Black bars represent all EP < 0.05 alleles observed in 1000G Phase 3 individuals. (a) Observed and fitted SFS for all candidate adaptive polymorphisms (CAPs). A neutral model (blue) does not explain the preponderance of alleles found at very high AF, and does not fit the observed data well (lnL= −4,124). (b) Observed and fitted SFS for all CAPs. A model with weakly deleterious (purple) and beneficial (green) showed the best fit (lnL = −3,080). It was significantly better than any other combination of models (LRT P ≪ 10−10). All CAP alleles shared with great apes (5%) were excluded from observed SFS.