| Literature DB >> 34180988 |
Jun Chen1, Thomas Bataillon2, Sylvain Glémin3,4, Martin Lascoux4.
Abstract
The distribution of fitness effects (DFE) of new mutations is a key parameter of molecular evolution. The DFE can in principle be estimated by comparing the site frequency spectra (SFS) of putatively neutral and functional polymorphisms. Unfortunately, the DFE is intrinsically hard to estimate, especially for beneficial mutations because these tend to be exceedingly rare. There is therefore a strong incentive to find out whether conditioning on properties of mutations that are independent of the SFS could provide additional information. In the present study, we developed a new measure based on SIFT scores. SIFT scores are assigned to nucleotide sites based on their level of conservation across a multispecies alignment: the more conserved a site, the more likely mutations occurring at this site are deleterious, and the lower the SIFT score. If one knows the ancestral state at a given site, one can assign a value to new mutations occurring at the site based on the change of SIFT score associated with the mutation. We called this new measure δ. We show that properties of the DFE as well as the flux of beneficial mutations across classes covary with δ and, hence, that SIFT scores are informative when estimating the fitness effect of new mutations. In particular, conditioning on SIFT scores can help to characterize beneficial mutations.Entities:
Keywords: DFE; SIFT; beneficial mutations
Mesh:
Year: 2022 PMID: 34180988 PMCID: PMC8743036 DOI: 10.1093/gbe/evab151
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.Conceptual overview of the approach developed in the present study and of the steps (1–5) we take for conditioning SFS data on genomic features. Here our genomic feature is the change in SIFT scores, δ.
Sequences for the “Toy Example” Three-Codon Sequence
| Ancestral Seq. | C | C | A | G | G | T | C | A | G |
|---|---|---|---|---|---|---|---|---|---|
| Ind 1 | — | — | G | — | — | — | — | — | — |
| Ind 2 | A | — | — | — | — | — | — | C | — |
| Ind 3 | — | — | G | — | — | — | — | C | — |
| Ind 4 | A | — | — | — | T | — | — | — | — |
Sequences and SIFT Scores for the “Toy Example” Three-Codon Sequence
| Codon 1 | Codon 2 | Codon 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Nucleotides | C | C | A | G | G | T | C | A | G |
| Degeneracy | 0 | 0 | 4 | 0 | 0 | 4 | 0 | 0 | 4 |
| SIFT for A |
| DEL |
| TOL | DEL | TOL | TOL |
| TOL |
| SIFT for C |
|
| TOL | TOL | DEL | DEL |
|
| TOL |
| SIFT for G | DEL | DEL |
|
|
| TOL | TOL | DEL |
|
| SIFT for T | DEL | TOL | TOL | TOL |
|
| DEL | DEL | TOL |
Note.—For each position, the SIFT score of the four possible nucleotides is given. The nucleotides present in the alignment are in bold, with the score in italics corresponding to the derived alleles. From this, each polymorphism can be assigned to a degeneracy category (0 or 4) and a delta SIFT score category (TOL →TOL, TOL →DEL, DEL →TOL, DEL →DEL). In the example, SNPs are thus classified as follows: 0-TOL →TOL (pos. 1), 4-TOL →TOL (pos. 3), 0-TOL →DEL (pos. 5), and 0-DEL →TOL (pos. 8). Each position also contributes to the length of the eight possible categories depending on the opportunity of mutations at this site. For example, at position 1, starting from the ancestral nucleotide C (TOL), one possible mutation is TOL →TOL and the two others are TOL →DEL, so this position contributes 1/3 the length of 0-TOL →TOL category and 2/3 to the 0-TOL →DEL category. The contribution of all positions is then summed across the ancestral sequence to obtain the total length of each category.
as a Function of the Change in SIFT Score, δ.
|
| ||||
|---|---|---|---|---|
| Fold |
| 25% | 50% | 75% |
| 0 | −3 | 0.043 | 0.078 | 0.10 |
| 0 | −2 | 0.062 | 0.10 | 0.14 |
| 0 | −1 | 0.14 | 0.18 | 0.29 |
| 0 | 0 | 0.20 | 0.34 | 0.51 |
| 0 | 1 | 0.59 | 0.92 | 1.50 |
| 0 | 2 | 1.55 | 3.75 | 8.25 |
| 0 | 3 | 31.44 | 53.74 | 131.70 |
Fig. 2.Log(P0/P4) as a function of the change in SIFT scores, δ: the orange line denotes a least square regression, the blue curve a local regression (loess). Data points are jittered horizontally for graphical convenience. Shaded gray areas around the curves denote confidence bands around each regression lines. Point size is proportional to the sample size of each SFS (number of nonsynonymous SNPs).
Distribution of the DFE Categories, N, as a Function of Site (0-fold vs. 4-fold) and Changes in SIFT Score, δ
|
| ||||||
|---|---|---|---|---|---|---|
| Fold |
|
|
|
|
|
|
| 0 | −3 | 2.3e-6 | 3e-2 | 6.4e-2 | 0.18 | 0.70 |
| 0 | −2 | 3.2e-5 | 4.8e-2 | 8.5e-2 | 0.18 | 0.66 |
| 0 | −1 | 1.5e-4 | 0.12 | 0.13 | 0.21 | 0.42 |
| 0 | 0 | 0.11 | 2.3e-3 | 1.8e-2 | 9.2e-2 | 0.52 |
| 0 | 1 | 0.60 | 1.5e-12 | 5.8e-9 | 9.9e-6 | 0.30 |
| 0 | 2 | 0.99 | 9.3e-3 | 2.8e-6 | 3.3e-5 | 2.3e-4 |
| 0 | 3 | 0.99 | 9.8e-3 | 8.6e-5 | 5.4e-6 | 1.3e-8 |
Note.—p is the proportion of beneficial mutations.
Fig. 3.Overview of the proportion of DFE classes versus δ. Shown are the local regression (loess) curves depicting the trend in the observed proportion of mutations falling in each N class in the inferred DFE versus δ. In orange, the class of beneficial mutations (), in red, strongly and very strongly deleterious (N within (−10, to ), in light gray, slightly deleterious (N within (-1, 0)), and in darker gray, mildly deleterious mutations (N within (−1, −10)). Note that the data points underlying the fitted curves are not pictured in the figure.
Fig. 4.The proportion (p) (A) and flux () (B) of beneficial mutations covary with δ. The curve in (B) is a loess regression line indicating the local trend in the data. The gray-shaded area represents the 95% confidence interval around the regression lines. Point size is proportional to the sample size of each SFS (number of nonsynonymous SNPs) used for estimating DFE parameters.