| Literature DB >> 29953449 |
Kristof Theys1, Alison F Feder2, Maoz Gelbart3, Marion Hartl4, Adi Stern3, Pleuni S Pennings4.
Abstract
HIV has a high mutation rate, which contributes to its ability to evolve quickly. However, we know little about the fitness costs of individual HIV mutations in vivo, their distribution and the different factors shaping the viral fitness landscape. We calculated the mean frequency of transition mutations at 870 sites of the pol gene in 160 patients, allowing us to determine the cost of these mutations. As expected, we found high costs for non-synonymous and nonsense mutations as compared to synonymous mutations. In addition, we found that non-synonymous mutations that lead to drastic amino acid changes are twice as costly as those that do not and mutations that create new CpG dinucleotides are also twice as costly as those that do not. We also found that G→A and C→T mutations are more costly than A→G mutations. We anticipate that our new in vivo frequency-based approach will provide insights into the fitness landscape and evolvability of not only HIV, but a variety of microbes.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29953449 PMCID: PMC6023119 DOI: 10.1371/journal.pgen.1007420
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Different frequency patterns of synonymous, non-synonymous and nonsense mutations.
As expected, in the HIV pol gene, synonymous mutations occurred more frequently than non-synonymous mutations, which occurred more frequently than nonsense mutations, which were not observed at all. A) First row: Single-site frequency spectrum for three sites in the HIV protease protein (sites 172, 173 and 174). Second row: simulated data based on estimated selection coefficients. B) Mean mutation frequencies for all sites, ordered by mutation frequency.
Predictors of frequencies for mutations in the pol gene, estimated using a generalized linear model (GLM).
| Estimate | Std. Error | z value | Pr (> | z |) | Effect | ||
|---|---|---|---|---|---|---|
| 1 | (Intercept) | -5.199* | 0.035 | -147.037 | 0.000 | 0.0055** |
| 2 | In reverse transcriptase | 0.096 | 0.023 | 4.223 | 0.000 | +10% |
| 3 | SHAPE | 0.168 | 0.037 | 4.556 | 0.000 | +18% |
| 4 | T→C | 0.013 | 0.039 | 0.339 | 0.734 | +1% |
| 5 | C→T | 0.104 | 0.054 | 1.940 | 0.052 | +11% |
| 6 | G→A | 0.720 | 0.040 | 18.134 | 0.000 | +105% |
| 7 | CpG-forming | -0.664 | 0.058 | -11.520 | 0.000 | −49% |
| 8 | T→C:CpG-forming | 0.029 | 0.093 | 0.315 | 0.753 | +3% |
| 9 | Non-syn | -0.345 | 0.037 | -9.460 | 0.000 | −29% |
| 10 | T→C:Non-syn | -0.375 | 0.062 | -6.017 | 0.000 | −31% |
| 11 | C→T:Non-syn | -1.036 | 0.083 | -12.456 | 0.000 | −65% |
| 12 | G→A:Non-syn | -1.124 | 0.058 | -19.496 | 0.000 | −65% |
| 13 | Non-syn:CpG-forming | 0.358 | 0.090 | 3.995 | 0.000 | +43% |
| 14 | T→C:Non-syn:CpG-forming | 0.330 | 0.153 | 2.156 | 0.031 | +39% |
| 15 | Drastic amino acid change | -0.691 | 0.034 | -20.394 | 0.000 | −50% |
The intercept (*) is estimated for synonymous, non CpG-forming A→G mutations in protease with SHAPE value 0. The predicted frequency for such mutations is therefore e−5.2 which equals 0.0055, as indicated in the last column(**). Row 2-15 of the table lists the effects of changing attributes of the mutation, which is why A→G mutations are not explicitly listed in the table.
To estimate predicted frequencies for a particular class of mutations from the table, the relevant coefficient estimates must be summed, then exponentiated. For example, the predicted frequency of a synonymous, A→G mutation in protease with SHAPE value 0 that would create a CpG site is e−5.2−0.664 (taken from line 7), alternatively, one could calculate this predicted frequency as 0.0055 * (1 − 0.49) = 0.0028. For a site that is CpG forming and non-synonymous, we have to add the estimates from lines 9 and 13 to get e−5.2−0.664−0.345+0.358 or 0.0055 * (1 − 0.49) * (1 − 0.29) * (1 + 0.43) = 0.0029. For the continuous SHAPE parameter, the value of the SHAPE parameter for a given site should be multiplied by 0.168 (line 2) and then exponentiated, e.g., for a SHAPE value of 0.5, the predicted frequency is e−5.2−0.5*0.664 = 0.0060.
Fig 2Predicted and observed mutation frequencies for different mutation classes.
Mutation frequencies as predicted by the generalized linear model (large dots) and observed frequencies (small dots). The horizontal lines show the standard errors from the GLM. The graph shows the model predictions for synonymous and non-synonymous mutations that do not involve a drastic amino acid change and either form CpG sites (blue) or do not (green). In addition, for non-synonymous mutations, predictions are shown for mutations that involve a drastic amino acid change and either form CpG sites (light red) or do not (yellow).
Fig 3Distribution of estimated selection coefficients by amino acid replacements.
Many of the most costly mutations are concentrated at a few amino acids (e.g., P (proline) and G (glycine)). The selection coefficients shown are calculated directly from mean mutation frequencies and mutation rates using the mutation-selection balance formula, f = u/s.
Fig 4Estimated selection coefficients for transition mutations along the pol gene.
Point estimates for the selection coefficients for each transition mutation along the pol gene. Synonymous mutations are shown in yellow, non-synonymous mutations are shown in light red (C→T or G→A mutations) and purple (T→C or A→G mutations), nonsense mutations are shown in black. This plot illustrates that estimated selection coefficients do not appear to be affected by location in the gene. Note that these histograms include mutations that create CpG sites and those that don’t, which means that the effect that G→A and C→T mutations are more costly than non-CpG forming A→G mutations is not visible in this figure.
Fig 5Distribution of fitness costs as estimated from mutation frequencies using the mutation-selection balance formula (f = u/s).
Most synonymous mutations (left panel) have very low selection coefficients. For non-synonymous mutations and nonsense mutations, (right panel), selection coefficients are higher, especially for C→T and G→A mutations. Dashed vertical lines indicate median selection coefficients. Note that the scales of the y-axes differ between the individual plots.
Parameters for the gamma distribution of fitness effects for transition mutations in pol in 160 HIV-infected patients from the Bacheler et al. dataset, reflecting scale (κ) and shape (θ).
| Num. sites | Fraction lethal | Mut Rates from Abram 2010 | Mut rates from Zanini 2017 | ||
|---|---|---|---|---|---|
| 870 | 0.082 | 0.334 | 0.275 | 0.327 | 0.333 |
The ‘fraction lethal’ is the fraction of the mutations that had a mean frequency smaller than or equal to the mutation rate, so that they are estimated to be lethal. Sites are resampled with replacement and gamma distributions are fit 1000 times to create 95% confidence intervals via bootstrapping (shown in parentheses).