| Literature DB >> 32805043 |
Zhengting Zou1, Jianzhi Zhang1.
Abstract
It has been suggested that, due to the structure of the genetic code, nonsynonymous transitions are less likely than transversions to cause radical changes in amino acid physicochemical properties so are on average less deleterious. This view was supported by some but not all mutagenesis experiments. Because laboratory measures of fitness effects have limited sensitivities and relative frequencies of different mutations in mutagenesis studies may not match those in nature, we here revisit this issue using comparative genomics. We extend the standard codon model of sequence evolution by adding the parameter η that quantifies the ratio of the fixation probability of transitional nonsynonymous mutations to that of transversional nonsynonymous mutations. We then estimate η from the concatenated alignment of all protein-coding DNA sequences of two closely related genomes. Surprisingly, η ranges from 0.13 to 2.0 across 90 species pairs sampled from the tree of life, with 51 incidences of η < 1 and 30 incidences of η >1 that are statistically significant. Hence, whether nonsynonymous transversions are overall more deleterious than nonsynonymous transitions is species-dependent. Because the corresponding groups of amino acid replacements differ between nonsynonymous transitions and transversions, η is influenced by the relative exchangeabilities of amino acid pairs. Indeed, an extensive search reveals that the large variation in η is primarily explainable by the recently reported among-species disparity in amino acid exchangeabilities. These findings demonstrate that genome-wide nucleotide substitution patterns in coding sequences have species-specific features and are more variable among evolutionary lineages than are currently thought.Entities:
Keywords: amino acid exchangeability; codon substitution model; natural selection; sequence evolution; transition bias; transition/transversion ratio
Mesh:
Year: 2021 PMID: 32805043 PMCID: PMC7783172 DOI: 10.1093/molbev/msaa200
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Simulations according to equation (1) show that the inferred η’s are unbiased when compared with the true values and are uncorrelated with (a) the genetic distance (d) between the two species in the clade, (b) transition bias at the mutational level (), and (c) the fixation probability of nonsynonymous transversions relative to that of synonymous transversions (). In each panel, only the parameter shown on the x-axis varied. Each dot is one η estimate plotted against the true value of another parameter used in the simulation. The true value of an η estimate is indicated by its color, and the dotted lines correspond to the true η values for easy comparison. Genetic distance is defined by the number of nucleotide substitutions per codon between the two sequences. In each plot, except for the parameter varied, the other parameters used in the simulation are d = 1 substitution per codon, = 2, = 0.06.
Fig. 2.The estimated η varies among 90 clades sampled across the tree of life. Each clade is represented by an alignment of genome-wide orthologous coding sequences of two closely related species/strains. Statistical significance of η’s deviation from 1 is determined by an adjusted P value of <0.05 (likelihood ratio test followed by Bonferroni correction for multiple testing). Clade indices on the x-axis refer to those in supplementary table S1, Supplementary Material online.
Fig. 3.Variations in , , , d, or their combination are insufficient to explain the large among-clade variation in the estimated η. Here, η’s estimated from sequence alignments simulated under equation (2) are plotted against the true values of (a) , (b) , (c) , (d) d, or (e) combination of , , , and d used in the simulations. In (a), (b), and (d), only the parameter on the x-axis varied in the simulations. In (c) and (e), the observed codon frequencies and combination of , , , and d of the 90 clades are respectively used in the simulations, and the insets plot the η estimated from the real sequence alignment of each clade against that estimated from the alignment simulated. For each parameter value, the η estimates from ten replicate simulations are shown as dots (in panels a, b, and d) or boxplots (in panels c and e). In each boxplot, the lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), and the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1). In the insets of panels (c) and (e), the mean estimate from the ten replicate simulations is shown on the y-axis. In each plot, except for the parameter varied, the other parameters used in the simulation are d = 1 substitution per codon, = 2, and = 0.06.
Fig. 4.Variation of REs among clades can explain η variation. (a) Simulations with RE values based on the Grantham matrix (see Materials and Methods). For each of the ten new RE sets at a given level of deviation from or shuffled from the original values, five replicate sequence evolution simulations are conducted and the corresponding η estimates are plotted. Different RE sets at each deviation level and from each independent shuffle are distinguished by different (randomly assigned) colors. (b) The η’s estimated from the 90 clades simulated using the corresponding RE values of the real clades are plotted against the η’s estimated from the real clades. Dots are colored by the corresponding taxonomic group of the clades, as shown in the legend. The dashed red line indicates y = x. The y-axis value of each dot is the mean estimate from ten replicate simulations. (c) The expected η computed from the estimated REs and codon frequencies of each clade is plotted against the η estimated by the likelihood method from the alignment of the clade. The dashed red line indicates y = x. The y-axis value of each dot is the mean estimate from ten replicate simulations.