| Literature DB >> 25771198 |
XianMing Wu1, Laurence D Hurst2.
Abstract
The nearly neutral theory predicts that small effective population size provides the conditions for weakened selection. This is postulated to explain why our genome is more "bloated" than that of, for example, yeast, ours having large introns and large intergene spacer. If a bloated genome is also an error prone genome might it, however, be the case that selection for error-mitigating properties is stronger in our genome? We examine this notion using splicing as an exemplar, not least because large introns can predispose to noisy splicing. We thus ask whether, owing to genomic decay, selection for splice error-control mechanisms is stronger, not weaker, in species with large introns and small populations. In humans much information defining splice sites is in cis-exonic motifs, most notably exonic splice enhancers (ESEs). These act as splice-error control elements. Here then we ask whether within and between-species intron size is a predictor of the commonality of exonic cis-splicing motifs. We show that, as predicted, the proportion of synonymous sites that are ESE-associated and under selection in humans is weakly positively correlated with the size of the flanking intron. In a phylogenetically controlled framework, we observe, also as expected, that mean intron size is both predicted by Ne.μ and is a good predictor of cis-motif usage across species, this usage coevolving with splice site definition. Unexpectedly, however, across taxa intron density is a better predictor of cis-motif usage than intron size. We propose that selection for splice-related motifs is driven by a need to avoid decoy splice sites that will be more common in genes with many and large introns. That intron number and density predict ESE usage within human genes is consistent with this, as is the finding of intragenic heterogeneity in ESE density. As intronic content and splice site usage across species is also well predicted by Ne.μ, the result also suggests an unusual circumstance in which selection (for cis-modifiers of splicing) might be stronger when population sizes are smaller, as here splicing is noisier, resulting in a greater need to control error-prone splicing.Entities:
Keywords: exonic splice enhancer; intron density; purifying selection; synonymous mutation
Mesh:
Substances:
Year: 2015 PMID: 25771198 PMCID: PMC4476162 DOI: 10.1093/molbev/msv069
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FRate of synonymous evolution in ESE and non-ESE sequences at exon ends as a function of the Log of flanking intron size for two ESE data sets (A: INT3, B: INT3_400). In addition to Ks of ESE and non-ESE we also show Ks of exon core domains and pseudo-ESE, that is, hexamers of the same underlying nucleotide content as ESEs but not necessarily identified as being functional ESE. We consider 20 intron size bins apportioned so that all bins contain the same number of exon ends for concatenation, the numbers given reflecting the upper intron size limit of each bin.
FThe degree of selective constraint on ESE sequences at exon ends as a function of the Log of flanking intron size for two ESE data sets (A: INT3, B: INT3_400). For definition of constraint, see main text. For intron size definition, see figure 1. Note that in all cases constraint appears stronger when intron sizes are larger, although using 20 bins the trends are not significant.
Evidence for Phylogenetically Controlled Correlation between Amino Acid/Codon Usage Trends and the Genomic Traits.
| All Exons (AA) | All Exons (codon) | Random 5,000 Exons (AA) | Random 5,000 Exons (codon) | |
|---|---|---|---|---|
| Log BF ( | 48.241 | 39.394 | 31.923 | 42.027 |
| Log BF ( | 37.484 | 29.202 | 24.055 | 32.294 |
| Log BF ( | 20.145 | 15.018 | 12.214 | 18.410 |
Note.—We employ two metrics of skews at exon ends, the number of codons showing a significant skew and the number of amino acids showing a significant skew. For each, in addition we report results wherein for each species all relevant exons are employed and a second metric where the input sample size is the same for all species (5,000 randomly chosen exons). In the latter instance, we consider the mean number of significant trends from multiple samplings of 5,000 randomly chosen exons. Y, proportion of amino acids/codons showing significant trends; X, mean CDS length/gene length; N, introns per kb exon; M, mean intron size.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]), is the test statistic of BayesTraits which gives the information of evidence for correlated evolution: Weak evidence (<2), positive evidence (>2), strong evidence (5–10), very strong evidence (>10). All Log BF values in the table are greater than 10, so the evidence from all correlations is very strong.
Cis-Motif Usage Correlates Significantly with Usage of “AGgt” and “agGT” Splice Sites.
| All Exons (AA) | All Exons (codon) | Random 5,000 Exons (AA) | Random 5,000 Exons (codon) | |
|---|---|---|---|---|
| Log BF ( | 39.0359 | 31.7091 | 26.4632 | 33.7518 |
| Log BF | 52.1594 | 64.0366 | 76.8355 | 58.1153 |
Note.—Y, proportion of amino acids/codons showing significant trends; P1, proportion of AGgt (Capital letter: exon, small letter: intron); P2, proportion of agGT.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]). All Log BF values in the table are greater than 10, so the evidences of all correlations (positive) are very strong.
Spearman’s Correlation Analysis Results for Ne.μ Values of This Study and the Prior Study of Lynch and Conery.
| rho | rho2 | ||
|---|---|---|---|
| 0.093 | 0.009 | 0.765 | |
| 0.165 | 0.027 | 0.591 | |
| 0.093 | 0.009 | 0.765 | |
| 0.996 | 0.991 | 0.000 | |
| 0.970 | 0.941 | 0.000 | |
| 0.975 | 0.951 | 0.000 |
Note.—We compare our three different estimators for Ne.μ, (Eta, Pi, and S) and Lynch’s single estimate.
Evidence for Phylogenetically Controlled Correlation between Ne.μ Values and Splice-Related Genomic Traits.
| X | N | M | |
|---|---|---|---|
| Log BF ( | 15.762 | 23.424 | 41.057 |
| Log BF ( | 14.572 | 22.590 | 39.944 |
| Log BF ( | 13.988 | 22.695 | 40.367 |
| Log BF ( | 5.290 | 0.989 | −0.587 |
Note.—We employ our three different estimators for Ne.μ (Eta, Pi, and S) and Lynch’s single estimate. X, mean CDS length/gene length; N, introns per kb exon; M, mean intron size.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]), is the test statistic of BayesTraits which gives the information of evidence for correlated evolution: weak evidence (<2), positive evidence (>2), strong evidence (5–10), very strong evidence (>10). All Log BF values in the table are greater than 10, so the evidences of all correlations are very strong.
bThis Ne.μ value is from previous study (Lynch and Conery 2003).
Evidence for Phylogenetically Controlled Correlations between Ne.μ Values and Usage of “AGgt” (very strong) and “agGT” (weak) Splice Sites Using Three Estimators of Ne.μ, Namely Pi, S, and Eta.
| Log BF ( | 22.7225 | 19.1016 | 20.6161 |
| Log BF ( | 1.6456 | −0.1762 | 0.6543 |
Note.—P1, proportion of AGgt (Capital letter: exon, small letter: intron); P2, proportion of agGT; Log BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]).
aThree types of Ne.μ (Ne.μ_Pi, Ne.μ_S, Ne.μ_Eta).
Little Evidence for a Phylogenetically Controlled Correlation between Ne.μ Values and Amino Acid/Codon Usage Trends (Y).
| All Exons (AA) | All Exons (codon) | Random 5,000 Exons (AA) | Random 5,000 Exons (codon) | |
|---|---|---|---|---|
| Log BF ( | −0.486 | −2.065 | −2.693 | −4.436 |
| Log BF ( | −1.383 | −0.206 | 1.514 | 0.728 |
| Log BF ( | 0.534 | −0.520 | −2.038 | 1.079 |
Note.—We employ our three different estimators for Ne.μ (Eta, Pi, and S) and four metrics of k-mer usage. Y, proportion of amino acids/codons showing significant trends.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]), is the test statistic of BayesTraits which gives the information of evidence for correlated evolution: weak evidence (<2), positive evidence (>2), strong evidence (5–10), very strong evidence (>10). All Log BF values in the table are less than 2, so the evidences of all correlations are weak.
Evidence for Correlation between Alternative Splicing Rates and Splice-Related Genomic Traits.
| X | N | M | |
|---|---|---|---|
| Log BF (ASL1 ∼ Splice-related Genomic Traits) | 5.259 | 2.782 | 7.299 |
| Log BF (ASL2 ∼ Splice-related Genomic Traits) | 8.714 | 4.589 | 9.500 |
ASL1, average number of ASEs per gene (residual of the polynomial regression between num of ESTs [col. O] and ASL [col. U]); ASL2, average number of ASEs per gene (residual of the linear regression between the log-transformed num of ESTs [col. O] and ASL [col. U]); X, mean CDS length/gene length; N, introns per kb exon; M, mean intron size.
aLog BF (log Bayes factor) = 2*(log [harmonic mean (complex model)] − log [harmonic mean (simple model)]), is the test statistic of BayesTraits which gives the information of evidence for correlated evolution: Weak evidence (<2), positive evidence (>2), strong evidence (5–10), very strong evidence (>10).