| Literature DB >> 25930165 |
Stephen J Bush1, Paula X Kover1, Araxi O Urrutia1.
Abstract
Rapidly evolving proteins can aid the identification of genes underlying phenotypic adaptation across taxa, but functional and structural elements of genes can also affect evolutionary rates. In plants, the 'edges' of exons, flanking intron junctions, are known to contain splice enhancers and to have a higher degree of conservation compared to the remainder of the coding region. However, the extent to which these regions may be masking indicators of positive selection or account for the relationship between dN/dS and other genomic parameters is unclear. We investigate the effects of exon edge conservation on the relationship of dN/dS to various sequence characteristics and gene expression parameters in the model plant Arabidopsis thaliana. We also obtain lineage-specific dN/dS estimates, making use of the recently sequenced genome of Thellungiella parvula, the second closest sequenced relative after the sister species Arabidopsis lyrata. Overall, we find that the effect of exon edge conservation, as well as the use of lineage-specific substitution estimates, upon dN/dS ratios partly explains the relationship between the rates of protein evolution and expression level. Furthermore, the removal of exon edges shifts dN/dS estimates upwards, increasing the proportion of genes potentially under adaptive selection. We conclude that lineage-specific substitutions and exon edge conservation have an important effect on dN/dS ratios and should be considered when assessing their relationship with other genomic parameters.Entities:
Keywords: Arabidopsis thaliana; dN/dS; lineage-specific evolution; splice enhancer
Mesh:
Year: 2015 PMID: 25930165 PMCID: PMC4480654 DOI: 10.1111/mec.13221
Source DB: PubMed Journal: Mol Ecol ISSN: 0962-1083 Impact factor: 6.185
Correlation strength of dN/dS and NI with different variables in A. thaliana, after alignment against A. lyrata, T. parvula or both
| Alignments of | Alignments of | Alignments of | ||||
|---|---|---|---|---|---|---|
| Variable | dN/dS | NI | dN/dS | NI | dN/dS | NI |
| Average exon length | 0.103 | −0.026 | 0.045 | −0.141 | −0.043 | |
| Average intron length | −0.070 | 0.043 | −0.052 | 0.061 | − | 0.088 |
| Gene length | −0.243 | 0.092 | −0.067 | −0.047 | −0.169 | 0.044 |
| Primary transcript length | −0.243 | 0.092 | −0.067 | −0.047 | −0.170 | 0.043 |
| Protein length | −0.124 | 0.050 | − | −0.060 | −0.186 | −0.034 |
| Total exon length | −0.203 | 0.075 | −0.066 | −0.039 | −0.200 | |
| Total intron length | −0.228 | 0.086 | −0.056 | −0.041 | − | 0.089 |
| UTR length (5′) | −0.183 | 0.032 | −0.131 | − | 0.035 | |
| UTR length (3′) | −0.122 | 0.053 | −0.070 | 0.040 | −0.051 | 0.086 |
| Expression breadth | −0.399 | 0.120 | −0.284 | 0.117 | −0.130 | 0.232 |
| Exp. level (RNA-seq) | −0.415 | 0.145 | −0.285 | 0.117 | −0.143 | 0.217 |
| Protein abundance | −0.302 | 0.078 | −0.241 | 0.095 | −0.086 | 0.194 |
| Tissue specificity ( | 0.277 | −0.088 | 0.210 | −0.092 | 0.128 | −0.175 |
| Effective number of codons | 0.059 | −0.016 | 0.065 | −0.035 | 0.064 | −0.043 |
| Frequency of optimal codons | −0.194 | 0.065 | −0.187 | 0.116 | −0.069 | 0.176 |
| GC (%) | − | 0.036 | −0.057 | 0.081 | −0.110 | 0.038 |
| Intron density | −0.158 | 0.048 | −0.022 | −0.052 | 0.026 | 0.064 |
| Total no. of introns | −0.212 | 0.071 | −0.038 | −0.069 | − | 0.062 |
| Multifunctionality | −0.132 | − | −0.137 | − | −0.045 | − |
| No. of protein–protein interactions | −0.060 | 0.031 | −0.084 | 0.069 | −0.113 | 0.152 |
| Recombination rate | −0.058 | − | − | − | ||
All values shown are correlation strengths, as Spearman's rho. All values are statistically significant at P < 0.05, except for those underlined.
Partial correlations of dN/dS and 11 evolutionary rate predictors in A. thaliana, after controlling for expression level
| Variable | Alignments of | Alignments of | Alignments of |
|---|---|---|---|
| Average exon length | 0.077 | −0.151 | |
| Average intron length | −0.037 | −0.037 | − |
| Gene length | −0.155 | −0.051 | −0.156 |
| Protein length | −0.093 | − | −0.196 |
| Total exon length | −0.124 | −0.052 | −0.191 |
| Total intron length | −0.148 | −0.039 | |
| Total no. of introns | −0.136 | −0.021 | 0.007 |
| Frequency of optimal codons | −0.130 | −0.121 | −0.045 |
| Expression breadth | −0.220 | −0.148 | −0.055 |
| Protein abundance | −0.126 | −0.112 | − |
| No. of protein–protein interactions | −0.108 | −0.118 | −0.118 |
All values shown are partial correlation strengths, as Spearman's rho. All values are statistically significant at P < 0.05, except those underlined.
Fig. 1dN, dS, dN/dS and NI after exon edge removal. dN/dS (a), dN (b), dS (c) and NI (d) for a sample of 1443 genes with at least one fully alignable exon between A. thaliana and A. lyrata, after removing one codon at a time from exon edges (black), to a maximum of 30. The effects of random codon removal are shown in red. Distributions significantly differ when 30 codons are removed sequentially, but not randomly, compared to when no codons are removed. For sequential removal vs. no removal, Kruskal–Wallis P = 0.02 (dN/dS) and < 2.2 × 10−16 (NI). For random removal vs. no removal, Kruskal–Wallis P = 0.08 (dN/dS) and 0.49 (NI).
Exon edge removal shifts dN/dS values towards a range indicative of either stronger positive or relaxed purifying selection, with the proportion of genes potentially under adaptive selection increased
| Chi-square test | |||||||
|---|---|---|---|---|---|---|---|
| Dataset | Max. no. of codons removed from each gene | No. of genes | % of genes with dN/dS >1 (no codons removed) | % of genes with dN/dS >1 (after sequential codon removal) | % of genes with dN/dS >1 (after random codon removal) | χ2 | |
| Alignments of | 10 | 3213 | 1.81 | 2.4 | 1.81 | 11.25 | 7.96 × 10−4 |
| 20 | 2041 | 1.62 | 2.45 | 1.71 | 6.43 | 0.011 | |
| 30 | 1443 | 1.39 | 2.43 | 1.39 | 6.22 | 0.013 | |
| Alignments of | 10 | 779 | 0.64 | 1.67 | 0.77 | 8.17 | 4.27 × 10−3 |
| 20 | 350 | 0.29 | 1.43 | 0.29 | 16.00 | 6.33 × 10−5 | |
| 30 | 174 | 0 | 2.87 | 0 | NA | NA | |
Fig. 2Variables that have a significantly different correlation with dN/dS after the sequential removal of 30 codons from exon edges, compared to random codon removal. The four variables shown – expression breadth, expression level, tau and GC content – are those which have significantly different estimates of rho for their correlation with dN/dS before and after codon removal. Two criteria are met for each variable: that rho is significantly different after sequential, compared to random codon removal, and that rho is significantly different after sequential, compared to no codon removal. Estimates of dN/dS are made using alignments of A. thaliana against A.lyrata. Data for this figure, including P-values and sample sizes, are shown in Table S6 (Supporting information).
Correlates of dN/dS using estimates derived from codons common to the alignment of A. thaliana, A. lyrata and T. parvula
| Variable | Alignments of | Alignments of | Alignments of |
|---|---|---|---|
| Average exon length | − | − | −0.106 |
| Average intron length | −0.080 | − | − |
| Gene length | −0.146 | −0.115 | −0.154 |
| Primary transcript length | −0.146 | −0.115 | −0.154 |
| Protein length | −0.107 | −0.089 | −0.177 |
| Total exon length | −0.139 | −0.114 | −0.182 |
| Total intron length | −0.088 | −0.072 | − |
| UTR length (5′) | − | − | |
| UTR length (3′) | −0.066 | −0.056 | − |
| Expression breadth | −0.286 | −0.317 | −0.182 |
| Exp. level (RNA-seq) | −0.256 | −0.284 | −0.144 |
| Protein abundance | −0.198 | −0.225 | −0.102 |
| 0.214 | 0.239 | 0.124 | |
| Effective number of codons | 0.116 | 0.123 | 0.051 |
| Frequency of optimal codons | −0.142 | −0.189 | −0.051 |
| GC (%) | −0.054 | −0.076 | −0.087 |
| Intron density | −0.050 | −0.056 | −0.020 |
| Total no. of introns | −0.079 | −0.067 | −0.045 |
| Multifunctionality | −0.060 | −0.038 | −0.036 |
| Protein–protein interactions | −0.127 | −0.153 | −0.098 |
| Recombination rate | −0.001 | −0.063 | 0.041 |
Correlation strengths are shown as Spearman's rho. All values are statistically significant at P < 0.05, except those underlined. The rightmost column shows lineage-specific dN/dS estimates.
Significantly different correlation strength when using lineage-specific dN/dS estimates compared to pairwise estimates.