| Literature DB >> 21926095 |
Tanja Slotte1, Thomas Bataillon, Troels T Hansen, Kate St Onge, Stephen I Wright, Mikkel H Schierup.
Abstract
Recent results from Drosophila suggest that positive selection has a substantial impact on genomic patterns of polymorphism and divergence. However, species with smaller population sizes and/or stronger population structure may not be expected to exhibit Drosophila-like patterns of sequence variation. We test this prediction and identify determinants of levels of polymorphism and rates of protein evolution using genomic data from Arabidopsis thaliana and the recently sequenced Arabidopsis lyrata genome. We find that, in contrast to Drosophila, there is no negative relationship between nonsynonymous divergence and silent polymorphism at any spatial scale examined. Instead, synonymous divergence is a major predictor of silent polymorphism, which suggests variation in mutation rate as the main determinant of silent variation. Variation in rates of protein divergence is mainly correlated with gene expression level and breadth, consistent with results for a broad range of taxa, and map-based estimates of recombination rate are only weakly correlated with nonsynonymous divergence. Variation in mutation rates and the strength of purifying selection seem to be major drivers of patterns of polymorphism and divergence in Arabidopsis. Nevertheless, a model allowing for varying negative and positive selection by functional gene category explains the data better than a homogeneous model, implying the action of positive selection on a subset of genes. Genes involved in disease resistance and abiotic stress display high proportions of adaptive substitution. Our results are important for a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21926095 PMCID: PMC3296466 DOI: 10.1093/gbe/evr094
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Relative Importance of Genomic Factors for Nonsynonymous Divergence and d
| Response Variable | Adj. | Scale (k) | log(1 + | Rec (cM/Mb) | Exonic GC | Exon Density | Chromosome |
| log(1 + | 0.24 | 20 | 0 | 0 | |||
| 0.32 | 50 | 0.01 | 0 | ||||
| 0.50 | 200 | 0.01 | 0.05 | 0.01 | |||
| log(1 + | 0.03 | 20 | 0.03 | 0.05 | |||
| 0.05 | 50 | 0.06 | 0.05 | ||||
| 0.16 | 200 | 0.04 | 0.04 | 0.02 |
Note.—Results are presented for analyses at three spatial scales (windows of 20, 50, and 200 kb). The proportion of variance explained by each model is presented (Adj. R2) as well as the estimate of the relative portion of the variance explained by each predictor variable in the model (these may not sum exactly to 1 due to rounding of proportions). Entries in bold denote factors with associated significance level <0.001.
Relative Importance of Genomic Factors for Non-exonic, Synonymous, and Nonsynonymous Nucleotide Diversity (π)
| Response Variable | Adj. | Scale (k) | log(1 + | log(1 + | Rec (cM/Mb) | Exonic GC | Exon Density | Chr |
| πNonexonic | 0.03 | 20 | 0.08 | 0.08 | ||||
| 0.05 | 50 | 0.16 | 0.03 | 0.06 | ||||
| 0.07 | 200 | 0.16 | 0.07 | 0.14 | 0.09 | |||
| πSyn | 0.01 | 20 | 0.12 | 0.08 | 0.05 | |||
| 0.02 | 50 | 0.18 | 0.10 | 0.04 | 0.10 | |||
| 0.08 | 200 | 0.07 | 0.04 | 0.06 | 0.07 | |||
| πNonsyn | 0.01 | 20 | 0.06 | 0.01 | 0.04 | |||
| 0.03 | 50 | 0.13 | 0.07 | 0.07 | 0.06 | |||
| 0.07 | 200 | 0.05 | 0.04 | 0.14 | 0.06 |
NOTE.—Results are presented for analyses at three spatial scales (windows of 20, 50, and 200 kb). The proportion of variance explained by each model is presented (Adj. R2) as well as the estimate of the relative portion of the variance explained by each predictor variable in the model (these may not sum exactly to 1 due to rounding of proportions). Entries in bold denote factors with associated significance level <0.001. Chr, chromosome.
FThe relationship between nonsynonymous divergence and synonymous polymorphism in Arabidopsis thaliana does not show a signature of recurrent hitchhiking. Levels of synonymous polymorphism—as measured through nucleotide diversity Pi—were corrected for the joint effect of d and exon density per window (n = 515 windows of 200 kb are used here) using a linear model. The dotted line denotes the standard least square regression line through the data points, and the continuous line—almost superimposed—denotes a local robust “lowess” regression.
F(A) Levels of synonymous divergence, d/d, recombination rate, and synonymous heterozygosity plotted in 200-kb windows across Arabidopsis thaliana chromosome 1. There is no reduction in synonymous divergence at a previously identified swept region in A. thaliana (indicated by arrow at ∼20 Mb). (B) A close-up of the putatively swept region shows that there is no evidence for a reduction in synonymous divergence (20-kb windows).
Strength of Purifying Selection (1 − f) and Levels of Adaptive Evolution (α) in the Arabidopsis thaliana versus Arabidopsis lyrata Divergence. Chromosome-Wide Estimates of Constraint and Adaptive Evolution
| Chromosome | α | 1 − |
| 1 | 0.05 ± 0.04 | 0.76 ± 0.008 |
| 2 | −0.01± 0.05 | 0.75 ± 0.010 |
| 3 | −0.08± 0.05 | 0.73 ± 0.013 |
| 4 | 0.07 ± 0.04 | 0.76 ± 0.010 |
| 5 | 0.10 ± 0.04 | 0.77 ± 0.008 |
NOTE.—Estimates are given ± 2 standard errors (SEs). Approximate SEs were estimated via 50 stratified bootstrap samples. 1 − f quantifies the intensity of purifying selection through the fraction of new AA changing mutation subjected to strong purifying selection. α is the fraction of divergence attributable to adaptive evolution (driven by positive selection on new AA changing mutations).
Strength of Purifying Selection (1 − f) and Levels of Adaptive Evolution (α) in the Arabidopsis thaliana versus Arabidopsis lyrata Divergence. Fit of Alternative Models for the 44 Most Abundant GO Categories
| Model Name and Description | α | LogL | AIC | |
| M0: Strict selective neutrality | 1 | 0 | −419,123 | 838,336 |
| M1: Homogenous purifying selection + No adaptive evolution | 0.27 | 0 | −57637.4 | 115,367 |
| M2: Purifying selection with variable intensity + No adaptive evolution | 0 | −6132.64 | 12,443 | |
| M3 Homogeneous purifying selection + homogenous levels of adaptive evolution | 0.22 | 0.2 | −56682.4 | 113,459 |
| M4 Purifying selection with variable intensity + homogenous levels of adaptive evolution | 0.1 | −5945.6 | 12,071 | |
| α |
NOTE.—α is the fraction of divergence attributable to adaptive evolution (driven by positive selection on new AA changing mutations). The model with the best AIC, here M5, is highlighted in bold.
Estimates of Constraint (1 − f) and Proportion of Adaptive Evolution, α, in the Abundant GO Categories Undergoing Most Adaptation
| GOs | GO Term | α | 1 − | |
| GO:0006869 | Lipid transport | 137 | 0.60 ± 0.36 | 0.71 ± 0.10 |
| GO:0009414 | Response to water deprivation | 221 | 0.57 ± 0.34 | 0.86 ± 0.02 |
| GO:0045087 | Innate immune response | 135 | 0.49 ± 0.10 | 0.76 ± 0.02 |
| GO:0009737 | Response to abscisic acid stimulus | 312 | 0.41 ± 0.10 | 0.86 ± 0.02 |
| GO:0006508 | Proteolysis | 460 | 0.35 ± 0.18 | 0.79 ± 0.04 |
| GO:0006281 | DNA repair | 140 | 0.34 ± 0.36 | 0.69 ± 0.07 |
| GO:0006457 | Protein folding | 269 | 0.30 ± 0.10 | 0.78 ± 0.04 |
| GO:0005975 | Carbohydrate metabolic process | 437 | 0.24 ± 0.23 | 0.81 ± 0.02 |
| GO:0006950 | Response to stress | 133 | 0.23 ± 0.36 | 0.83 ± 0.07 |
| GO:0008152 | Metabolic process | 939 | 0.13 ± 0.15 | 0.80 ± 0.02 |
| GO:0009753 | Response to jasmonic acid stimulus | 171 | 0.08 ± 0.16 | 0.81 ± 0.03 |
| GO:0009409 | Response to cold | 298 | 0.19 ± 0.14 | 0.87 ± 0.04 |
NOTE.—Estimates ± 2 SEs. Approximate SEs on estimates obtained through 100 bootstrap samples (re-sampling stratified by GO category).
FThe upper panel shows partial correlation coefficients for an analysis aimed at understanding the genomic factors correlated with variation in d, and the lower panel shows results from an analysis of genomic factors correlated with codon bias. Partial correlation coefficients (below diagonal) are color coded according to sign and degree of correlation, and P values (above diagonal) are color coded by significance level. In the lower panel, the upper value in each cell comes from analysis of codon bias in Arabidopsis thaliana, whereas the lower value corresponds to results for Arabidopsis lyrata. FOP is the frequency of optimal codons.
GO Biological Process Categories That Were Overrepresented (≤5% false discovery rate) among Sets of Genes in the Tails of the d Distribution
| Gene Set | GO Term | Description | Fold Enrichment | FDR | ||
| High | GO:0006355 | Regulation of transcription, DNA dependent | 57 | 2.1 | 9.80 × 10−08 | 1.49 × 10−04 |
| GO:0051252 | Regulation of RNA metabolic process | 57 | 2.1 | 1.16 × 10−07 | 1.76 × 10−04 | |
| GO:0045449 | Regulation of transcription | 86 | 1.7 | 2.97 × 10−07 | 4.51 × 10−04 | |
| GO:0006350 | Transcription | 58 | 1.8 | 1.51 × 10−05 | 2.29 × 10−02 | |
| Low | GO:0007264 | Small GTPase mediated signal transduction | 33 | 5.4 | 1.43 × 10−17 | 2.35 × 10−14 |
| GO:0006412 | Translation | 84 | 2.3 | 2.50 × 10−14 | 4.09 × 10−11 | |
| GO:0045184 | Establishment of protein localization | 66 | 2.3 | 1.17 × 10−10 | 1.92 × 10−07 | |
| GO:0015031 | Protein transport | 66 | 2.3 | 1.17 × 10−10 | 1.92 × 10−07 | |
| GO:0006091 | Generation of precursor metabolites and energy | 59 | 2.4 | 1.88 × 10−10 | 3.08 × 10−07 | |
| GO:0008104 | Protein localization | 66 | 2.2 | 6.94 × 10−10 | 1.14 × 10−06 | |
| GO:0010038 | Response to metal ion | 57 | 2.3 | 2.88 × 10−09 | 4.72 × 10−06 | |
| GO:0046686 | Response to cadmium ion | 52 | 2.3 | 7.86 × 10−09 | 1.29 × 10−05 | |
| GO:0010035 | Response to inorganic substance | 66 | 2.0 | 4.62 × 10−08 | 7.57 × 10−05 | |
| GO:0031497 | Chromatin assembly | 19 | 4.0 | 1.65 × 10−07 | 2.71 × 10−04 | |
| GO:0051258 | Protein polymerization | 12 | 6.1 | 2.90 × 10−07 | 4.75 × 10−04 | |
| GO:0006323 | DNA packaging | 19 | 3.7 | 6.01 × 10−07 | 9.85 × 10−04 | |
| GO:0034728 | Nucleosome organization | 18 | 3.9 | 6.47 × 10−07 | 1.06 × 10−03 | |
| GO:0006334 | Nucleosome assembly | 18 | 3.9 | 6.47 × 10−07 | 1.06 × 10−03 | |
| GO:0065004 | Protein–DNA complex assembly | 18 | 3.8 | 9.87 × 10−07 | 1.62 × 10−03 | |
| GO:0034622 | Cellular macromolecular complex assembly | 32 | 2.5 | 1.04 × 10−06 | 1.70 × 10−03 | |
| GO:0034621 | Cellular macromolecular complex subunit organization | 34 | 2.5 | 1.08 × 10−06 | 1.77 × 10−03 | |
| GO:0006333 | Chromatin assembly or disassembly | 20 | 3.4 | 1.62 × 10−06 | 2.65 × 10−03 | |
| GO:0009628 | Response to abiotic stimulus | 122 | 1.5 | 2.04 × 10−06 | 3.34 × 10−03 | |
| GO:0043933 | Macromolecular complex subunit organization | 37 | 2.1 | 9.54 × 10−06 | 1.56 × 10−02 | |
| GO:0065003 | Macromolecular complex assembly | 35 | 2.2 | 1.08 × 10−05 | 1.76 × 10−02 | |
| GO:0006119 | Oxidative phosphorylation | 16 | 3.5 | 1.30 × 10−05 | 2.13 × 10−02 | |
| GO:0030244 | Cellulose biosynthetic process | 11 | 4.8 | 2.47 × 10−05 | 4.05 × 10−02 |