| Literature DB >> 24896293 |
Slavica Dimitrieva1, Maria Anisimova2.
Abstract
In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24896293 PMCID: PMC4045579 DOI: 10.1371/journal.pone.0095034
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Bootstrap distribution of the differences in A) the mean ω-ratio, B) tRNA reusage, measured through tRNA Pairing Index (TPI), C) number of interactions and D) number of structures, between protein groups having site-to-site variation in synonymous rates (SRV+) and protein groups having constant synonymous rates (SRV−).
The plots B), C) and D) also show the bootstrap distributions of the corresponding differences between protein groups showing evidence for positive selection (PS+) and those failing to show such evidence (PS−). All differences (except for TPI in PS+/PS− data) are significant since 95% of the histogram area does not include the zero value.
Overrepresentation (+) and underrepresentation (−) of SRV and PS in different data categories.
| Pfam type | SRV | PS | ||
| Representation | P-value | Representation | P-value | |
| Protein Domains |
| 10−33 | − | 10−9 |
| Protein Families | − | 10−28 |
| 10−10 |
Differences between the mean values of the attribute (#interactions, #structures, codon bias and tRNA reusage) in SRV+ and SRV− data, and in PS+ and PS− data correspondingly.
| Attribute | Difference between attribute means in SRV+ and SRV− data (median [IQR]) | Difference between attribute means in PS+ and PS− data (median [IQR]) |
|
| 0.50 [0.48, 0.53 | −0.42 [−0.44, −0.39] |
|
| 17.72 [16.65, 18.80] | −11.38 [−10.12, −12.59] |
|
| 0.02 [0.019; 0.022] | −0.01 [−0.018; −0.011] |
|
| −1.3 [−1.39; −1.22] | 1.0 [0.84; 1.18] |
|
| 0.14 [0.13; 0.15] | −0.01 [−0.02; 0.005] |
|
| 0.042 [0.04; 0.043] | −0.041 [−0.038; −0.043] |
|
| 0.08 [0.078; 0.083], | −0.08 [−0.085; −0.076] |
All p-values are <10−16, except for the differences in mean values of tRNA reusage (TPI) between PS+/PS− data where there was no significance. This table corresponds to Figure 1 and Figure S2.
Over/under-representation of selective forces in GO categories for Cellular Component.
| GO Categories |
|
| #pfam | ||
| Over(+)/Under(−) represent. | Signif. | Over(+)/Under(−) represent. | Signif. | ||
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| intracellular |
|
| 872 | ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
| intracellular part |
|
| 773 | ||
|
|
|
|
| ||
|
|
|
|
| ||
| virion |
|
| 151 | ||
| virion part |
|
| 141 | ||
| viral capsid |
|
| 98 | ||
| viral envelope |
|
| 35 | ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
| macromolecular complex |
|
| 346 | ||
| ribosome |
|
| 98 | ||
| MHC protein complex | + | ** | 4 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notation: Significance levels are at the 5% (*), 1% (**), or 0.1% (***). Boldface indicates overrepresentation of SRV; italics indicates underrepresentation of SRV.
Over/under-representation of selective forces in GO categories for Molecular Function.
| GO Categories |
|
| #pfam | ||
| Over(+)/Under(−) represent. | Signif. | Over(+)/Under(−) represent. | Signif. | ||
|
| |||||
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| transferase activity |
|
|
|
| 444 |
| transferase activity, transferring one-carbon groups |
|
| 76 | ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
| structural molecule activity | 220 | ||||
| structural constituent of ribosome |
|
| 98 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| protein binding |
|
|
| ||
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| DNA binding |
|
| 368 | ||
|
|
|
|
| ||
| ion binding |
|
| 270 | ||
| cation binding |
|
| 269 | ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
Notation: Significance levels are at the 5% (*), 1% (**), or 0.1% (***). Boldface indicates overrepresentation of SRV; italics indicates underrepresentation of SRV.
Over/under-representation of selective forces in GO categories for Biological Processes.
| GO Categories |
|
| #pfam | ||
| Over(+)/Under(−) represen. | Signif. | Over(+)/Under(−) represent. | Signif. | ||
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| oxidation reduction |
|
| 98 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| regulation of biosynthetic process |
|
| 231 | ||
|
|
|
|
|
|
|
| regulation of metabolic process |
|
| 260 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
| ||
| nucleobase, nucleoside, nucleotide and nucl. acid m. proc. |
|
| 696 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| immune system process |
|
| 27 | ||
| immune response |
|
| 26 | ||
|
|
|
|
| ||
|
|
| ||||
| viral reproductive process | |||||
| viral assembly, maturation, egress, and release |
|
| 25 | ||
| virion assembly |
|
| 20 | ||
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
| regulation of cellular process |
|
| 332 | ||
| cellular localization |
|
| 85 | ||
| developmental process |
|
| 89 | ||
| response to stimulus |
|
| 202 | ||
| response to stress |
|
| 132 | ||
|
|
|
|
|
|
|
| response to wounding |
|
|
| ||
| immune response |
|
| 26 | ||
|
|
|
|
|
|
|
| macromolecule localization |
|
| 104 | ||
|
|
|
|
|
|
|
| cellular localization |
|
| 85 | ||
|
|
|
|
|
|
|
| pathogenesis |
|
| 71 | ||
| biological regulation |
|
| 384 | ||
| regulation of biological process |
|
| 356 | ||
| regulation of metabolic process |
|
| 260 | ||
| regulation of cellular process |
|
| 332 | ||
Notation: Significance levels are at the 5% (*), 1% (**), or 0.1% (***). Boldface indicates overrepresentation of SRV; italics indicates underrepresentation of SRV.
Over/under-representation of selective forces in KEGG Pathways.
| KEGG Pathway |
| PS | |||||
| Over(+)/Under(−) represen. | Sign. | #Genes | Over(+)/Under(−) represen. | Sign. | #Genes | ||
| Metabolism | 1434 |
| ** | 1484 | |||
|
|
|
|
| 313 | |||
|
|
|
|
| 26 | |||
| Pentose and glucuronate interconversions | 25 | + | *** | 25 | |||
|
|
|
|
| 36 | |||
| Ascorbate and aldarate metabolism | 26 | + | *** | 26 | |||
|
|
|
|
| + | *** | 54 | |
|
|
|
|
| 51 | |||
|
|
|
|
|
| *** | 178 | |
|
|
|
|
| 124 | |||
|
|
|
|
| 24 | |||
| Lipid Metabolism | 317 | 330 | |||||
|
| + | * | 44 | + | *** | 46 | |
|
|
|
|
| + | * | 17 | |
|
|
|
|
| 303 | |||
|
|
|
|
| 41 | |||
|
|
|
|
| 213 | |||
| Glycosaminoglycan degradation | 18 | + | * | 18 | |||
|
|
|
|
| 14 | |||
|
|
|
|
| 21 | |||
|
| + | * | 190 | 204 | |||
|
| + | * | 56 | + | *** | 65 | |
| Porphyrin and chlorophyll metabolism | 41 | + | *** | 41 | |||
| Xenobiotics Biodegradation and Metabolism | 156 | + | *** | 160 | |||
|
| + | *** | 66 | + | *** | 70 | |
|
| + | *** | 68 | + | *** | 72 | |
| Drug metabolism - other enzymes | 52 | + | *** | 52 | |||
|
| + | * | 560 |
| *** | 573 | |
| Translation | 143 | 143 | |||||
|
|
|
|
| 40 | |||
|
| + | * | 257 | 264 | |||
|
| + | * | 125 |
| *** | 132 | |
|
|
|
|
| 37 | |||
|
|
| ** | 34 | + | *** | 34 | |
|
|
|
|
| + | *** | 1480 | |
|
|
|
|
| 42 | |||
|
|
|
|
| 42 | |||
|
|
|
|
| 892 | |||
|
|
|
|
| 272 | |||
|
|
|
|
| 85 | |||
|
|
|
|
| 181 | |||
|
|
|
|
| 75 | |||
|
|
|
|
| 57 | |||
|
|
|
|
| 145 | |||
|
|
|
|
| + | *** | 750 | |
|
|
|
|
| + | *** | 295 | |
|
|
|
|
| + | *** | 84 | |
|
|
|
|
| + | *** | 130 | |
|
|
|
|
| + | *** | 1837 | |
|
|
|
|
| 213 | |||
|
|
|
|
| 213 | |||
| Cell Growth and Death | 210 |
| * | 225 | |||
|
|
|
|
| + | *** | 413 | |
|
|
|
|
| 201 | |||
|
|
|
|
| 78 | |||
|
|
|
|
| + | *** | 128 | |
|
|
|
|
| 96 | |||
|
|
|
|
| 381 | |||
|
|
|
|
| 136 | |||
|
|
|
|
| 102 | |||
| Adipocytokine signaling pathway | 61 |
| * | 66 | |||
| Immune System | 519 | + | *** | 547 | |||
|
|
|
|
| + | *** | 86 | |
| Natural killer cell mediated cytotoxicity | 132 | + | *** | 139 | |||
|
|
|
|
| + | *** | 117 | |
|
|
|
|
| + | *** | 429 | |
|
|
|
|
| + | *** | 381 | |
|
|
|
|
| + | *** | 53 | |
|
|
|
|
| 129 | |||
|
|
|
|
| 129 | |||
|
|
|
|
| 1025 | |||
|
|
|
|
| 378 | |||
|
|
|
|
| 312 | |||
|
|
|
|
| 83 | |||
|
|
|
|
| 50 | |||
|
|
|
|
| 55 | |||
|
|
|
|
| 69 | |||
|
|
|
|
| + | * | 229 | |
| Asthma | 30 | + | *** | 30 | |||
| Autoimmune thyroid disease | 53 | + | *** | 53 | |||
|
|
|
|
| 143 | |||
| Allograft rejection | 38 | + | *** | 38 | |||
| Graft-versus-host disease | 42 | + | *** | 42 | |||
| Neurodegenerative Diseases | 275 | 297 | |||||
|
|
|
|
| 162 | |||
|
|
|
|
| 124 | |||
|
|
|
|
| 172 | |||
|
|
|
|
| + | ** | 104 | |
|
|
|
|
| 43 | |||
| Type I diabetes mellitus | 42 | + | *** | 44 | |||
|
|
|
|
| 149 | |||
|
|
|
|
| 53 | |||
Notation: Significance levels are at the 5% (*), 1% (**), or 0.1% (***). Boldface indicates overrepresentation of SRV; italics indicates underrepresentation of SRV.
Figure 2Hierarchical clustering of human disease and environmental information processing pathways in respect to the SRV+ genes that are shared between the pathways.
The bars next to the pathways denote the number of SRV+ genes (red) and SRV- genes (green) in the corresponding pathways. Cancer related pathways are marked in blue; metabolic disease pathways are in purple. Note that ABC transporters and Type II diabetes mellitus pathways are exclusively composed of SRV+ genes.
Figure 3Distribution of the expression levels in A) SRV− genes (blue) and SRVEXT genes (red) and B) PS− genes (green) and PS+ genes (purple) for different tissues.
SRVEXT genes show higher expression levels compared to SRV− genes; PS+ genes show reduced expression levels compared to PS− genes.