Literature DB >> 22865738

The relation of codon bias to tissue-specific gene expression in Arabidopsis thaliana.

Salvatore Camiolo¹, Lorenzo Farina, Andrea Porceddu.

Abstract

The codon composition of coding sequences plays an important role in the regulation of gene expression. Herein, we report systematic differences in the usage of synonymous codons among Arabidopsis thaliana genes that are expressed specifically in distinct tissues. Although we observed that both regionally and transcriptionally associated mutational biases were associated significantly with codon bias, they could not explain the observed differences fully. Similarly, given that transcript abundances did not account for the differences in codon usage, it is unlikely that selection for translational efficiency can account exclusively for the observed codon bias. Thus, we considered the possible evolution of codon bias as an adaptive response to the different abundances of tRNAs in different tissues. Our analysis demonstrated that in some cases, codon usage in genes that were expressed in a broad range of tissues was influenced primarily by the tissue in which the gene was expressed maximally. On the basis of this finding we propose that genes that are expressed in certain tissues might show a tissue-specific compositional signature in relation to codon usage. These findings might have implications for the design of transgenes in relation to optimizing their expression.

Entities: Chemical Disease Species

Mesh：

Substances：
Codon
RNA, Transfer

Year: 2012 PMID： 22865738 PMCID： PMC3454886 DOI： 10.1534/genetics.112.143677

Source DB: PubMed Journal: Genetics ISSN： 0016-6731 Impact factor: 4.562

SYNONYMOUS codons encode the same amino acid, but occur at different frequencies in genes (Duret 2002; Plotkin ; Chamary and Parmley 2006; Plotkin 2011). Two models have been proposed to explain this phenomenon, which is known as codon usage bias. Whereas the neutralist model postulates that the observed pattern of codon bias is determined by local differences in mutational processes, the selective model proposes that synonymous codons coadapt to the abundances of tRNAs to optimize the efficiency and accuracy of translation. Theoretical considerations and simulation studies have suggested that the two models are not mutually exclusive, and that codon usage might reflect a balance between selective and mutational pressures (Bulmer 1991). Recent studies have suggested that this balance can differ substantially among species, and that its nature is highly dynamic within species (Rocha ; Plotkin 2011). The coadaptation model was formulated initially on the basis of the significant correlation between the usage of synonymous codons in highly expressed genes and the copy number of the genes encoding the isoacceptor tRNAs in unicellular organisms, such as Escherichia coli and Saccharomyces cerevisiae (Sharp and Li 1987). The model hypothesizes that the number of copies of a tRNA gene in a genome is a reliable proxy for the availability of that tRNA in the cell, and that the composition of the cellular pool of tRNAs is rather invariable. Direct measurement of tRNA abundances in yeast cells has, in fact, demonstrated a strong correlation between tRNA-gene copy number and transcript abundance (Dittmar ; Tuller ). However, recent studies have challenged the assumption that the availability of cellular isoacceptor tRNAs is constant, and instead propose that it depends on particular conditions or developmental stage (Najafabadi ). As a consequence, the translational efficiencies of genes would also not be constant, but instead would change in response to alterations in the availabilities of isoacceptor tRNAs. Consequently, genes with a very similar usage of synonymous codons should have similar expression patterns. Accordingly, using a wide variety of organisms, Najafabadi have shown a significant positive correlation between the level of coexpression of genes and the similarity in their codon usage. On the basis of this observation, they proposed that codon usage might be selected during evolution to synchronize the efficiency of translation with the functional requirements for the expression of specific proteins at certain times (Najafabadi and Salavati 2008; Najafabadi ). Codon usage bias is more complex in multicellular organisms than in unicellular organisms (Plotkin 2011). Given the presence of diverse cell types in an organism, there may be differences in codon bias among distinct tissues or organs. Studies in various multicellular eukaryotic organisms have indicated that both mutational bias and selective forces impact codon usage (Plotkin 2011). However, consensus on the relative contributions of these effects has yet to be reached. Plotkin have shown that human genes that are expressed specifically in organs as different as the brain and vulva have different patterns of synonymous codon usage, as do the orthologous genes in mouse. Waldman reported that different levels of coadaptation between codon usage and tRNA availability can be observed not only in different human tissues, but also at different developmental stages (e.g., adult human tissues show higher levels of coadaptation than fetal tissues). Together, these findings suggest that the expression pattern of a gene is a key determinant of its codon usage. Nonetheless, other studies have presented alternative explanations (Sémon ). For example, the unequal nucleotide composition in many eukaryotic genomes has been associated with the observed codon bias. Sémon argued that differences in codon bias among tissue-specific genes were driven by mutational biases that act on the GC content of genes, rather than coadaptation to pools of tRNA of different abundances. In addition, mutational biases that are mediated by gene expression are known to influence codon usage. For example, the transcription process, which distinguishes between the two complementary strands of DNA, might lead to differences in mutation rate between the two strands (Green ). Whereas the antisense strand is stabilized by the transcription machinery, the complementary strand is exposed and prone to mutation events, such as deamination (Green ). Such effects, in addition to bias of the transcription-coupled machinery, may result in a higher GC content of the coding strand (Green ; Majewski 2003). Indeed, Comeron (2004) confirmed the association between GC content at the third codon position and the pattern of gene expression, pointing to transcription-associated mutational bias (TAMB) as a possible force that contributes to such a correlation. Plants present several interesting features for the study of codon usage. Many plant cells are totipotent and they show outstanding developmental and physiological plasticity throughout their life span. This suggests that the structural features and composition of plant genes should be highly plastic to enable adaptation to many different conditions. Indeed, at least two lines of evidence have indicated a strong relationship between the structure and composition of plant genes and their patterns of expression in different tissues and organs. Seoighe demonstrated that pollen-specific genes in Arabidopsis have shorter and fewer introns than genes that are expressed in other organs. The authors claimed that this characteristic was a strong signature of gametophytic selection because it was common to all genes expressed in pollen, regardless of whether they were also expressed in other tissues. Whittle have provided additional evidence of an association between codon bias and expression in plant reproductive organs. The results of the study reported herein show that the expression pattern of Arabidopsis tissue-specific genes is an important factor in relation to their synonymous codon usage. Regionally and transcriptionally associated mutational biases were associated significantly with bias in synonymous codon usage but none of these factors could explain the differences fully. We present several lines of evidence to support the hypothesis that the evolution of codon bias is an adaptive response to different availabilities of tRNAs in distinct tissues.

Materials and Methods

Sequences and expression data

Coding sequences from Arabidopsis thaliana were downloaded from the Arabidopsis Information Resources (TAIR) website (ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datsets/TAIR8_blastsets/TAIR8_cds_20080412). The list of markers proposed by Schmid was used as an initial source of tissue-specific genes (http://www.weigelworld.org/resources/microarray/AtGenExpress/) and was complemented with expression data from eight A. thaliana tissues or organs, which were retrieved from the Genome Expression Omnibus repository at the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/gds/). Each microarray experiment was replicated three times, and in a given experiment, genes were considered to be expressed when the corresponding probeset was detected significantly in all replicates and the hybridization signal was never below 75 (technical threshold). In cases in which all the above conditions were met, the expression level of a gene in a given experiment was calculated as the mean value of the hybridization signal of the corresponding probeset in the three experimental replicates. For consistency with other articles on the same topic, the word “tissue” is used here to indicate either a tissue or an organ, and therefore does not correspond to a strict histological meaning. Microarray experiments were assigned to tissues on the basis of the attached experimental description (Schmid ). The complete list of microarray experiments analyzed is reported in Supporting Information, Table S1. The expression level of each gene in a given tissue corresponded to its highest level of expression in the experiments that were classified as belonging to that tissue. A gene was classified as being expressed specifically in a given tissue when the level of expression in that tissue was >100 and below the technical threshold in all other tissues. Only tissue-specific genes that were expressed in one of the five tissues that had >45 tissue-specific genes (pollen, flower, root, seed, and shoot apex) were considered for further analyses. The complete list of tissue-specific genes used in the present study is provided in Table S2a. Genes that were expressed in more than one tissue were classified according to the tissue in which they were expressed at the highest level. The expression breadth (EB) index indicates the number of tissues in which a gene was expressed. For example, the root EB2 dataset included all genes that were expressed in two tissues, but at a higher level in roots. A complete list of EB2 and EB_5-6-7 locus names are provided in Tables S2b and Table S2c, respectively. Datasets of EB1 genes that were expressed at high or low level were generated by applying the internal double randomization procedure (see below) to sample 45 genes from subdatatsets including either the 70 most highly or the 70 least expressed EB1 genes of each tissue.

Measure of codon bias

The synonymous codon usage bias was measured using the relative synonymous codon usage (RSCU) parameter, which estimates the extent to which the use of a given synonymous codon deviates from the frequency of its use if all codons for each amino acid were used equally (Sharp 1987).

Nonparametric multivariate analysis of variance and multivariate analysis of covariance

Nonparametric multivariate analysis of variance (MANOVA) and multivariate analysis of covariance (MANCOVA) were performed on each dataset using the PERMANOVA software (Anderson 2001), which considers the sum of squared distances between points and their centroids to be equal to the sum of squared interpoint distances divided by the number of points. An additive partitioning of the sum of squares was obtained from distances measured directly from the distance matrix, without calculating the central locations of points. This provided a pseudo-F ratio with which the multivariate hypothesis could be tested. Significance was calculated using a permutation approach that shuffled the observations among classification levels and recalculated the pseudo-F statistic after each permutation. The resulting distribution of pseudo-F values was used as a reference to test the significance of the observed pseudo-F value. The only assumption of this test is that the observations are exchangeable under a true null hypothesis. In all cases analyzed, this assumption was verified previously using the PERMDISP software package (Anderson 2001).

Dataset construction by internal double randomization sampling

An internal double randomization scheme that involved both the genes within each tissue-specific list (rows) and RSCU variables within each synonymous codon family (columns) was used for dataset construction. Briefly, to reduce the intrinsic correlation between RSCU variables of synonymous codons, one randomly selected codon for each amino acid was omitted from the analysis. Thus, each combination analyzed consisted of 41 variables, which included 9 [= (2 − 1) × 9] variables from the nine twofold degenerate amino acids, 2 [= (3 − 1) × 1] variables from the threefold degenerate amino acids, 15 [= (6 − 1) × 3] variables from the three sixfold degenerate amino acids, and 15 [= (4 − 1) × 5] variables from the five fourfold degenerate amino acids. The second randomization concerned the rows. For each RSCU combination, we extracted balanced datasets by randomly sampling an equal number of genes from each tissue-specific list. The intergenic GC content of EB1 genes was calculated as the frequency of guanine and cytosine in nontranscribed genomic sequences that flanked tissue-specific genes. TAMB is expected to cause strand asymmetries in which the proportion of G relative to C and T relative to A is increased in the coding strand (Green ). The effect of TAMB was measured as the G + T content of the transcribed strand of introns, because these are unconstrained regions of genes (Green ; Comeron 2004). As a measure of the codon adaptation of a gene, we used the metric Fop (Stenico ), which represents the proportion fraction of “optimal” codons in a gene. The optimal codons were identified on the basis of the copy number of the corresponding tRNA genes in accordance with Wright . The level of expression of EB1 genes was measured by the expression peak (pE) and average expression level (avgEL). The pE of an EB1 gene was the highest level of signal hybridization in experiments assigned to a given tissue. The avgEL of EB1 genes was the average of the hybridization signals of experiments that showed a value of hybridization signal >75 (technical threshold).

Post hoc comparisons

Pairwise comparisons between tissues:

Post hoc comparisons among levels of the classification variable (tissues) were performed using PERMANOVA software (Anderson 2001). The level of significance (P) for each test was obtained by using separate sets of permutations across the compared tissue pair.

Mutual information:

Mutual information (MI) between codon usage and expression pattern was calculated in accordance with Najafabadi . In brief, given a variable γ and a cluster α (lists of genes expressed specifically in a given tissue), we were interested in determining whether the distribution of γ in α was random. The MI was used to represent such nonrandomness. To calculate MI, the genes in α + α′ (where α′ contains all the genes not contained in α) of α are sorted on the basis of the value of γ for each gene and are divided into m equally populated bins. A 2 × m table is formed in which the element e1, shows the number of genes in the ith bin that are in α, and the element e2, shows the number of genes in the ith bin that are in α′ (1 ≤ i ≤ m). The value of MI across this table is then calculated as described previously (Elemento ). To examine whether the MI thus obtained is significantly higher than would be expected from a random distribution, the gene-cluster assignments are shuffled randomly n times, MI is calculated each time, and the probability of observing a random MI that is equal to or larger than the original MI is calculated. In the work described herein, m (the number of bins) was set to five for the analysis of the MI. Gene-cluster assignments were shuffled 104 times to assess the significance of MI. The variable γ was the normalized frequency of a synonymous codon in a given gene. This was calculated as the usage of that codon divided by the usage of the corresponding amino acid in the encoded protein. This statistic was calculated only when the corresponding amino acid was encoded more than five times in the open reading frame. Gene clusters were defined on the basis of the list of tissue-specific genes. The MI of each codon for the five tissues was calculated using the MI RSCU package of the ICodPack suite (Najafabadi ).

Tissue signature

The resemblance between genes in terms of codon usage was calculated as the interpoint distance in multivariate space. Accordingly, the distance between datasets was calculated as the average interpoint distance of all points in one dataset from all points in the other dataset. Regarding the definitions of these distances, we defined “cognate” distances as distances between datasets that shared the same classification variable (for example, the root-specific dataset and root EB2 dataset), and “noncognate” distances as distances between datasets that did not share the classification variable. The significance of dataset distances was estimated by constructing a parameter reference distribution by shuffling tissue-specific genes among datasets. As a measure of distance, we used the pseudo-F statistic as calculated using PERMANOVA software (Anderson 2001).

Results

We used nonparametric MANOVA to analyze differences in codon usage among genes that were expressed in Arabidopsis in a tissue-specific manner (Anderson 2001). Each gene was represented as a point in a multidimensional space that was defined by the RSCU variables (Sharp and Li 1987), and the degree of “resemblance” between genes was extrapolated from their interpoint Euclidean distances. In cases in which the expression pattern had a significant effect on codon usage, the mean distance between genes expressed in different tissues was expected to exceed the mean distance between genes expressed in the same tissue. All distance measures calculated indicated that there was a significant association between the pattern of gene expression and codon usage (Table 1). The robustness of this conclusion was tested using 500 datasets that were generated by the internal double randomization strategy (Materials and Methods). Briefly, for each of the 50 RSCU combinations that we extracted, we generated 10 datasets by sampling an equal number of genes from each list of genes expressed specifically in the root, pollen, seed, flower, and shoot apex. Nonparametric MANOVA was performed for each dataset, and the significance of the calculated pseudo-F statistic was estimated as described in Materials and Methods. The results are reported in Table 1.

Table 1

Results generated using PERMANOVA by the analysis of 500 datasets that were produced following the internal double randomization scheme

	Average pseudo-F	No. of datasets with significance at P < 0.05 (%)	No. of datasets with significance at P < 0.01 (%)
EB1	1.57	483 (96.6)	423 (84.6)
EB2	1.48	482 (95.4)	420 (84.0)
EB_5-6-7	1.41	383 (76.6)	233 (46.6)

PERMANOVA used the pseudo-F to test the null hypothesis that there was no difference among the levels of the classification variable. The number of permutations used to assess the significance of F was set to 4999. The last two columns represent the number (and percentage) of the 500 analyses that indicated a significant effect at P < 0.05 or P < 0.01, respectively (see Materials and Methods for more detail). EB, expression breadth. In total, 483 of the 500 subdatasets (96.6%) showed significant differences in codon bias among tissues (P < 0.05). Thirty-nine of the 50 RSCU combinations (78%) showed significant differences (P < 0.05) in all 10 analyzed subdatasets, six combinations were significantly different in 9 of 10 datasets, and the others had a minimum of 6 datasets that showed significant differences in codon usage bias among tissues. Sémon have pointed out recently that differences in codon usage can be attributed to local mutational biases acting on clusters of human tissue-specific genes. Given that Arabidopsis genes with similar expression patterns tend to cluster within the genome (Williams and Bowles 2004), we investigated whether the same principle applies to the Arabidopsis genome by introducing the average GC content of intergenic regions as a covariate in the analysis. The effect of this introduction, although significant, did not perturb substantially the significant association between expression pattern and codon usage (Table 2). A similar picture emerged when the measure of TAMB was analyzed as a covariate. Although the G + T content of introns, which is used as an index of TAMB, correlated with the codon bias, its effect did not abolish the influence of expression pattern.

Table 2

Summary of results generated by using PERMANCOVA through the analysis of 500 datasets that were produced following the internal double randomization scheme

	Effect of the main factor (tissue)		Effect of the covariate
Covariate	Average pseudo-F	No. of datasets with significance at P < 0.05 (%)	Average pseudo-F	No. of datasets with significance at P < 0.05 (%)
GC intergenic	1.58	482 (96.4)	1.41	137 (27.4)
Expr level (pE)	1.35	337 (67.4)	3.03	490 (98.0)
Expr level (avgEL)	1.32	310 (62.0)	3.04	500 (100.0)
Fop (tRNA)	1.54	471 (94.2)	12.39	500 (100.0)
G_i + T_i	1.34	318 (63.6)	1.67	292 (58.5)

The covariables were the GC content of intergenic sequences, the G + T content of introns, the Fop, and the expression level measured as either pE or avgEL. The number of permutations used to assess the significance of both the covariable and the independent variable pseudo-F was set to 999 (see Materials and Methods for more detail). Finally, we tested whether the observed association between expression pattern and codon usage was determined predominantly by the level of gene expression. If this was the case, the effect of expression pattern would simply reflect differences in the outcome of translational selection (Plotkin 2011). Following previous approaches, the expression levels of tissue-specific genes were measured either as the maximum level of transcript accumulation (pE) or the average level of transcript accumulation (avgEL) and were introduced as a covariate in the analysis. However, again this did not affect the outcome substantially (Table 2). An as yet untested possibility is that different tissues are subject to different strengths of selection on codon usage, which results in some tissues showing greater codon usage bias than others. We tested this hypothesis using the metric Fop (Stenico ), a measure of codon adaptation to tRNA gene copy number, as a covariate instead of the expression level. Although the effect of Fop was significant, it did not abolish the association between expression pattern and codon usage bias (Table 2). A plausible explanation for the above findings is that tissue-specific genes have a synonymous codon composition that is adapted to the compositional abundance of tRNA pools, which might be different in distinct tissues. If so, we could speculate that EB1 genes that are expressed at high levels in certain tissues should be compositionally more diverse than EB1 genes that are expressed at low levels in the same tissues. To test such a hypothesis, first we analyzed the compositional diversity between high and low EB1 genes of different tissues. These analyses were performed only for root, seed, and pollen, which all had an adequate number of tissue-specific genes. For root and seed EB1 genes, the expression level was associated significantly with synonymous codon bias (Table 3). Pollen EB1 genes did not show compositional differences that were associated with expression level. The differences among EB1 genes of different tissues that were highly or poorly expressed were then analyzed. Interestingly, highly expressed genes were differentiated significantly more than poorly expressed ones (P < 0.01; see Table 3). The complete data on pairwise comparisons of EB1 genes are reported in Table S3.

Table 3

Summary of results generated by PERMANOVA on 500 datasets that were produced by applying the double randomization scheme to the datasets of EB1 genes that were expressed at high or low levels

	Average pseudo-F	No. of datasets with significance at P < 0.05 (%)	No. of datasets with significance at P < 0.01 (%)
Root	1.74	298 (59.6)	148 (29.6)
High vs. low
Seed	1.70	315 (63)	138 (27.6)
High vs. low
Pollen	0.91	7 (1.45)	2 (0.4)
High vs. low
EB1 low	1.285	153 (30.6)	43 (8.6)
EB1 high	2.06	492 (98.4)	476 (95.2)

Root, seed, or pollen are summarized. PERMANOVA results between EB1 genes of different tissues that were expressed at either high or low levels are summarized. In each case, the significance was calculated by a permutation approach that involved 4999 permutations. We analyzed pairwise comparisons between tissues as a first step toward dissecting the differences in codon usage among tissues. Again, the analyses were conducted by using the internal double randomization sampling strategy, which generated 500 datasets. The degree of similarity between any pair of tissues was measured by the number of pairwise comparisons that showed a significant association between expression pattern and codon usage. The analysis of the complete pattern of pairwise similarities suggested the presence of two main groups of tissues (Figure 1, Table S4). One group corresponded to the shoot apex and pollen, which were differentiated in only 5.2% of the comparisons analyzed. The other cluster was formed by roots and flowers, which were differentiated in 28% of the comparisons and were both highly differentiated from shoot apex and pollen. Seed tissue was in an intermediate position between the two main clusters.

Figure 1

Differentiation between tissue-specific datasets. Dendrogram representing the average differentiation among tissue-specific genes in terms of synonymous codon usage.

Differentiation between tissue-specific datasets. Dendrogram representing the average differentiation among tissue-specific genes in terms of synonymous codon usage. The MI of the usage of each codon in each tissue was then analyzed. A high MI value means not random usage of the corresponding codon among genes that are expressed in the same tissue. Figure 2 shows that the use of several codons was not random in a number of tissues. This suggests that certain synonymous codons are preferred in genes that are expressed preferentially in those tissues. The frequencies of synonymous codons within each tissue are reported in Table S5.

Figure 2

Mutual information of synonymous codon usage. The mutual information (MI) of synonymous codon usage in tissue-specific genes is significantly higher than that expected from a random distribution. Each row represents a cluster of genes expressed in a tissue-specific manner, whereas each column represents a codon. Statistical significance is expressed as −log(P). Having demonstrated an association between the tissues in which genes were expressed specifically and codon usage, next we investigated the frequencies of synonymous codons in more widely expressed genes. Genes that were expressed in more than one tissue were grouped into datasets according to the tissue in which they were expressed at the highest level. For example, the root EB2 dataset included all genes with maximal expression in roots that were also expressed in only one other tissue. Multivariate analysis of variance, which was carried out as described previously for tissue-specific genes (EB1 datasets), also revealed differential codon usage among EB2 datasets (Table 1). On the basis of this finding, the distances between EB2 and EB1 datasets were analyzed to determine whether genes that were expressed in a given tissue/organ had a particular signature in terms of codon usage (referred to as the tissue signature). We expected that the genes in a given EB2 dataset and cognate EB1 dataset would be more similar in relation to codon usage than the same set of EB2 genes and a randomly selected set of EB1 genes. In terms of distances in multivariate space, this means that the centroid of the EB2 dataset for a given tissue should be closer to the centroid of the cognate EB1 genes than to the centroid of genes selected randomly from the complete EB1 gene list. To test this hypothesis, we used the pseudo-F statistic, calculated using nonparametric MANOVA, as a measure of the distance between centroids. Interestingly, the pollen and shoot apex EB2 datasets were closer to their cognate EB1 genes than to the set of randomly chosen EB1 genes, whereas the root, flower, and seed EB2 genes were closer to the random EB1 centroid (Table 4). Thus, the pollen- and shoot apex-specific EB2 genes showed a tissue signature, but the root-, flower-, and seed-specific genes did not.

Table 4

The tissue signature is the ratio between the average distance of EB2 (or EB_5-6-7) genes from a randomly selected set of EB1 genes and the average distance of the same genes from the cognate EB1 genes

	Cognate EB1 distance	Random EB1 distance	Tissue signature	P
EB2 dataset
Flower	1.19	1.16	0.98	<0.001
Pollen	1.04	1.17	1.12	<0.0001
Root	1.03	0.99	0.96	<0.0001
Seed	1.00	0.97	0.96	<0.0001
Shoot apex	0.88	1.26	1.43	<0.0001
EB_567 dataset
Flower	1.05	0.97	0.92	<0.0001
Pollen	1.22	1.24	1.02	0.343
Root	1.14	1.10	0.96	0.011
Seed	1.30	1.29	0.99	0.420
Shoot apex	1.25	1.44	1.16	<0.0001

Distances were calculated as the average pseudo-F, which in turn was calculated using nonparametric MANOVA.

Distances were calculated as the average pseudo-F, which in turn was calculated using nonparametric MANOVA. Datasets of genes expressed in the range of five to seven tissues (EB_5-6-7 genes) were generated as described above, using the tissue in which the genes were expressed maximally as the classification variable. The tissue signature in the shoot apex and in pollen was confirmed in the EB_5-6-7 datasets, although a minor statistical support was observed (Table 4). The three other tissues showed no tissue signature.

Discussion

The results of the study reported herein suggest that Arabidopsis genes that are expressed in a tissue-specific manner show distinct patterns of synonymous codon usage depending on the tissue in which they are expressed predominantly. Measures of bias that were estimated for each synonymous family of codons enabled us to eliminate the possibility of bias in amino acid usage as a major confounding effect (Sémon ). An alternative explanation is that the choice of synonymous codons in tissue-specific genes is influenced by local differences in genomic or functional features. The classical view of the relationship between synonymous codon usage and gene expression assumes the presence of a constant pool of tRNAs in an organism throughout different life conditions and physiological stages. All genes cope with the same pool of tRNAs, and their preference for some synonymous codons over others would solely be a function of their average level of expression. Studies involving plants have provided several pieces of evidence that are consistent with this classical view. For instance, Wright have shown that the choice of synonymous codons in Arabidopsis genes correlates with the average level of expression of the genes. A similar situation has been reported by Ingvarsson (2007) for Populus tremula, and by Wang and Roosinck (2006) for several other species. One might envisage that if our observations are translated into this conceptual framework, the different codon biases of tissue-specific genes should reflect differences in the extent of translational selection mediated by the expression levels. However, our results demonstrated a significant association between the expression pattern and codon usage bias even after the effect of expression level, measured as maximum or average level of transcript accumulation, had been eliminated (Table 2). In addition, the hypothesis that the intensity of selection on codon usage differs among tissues cannot explain the observed association between synonymous codon usage and expression pattern. Indeed, the association between expression pattern and codon usage remained significant even after the effect of the adaptation of synonymous codon usage to tRNA gene copy number was controlled for. Another possibility is that the reported differences result from differences in the mutational biases that act on clusters of tissue-specific genes. A similar idea was advanced by Sémon to explain different codon usage in human tissue-specific genes. Given that Arabidopsis genes with similar breadths of expression tend to cluster at close genomic positions (Williams and Bowles 2004), it is possible that regional mutational bias could explain a substantial portion of the differences in codon usage. However, in a nonparametric MANCOVA, the GC content of intergenic regions did not weaken the effect of the expression pattern substantially (Table 2). Similar conclusions regarding the limited importance of local mutational bias for synonymous codon usage were drawn by Morton and Wright (2007) from analyses of a large gene pool with a wide range of expression profiles. In addition, our observation that highly expressed tissue-specific genes were more diverse than poorly expressed ones points to an adaptive mechanism for the regulation of codon usage, which is presumably mediated by selection. Recent articles on multicellular organisms have provided direct evidence that the relative availability of isoacceptor tRNAs can vary among cell types (Dittmar ). The central role of expression level in the choice of synonymous codons in genes has been questioned (Dittmar ; Najafabadi ). Currently, the more popular view is that codon usage in genes should reflect the availabilities of tRNAs when these genes are expressed most actively. In support of this idea, Plotkin (2011) reported the presence of a tissue-specific codon bias in human genes. More recently, Najafabadi and Salavati (2008) and Najafabadi extended this concept by providing evidence that codon usage is correlated universally with gene function. A series of experiments that were conducted using several unicellular and multicellular organisms demonstrated that the pattern of gene expression could explain codon usage bias in genes more adequately than the average expression level (Najafabadi ). More importantly, the same study provided experimental data that showed that variation in the tRNA content of a cell can alter the response to environmental changes in terms of the regulation of protein expression and cell phenotypes (Najafabadi ). On the basis of the above-mentioned theory, the observed differences in codon usage among Arabidopsis tissue-specific genes might be associated with differences in the availabilities of tRNAs in distinct tissues. Of course, causality can be hypothesized in both directions. Whereas the first hypothesis holds that synonymous codons are selected in tissue-specific genes on the basis of tRNA availabilities, the second hypothesis holds that tRNA availabilities differ among tissues to meet the requirements of the genes that are expressed in those tissues. Unfortunately, the two hypotheses cannot be investigated independently because, if not determined otherwise, the relative synonymous codon usage of tissue-specific genes is the unique proxy for tRNA availability. Regarding the first hypothesis mentioned above, there is no a priori reason to think that tissue-specific genes are the only genes that are adapted to the locally available pool of tRNAs. Similarly, it is possible that genes that are expressed in multiple tissues have also been subject to selection to adapt to the tRNA pools on which they depend for optimal expression. In other words, their synonymous codon usage should reflect the balance between the responses to the adaptive selection that they have experienced. If adaptation for expression in a given tissue is particularly important for the function of a coding sequence then a gene would preferentially use the codons “typical” for that tissue, which in our construction are inferrable from the EB1 gene sequences. Put differently, codon usage in the EB2 datasets should be closer to that of the cognate EB1 genes than to that of EB1 genes selected randomly from the “all tissues” list. Indeed, such characteristics were observed for EB2 genes that were expressed maximally in pollen and shoot apices. As a consequence these two tissues were said to display a “tissue signature” in relation to codon usage. Given that both of these organs are involved in reproduction, it is possible that the observed signature is caused by TAMB. Accordingly, Comeron (2004) demonstrated that human genes that are expressed in testis have a synonymous codon usage that is influenced by the mutagenic effect of transcription. However, the results of our assessment of the impact of TAMB on synonymous codon usage in tissue-specific genes have ruled out the possibility that this factor could account for the observed tissue effect (Table 2). Moreover, we have verified that the mutational bias that is associated with transcription is not responsible for the theorized pollen and shoot apex tissue signatures (S. Camiolo and A. Porceddu, unpublished results). Sémon . have cautioned against overinterpretation of the biological relevance of statistically significant differences in codon bias among human tissues. In this regard, we appreciate that the observed tissue effect, although supported statistically, is very weak. It is important to note that several technical issues might tune the strength of such an effect. Indeed, the threshold of expression by which genes were assigned to the EB1 and EB2 datasets was arbitrary and the choice of this threshold could have inflated the entity of the tissue signature effect. However, the randomization scheme that was used should have minimized the confounding effect of this factor, which should increase the robustness of the conclusions drawn. Moreover, the tissue signature of the shoot apex and pollen could be reproduced for the EB_5-6-7 datasets of these tissues, albeit with weaker statistical support in the latter case. This finding deserves particular attention, because we predict that the number of variables that influence codon usage is correlated positively with the breadth of expression of a gene. Time, through its influence on either physiology or development, is the most obvious variable that has not been considered in our models. Given that the pool of available tRNAs could vary over time in a given cell type, genes that are expressed at different times in a tissue could differ somewhat in their codon compositions. We did not perform dedicated tests to verify this hypothesis, but it should be emphasized that genes that were grouped in clusters on the basis of their expression pattern and without any regard to the timing of their expression, showed a significant level of similarity in relation to codon usage. Genome-wide analyses of gene expression during Arabidopsis development have revealed that, despite the large overlap in expressed genes, organ systems have distinct transcriptional signatures (Schmid ). By applying a principal component analysis (PCA) to the transcriptional profiles detected throughout the development of various Arabidopsis organs, Schmid have found that the PCA distance between organ systems reflects well the overall morphological similarities between the organs analyzed, with developmental stage and environmental conditions being only minor contributors. If we extend the analogy between morphological and transcriptional similarities to the availability of different tRNAs, we should expect that the compositional similarities between EB1 genes should also fit to a morphological criterion. However, our analyses on the compositional resemblance between Arabidopsis tissue-specific genes provided only slight support for this prediction. In fact, whereas the intermediate position in terms of codon usage of seeds between pollen and roots showed a reasonable correspondence with the PCA distances reported in the Schmid article, we have no plausible explanation for the relative compositional similarities of flower and root or shoot apex and pollen EB1 genes. These findings suggest that the regulation of codon usage of tissue-specific genes is a complex phenomenon that goes beyond a simple transcriptional control of tRNA genes. In conclusion, we expect the relationship between codon usage and tRNA availability to be highly dynamic, with both temporal and spatial factors playing important roles. The multivariate approach used in the present study demonstrated that the tissue plays an important role, at least for genes specifically or preferentially expressed in pollen or the shoot apex. This finding should be considered when optimizing coding sequences for use in biotechnological applications.

29 in total

Review 1. Evolution of synonymous codon usage in metazoans.

Authors: Laurent Duret
Journal: Curr Opin Genet Dev Date: 2002-12 Impact factor: 5.578

2. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors: P M Sharp; W H Li
Journal: Nucleic Acids Res Date: 1987-02-11 Impact factor: 16.971

3. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias.

Authors: P M Sharp; W H Li
Journal: Mol Biol Evol Date: 1987-05 Impact factor: 16.240

Review 4. Synonymous but not the same: the causes and consequences of codon bias.

Authors: Joshua B Plotkin; Grzegorz Kudla
Journal: Nat Rev Genet Date: 2010-11-23 Impact factor: 53.242

5. Tissue-specific codon usage and the expression of human genes.

Authors: Joshua B Plotkin; Harlan Robins; Arnold J Levine
Journal: Proc Natl Acad Sci U S A Date: 2004-08-16 Impact factor: 11.205

6. Exploring the regulation of tRNA distribution on the genomic scale.

Authors: Kimberly A Dittmar; Evelyn M Mobley; Agnes Jancso Radek; Tao Pan
Journal: J Mol Biol Date: 2004-03-12 Impact factor: 5.469

7. Dependence of mutational asymmetry on gene-expression levels in the human genome.

Authors: Jacek Majewski
Journal: Am J Hum Genet Date: 2003-07-24 Impact factor: 11.025

8. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata.

Authors: Stephen I Wright; C B Kenneth Yau; Mark Looseley; Blake C Meyers
Journal: Mol Biol Evol Date: 2004-06-16 Impact factor: 16.240

9. Coexpression of neighboring genes in the genome of Arabidopsis thaliana.

Authors: Elizabeth J B Williams; Dianna J Bowles
Journal: Genome Res Date: 2004-06 Impact factor: 9.043

10. Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence.

Authors: Josep M Comeron
Journal: Genetics Date: 2004-07 Impact factor: 4.562

20 in total

Review 1. The Code of Silence: Widespread Associations Between Synonymous Codon Biases and Gene Function.

Authors: Fran Supek
Journal: J Mol Evol Date: 2015-11-04 Impact factor: 2.395

2. Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome.

Authors: Monisha Nath Choudhury; Arif Uddin; Supriyo Chakraborty
Journal: Genetica Date: 2017-04-18 Impact factor: 1.082

3. LOTTE-seq (Long hairpin oligonucleotide based tRNA high-throughput sequencing): specific selection of tRNAs with 3'-CCA end for high-throughput sequencing.

Authors: Lieselotte Erber; Anne Hoffmann; Jörg Fallmann; Heike Betat; Peter F Stadler; Mario Mörl
Journal: RNA Biol Date: 2019-09-16 Impact factor: 4.652

Review 4. Codon optimality, bias and usage in translation and mRNA decay.

Authors: Gavin Hanson; Jeff Coller
Journal: Nat Rev Mol Cell Biol Date: 2017-10-11 Impact factor: 94.444

5. Bicluster pattern of codon context usages between flavivirus and vector mosquito Aedes aegypti: relevance to infection and transcriptional response of mosquito genes.

Authors: Susanta K Behura; David W Severson
Journal: Mol Genet Genomics Date: 2014-05-18 Impact factor: 3.291

9. Antagonistic relationships between intron content and codon usage bias of genes in three mosquito species: functional and evolutionary implications.

Authors: Susanta K Behura; Brajendra K Singh; David W Severson
Journal: Evol Appl Date: 2013-07-24 Impact factor: 5.183

10. New insights into the interplay between codon bias determinants in plants.

Authors: S Camiolo; S Melito; A Porceddu
Journal: DNA Res Date: 2015-11-05 Impact factor: 4.458