Literature DB >> 19255636

Unusual codon usage bias in low expression genes of Vibrio cholerae.

Surajit Basak1, Indranuj Mukherjee, Mayukh Choudhury, Santasabuj Das.   

Abstract

Positive correlation between gene expression and synonymous codon usage bias is well documented in the literature. However, in the present study of Vibrio cholerae genome, we have identified a group of genes having unusually high codon usage bias despite being low potential expressivity. Our results suggest that codon usage in lowly expressed genes might also be selected on to preferably use non-optimal codons to maintain a low cellular concentration of the proteins that they encode. This would predict that lowly expressed genes are also biased in codon usage, but in a way that is opposite to the bias of highly expressed genes.

Entities:  

Keywords:  CAI; RSCU; correspondence analysis; effective number of codons; tRNA copy number

Year:  2008        PMID: 19255636      PMCID: PMC2646191          DOI: 10.6026/97320630003213

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

In most species synonymous codons are not used with equal frequencies; the phenomenon known as codon usage bias. Codon bias is generally governed by a balance between mutation, genetic drift and natural selection [1-5]. In various organisms, such as Escherichia coli and Saccharomyces cerevisiae, synonymous codon usage bias has been shown to be correlated with the abundance of isoaccepting tRNA [6]. An optimal codon is thought to increase translation rate [7-9]. On the other hand, the presence of non-optimal codons has been postulated to reduce translation rate [10], probably due to a relative scarcity of cognate tRNA species. Non-optimal codons have selective advantage to maintain a low cellular concentration of the proteins that they encode [11]. It was reported previously that non-optimal codons occur at high frequency in the signal sequence of secretory genes in Escherichia coli [12]. The high occurrence of non-optimal codons in the signal sequence of secretory proteins has also been observed in the gram-positive bacterium Streptomyces coelicolor [13]. Apart from gene expression level, gene length also has important role in affecting synonymous codon usage bias. Several earlier studies have also documented strong effects of gene length on codon bias in a variety of organisms. The level of synonymous codon usage bias has been shown to be positively correlated to gene length in Escherichia coli [14]. In Drosophila genome, longer genes had lower codon usage bias [15]. However, Hou and Yang [16] reported that in S. penumoniaes genome, the longer genes had higher expression level and higher codon usage bias. Cholera remains a heavy burden to human health in some developing countries including India where sanitation is poor and health care is limited [17-20]. After the publication of the complete genome sequence of Vibrio cholerae [21], the etiological agent of cholera, extensive possibilities, earlier unavailable, have opened up to understand the genetic organization of Vibrio cholerae. The present study demonstrates an unusual trend in synonymous codon usage pattern of lowly expressed genes of Vibrio cholerae genome. Contrary to the usual expectation, we have identified 138 genes that are highly biased yet lowly expressed. Moreover, the usage pattern of non-optimal codons in lowly expressed genes depends on the gene length. Our results clearly suggest that translational selection has significant influence on the codon usage pattern of lowly expressed genes depending on gene length.

Methodology

The complete genome sequence of Vibrio cholerae has been downloaded from ftp://ftp.ncbi.nih.gov/genbank/genomes and the coding sequences were extracted. To minimize the sampling errors [22], only those coding sequences that are more than or equal to 30 amino acids has been retained for our analysis. Correspondence analysis [23] available in CodonW 1.4.2 (J. Peden, 2000; http://www.molbiol.ox.ac.uk/cu/) was used to investigate the major trend in relative synonymous codon usage variation among the genes. We have also used CodonW to calculate Relative Synonymous Codon Usage (RSCU) values and gene length. Synonymous codon usage bias was measured by calculating the ’effective number of codons used in a gene‘ (Nc) [22,24]. The values of Nc range from 20 (when one codon is used per amino acid) to 61 (when all the codons are used with equal probability). In the present study, a gene is designated as highly biased if Nc≫ 36, and lowly biased if Nc>44. We have used Codon Adaptation Index (CAI) to calculate gene expressivities. CAI is widely accepted as an effective measure of potential level of gene expression [25]. CAI of individual genes were calculated taking a reference gene set of all the ribosomal proteins, which are known to be highly expressed in most bacterial genomes [26-28]. We have sorted our dataset according to the CAI values. We have taken out genes from extreme 20% of population from both ends of the sorted dataset. Using the above criteria, a gene is considered as lowly expressed if its CAI ≫ 0.318 and highly expressed if it's CAI>0.502. The transfer RNA gene copy numbers were taken from the tRNA scan database (http://lowelab.cse.ucsc.edu/GtRNAdb/Vibr_chol/). The Student's t-test was used to evaluate the significance of all the pairwise differences. The correlation coefficients were determined using SPSS (13.0) to assess the statistical significance of the correlation, if any.

Results and Discussion

Codon usage bias and gene expression level

Many studies have demonstrated a positive correlation between degree of codon bias and level of gene expression [25,29,30]. As a result, it is generally expected that lowly expressed genes should have lower codon bias and highly expressed genes should have higher codon bias. When analysis was performed taking all the genes in V. cholerae genome, we also observed a negative correlation (r= -0.2994, P≫ 0.01) between Nc and CAI which indicates that degree of codon bias increases with the increase in gene expression level. However, careful inspection of the plot between Nc and CAI reveals (Figure 1) that though lowly biased genes are lowly expressed but not all the highly biased genes are highly expressed. Finally, we have identified 138 genes which show unusual pattern of codon usage (i.e., high codon usage bias yet lowly expressed).
Figure 1

Variation of Effective Number of Codons (Nc) against Codon Adaptation Index (CAI).

Correspondence analysis on RSCU: Identification of translationally non-optimal codons

We have performed correspondence analysis (CoA) on the set of highly and lowly expressed genes. Since codon usage by its very nature is multivariate, one of the most popular multivariate methods for studying codon usage variation is correspondence analysis [23]. Correspondence analysis identifies the major trends in the variation of the synonymous codon usage data and distributes genes along continuous axes in accordance with these trends. Correspondence analysis on relative synonymous codon usage (RSCU) detected one major trends of codon usage variation on first axis of inertia. The first axis accounted for 17.48% of the total variation and no other axis accounted for more than 6.97% of the total variation. As expected, the position of the genes along the first major axis is significantly correlated with the corresponding CAI values (r= -0.9648, P≫ 0.01). By looking at the distribution of codons along the first two major axes (Figure 2), we have identified five most preferred codons in lowly expressed genes situated at the most extreme positions in the positive side of the Axis 1. The names of these codons are: AGG, CGA, CGG, AUA, and AGA. Non-optimal codons are defined by their low usage in the genome and the low abundance of their corresponding tRNA [8,31]. If we compare the RSCU values of the above five codons among other synonymous alternatives (Table 1 in supplementary material), we can see that they are used less frequently among other synonyms. Moreover, we have used tRNA gene copy number data (Table 1 in supplementary material) to assess the abundance of their corresponding tRNA. Table 1 also suggests that the five most preferred codons in lowly expressed genes has either nil or lowest abundance of the corresponding tRNA gene copy number. Thus, these five codons can also be considered as the non-optimal codons.
Figure 2

The distribution of codons of all Vibrio cholerae genes along the first and second axes of the correspondence analysis.

Correspondence analysis on RSCU of low expression genes

We have identified a set of 138 genes showing unusual pattern of synonymous codon usage, i.e., they are highly biased (low Nc) but lowly expressed (low CAI). Therefore, correspondence analysis was performed on RSCU of lowly expressed genes to analyze the differential nature of selective constraints acting on synonymous codon usage pattern of lowly expressed genes. CoA detected one single explanatory axis of major synonymous codon usage variation. The first major axis accounted for 9.58% of the total variation in codon usage and no other axis accounted for more than 5.34% of the total variation. More importantly, we have noted that position of the genes along the first major axis is significantly correlated with Nc (r= -0.3395, P≫ 0.01) and gene length (r= -0.2818, P≫0.01). Thus genes placed at positive side of Axis 1 are highly biased and of smaller length. We have also compared the average length of lowly biased and highly biased groups of lowly expressed genes. The average length of highly biased genes is 50.15 and the average length of lowly biased genes is 372.08. This difference in gene length between lowly biased and highly biased groups of lowly expressed genes is statistically significant at P≫0.001. We have also analyzed the codon distribution along first major axis generated from CoA on RSCU of lowly expressed genes (data not shown). The name of the most preferred codons at the extreme of the positive side of Axis 1 is: AGG, AUA, UCA, ACA, AGA and those at the extreme of the negative side of Axis 1 are: CGC, CGG, CCG, CGU, CUG. One interesting observation is that among the five most preferred codons at the positive side of Axis 1, three are non-optimal codons (please see section Correspondence analysis on RSCU: Identification of translationally non-optimal codons). On the other hand, there is no non-optimal codon present among the most preferred codons at the negative side of Axis 1.

Non-optimal codon usage: Shorter gene length and greater codon biasness of lowly expressed genes

From the above results it is clear that the frequencies of non‐optimal codons are greater in the highly biased group of lowly expressed genes and the average length of these group of genes are significantly smaller than the average length of lowly biased lowly expressed genes. The presence of non‐optimal codons has been postulated to reduce translation rate [10], probably due to a relative scarcity of cognate tRNA species. Considering these facts it is reasonable to argue that selective constraints on the usage of non‐optimal codons are greater in the highly biased groups of lowly expressed genes than lowly biased lowly expressed genes. If synonymous codon usage pattern among the lowly expressed genes is explained by selection to reduce translational rate, is this consistent with the length effect? Several earlier studies have also documented the influence of gene length on codon bias in a variety of organisms [32,33]. Powell and Moriyama [34] hypothesized that length effect could be explained by selection for translation rate, e.g., in a short gene with 100 codons, such mutation would increase translation time by 1%, whereas the same mutation in a gene with 1000 codons would increase translation time by only 0.1%. In the present study, among the lowly expressed genes, mutations of non-optimal codon will have greater relative effect in smaller genes compared with larger genes. Thus such mutations are likely to be counter selected in short genes than in long genes.

Conclusion

In summary, the present study attempts to focus on the unusual trends in synonymous codon usage pattern in lowly expressed genes of V. cholerae genome. Selection forces governing the synonymous codon usage in bacterial genes usually vary across or within the genomes. One might, therefore, expect to observe species- specific and/or gene‐specific trends in synonymous codon usage pattern. This study finds that selective preference of the non-optimal codons in shorter lowly expressed genes has made them highly bias and might have a greater role in translational pausing to allow correct folding of proteins. The unusual pattern of synonymous codon usage observed in a subset of lowly expressed genes of Vibrio cholerae genome may provide a new starting point for the study of the organism's environmental and pathobiological characteristics. It will be interesting to see if the synonymous codon usage pattern could influence to determine the gene expressions that are unique to its survival and replication during human infection [35] as well as in the environment [17,36]
  34 in total

1.  Towards a resolution on the inherent methodological weakness of the "effective number of codons used by a gene".

Authors:  T Banerjee; S K Gupta; T C Ghosh
Journal:  Biochem Biophys Res Commun       Date:  2005-05-20       Impact factor: 3.575

2.  Secretory signal sequence non-optimal codons are required for expression and export of beta-lactamase.

Authors:  Yaramah M Zalucki; Karlee L Gittins; Michael P Jennings
Journal:  Biochem Biophys Res Commun       Date:  2007-11-29       Impact factor: 3.575

3.  On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes.

Authors:  Surajit Basak; Sujata Roy; Tapash Chandra Ghosh
Journal:  FEBS Lett       Date:  2007-11-29       Impact factor: 4.124

4.  The selection-mutation-drift theory of synonymous codon usage.

Authors:  M Bulmer
Journal:  Genetics       Date:  1991-11       Impact factor: 4.562

5.  Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy?

Authors:  A Eyre-Walker
Journal:  Mol Biol Evol       Date:  1996-07       Impact factor: 16.240

6.  Regulation and temporal expression patterns of Vibrio cholerae virulence genes during infection.

Authors:  S H Lee; D L Hava; M K Waldor; A Camilli
Journal:  Cell       Date:  1999-12-10       Impact factor: 41.582

7.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

Review 8.  Environmental reservoir of Vibrio cholerae. The causative agent of cholera.

Authors:  R R Colwell; A Huq
Journal:  Ann N Y Acad Sci       Date:  1994-12-15       Impact factor: 5.691

9.  Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system.

Authors:  T Ikemura
Journal:  J Mol Biol       Date:  1981-09-25       Impact factor: 5.469

10.  DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae.

Authors:  J F Heidelberg; J A Eisen; W C Nelson; R A Clayton; M L Gwinn; R J Dodson; D H Haft; E K Hickey; J D Peterson; L Umayam; S R Gill; K E Nelson; T D Read; H Tettelin; D Richardson; M D Ermolaeva; J Vamathevan; S Bass; H Qin; I Dragoi; P Sellers; L McDonald; T Utterback; R D Fleishmann; W C Nierman; O White; S L Salzberg; H O Smith; R R Colwell; J J Mekalanos; J C Venter; C M Fraser
Journal:  Nature       Date:  2000-08-03       Impact factor: 49.962

View more
  3 in total

1.  Genome-Wide Identification and Analysis of Lipases in Fig Wasps (Chalcidoidea, Hymenoptera).

Authors:  Xianqin Wei; Jiaxing Li; Tao Wang; Jinhua Xiao; Dawei Huang
Journal:  Insects       Date:  2022-04-24       Impact factor: 3.139

2.  Predictive role of mitochondrial genome in the stress resistance of insects and nematodes.

Authors:  Akshay Pandey; Shubhankar Suman; Sudhir Chandna
Journal:  Bioinformation       Date:  2010-06-24

3.  Genome-Wide Analysis of Chemosensory Protein Genes (CSPs) Family in Fig Wasps (Hymenoptera, Chalcidoidea).

Authors:  Zhaozhe Xin; Dawei Huang; Dan Zhao; Jiaxing Li; Xianqin Wei; Jinhua Xiao
Journal:  Genes (Basel)       Date:  2020-09-29       Impact factor: 4.096

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.