Literature DB >> 18940873

Comparison of correspondence analysis methods for synonymous codon usage in bacteria.

Haruo Suzuki¹, Celeste J Brown, Larry J Forney, Eva M Top.

Abstract

Synonymous codon usage varies both between organisms and among genes within a genome, and arises due to differences in G + C content, replication strand skew, or gene expression levels. Correspondence analysis (CA) is widely used to identify major sources of variation in synonymous codon usage among genes and provides a way to identify horizontally transferred or highly expressed genes. Four methods of CA have been developed based on three kinds of input data: absolute codon frequency, relative codon frequency, and relative synonymous codon usage (RSCU) as well as within-group CA (WCA). Although different CA methods have been used in the past, no comprehensive comparative study has been performed to evaluate their effectiveness. Here, the four CA methods were evaluated by applying them to 241 bacterial genome sequences. The results indicate that WCA is more effective than the other three methods in generating axes that reflect variations in synonymous codon usage. Furthermore, WCA reveals sources that were previously unnoticed in some genomes; e.g. synonymous codon usage related to replication strand skew was detected in Rickettsia prowazekii. Though CA based on RSCU is widely used, our evaluation indicates that this method does not perform as well as WCA.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Codon

Year: 2008 PMID： 18940873 PMCID： PMC2608848 DOI： 10.1093/dnares/dsn028

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

Introduction

Most amino acids are encoded by more than one codon, and these synonymous codons usually differ by one nucleotide in the third position. Generally, alternative synonymous codons are not used with equal frequency; their usage varies among different species, and often among genes within the same genome.[1] Three principal factors have been proposed to account for the intragenomic variation in synonymous codon usage. First, intragenomic variation in G + C content is mostly related to the existence of regions with unusual base composition, so-called genomic islands, that may be the result of recent horizontal DNA transfer.[2-4] Secondly, the excess of G over C in the leading strand of DNA replication relative to the lagging strand is observed in many bacteria, and this is thought to reflect strand-specific mutational bias.[5,6] Thirdly, genes expressed at high levels in fast-growing bacteria tend to preferentially use translationally optimal codons that are recognized by the most abundant tRNAs. This presumably reflects natural selection for synonymous codons that are translated more efficiently and accurately.[7,8] Thus, the use of synonymous codons in any gene can be the result of a mixture of these different evolutionary factors, and their relative contributions may vary among different species depending on their life history.[9-11] It follows that information on synonymous codon usage can be used to identify certain kinds of genes, e.g. those that have been horizontally transferred[12-14] or are highly expressed.[15-18] To reliably detect and quantify synonymous codon usage patterns, it is necessary to employ appropriate statistical methods. One such method is correspondence analysis (CA), a multivariate statistical method that can be used to summarize high dimensional data, such as codon counts, by reducing them to a limited number of variables, called axes.[19,20] The axes retain much of the information about the variability in codon usage among the genes, but in a way that makes those differences easier to understand. This method is widely used to identify major sources of variation in synonymous codon usage among genes. A common issue in synonymous codon usage analysis is that variation in amino acid composition among proteins is a confounding factor in assessing variation in synonymous codon usage among nucleotide sequences. Different approaches have been taken to remove such amino acid composition effects. Most commonly, CA is performed on modified codon usage data that have been adjusted for the frequency of the amino acids they encode. The resulting relative codon frequency (RF) and relative synonymous codon usage (RSCU) are used instead of the original codon count data, which is also referred to as the absolute codon frequency (AF). However, previous studies showed that for some genomes the use of RF and RSCU to remove amino acid composition effects introduced a bias associated with the low frequency of cysteine in proteins.[21,22] To validate findings, some researchers compared the results of CA using different input data (termed here CA-AF, CA-RF, and CA-RSCU).[21,23,24] The within-group CA (WCA) has been proposed as an alternative method to dissociate the effects of different amino acid compositions from the effects directly related to synonymous codon usage.[25] This method adjusts the value for each codon by the average value of all the codons encoding for the same amino acid using a different method than CA-RF or CA-RSCU. These four different CA methods have all been used for studying synonymous codon usage, but it remains unclear which one is the most effective. In spite of the lack of rigorous testing, CA-RSCU remains the most popular method.[26-37] In this paper, we have evaluated and compared four CA methods for the analysis of synonymous codon usage (CA-AF, CA-RF, CA-RSCU and WCA) by applying them to 241 bacterial genomes for which complete genome sequences were available. Our results indicate that WCA is more effective than the other three methods in generating axes corresponding to variation in synonymous codon usage.

Materials and methods

Sequences

Complete genome sequences of bacterial species in GenBank format[38] were retrieved from the NCBI FTP site (ftp://ftp.ncbi.nih.gov/genomes/Bacteria). In the case of species for which multiple strains have been sequenced, only one representative was randomly selected. An exception was made for the genomes of the following 10 strains, which were specifically selected as species representatives because they have been previously analyzed by CA: Borrelia burgdorferi B31 (B. burgdorferi B31),[21,39,40] Chlamydia trachomatis D/UW-3/CX (C. trachomatis D/UW-3/CX),[41] Clostridium perfringens 13 (C. perfringens 13),[42] Escherichia coli K12 MG1655 (E. coli K12 MG1655),[21,23,43] Haemophilus influenzae Rd KW20 (H. influenzae Rd KW20),[44] Helicobacter pylori 26695 (H. pylori 26695),[45] Mycoplasma genitalium G37 (M. genitalium G37),[21,46] Rickettsia prowazekii Madrid E (R. prowazekii Madrid E),[47] Thermotoga maritima MSB8 (T. maritima MSB8)[22] and Treponema pallidum Nichols (T. pallidum Nichols).[39] Moreover, genomes were excluded when genes used in the analysis (Section 2.4) were missing. The final data set included 241 genomes (see Supplementary Table S1 or S2 for a comprehensive list). All protein-coding sequences, except those containing letters other than A, C, G, or T were included in the analysis. Because methionine and tryptophan are generally encoded by only a single codon, the codons for methionine and tryptophan were excluded. Start and stop codons were also eliminated.

Definitions of codon usage data

We computed original codon count data, i.e. the AF, and two kinds of modified codon usage data that have been normalized for each individual amino acid. The latter included the RF, which is defined as the ratio of the number of occurrences of a codon to the sum of all synonymous codons[21,48] and the RSCU, which is defined as the ratio of the observed number of occurrences of a codon to the number expected if all synonymous codons were used with equal frequency.[49] The values of AF, RF and RSCU of the cth codon for the ath amino acid (AF, RF, and RSCU, respectively) were calculated as follows: where n is the number of occurrences of the cth codon for the ath amino acid, and d the degree of codon degeneracy for the ath amino acid. RF equals 1/d (e.g. 1/2 for cysteine and 1/6 for arginine) when alternative synonymous codons are used with equal frequency, and reaches the maximum value of 1 when only one of synonymous codons is used and all others are not present with value of 0. RSCU equals 1 when alternative synonymous codons are used with equal frequency, and attains its maximum value of d (e.g. 2 for cysteine and 6 for arginine) when only one of synonymous codons is used for the amino acid.

Implementation of CA

CA was implemented using the ‘dudi.coa’ and ‘within’ functions in the ‘ade4’[50] library of R.[51] CA takes multivariate data and combines them into a small number of variables (axes) that explains most of the variation among the original variables.[19,21,25] In our study our variables are the 59 codons for each gene in a genome, and the result of the CA yields the coordinates of each gene on each new axis. A matrix is created in which the rows correspond to the genes on one bacterial genome and the columns to the 59 codons, such that each row has the codon usage information for a specific gene. For the different CA methods, CA-AF, CA-RF, CA-RSCU, or WCA, the cells contain AF, RF, RSCU, or AF values, respectively, for each gene and codon. We provide a brief explanation of our implementation of CA for analyzing synonymous codon usage. For each genome, the matrix X = [x] is an input data table with N genes (rows) and 59 codons (columns). We denote the sum of values for the ith gene of X as x and the jth codon as x+. We denote the sum of all of the data in X as x++. The weight of the ith gene is defined as p = x+/x++, that of the jth codon is defined as p+ = x+/x++. The matrix Y has elements where p is the weight of each cell p = x/x++. The matrix Y for WCA is obtained by replacing the elements y in the matrix Y for CA-AF by , where the sum extends over all codons j encoding amino acid a. This subtraction centers the data in each cell based upon the value of the codons that encode a particular amino acid. In other words, the y values for WCA become the difference between the y values for CA-AF and their adjusted average. The matrix Z with elements is submitted to singular value decomposition, producing three matrices: Z = USVt. S is a diagonal matrix whose diagonal elements s are singular values, the matrices U and V have elements u and v, respectively (the superscript t is the transposition operator). The coordinates for the ith gene or the jth codon in the kth axis (g and c, respectively) are calculated as follows: The g scores are the values that are correlated with other gene features in the subsequent analyses (see Section 2.4). The contribution of the jth codon to the kth axis is given by . The sum of the contributions of all 59 codons to each axis is one; that is, . We compared the sum of the contributions of 18 codons with twofold degeneracy (those coding for asparagine, aspartic acid, cysteine, glutamic acid, glutamine, histidine, lysine, phenylalanine, and tyrosine) and the sum of the contributions of 18 codons with sixfold degeneracy (those coding arginine, leucine, and serine). Supplementary Table S1 shows the percentage of total variance explained by the first 10 axes, as generated by these four CA methods for 241 bacterial genomes. Because the percentage of variance explained by axes >3 was small overall, our subsequent analyses were focused on the first three axes.

Interpretation of axes generated by CA

To identify major sources of variation among genes on the axes generated by CA of codon usage data, we conducted two analyses that considered four commonly used features of protein-coding genes: GRAVY, GC3content, GC3skew, and Expression.[22,52] First, we tested for the correlation between scores of each of three axes [Equation (4)] and values of GRAVY, GC3content, or GC3skew. GRAVY is the mean of the sum of the hydropathic index of each amino acid in the protein, and thus reflects amino acid composition.[53] GC3content is the relative frequency of guanine and cytosine, (G + C)/(A + T + G + C), at the third codon position in the nucleotide sequence, and GC3skew is the deviation from equal amounts of guanine and cytosine, (G − C)/(G + C), at the third codon position in the nucleotide sequence. Pearson’s product moment correlation coefficient (r) between the axis scores and gene feature values was calculated. The square of r measures the percentage of variance; e.g. the square of 0.70 indicates that 49% of the variance in the axis scores is explained by the variance in the gene feature values. For each axis, the gene feature with an absolute r value (|r|) >0.70 was identified as the main source of variation among genes on the axis. At lower threshold |r| values, different gene features were detected on the same axis and/or the same gene feature was detected on more than one axis, and thus the interpretation of the axes becomes quite difficult. Additionally, low |r| values may be statistically significantly different from zero due to very large sample sizes, but weak correlations may have no biological meaning. Secondly, to analyze the correlation between scores of each of the three axes [Equation (4)] and levels of gene expression (Expression), we tested for the distribution of the axis scores for 40 genes expected to be expressed constitutively at high levels.[10] This set included the genes encoding translation elongation factors Tu (tuf), Ts (tsf) and G (fus), and 37 of the larger ribosomal proteins (encoded by genes rplA-rplF, rplI-rplT, and rpsB-rpsT). In each axis, the score for each gene was standardized by subtracting the mean and dividing by the standard deviation of scores for all protein genes. For each axis, Expression was detected as the main source of variation among genes on the axis when the mean absolute standard score for the 40 highly expressed genes was >1.644854 (an interval in which theoretically only 5% of all protein genes are included).

Results and discussion

Performance of different CA methods

CA summarizes high dimensional data, such as codon counts, by reducing them to a limited number of variables (axes). We tested the ability of the four CA methods, CA-AF, CA-RF, CA-RSCU, and WCA, to generate axes that correspond to variation in synonymous codon usage. We considered two commonly used gene features: GC3content is the G + C content at the third codon position, and GC3skew that reflects the bias in G over C content at the third codon position. We investigated how often these two gene features were correlated with one of the first three axes in 241 bacterial genomes (Table 1). To illustrate our method, Fig. 1 shows scatter plots of axis 1 scores obtained by the four methods, plotted against GC3skew for R. prowazekii Madrid E genes. At the threshold |r| value of 0.70, GC3skew values were significantly correlated with axis 1 scores of WCA (|r| = 0.84), but not with those of CA-AF (|r| = 0.46), CA-RF (|r| = 0.32), and CA-RSCU (|r| = 0.04). Thus, in R. prowazekii Madrid E, GC3skew was detected on axis 1 of WCA, but not on axis 1 of CA-AF, CA-RF, and CA-RSCU. GC3content was detected in 191 genomes when the WCA method was used, which was more than when CA-AF (150), CA-RF (143), or CA-RSCU (145) were used (Table 1A). Likewise, the total number of genomes where GC3skew was detected (108) was also greater when WCA was used than when CA-AF (46), CA-RF (30), and CA-RSCU (53) were used (Table 1B). Thus, WCA detected GC3content and GC3skew more often than CA-AF, CA-RF, and CA-RSCU.

Table 1

Numbers of genomes where the gene feature GC3content, GC3skew, or GRAVY was significantly correlated with one of three axes generated by different CA methods, CA-AF, CA-RF, CA-RSCU, and WCA, in 241 bacterial genomes

Method	Axis 1	Axis 2	Axis 3
A. GC3content
CA-AF	121	17	12
CA-RF	129	9	5
CA-RSCU	134	11	0
WCA	150	34	7
B. GC3skew
CA-AF	26	7	13
CA-RF	26	4	0
CA-RSCU	25	25	3
WCA	38	57	13
C. GRAVY
CA-AF	20	69	55
CA-RF	0	0	0
CA-RSCU	0	0	0
WCA	0	0	0

Figure 1

Scatter plot showing axis 1 scores obtained by different CA methods, CA-AF (A), CA-RF (B), CA-RSCU (C), and WCA (D), plotted against GC3skew for R. prowazekii Madrid E genes. Each point represents a gene. Numbers of genomes where the gene feature GC3content, GC3skew, or GRAVY was significantly correlated with one of three axes generated by different CA methods, CA-AF, CA-RF, CA-RSCU, and WCA, in 241 bacterial genomes It is important to note that these results remained similar when all complete bacterial genomic sequences available from the NCBI repository on August 2008 were included (data not shown). Similar results were obtained when only long sequences with >300 codons were used (data not shown). We also verified the consistency of the results when using detection thresholds below |r| = 0.70 (data not shown). Thus we conclude that WCA is more effective than the other three methods in generating axes that correspond to variation in synonymous codon usage, regardless of the data sets and statistical criteria used. WCA may have performed best because it does not mask variation in synonymous codon usage caused by amino acid composition and codon degeneracy. CA-AF may have performed worse because it is confounded by amino acid composition. CA-RF and CA-RSCU did not perform as well as WCA possibly because their input data depend on the degree of codon degeneracy, which differs among amino acids [d in Equations (2) and (3) in Section 2.2].[54] Later, we demonstrate these effects on the four CA methods.

Effect of amino acid composition and codon degeneracy in different CA methods

To determine the effect of amino acid composition, we tested the ability of the four CA methods, CA-AF, CA-RF, CA-RSCU, and WCA, to generate axes that correspond to variation in amino acid composition. The protein feature GRAVY, which represents the global hydrophobicity of proteins, can be used to measure the variation in amino acid composition among proteins.[55] We investigated how often GRAVY was correlated with one of the first three axes in 241 bacterial genomes. CA-AF detected the correlation between GRAVY and one of the first three axes in 144 genomes, whereas CA-RF, CA-RSCU, and WCA did not detect it (Table 1C). This result suggests that CA-AF can generate axes corresponding to variation in amino acid composition as well as synonymous codon usage, whereas CA-RF, CA-RSCU, and WCA never generate such axes because they compensate for differences in amino acid composition. The use of RF and RSCU to remove the confounding effects of amino acid composition introduces other effects associated with the degree of codon degeneracy, which may be pronounced for rare amino acids. To determine the effect of the difference in the degree of codon degeneracy between amino acids, we compared the contributions to axis 1 of nine amino acids with low (twofold) degeneracy and three amino acids with high (sixfold) degeneracy, totaling 18 codons each. This was done for the four CA methods, CA-AF, CA-RF, CA-RSCU, and WCA. Fig. 2 shows scatter plots of the contribution of twofold degenerate codons (y-axis) plotted against that of sixfold degenerate codons (x-axis) for 241 bacterial genomes. The scatter plots for CA-AF and WCA (Fig. 2A and D) displayed genome distributions less biased toward twofold or sixfold degenerate codons than the scatter plots for CA-RF and CA-RSCU (Fig. 2B and C). For CA-RF, 208 (86%) of the 241 genomes fell above the line y = x, indicating that twofold degenerate codons contributed more to the axis than sixfold degenerate codons in most genomes (Fig. 2B). For CA-RSCU, 238 (99%) of the 241 genomes were below the line y = x, indicating that sixfold degenerate codons contributed more to the axis than twofold degenerate codons in most genomes (Fig. 2C). Thus, CA-RF and CA-RSCU tend to generate axes corresponding to variation in low (twofold) and high (sixfold) degenerate codons, respectively. This observation can be explained by the dependence of their input data on the degree of codon degeneracy [d in Equations (2) and (3) in Section 2.2]. Thus, the use of RF and RSCU to remove effects of amino acid usage introduces other effects associated with the degree of codon degeneracy, whereas WCA does not. In spite of these shortcomings, these methods, in particular CA-RSCU, are still frequently used.[26-37] We recommend using WCA for analyzing synonymous codon usage.

Figure 2

Contributions of twofold and sixfold degenerate codons to axis 1, obtained by different CA methods, CA-AF (A), CA-RF (B), CA-RSCU (C), and WCA (D), for 241 bacterial genomes. Each point represents a genome.

Sources of intragenomic variation in synonymous codon usage among genes

We applied WCA to the genomes of 241 bacterial species to identify major sources of intragenomic variation in synonymous codon usage among genes. In addition to the two gene features described earlier (GC3content and GC3skew), gene expression level (Expression) was also considered. In 57 genomes, WCA detected one of the three gene features, GC3content, GC3skew, and Expression on axis 1 but none of the features on axes 2 and 3 (Supplementary Table S2). In 97 other genomes, WCA detected two of the three gene features on axes 1 and 2 but none of the gene features on axis 3. All three features were detected on the first three axes of 40 genomes, and only in nine genomes were no gene features detected on the first three axes. The results demonstrate that the three gene features can contribute to intragenomic variations in synonymous codon usage among genes, and that their relative contributions vary among different genomes. CA of codon usage data generated axes on which no gene feature was detected. There are three possible explanations for this observation. First, in some cases, the axis was moderately correlated with one of the gene features considered here, but the correlation was not strong enough to reach the detection threshold. For example in Shewanella putrefaciens CN-32, the |r| value between axis 1 of WCA and GC3content (0.68) was below the threshold |r| value of 0.70. Secondly, although the axis was not correlated with any of the gene features considered here, it may be correlated with other relevant gene features that can be determined computationally or experimentally; e.g. protein abundance[56] and mRNA half-life.[57] Thirdly, variation among genes on the axis, even if the axis accounts for the largest fractions of the total variation among genes, may have no biological meaning. These possibilities should be kept in mind when interpreting the axes generated by CA of codon usage data. For 10 genomes in our study that were previously analyzed by CA (Table 2), we compared our findings with previous conclusions. First, GC3content was detected as a primary source of synonymous codon usage variation among genes in E. coli K12 MG1655, M. genitalium G37, T. maritima MSB8, and H. pylori 26695. G + C content was previously detected in these first three genomes (previous analysis for H. pylori is not directly comparable). Intragenomic variation in G + C content mostly reflects the existence of regions with anomalous nucleotide composition, putatively acquired by horizontal transfer.[2] The exception to this is M. genitalium, in which intragenomic G + C variation is continuous along the genome.[58] Thus if the WCA axis clearly separates anomalous gene clusters from other genes, the axis scores can be used to predict genes that have recently transferred.

Table 2

Gene features that are significantly correlated with one of three axes generated by WCA in 10 bacterial genomes

Bacterial strain	Axis 1	Axis 2	Axis 3	References^a
B. burgdorferi B31	GC3skew	nd^b	nd	[21,39,40]
C. trachomatis D/UW-3/CX	GC3skew	Expression	nd	[41]
C. perfringens 13	Expression	nd	nd	[42]
E. coli K12 MG1655	GC3content	Expression	nd	[21,23,43]
H. influenzae Rd KW20	Expression	GC3content	nd	[44]
H. pylori 26695	GC3content	GC3skew	nd	[45]
M. genitalium G37	GC3content	nd	nd	[21,46]
R. prowazekii Madrid E	GC3skew	GC3content	nd	[47]
T. maritima MSB8	GC3content	nd	nd	[22]
T. pallidum Nichols	GC3skew	GC3content	nd	[39]

aPrevious studies, whose results do not necessarily agree with those shown here. See Section 3.3 for conflicts.

bnd, none of the gene features considered here were detected.

Gene features that are significantly correlated with one of three axes generated by WCA in 10 bacterial genomes aPrevious studies, whose results do not necessarily agree with those shown here. See Section 3.3 for conflicts. bnd, none of the gene features considered here were detected. The second feature, GC3skew was detected as a primary source of synonymous codon usage variation among genes in B. burgdorferi B31, C. trachomatis D/UW-3/CX, R. prowazekii Madrid E, and T. pallidum Nichols (Table 2 and Fig. 1). Intragenomic variation in GC3skew presumably reflects differences in mutational bias between the leading and lagging strands of replication.[5,6] This mutational bias was previously detected in each of these genomes, except R. prowazekii.[47] Thus in genomes where GC3skew is detected on axis 1 of WCA, the axis scores can be used to predict whether the gene is located on the leading or lagging strands. The third feature, Expression, was detected as a major source of synonymous codon usage variation among genes in C. trachomatis D/UW-3/CX, C. perfringens 13, E. coli K12 MG1655 and H. influenzae Rd KW20, which is consistent with previous findings (Table 2). The relative contribution of Expression varies among different genomes; e.g. Expression is a primary source in H. influenzae, while it is a secondary source in E. coli. The anomalous codon usage of highly expressed genes presumably reflects natural selection for optimal codons that are translated more efficiently and accurately; so-called translational selection.[7,8] In B. burgdorferi and M. genitalium, conflicting conclusions regarding the presence or absence of translational selection on synonymous codon usage have been reported.[21] In the present analysis, Expression was not detected in these two genomes, suggesting there is no evidence for translational selection. This is in agreement with conclusions drawn using a different statistical method.[10] Thus in genomes where Expression is detected by WCA, the axis scores can be used to predict gene expression level and compared with experimental expression data obtained by DNA microarray (transcriptomes) and 2D gel electrophoresis (proteomes).

Conclusion

Of the four CA methods, WCA was found to be most useful for the analysis of synonymous codon usage. Using WCA, it may be possible to find new factors that can explain variation in synonymous codon usage among genes, and improve the accuracy of identifying genes that have been horizontally transferred or are highly expressed.

Availability

All analyses are implemented using G-language Genome Analysis Environment version 1.8.3,[59,60] available at http://www.g-language.org/.

Supplementary Data

Supplementary data are available online at www.dnaresearch.oxfordjournals.org.

Funding

This project was supported by the Microbial Genome Sequencing Program of the National Science Foundation (EF-0627988), and by the National Institutes of Health grant R01 GM073821 from the National Institute of General Medical Sciences, and COBRE and INBRE grants P20RR016454 and P20RR16448 from the National Center for Research Resources, National Institutes of Health.

56 in total

1. Codon usage and the origin of P elements.

Authors: E Lerat; C Biémont; P Capy
Journal: Mol Biol Evol Date: 2000-03 Impact factor: 16.240

Review 2. Lateral gene transfer and the nature of bacterial innovation.

Authors: H Ochman; J G Lawrence; E A Groisman
Journal: Nature Date: 2000-05-18 Impact factor: 49.962

Review 3. Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes.

Authors: S Karlin
Journal: Trends Microbiol Date: 2001-07 Impact factor: 17.079

4. Correspondence analysis applied to microarray data.

Authors: K Fellenberg; N C Hauser; B Brors; A Neutzner; J D Hoheisel; M Vingron
Journal: Proc Natl Acad Sci U S A Date: 2001-09-04 Impact factor: 11.205

5. Synonymous codon usage in Pseudomonas aeruginosa PA01.

Authors: Russell J Grocock; Paul M Sharp
Journal: Gene Date: 2002-05-01 Impact factor: 3.688

6. Codon bias and base composition are poor indicators of horizontally transferred genes.

Authors: L B Koski; R A Morton; G B Golding
Journal: Mol Biol Evol Date: 2001-03 Impact factor: 16.240

7. Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces.

Authors: H Romero; A Zavala; H Musto
Journal: Nucleic Acids Res Date: 2000-05-15 Impact factor: 16.971

8. Characterizations of highly expressed genes of four fast-growing bacteria.

Authors: S Karlin; J Mrázek; A Campbell; D Kaiser
Journal: J Bacteriol Date: 2001-09 Impact factor: 3.490

9. Trends in codon and amino acid usage in Thermotoga maritima.

Authors: Alejandro Zavala; Hugo Naya; Héctor Romero; Héctor Musto
Journal: J Mol Evol Date: 2002-05 Impact factor: 2.395

10. Analysis of synonymous codon usage in 11 human bocavirus isolates.

Authors: Sheng Zhao; Qin Zhang; Xiaolin Liu; Xuemin Wang; Huilin Zhang; Yan Wu; Fei Jiang
Journal: Biosystems Date: 2008-02-21 Impact factor: 1.973

34 in total

1. Host selection and niche differentiation in sucking lice (Insecta: Anoplura) among small mammals in southwestern China.

Authors: Xiao-Hua Zuo; Xian-Guo Guo; Yin-Zhu Zhan; Dian Wu; Zhi-Hua Yang; Wen-Ge Dong; Li-Qin Huang; Tian-Guang Ren; Yong-Guang Jing; Qiao-Hua Wang; Xiao-Mei Sun; Shang-Jin Lin
Journal: Parasitol Res Date: 2010-12-08 Impact factor: 2.289

Review 2. You're one in a googol: optimizing genes for protein expression.

Authors: Mark Welch; Alan Villalobos; Claes Gustafsson; Jeremy Minshull
Journal: J R Soc Interface Date: 2009-03-11 Impact factor: 4.118

3. Selection on GGU and CGU codons in the high expression genes in bacteria.

Authors: Siddhartha Sankar Satapathy; Bhesh Raj Powdel; Malay Dutta; Alak Kumar Buragohain; Suvendra Kumar Ray
Journal: J Mol Evol Date: 2013-11-23 Impact factor: 2.395

4. Nucleotide composition bias and codon usage trends of gene populations in Mycoplasma capricolum subsp. capricolum and M. Agalactiae.

Authors: Xiao-Xia Ma; Yu-Ping Feng; Jia-Ling Bai; De-Rong Zhang; Xin-Shi Lin; Zhong-Ren Ma
Journal: J Genet Date: 2015-06 Impact factor: 1.166