| Literature DB >> 20018979 |
Abstract
Most genomes are heterogeneous in codon usage, so a codon usage study should start by defining the codon usage that is typical to the genome. Although this is commonly taken to be the genomewide average, we propose that the mode-the codon usage that matches the most genes-provides a more useful approximation of the typical codon usage of a genome. We provide a method for estimating the modal codon usage, which utilizes a continuous approximation to the number of matching genes and a simplex optimization. In a survey of bacterial and archaeal genomes, as many as 20% more of the genes in a given genome match the modal codon usage than the average codon usage. We use the mode to examine the evolution of the multireplicon genomes of Agrobacterium tumefaciens C58 and Borrelia burgdorferi B31. In A. tumefaciens, the circular and linear chromosomes are characterized by a common "chromosome-like" codon usage, whereas both plasmids share a distinct "plasmid-like" codon usage. In B. burgdorferi, in addition to different codon-usage biases on the leading and lagging strands of DNA replication found by McInerney (McInerney JO. 1998. Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA. 95:10698-10703), we also detect a codon-usage similarity between linear plasmid lp38 and the leading strand of the chromosome and a high similarity among the cp32 family of plasmids.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20018979 PMCID: PMC2839124 DOI: 10.1093/molbev/msp281
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Some Examples of Genomes Where There Is a Large Difference between the Average and the Mode.
| Organism | CDS | G + C | G + C3 | Average | Mode | Difference |
| 2525 | 55.8 | 58.4 | 45.5 | 65.2 | 19.6 | |
| 2075 | 27.4 | 12.5 | 66.5 | 83.1 | 16.6 | |
| 681 | 26.0 | 12.0 | 62.3 | 78.6 | 16.3 | |
| 1523 | 29.1 | 13.5 | 51.1 | 65.5 | 14.3 | |
| 1978 | 49.5 | 53.0 | 39.3 | 52.6 | 13.3 | |
| 2732 | 29.4 | 16.4 | 57.0 | 69.9 | 12.9 | |
| 1866 | 35.2 | 25.0 | 44.4 | 56.1 | 11.7 | |
| 2618 | 33.5 | 22.5 | 59.9 | 70.7 | 10.8 | |
| 2162 | 40.6 | 35.9 | 38.2 | 48.2 | 10.0 | |
| 1688 | 28.9 | 21.2 | 50.3 | 58.0 | 7.7 |
Number of coding sequences in genome (all replicons combined).
Percentage G + C for protein-encoding genes.
Mean G + C content for nucleotides in the third codon position.
Percentage of genes matching the modal or average codon usage for the entire genome of that organism.
Percentage of Genes Matching the Modal Codon Usages of Replicons in Agrobacterium tumefaciens C58.
| Genes of Replicon | CDS | Matching Mode of | |||
| Circular Chromosome | Linear Chromosome | pAt | pTi | ||
| Circular chromosome | 2,765 | 62.2 | 61.4 | 34.9 | 30.0 |
| Linear chromosome | 1,851 | 61.2 | 62.3 | 32.5 | 27.6 |
| pAt | 542 | 26.8 | 30.1 | 64.4 | 61.3 |
| pTi | 197 | 20.8 | 24.4 | 55.3 | 59.9 |
Number of coding sequences in the replicon.
The Distance between the Modal Codon Usages of Agrobacterium tumefaciens Replicons.
| Replicon 1 | Replicon 2 | Distance between Replicon Modes | Distance between Shuffled Replicons |
| Circular chromosome | Linear chromosome | 0.062 | 0.035 ± 0.006 |
| Circular chromosome | pAt | 0.423 | 0.049 ± 0.009 |
| Circular chromosome | pTi | 0.469 | 0.068 ± 0.009 |
| Linear chromosome | pAt | 0.390 | 0.053 ± 0.012 |
| Linear chromosome | pTi | 0.430 | 0.069 ± 0.008 |
| pAt | pTi | 0.106 | 0.067 ± 0.007 |
Average ± standard deviation of distances between codon-usage modes of simulated replicons with a random partitioning of the combined set of genes.
FGene-by-gene plot of the codon usage of Agrobacterium tumefaciens C58. From outside to inside: circular chromosome, linear chromosome, pAt, and pTi. Each wedge represents a gene. Orange genes match the codon usage of the combined chromosomes, magenta genes match the codon usage of the combined plasmids, teal genes match both the chromosomes and plasmids, and black genes match neither.
FNeighbor-Joining tree of codon usage of Borrelia burgdorferi replicons containing more than 30 genes. Each tip in the tree represents the modal codon usage of a replicon, genome, or set of genes (see text). For each B. burgdorferi modal codon usage, the modal codon usage of the three most similar genomes is also included to help visualize the significant groupings. The tree is shown arbitrarily rooted at its midpoint. The reference bar represents a codon usage distance of 0.1 (Materials and Methods).