| Literature DB >> 17881030 |
Patrick C Y Woo1, Beatrice H L Wong, Yi Huang, Susanna K P Lau, Kwok-Yung Yuen.
Abstract
Using the complete genome sequences of 19 coronavirus genomes, we analyzed the codon usage bias, dinucleotide relative abundance and cytosine deamination in coronavirus genomes. Of the eight codons that contain CpG, six were markedly suppressed. The mean NNU/NNC ratio of the six amino acids using either NNC or NNU as codon is 3.262, suggesting cytosine deamination. Among the 16 dinucleotides, CpG was most markedly suppressed (mean relative abundance 0.509). No correlation was observed between CpG abundance and mean NNU/NNC ratio. Among the 19 coronaviruses, CoV-HKU1 showed the most extreme codon usage bias and extremely high NNU/NNC ratio of 8.835. Cytosine deamination and selection of CpG suppressed clones by the immune system are the two major independent biochemical and biological selective forces that shape codon usage bias in coronavirus genomes. The underlying mechanism for the extreme codon usage bias, cytosine deamination and G+C content in CoV-HKU1 warrants further studies.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17881030 PMCID: PMC7103290 DOI: 10.1016/j.virol.2007.08.010
Source DB: PubMed Journal: Virology ISSN: 0042-6822 Impact factor: 3.616
Coronavirus genomes used in the present study
| Coronavirus | Host | GenBank accession no. | Reference | Genome size (bases) | G + C content (%) | GC skew | Mononucleotide frequencies (%) | Nc | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| G | A | U | C | ||||||||
| Group 1a | |||||||||||
| TGEV | Pig | 28,586 | 37.5 | 0.097 | 20.6 | 29.5 | 32.9 | 17.0 | 44.737 | ||
| FIPV | Cat | 29,355 | 38.1 | 0.102 | 21.0 | 29.2 | 32.7 | 17.1 | 46.150 | ||
| PRCV | Pig | 27,550 | 37.4 | 0.107 | 20.7 | 29.3 | 33.2 | 16.7 | 44.406 | ||
| Group 1b | |||||||||||
| HCoV-229E | Human | 27,317 | 38.2 | 0.129 | 21.6 | 27.2 | 34.6 | 16.7 | 44.281 | ||
| HCoV-NL63 | Human | 27,553 | 34.4 | 0.161 | 20.0 | 26.3 | 39.2 | 14.4 | 37.275 | ||
| PEDV | Pig | 28,033 | 42.0 | 0.086 | 22.8 | 24.7 | 33.2 | 19.2 | 48.424 | ||
| BtCoV | Bat | 28,203 | 40.1 | 0.102 | 22.1 | 26.2 | 33.7 | 18.0 | 46.905 | ||
| Bat-CoV HKU2 | Bat | 27,164 | 38.9 | 0.140 | 22.2 | 24.9 | 35.1 | 16.8 | 43.342 | ||
| Group 2a | |||||||||||
| HCoV-OC43 | Human | 30,738 | 36.8 | 0.176 | 21.7 | 27.6 | 35.6 | 15.2 | 43.791 | ||
| CoV-HKU1 | Human | 29,926 | 32.0 | 0.188 | 19.0 | 27.8 | 40.1 | 13.0 | 35.671 | ||
| BCoV | Cattle | 31,028 | 37.1 | 0.174 | 21.8 | 27.4 | 35.5 | 15.3 | 43.856 | ||
| PHEV | Pig | 30,480 | 37.2 | 0.164 | 21.7 | 27.3 | 35.4 | 15.6 | 44.380 | ||
| MHV | Mouse | 31,357 | 41.7 | 0.142 | 23.9 | 26.0 | 32.3 | 17.9 | 51.237 | ||
| Group 2b | |||||||||||
| SARS-CoV | Human | 29,751 | 40.7 | 0.020 | 20.8 | 28.5 | 30.7 | 20.0 | 49.423 | ||
| Bat-SARS-CoV HKU3 | Bat | 29,728 | 41.1 | 0.027 | 21.1 | 28.4 | 30.5 | 20.0 | 49.882 | ||
| Group 2c | |||||||||||
| Bat-CoV HKU4 | Bat | 30,286 | 37.8 | 0.093 | 20.7 | 27.6 | 34.6 | 17.1 | 44.585 | ||
| Bat-CoV HKU5 | Bat | 30,488 | 42.9 | 0.004 | 21.6 | 26.6 | 30.4 | 21.4 | 53.230 | ||
| Group 2d | |||||||||||
| Bat-CoV HKU9 | Bat | 29,114 | 41.0 | 0.138 | 23.3 | 25.3 | 33.7 | 17.7 | 46.162 | ||
| Group 3 | |||||||||||
| IBV | Chicken | 27,608 | 37.9 | 0.144 | 21.7 | 28.9 | 33.2 | 16.2 | 45.777 | ||
Codon usage fractions in coronaviruses
a Codons with CpG are in red and codons of amino acids that use either NNC or NNU as the codon are in green. (For interpretation of the references to colour in this table legend, the reader is referred to the web version of this article.)
Codon usage fractions in different hosts of coronaviruses
a Codons with CpG are in red and codons of amino acids that use either NNC or NNU as the codon are in green. (For interpretation of the references to colour in this table legend, the reader is referred to the web version of this article.)
Relative abundance of the 16 dinucleotides in the 19 coronavirus species with complete genomes available
a Numbers > 1.23 and < 0.78 are shown in red and green, respectively. (For interpretation of the references to colour in this table legend, the reader is referred to the web version of this article.)
Fig. 1Correlation between CpG dinucleotide abundance and NNU/NNC ratio in the 19 coronavirus genomes.
Fig. 2Mean frequencies of 64 trinucleotides in the 19 coronavirus genomes. The dots and the bars represent the mean frequencies and the 95% confidence intervals of the trinucleotides. The dotted line represents the frequency of each trinucleotide (1/64 = 0.015625) if the bases are distributed in random. The CpG containing trinucleotides are in red.
Fig. 3Correlations among mononucleotide frequencies in the 19 coronavirus genomes. The symbols for the various coronaviruses are the same as those used in Fig. 1.