| Literature DB >> 15041183 |
Wanjun Gu1, Tong Zhou, Jianmin Ma, Xiao Sun, Zuhong Lu.
Abstract
In this study, we calculated the codon usage bias in severe acute respiratory syndrome Coronavirus (SARSCoV) and performed a comparative analysis of synonymous codon usage patterns in SARSCoV and 10 other evolutionary related viruses in the Nidovirales. Although there is a significant variation in codon usage bias among different SARSCoV genes, codon usage bias in SARSCoV is a little slight, which is mainly determined by the base compositions on the third codon position. By comparing synonymous codon usage patterns in different viruses, we observed that synonymous codon usage pattern in these virus genes was virus specific and phylogenetically conserved, but it was not host specific. Phylogenetic analysis based on codon usage pattern suggested that SARSCoV was diverged far from all three known groups of Coronavirus. Compositional constraints could explain most of the variation of synonymous codon usage among these virus genes, while gene function is also correlated to synonymous codon usages to a certain extent. However, translational selection and gene length have no effect on the variations of synonymous codon usage in these virus genes.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15041183 PMCID: PMC7127446 DOI: 10.1016/j.virusres.2004.01.006
Source DB: PubMed Journal: Virus Res ISSN: 0168-1702 Impact factor: 3.303
Identified ORFs (length > 150 bps) in the SARSCoV (TOR2 isolation) genomea, b
| Gene product | ENC | GC3S (%) | ||
| Putative orf1ab polyprotein | 21222 | 48.47 | 32.20 | −0.60 |
| Orf1a polyprotein | 13149 | 48.24 | 33.10 | −0.57 |
| Putative spike glycoprotein | 3468 | 45.73 | 28.30 | −0.85 |
| Putative uncharacterized protein | 825 | 47.66 | 34.50 | −0.37 |
| Putative uncharacterized protein | 465 | 42.80 | 45.10 | 1.34 |
| Putative small envelope protein E | 231 | 59.06 | 38.70 | 0.34 |
| Putative protein M | 666 | 59.04 | 42.50 | 0.51 |
| Putative uncharacterized protein | 192 | 42.19 | 28.80 | −1.08 |
| Putative uncharacterized protein | 269 | 43.05 | 30.60 | −0.55 |
| Putative nucleocapsid protein | 1269 | 54.16 | 37.60 | 0.49 |
| Putative uncharacterized protein | 297 | 46.62 | 58.10 | 1.87 |
L represents the length of identified ORF.
f1′ represent the first axis values of each gene in CA.
Phylogenetic breakdown, accession number, GC3S and the first two axis values in CA of 11 selected viruses in order Nidoviralesa, b
| Organism | Accession number | GC3S(%) | ||||||
| HCoV 229E | NC_002645 | 30.89 | −0.84 | −0.16 | ||||
| PEDV | NC_003436 | 37.32 | −0.04 | 0.42 | ||||
| TGV | NC_002306 | 27.02 | −0.99 | −0.08 | ||||
| BCoV | NC_003045 | 29.43 | −0.75 | 0.48 | ||||
| MHV | NC_001846 | 38.30 | −0.16 | 0.27 | ||||
| AIBV | NC_001451 | 26.09 | −0.90 | −1.30 | ||||
| SARSCoV | NC_004718 | 37.23 | 0.05 | 0.36 | ||||
|
| ||||||||
| EAV | NC_002532 | 47.28 | 0.80 | 0.47 | ||||
| LDEV | NC_002534 | 45.18 | 0.53 | 0.43 | ||||
| PRRSV | NC_001961 | 53.76 | 1.31 | 0.55 | ||||
| SHFV | NC_003092 | 48.43 | 1.09 | −0.14 | ||||
Organism abbreviation: HCoV 229E, human Coronavirus 229E; PEDV, porcine epidemic diarrhea virus; TGV, transmissible gastroenteritis virus; BCoV, bovine Coronavirus; MHV, murine hepatitis virus; AIBV, avian infectious bronchitis virus; SARSCoV, SARS Coronavirus; EAV, equine arteritis virus; LDEV, lactate dehydrogenase elevating virus; PRRSV, porcine reproductive and respiratory syndrome virus; SHFV, simian hemorrhagic fever virus.
f1′ and f2′, respectively, represent the first axis mean value and the second axis mean value in CA of each genome.
Synonymous codon usage in SARSCoVa, b, c
| AA | Codon | RSCU | AA | Codon | RSCU | ||
| Ala | 531 | Ile | 410 | ||||
| GCC | 0.58 | 147 | AUC | 0.67 | 159 | ||
| GCA | 1.13 | 288 | AUA | 0.62 | 148 | ||
| GCG | 0.22 | 55 | Cys | 280 | |||
| Gly | GGG | 0.17 | 37 | UGC | 0.73 | 160 | |
| GGA | 0.85 | 182 | Thr | 427 | |||
| GGC | 0.95 | 202 | ACC | 0.59 | 153 | ||
| 431 | ACG | 0.18 | 46 | ||||
| Val |
|
| 479 | ACA | 1.57 | 406 | |
| GUC | 0.67 | 188 | Asn | 449 | |||
| GUA | 0.83 | 232 | AAC | 0.76 | 277 | ||
| GUG | 0.78 | 219 | Gln | 298 | |||
| Leu | UUA | 1.04 | 238 | CAG | 0.84 | 214 | |
| UUG | 1.10 | 251 | Tyr | 345 | |||
| 409 | UAC | 0.88 | 270 | ||||
| CUC | 0.83 | 191 | His | 187 | |||
| CUA | 0.64 | 147 | CAC | 0.71 | 103 | ||
| CUG | 0.60 | 138 | Asp | 463 | |||
| Phe | UUC | 0.77 | 260 | GAC | 0.76 | 282 | |
| 414 | Glu | 354 | |||||
| Pro |
|
| 247 | GAG | 0.96 | 326 | |
| CCC | 0.40 | 57 | Lys | 421 | |||
| CCA | 1.70 | 241 | AAG | 0.96 | 388 | ||
| CCG | 0.16 | 22 | Arg | CGU | 1.77 | 153 | |
| Ser |
|
| 310 | CGC | 0.72 | 62 | |
| UCC | 0.42 | 67 | CGA | 0.44 | 38 | ||
| UCA | 1.70 | 270 | CGG | 0.09 | 8 | ||
| UCG | 0.23 | 36 | 180 | ||||
| AGU | 1.17 | 186 | AGG | 0.90 | 78 | ||
| AGC | 0.52 | 82 | |||||
AA is the abbreviation of amino acid.
N represents the number of occurrence of each sense codon.
The preferentially used codons for each amino acid are displayed in bold.
Fig. 1A plot of the values of the first axis and the second axis of each gene in CA (abbreviations of the viruses: AIBV, avian infectious bronchitis virus; BCoV, bovine Coronavirus; EAV, equine arteritis virus; HCoV 229E, human Coronavirus 229E; LDEV, lactate dehydrogenase elevating virus; MHV, murine hepatitis virus; PEDV, porcine epidemic diarrhea virus; PRRSV, porcine reproductive and respiratory syndrome virus; SARSCoV, SARS Coronavirus; SHFV, simian hemorrhagic fever virus; TGV, transmissible gastroenteritis virus. f1′ and f2′, respectively, represent the values of the first and the second axis of each gene in CA).
Summary of linear regression analysis between the first two axes in CA and the nucleotide contents on the third codon position in all selected virus genesa
| Base composition | ||
| A3S | 0.791 | 0.085 |
| T3S | 0.239 | 0.444 |
| G3S | 0.484 | 0.082 |
| C3S | 0.720 | 0.0001NS |
| GC3S | 0.936 | 0.018NS |
NS in superscript represent non-significant.
Value in this table is the R2 value of each linear regression analysis.
f1′ and f2′, respectively, represent the values of the first and the second axis of each gene in CA.
P-value <0.01.
P-value <0.00001.
Fig. 2A dot plot of the first axis value in correspondence analysis and GC3S of each gene (f1′ denotes the first axis value in correspondence analysis of each gene, and GC3S denotes the G+C content on the third synonymous codon position of each gene).
Fig. 3ENC vs. GC3S plot of all virus genes (ENC denotes the effective number of codon of each gene, and GC3S denotes the G+C content on the third synonymous codon position of each gene. The solid line represents the relationship between GG3S and ENC under random codon usage assumption).