| Literature DB >> 33516969 |
Wanyi Huang1, Yaqiong Guo2, Na Li3, Yaoyu Feng4, Lihua Xiao5.
Abstract
Since 2002, the world has witnessed major outbreaks of acute respiratory illness by three zoonotic coronaviruses (CoVs), which differ from each other in pathogenicity. Reasons for the lower pathogenicity of SARS-CoV-2 than the other two zoonotic coronaviruses, SARS-CoV and MERS-CoV, are not well understood. We herein compared the codon usage patterns of the three zoonotic CoVs causing severe acute respiratory syndromes and four human-specific CoVs (NL63, 229E, OC43, and HKU1) causing mild diseases. We found that the seven viruses have different codon usages, with SARS-CoV-2 having the lowest effective number of codons (ENC) among the zoonotic CoVs. Human codon adaptation index (CAI) analysis revealed that the CAI value of SARS-CoV-2 is the lowest among the zoonotic CoVs. The ENC and CAI values of SARS-CoV-2 were more similar to those of the less-pathogenic human-specific CoVs. To further investigate adaptive evolution within SARS-CoV-2, we examined codon usage patterns in 3573 genomes of SARS-CoV-2 collected over the initial 4 months of the pandemic. We showed that the ENC values and the CAI values of SARS-CoV-2 were decreasing over the period. The low ENC and CAI values could be responsible for the lower pathogenicity of SARS-CoV-2. While mutational pressure appears to shape codon adaptation in the overall genomes of SARS-CoV-2 and other zoonotic CoVs, the E gene of SARS-CoV-2, which has the highest codon usage bias, appears to be under strong natural selection. Data from the study contribute to our understanding of the pathogenicity and evolution of SARS-CoV-2 in humans.Entities:
Keywords: Adaptive evolution; Codon usage bias; Human host adaptation; Pathogenicity; SARS-CoV-2
Year: 2021 PMID: 33516969 PMCID: PMC7843097 DOI: 10.1016/j.meegid.2021.104736
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
The nucleotide compositions and the codon usage indices of the RdRp, S, E, M, and N genes in different CoV groups.
| GC | GC1S | GC2S | GC3S | GC12S | AT | AT3S | A3S | T3S | C3S | G3S | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clade 1 (HCoV-229E) | 39.1 ± 0 | 45.9 ± 0.1 | 39 ± 0.1 | 32.4 ± 0.1 | 42.4 ± 0 | 60.9 ± 0 | 67.6 ± 0.1 | 20.7 ± 0 | 47 ± 0.1 | 16.4 ± 0.1 | 16 ± 0.1 |
| Control 1 (229ERC-Camel) | 39.1 ± 0 | 46.3 ± 0 | 39.3 ± 0 | 31.9 ± 0 | 42.8 ± 0 | 60.9 ± 0 | 68.1 ± 0 | 21.2 ± 0 | 46.9 ± 0 | 16.4 ± 0 | 15.5 ± 0 |
| Clade 2 (HCoV-NL63) | 35.9 ± 0.1 | 44.7 ± 0.1 | 38.3 ± 0.1 | 24.7 ± 0.2 | 41.5 ± 0.1 | 64.1 ± 0.1 | 75.3 ± 0.2 | 18.6 ± 0.1 | 56.7 ± 0.2 | 11.6 ± 0.1 | 13.1 ± 0.1 |
| Clade 3 (HCoV-HKU1) | 32.6 ± 0.1 | 41.1 ± 0.1 | 37.8 ± 0.1 | 18.9 ± 0.3 | 39.5 ± 0 | 67.4 ± 0.1 | 81.1 ± 0.3 | 20.4 ± 0.2 | 60.7 ± 0.2 | 8.2 ± 0.2 | 10.7 ± 0.1 |
| Clade 4 (HCoV-OC43) | 37.1 ± 0.1 | 44.8 ± 0.1 | 38.4 ± 0.1 | 28.3 ± 0.1 | 41.6 ± 0.1 | 62.9 ± 0.1 | 71.7 ± 0.1 | 22.3 ± 0.1 | 49.4 ± 0.1 | 13 ± 0.1 | 15.3 ± 0.1 |
| Clade 5 (MERS-CoV) | 41.6 ± 0.1 | 48.7 ± 0.1 | 40.5 ± 0.1 | 35.6 ± 0.1 | 44.6 ± 0.1 | 58.4 ± 0.1 | 64.4 ± 0.1 | 19.9 ± 0.1 | 44.5 ± 0.1 | 20.1 ± 0.1 | 15.5 ± 0 |
| Clade 6 (SARS-CoV-2) | 39.2 ± 0 | 47.3 ± 0 | 40 ± 0 | 30.1 ± 0 | 43.7 ± 0 | 60.8 ± 0 | 69.9 ± 0 | 27.1 ± 0 | 42.7 ± 0 | 17.9 ± 0 | 12.2 ± 0 |
| Control 2 (SARS2RC-Pan) | 39 ± 0 | 46.7 ± 0 | 40.1 ± 0 | 30.1 ± 0 | 43.4 ± 0 | 61 ± 0 | 69.9 ± 0 | 27.9 ± 0 | 42 ± 0 | 18.3 ± 0 | 11.8 ± 0 |
| Control 3 (SARS2RC-Bat) | 39.3 ± 0 | 47.5 ± 0 | 40 ± 0 | 30.5 ± 0 | 43.7 ± 0 | 60.7 ± 0 | 69.5 ± 0 | 27.4 ± 0 | 42.1 ± 0 | 18.4 ± 0 | 12.1 ± 0 |
| Clade 7 (Bat-CoV) | 41.7 ± 0.2 | 48.9 ± 0.1 | 40 ± 0.2 | 36.3 ± 0.4 | 44.4 ± 0.1 | 58.3 ± 0.2 | 63.7 ± 0.4 | 25.5 ± 0.5 | 38.2 ± 0.5 | 21.6 ± 0.5 | 14.7 ± 0.5 |
| Clade 8 (SARS-CoV) | 40.9 ± 0 | 48.8 ± 0 | 40.1 ± 0 | 33.8 ± 0.1 | 44.4 ± 0 | 59.1 ± 0 | 66.2 ± 0.1 | 24.7 ± 0 | 41.5 ± 0.1 | 19.9 ± 0 | 13.9 ± 0.1 |
| Control 4 (SARSRC-Bat) | 41.1 ± 0 | 49.1 ± 0 | 40.1 ± 0 | 34 ± 0 | 44.6 ± 0 | 58.9 ± 0 | 66 ± 0 | 24.6 ± 0 | 41.4 ± 0 | 20 ± 0 | 14 ± 0 |
Fig. 1Genome composition and phylogeny of zoonotic CoVs (SARS-CoV, MERS-CoV, and SARS-CoV-2), human-specific CoVs (HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1) and related viruses. (A) Maximum likelihood tree of CoVs constructed using a GTR + G + I model implemented in PhyML and bootstrapping with 1000 replicates. The observed eight clades and three controls, i.e. Clade 1 (HCoV-229E), Clade 2 (HCoV-NL63), Clade 3 (HCoV-HKU1), Clade 4 (HCoV-OC43), Clade 5 (MERS-CoV), Clade 6 (SARS-CoV-2), Clade 7 (Bat-CoV), Clade 8 (SARS-CoV), Control 1 (229ERC-Camel), Control 2 (SARS2RC-Pangolin), Control 3 (SARS2RC-Bat), and Control 4 (SARSRC-Bat) are represented in orange, yellow, green, light green, sky blue, purple, violet, dark red, brown, light blue, blue, and red, respectively. (B) Outcome of the PCA analysis of RSCU data, with clades and controls colored the same as in A. The PCA was done on RSCU data from CoVs based on a 59-dimension vector. (C) Genome composition of different CoVs. Two-third of the genome from the 5′-terminus encodes a polyprotein, 1ab, which is further cleaved into an RNA-dependent RNA polymerase (RdRp) involved in genome transcription and replication. The other one-third of the genome from the 3′ terminus encodes structural proteins, including spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins.
Fig. 2Codon usage bias in CoVs and related viruses. (A) Mean ENC values of genomes and major genes of CoVs. The error bars are the standard deviation of the means. Clade 1 (HCoV-229E), Clade 2 (HCoV-NL63), Clade 3 (HCoV-HKU1), Clade 4 (HCoV-OC43), Clade 5 (MERS-CoV), Clade 6 (SARS-CoV-2), Clade 7 (Bat-CoV), Clade 8 (SARS-CoV), Control 1 (229ERC-Camel), Control 2 (SARS2RC-Pangolin), Control 3 (SARS2RC-Bat), and Control 4 (SARSRC-Bat) are represented in orange, yellow, green, light green, sky blue, purple, violet, dark red, brown, light blue, blue, and red, respectively. (B) Results of the ENC-GC3S plot analysis of genomes and five major genes (E, M, N, RdRp, and S genes) of CoVs. The color dots represent the observed ENC-GC3S values of the individual groups (colored the same as in A). (C) Human codon adaptation indices (CAI) of the genomes and the five major genes of CoVs. The error bars are the standard deviation of the means (colored the same as in A).
Fig. 3Codon usage of SARS-CoV-2 over a 4-month period in 2020. (A) Mean ENC values of the genomes of SARS-CoV-2 collected in each month. The error bars are the standard deviation of the means. (B) Human codon adaptation indices (CAI) of the genomes and five genes (E, M, N, RdRp, and S genes) in SARS-CoV-2.