| Literature DB >> 32431949 |
Haogao Gu1, Daniel K W Chu1, Malik Peiris1, Leo L M Poon1.
Abstract
Coronavirus disease 2019 (COVID-19) is a global health concern as it continues to spread within China and beyond. The causative agent of this disease, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), belongs to the genus Betacoronavirus, which also includes severe acute respiratory syndrome-related coronavirus (SARSr-CoV) and Middle East respiratory syndrome-related coronavirus (MERSr-CoV). Codon usage of viral genes are believed to be subjected to different selection pressures in different host environments. Previous studies on codon usage of influenza A viruses helped identify viral host origins and evolution trends, however, similar studies on coronaviruses are lacking. In this study, we compared the codon usage bias using global correspondence analysis (CA), within-group CA and between-group CA. We found that the bat RaTG13 virus best matched the overall codon usage pattern of SARS-CoV-2 in orf1ab, spike and nucleocapsid genes, while the pangolin P1E virus had a more similar codon usage in membrane gene. The amino acid usage pattern of SARS-CoV-2 was generally found similar to bat and human SARSr-CoVs. However, we found greater synonymous codon usage differences between SARS-CoV-2 and its phylogenetic relatives on spike and membrane genes, suggesting these two genes of SARS-CoV-2 are subjected to different evolutionary pressures.Entities:
Keywords: SARS-CoV-2; WCA; codon usage analysis; coronavirus
Year: 2020 PMID: 32431949 PMCID: PMC7223271 DOI: 10.1093/ve/veaa032
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Histogram of (A) G + C content and (B) GRAVY score by different genes in Betacoronavirus. The G + C content values of SARS-CoV-2 were plotted separately in red. The dashed line showed the G + C content of 50 per cent.
Figure 2.Codon usage in Betacoronavirus (Cleveland’s dot plot). Points in green showed the count of codons in a sample SARS-CoV-2 genome (MN908947).
Figure 3.Factorial map of the first and second factors for global CA by different genes, coloured by different viral host. The SARS-CoV-2 and related reference data points were labelled. The seven clusters identified by k-means clustering were circled by dashed lines.
Variability explained by the synonymous codon usage level and the amino acid level.
| Orf1ab (%) | Spike (%) | Membrane (%) | Nucleocapsid (%) | |
|---|---|---|---|---|
| WCA (synonymous codon level) | 90.36 | 85.29 | 83.71 | 84.07 |
| BCA (amino acid level) | 9.64 | 14.71 | 16.29 | 15.93 |
Figure 4.Factorial map of the first and second factors for WCA and BCA by different genes, coloured by different viral host. The SARS-CoV-2 and related reference data points were labelled. The seven clusters identified by k-means algorithm were circled by dashed lines.