| Literature DB >> 32704018 |
Yong Wang1, Jun-Ming Mao2, Guang-Dong Wang2, Zhi-Peng Luo3, Liu Yang3, Qin Yao2, Ke-Ping Chen2.
Abstract
The outbreak of COVID-19 has brought great threat to human health. Its causative agent is a severe acute respiratory syndrome-related coronavirus which has been officially named SARS-CoV-2. Here we report the discovery of extremely low CG abundance in its open reading frames. We found that CG reduction in SARS-CoV-2 is achieved mainly through mutating C/G into A/T, and CG is the best target for mutation. Meanwhile, 5'-untranslated region of SARS-CoV-2 has high CG content and is capable of forming an internal ribosome entry site (IRES) to recruit host ribosome for translating its RNA. These features allow SARS-CoV-2 to reproduce efficiently in host cells, because less energy is consumed in disrupting the stem-loops formed by its genomic RNA. Notably, genomes of cellular organisms also have very low CG abundance, suggesting that mutating C/G into A/T occurs universally in all life forms. Moreover, CG is the dinucleotide related to CpG island, mutational hotspot and single nucleotide polymorphism in cellular organisms. The relationship between these features is worthy of further investigations.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32704018 PMCID: PMC7378049 DOI: 10.1038/s41598-020-69342-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Odds ratios of dinucleotides in open reading frames of SARS-CoV-2. (a) odds ratios of dinucleotides at all codon positions. (b–d) odds ratios of dinucleotides at codon positions 1 and 2, 2 and 3, 3 and 1, respectively. Value shown in the figure is weighted average odds ratio of each dinucleotide. Odds ratio of each dinucleotide in ten ORFs (i.e. ORF1ab and ORF 2–10) of SARS-CoV-2 is calculated respectively first. Then, a weighted average odds ratio is obtained based on length of each ORF.
Figure 2Percentages of codon usage in open reading frames of SARS-CoV-2. Usage of synonymous codons for eighteen amino acids (except methionine and tryptophan) and three stop codons are shown in the figure. Percentages of codons with A, T, G and C at codon position 3 are in yellow, brown, green and aqua blue background, respectively. Total number of codons for each amino acid is indicated at top of the percentage bar. Arrows indicate four codons that contain CG at positions 2 and 3.
Figure 3Odds ratios of dinucleotides in open reading frames of coronaviruses and cellular organisms. (a) odds ratios of dinucleotides at all codon positions. (b–d) odds ratios of dinucleotides at codon positions 1 and 2, 2 and 3, 3 and 1, respectively. Data of coronaviruses are from Table S1, which are shown in blue background. Those of cellular organisms are from our previous work[15]. Filled triangle or filled inverter triangle indicates that odds ratio of a dinucleotide in coronavirus is significantly higher or lower than that in cellular organisms at p = 0.05 level. Open triangle or open inverter triangle indicates that odds ratio of a dinucleotide in coronavirus is insignificantly higher or lower than that in cellular organisms.
Figure 4Secondary structure formed by 5′-UTR of poliovirus (a) and SARS-CoV-2 (b). The secondary structure is based on 200 nucleotides immediately upstream of the translation start site. Sequence number of poliovirus is MG212486. That of SARS-CoV-2 is NC_045512. Both structures and their free energy (indicated in centre of the structure) are drawn/calculated using RNAstructure (version 5.7)[27].
Stability of secondary structure formed by genome of coronavirus.
| Genus | Virus | 5′-UTR* | TSS-to-end | Virulence grade | ||
|---|---|---|---|---|---|---|
| Free energy (kcal/mol) | Stability index | Free energy (kcal/mol) | Stability index | |||
| Alphacoronavirus | Bat CoV CDPHE15 | − 66.8 | 92 (H) | − 8,803.5 | 99 (H) | 6 |
| Bat CoV HKU10 | − 61.3 | 84 (M) | − 8,029.1 | 90 (H) | 5 | |
| Cat CoV1 | − 68.8 | 95 (H) | − 7,963.0 | 89 (M) | 5 | |
| Rat CoV | − 59.5 | 82 (M) | − 8,615.0 | 97 (H) | 5 | |
| Mink CoV1 | − 71.2 | 98 (H) | − 7,790.4 | 88 (M) | 5 | |
| Bat CoV1 | − 59.6 | 82 (M) | − 8,153.4 | 92 (H) | 5 | |
| Bat CoV Sax2011 | − 66.7 | 92 (H) | − 8,815.5 | 99 (H) | 6 | |
| Bat CoV SC2013 | − 57.3 | 79 (L) | − 8,712.4 | 98 (H) | 4 | |
| PEDV | − 62.4 | 86 (M) | − 8,671.9 | 97 (H) | 5 | |
| Bat CoV HKU2 | − 64.3 | 88 (M) | − 8,313.0 | 93 (H) | 5 | |
| Human CoV NL63 | − 58.9 | 81 (M) | − 7,223.3 | 81 (M) | 4 | |
| Human CoV 229E | − 55.6 | 76 (L) | − 7,982.5 | 90 (H) | 4 | |
| Betacoronavirus | Human CoV HKU1 | − 43.1 | 59 (L) | − 6,864.6 | 77 (L) | 2 |
| Human MERS-CoV | − 72.8 | 100 (H) | − 8,436.5 | 95 (H) | 6 | |
| Human SARS-CoV | − 63.2 | 87 (M) | − 8,054.1 | 91 (H) | 5 | |
| Human SARS-CoV-2 | − 62.4 | 86 (M) | − 7,860.4 | 88 (M) | 4 | |
| Bat CoV ZJ2013 | − 58.9 | 81 (M) | − 8,328.2 | 94 (H) | 5 | |
| Bat CoV HKU9 | − 55.2 | 76 (L) | − 8,897.8 | 100 (H) | 4 | |
| Deltacoronavirus | Wigeon CoV HKU20 | − 51.8 | 71 (L) | − 8,273.2 | 93 (H) | 4 |
| Bulbul CoV HKU11 | − 54.3 | 75 (L) | − 8,387.6 | 94 (H) | 4 | |
| Heron CoV HKU19 | − 54.9 | 75 (L) | − 7,687.2 | 86 (M) | 3 | |
| Moorhen CoV HKU21 | − 51.2 | 70 (L) | − 8,140.4 | 91 (H) | 4 | |
| Gammacoronavirus | Whale CoV SW1 | − 62.8 | 86 (M) | − 8,161.3 | 92 (H) | 5 |
| Turkey CoV | − 59.0 | 81 (M) | − 8,195.4 | 92 (H) | 5 | |
*Fee energy of 5′-UTR (untranslated region) was obtained by using 200 nucleotides immediately upstream of TSS (translation start site) for secondary structure prediction. Free energy of TSS-to-end region is normalized using the average genome size (28,085 nt) of all surveyed coronaviruses based on actual accumulated free energy of a specific genome (Table S2). 5′-UTR region of human MERS-CoV and TSS-to-end region of bat CoV HKU9 have the lowest free energy respectively, which are thus given the highest stability index (100). H (high), M (medium) and L (low) indicate stability index of ≥ 90, 80 to 89, and < 79, respectively. Virulence grade is based on stability of both 5′-UTR and TSS-to-end regions, in which H, M and L stability is given 3, 2 and 1 points respectively. For example, human SARS-CoV has M and H stability in 5′-UTR and TSS-to-end regions. Thus, its virulence is of grade 5 (i.e. 2 + 3). Various grades of virulence are interpreted as follows: 6—very high, 5—high, 4—medium, 3—low and 2—very low. MERS: Middle East respiratory syndrome. SARS: severe acute respiratory syndrome. PEDV: Porcine epidemic diarrhea virus. The viruses listed in the table were selected to represent different subgenera of coronaviruses.
Figure 5Correlation between RNA stability and nucleotide composition in viral genome. Shown here are correlation coefficients of RNA stability with (a) content of nucleotide(s), (b) content of dinucleotide and (c) odds ratio of dinucleotide in genomes of 24 coronaviruses. Only TSS-to-end region of viral genome is included for analysis (TSS: translation-start-site). * and **above data bar indicate that the correlation reaches significant (0.01 < p < 0.05) and extremely significant (p < 0.01) level, respectively. Detailed data for correlation analysis are listed in rows 67 to 103 of Table S2.
Number of silent mutations of each dinucleotide at various codon positions.
| Dinucleotide | Codon positions | Total | ||
|---|---|---|---|---|
| 1 and 2 | 2 and 3 | 3 and 1 | ||
| GT | 0 | 8 | 30 | 38 |
| GA | 0 | 8 | 30 | 38 |
| GC | 0 | 8 | 30 | 38 |
| GG | 0 | 7 | 30 | 37 |
| AG | 4 | 4 | 32 | 40 |
| AT | 0 | 4 | 32 | 36 |
| AC | 0 | 4 | 32 | 36 |
| AA | 0 | 5 | 32 | 37 |
| TG | 1 | 7 | 33 | 41 |
| TA | 1 | 9 | 33 | 43 |
| TC | 2 | 9 | 33 | 44 |
| TT | 2 | 9 | 33 | 44 |
| CG | 2 | 12 | 33 | 47 |
| CA | 0 | 12 | 33 | 45 |
| CT | 2 | 12 | 33 | 47 |
| CC | 0 | 12 | 33 | 45 |
When a dinucleotide is located at codon positions 1 and 2 or at codon positions 2 and 3, there are four codons that contain this dinucleotide. Theoretically, they can be mutated into any of the rest 60 codons. When a dinucleotide is located at codon positions 3 and 1, only the nucleotide at position 3 is considered to mutate. There are 16 codons containing this nucleotide. Theoretically, they can be mutated into any of the rest 48 codons. Therefore, values in the table are number of silent mutations out of 60, 60 and 48 mutations for a dinucleotide at codon positions 1 and 2, 2 and 3, or 3 and 1, respectively.