| Literature DB >> 34655422 |
Yanping Zhang1,2, Xiaojie Jin1,2, Haiyan Wang1,2, Yaoyao Miao1,2, Xiaoping Yang1,2, Wenqing Jiang1,2, Bin Yin3,4.
Abstract
During SARS-CoV-2 proliferation, the translation of viral RNAs is usually the rate-limiting step. Understanding the molecular details of this step is beneficial for uncovering the origin and evolution of SARS-CoV-2 and even for controlling the pandemic. To date, it is unclear how SARS-CoV-2 competes with host mRNAs for ribosome binding and efficient translation. We retrieved the coding sequences of all human genes and SARS-CoV-2 genes. We systematically profiled the GC content and folding energy of each CDS. Considering that some fixed or polymorphic mutations exist in SARS-CoV-2 and human genomes, all algorithms and analyses were applied to both pre-mutate and post-mutate versions. In SARS-CoV-2 but not human, the 5-prime end of CDS had lower GC content and less RNA structure than the 3-prime part, which was favorable for ribosome binding and efficient translation initiation. Globally, the fixed and polymorphic mutations in SARS-CoV-2 had created an even lower GC content at the 5-prime end of CDS. In contrast, no similar patterns were observed for the fixed and polymorphic mutations in human genome. Compared with human RNAs, the SARS-CoV-2 RNAs have less RNA structure in the 5-prime end and thus are more favorable of fast translation initiation. The fixed and polymorphic mutations in SARS-CoV-2 are further amplifying this advantage. This might serve as a strategy for SARS-CoV-2 to adapt to the human host.Entities:
Keywords: CDS; Human genome; Mutation; RNA structure; SARS-CoV-2; Translation
Mesh:
Substances:
Year: 2021 PMID: 34655422 PMCID: PMC8520108 DOI: 10.1007/s13353-021-00665-w
Source DB: PubMed Journal: J Appl Genet ISSN: 1234-1983 Impact factor: 3.240
Fig. 1Molecular basis of mRNA translation initiation. A No RNA structure near the start codon is favorable for efficient translation initiation. Strong RNA structure near the start codon usually leads to low initiation efficiency. B G:C base pair is biochemically more stable than A:U base pair. C Local RNA sequence with high GC content will lead to strong local structure, and vice versa
Fig. 2GC content at the 5-prime end of CDS. Each CDS is divided into 10 equal bins. The GC content is calculated within each bin. A For SARS-CoV-2, the GC% in the first bin (the first 1/10 at the 5-prime of CDS) is relatively low. Error bar is the S.E.M. of all genes. B In SARS-CoV-2, the GC% in the first bin is significantly lower than the other parts of CDS. t-test is used to calculate the p-value. C For human, the GC% in the first bin is very high. Error bar is the S.E.M. of all human genes. D In human, the GC% in the first bin is significantly higher than the other parts of CDS. t-test is used to calculate the p-value
Fig. 3Rank of minimum free energy (MFE) of each bin of CDS. “1” represents the lowest MFE and “10” represents the highest MFE. A For SARS-CoV-2, the MFE rank of the first bin (the first 1/10 at the 5-prime of CDS) is relatively high. Error bar is the S.E.M. of all genes. B In SARS-CoV-2, the MFE rank of the first bin is significantly higher than those of other parts of CDS. t-test is used to calculate the p-value. C For human, the MFE rank of the first bin is very low. Error bar is the S.E.M. of all human genes. D In human, the MFE rank of the first bin is significantly lower than those of the other parts of CDS. t-test is used to calculate the p-value
Fig. 4Fixed and polymorphic mutations in SARS-CoV-2 or human could change the folding MFE of RNAs. CDS is divided into 10 equal bins as previously described. Comparisons are made between the 1st bin and the other parts. MFE is folded by 100 bp around the focal mutation (from − 50 to + 50 bp). A For the fixed mutations from RaTG13 to SARS-CoV-2, four mutation types are the most abundant (denoted as “from > to”): A > G, G > A, T > C, and C > T. The total number of each type of mutation is counted within each bin. Red lines (G > A and C > T) are the mutations that decrease the GC content. They show high abundance in the 1st bin of CDS. Blue lines (A > G and T > C) are the mutations that increase the GC content. They show depletion in the 1st bin of CDS. B For each of the fixed mutations in SARS-CoV-2, we calculated the MFE before and after mutation. N mutations would have N pairs of observations. p-value was calculated by t-test. C For each of the polymorphic mutations in SARS-CoV-2, we calculated the MFE before and after mutation. N mutations would have N pairs of observations. The sites in the 1st bin of CDS and the remaining parts are compared separately. p-value was calculated by t-test. D For each of the fixed or polymorphic sites in human genome, the MFE before and after mutation were calculated. The sites in the 1st bin of CDS were shown and compared
Fig. 5The comparison between CDS lengths of ~ 20,000 human coding genes versus 12 SARS-CoV-2 genes. t-test was used to get the p-value. The length of each SARS-CoV-2 gene is labeled in the graph