| Literature DB >> 35229157 |
Martin Bartas1, Adriana Volná2, Christopher A Beaudoin3, Ebbe Toftgaard Poulsen4, Jiří Červeň1, Václav Brázda5, Vladimír Špunda2,6, Tom L Blundell3, Petr Pečinka1.
Abstract
SARS-CoV-2 is a novel positive-sense single-stranded RNA virus from the Coronaviridae family (genus Betacoronavirus), which has been established as causing the COVID-19 pandemic. The genome of SARS-CoV-2 is one of the largest among known RNA viruses, comprising of at least 26 known protein-coding loci. Studies thus far have outlined the coding capacity of the positive-sense strand of the SARS-CoV-2 genome, which can be used directly for protein translation. However, it has been recently shown that transcribed negative-sense viral RNA intermediates that arise during viral genome replication from positive-sense viruses can also code for proteins. No studies have yet explored the potential for negative-sense SARS-CoV-2 RNA intermediates to contain protein-coding loci. Thus, using sequence and structure-based bioinformatics methodologies, we have investigated the presence and validity of putative negative-sense ORFs (nsORFs) in the SARS-CoV-2 genome. Nine nsORFs were discovered to contain strong eukaryotic translation initiation signals and high codon adaptability scores, and several of the nsORFs were predicted to interact with RNA-binding proteins. Evolutionary conservation analyses indicated that some of the nsORFs are deeply conserved among related coronaviruses. Three-dimensional protein modeling revealed the presence of higher order folding among all putative SARS-CoV-2 nsORFs, and subsequent structural mimicry analyses suggest similarity of the nsORFs to DNA/RNA-binding proteins and proteins involved in immune signaling pathways. Altogether, these results suggest the potential existence of still undescribed SARS-CoV-2 proteins, which may play an important role in the viral lifecycle and COVID-19 pathogenesis.Entities:
Keywords: Kozak sequence; ORFs; RNA; SARS-CoV-2; proteomics; structures
Mesh:
Substances:
Year: 2022 PMID: 35229157 PMCID: PMC9116216 DOI: 10.1093/bib/bbac045
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 13.994
Sequence position and analysis of nsORFs
| nsORF | Frame | Identity to Kozak rule (A/GXXATGG) | Start (bp) | Finish (bp) | Length (nuc/aa) | ATGpr score | TISrover score | CAI |
|---|---|---|---|---|---|---|---|---|
| nsORF1 | 2 | AXXATGc | 562 | 694 | 132 / 44 | 0.1 | 0.861 | 0.717 |
| nsORF2 | 2 | tXXATGt | 2899 | 3097 | 198 / 66 | 0.06 | 0.008 | 0.712 |
| nsORF3 | 3 | cXXATGa | 5792 | 5975 | 183 / 61 | 0.09 | 0.028 | 0.693 |
| nsORF4 | 2 | tXXATGa | 6466 | 6703 | 237 / 79 | 0.16 | 0.102 | 0.806 |
| nsORF5 | 1 | AXXATGa | 8865 | 9057 | 192 / 64 | 0.09 | 0.194 | 0.654 |
| nsORF6 | 1 | GXXATGt | 10 047 | 10 188 | 141 / 47 | 0.11 | 0.909 | 0.739 |
| nsORF7 | 3 | AXXATGt | 23 414 | 23 714 | 300 / 100 | 0.22 | 0.015 | 0.682 |
| nsORF8 | 1 | cXXATGa | 29 211 | 29 385 | 174 / 58 | 0.14 | 0.232 | 0.705 |
| nsORF9 | 2 | AXXATGG | 29 236 | 29 479 | 243 / 81 | 0.47 | 0.889 | 0.776 |
Figure 1Localization and synteny of nsORFs in SARS-CoV-2 and related coronaviruses. (A) Localization of all identified nsORFs within the SARS-CoV-2 genome. The upper part of the scheme depicts positively encoded ORFs annotated on NCBI reference SARS-CoV-2 genome, together with additional ORFs described in the literature (indicated in italics): ORF3c, ORF3d, ORF3b (which span particular genomic regions of ORF3a) and ORF9b and ORF9c (which span particular genomic regions of N). (B)Synteny of SARS-CoV-2 nsORFs in representative species of SARS-like coronaviruses. At least two SARS-CoV-2 nsORFs (nsORF2 and nsORF9) are more or less conserved in most of inspected SARS-CoV-2-related coronaviruses, including old SARS-CoV Tor 2003. In MERS-CoV and human-CoV-OC43, none of SARS-CoV-2 homologous nsORFs was found. The synteny plot was constructed using SimpleSynteny web server [56] and redrawn in this schematic figure.
Figure 2Structural characterization and similarity comparisons of nsORFs. Residues of putative nsORFs in A (nsORF7), B (nsORF5), C (nsORF1) and E (nsORF9) are depicted with amino acid colouration: red for acidic (D and E), blue for basic (H, R and K), light teal for polar noncharged (S, N, T and Q), dirty violet for hydrophobic (A, V, I, L, M, F, W, P, G and Y), and lime green for cysteine residues. Putative transmembrane protein nsORF7 is shown with the predicted transmembrane region inside a representative cell membrane (A). Extensive O-linked glycosylation of nsORF5 is shown with gray stick configurations (B). The structural similarity of nsORF1 (C) to a homologous protein of T-cell leukemia homeobox protein 2 (which was predicted by RUPEE, but shown without DNA) bound to DNA (PDB: 3a01; both homeobox protein structures are published by [76]) is depicted with nsORF1 in cyan, homeobox protein in blue, and DNA in orange (D). The predicted protein–protein interaction of nsORF9 (E) and interferon alpha/beta receptor 2 using HMI-PRED is compared to the interaction between interferon alpha/beta receptor 2 and interferon omega-1 (PDB: 3se4) with nsORF9 in cyan, interferon omega-1 in blue, and interferon alpha/beta receptor 2 in orange (F).
Structural characterization of nsORFs
| Selected RUPEE Hits | ||||||||
|---|---|---|---|---|---|---|---|---|
| nsORF | Isoelectric Point | NetNGlyc Residue # | NetOGlyc Residue # | # HMI-PRED Hits | Superfamily | Structure name | PDB (chain) | TM-score |
| nsORF1 | 12 | 0 | Homeodomain-like | T-cell leukemia homeobox protein 2 | 3a03(a) | 0.51 | ||
| Histone-fold | Histone h4 | 4z2m(h) | 0.5 | |||||
| FF domain | Formin-binding protein 3 | 2cqn(a1) | 0.49 | |||||
| nsORF2 | 9.78 | 3,8,12,21 | 16 | UBA-like | Ubiquitin carboxyl-terminal hydrolase 5 | 2dag(a1) | 0.39 | |
| Insulin-like | Insulin-like growth factor II | 1igl(a) | 0.35 | |||||
| nsORF3 | 8.61 | 14,29,29,31,32,33,34,44,50 | 0 | RING/U-box | E3 ubiquitin-protein ligase AMFR | 2lxp(c) | 0.36 | |
| Viral DNA-binding domain | Regulatory protein E2 from human papillomavirus | 1f9f(b1) | 0.35 | |||||
| nsORF4 | 9.46 | 18 | A DNA-binding domain in eukaryotic transcription factors | Mouse MafG | 1k1v(a) | 0.53 | ||
| Phosphoprotein XD domain | RNA polymerase alpha from measles virus | 2k9d(a) | 0.42 | |||||
| nsORF5 | 11 | 25,38 | 0 | YegP-like | nmb1088 protein from | 3bid(f2) | 0.44 | |
| Complement control module/SCR domain | Complement receptor type 1 | 2mcz(a2) | 0.41 | |||||
| Signal recognition particle (SRP) complex | Signal recognition particle 9 kDa protein | 1ry1(c) | 0.4 | |||||
| Scorpion toxin-like | Hongotoxin 1 | 1hly(a) | 0.36 | |||||
| nsORF6 | 6 | 0 | WW domain | NEDD4-like E3 ubiquitin-protein ligase WWP1 | 2op7(a) | 0.34 | ||
| Immunoglobulin | Obscurin | 2edf(a1) | 0.31 | |||||
| nsORF7 | 7.71 | 47 | PX domain | Sorting nexin-17 | 3foga(1) | 0.46 | ||
| Histone-fold | Histone h4 | 3nqu(b) | 0.45 | |||||
| nsORF8 | 7.87 | 0 | Bowman-Birk inhibitor, BBI | Bowman–Birk type proteinase inhibitor | 2iln(i) | 0.38 | ||
| Complement control module/SCR domain | Complement control protein from vaccinia virus | 1rid(b3) | 0.34 | |||||
| nsORF9 | 9.22 | 51 | 21 | GAT-like domain | ADP-ribosylation factor-binding protein GGA1 | 1x79(a) | 0.65 | |
| MIT domain | Vacuolar protein sorting-associating protein 4B | 2jqh(a) | 0.56 | |||||
| BAG domain | BAG-family molecular chaperone regulator-4 | 1m62(a1) | 0.56 | |||||
| tRNA-binding arm |
| 1lrz(a) | 0.54 | |||||
| S15/NS1 RNA-binding domain | Nonstructural protein 1 from influenza virus | 1 ns1(a) | 0.53 | |||||