| Literature DB >> 34519429 |
Seán I O'Donoghue1,2,3, Andrea Schafferhans1,4,5, Neblina Sikta1, Christian Stolte1, Sandeep Kaur1,3, Bosco K Ho1, Stuart Anderson2, James B Procter6, Christian Dallago5, Nicola Bordin7, Matt Adcock2, Burkhard Rost5.
Abstract
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.Entities:
Keywords: COVID-19; SARS-CoV-2; bioinformatics; data visualization; structural biology
Mesh:
Substances:
Year: 2021 PMID: 34519429 PMCID: PMC8438690 DOI: 10.15252/msb.202010079
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 13.068
Figure 1SARS‐CoV‐2 structural coverage map
Integrated visual summary showing 79 distinct states found in 2,060 structural models derived by systematically comparing the SARS‐CoV‐2 proteome against all experimentally determined 3D structures. Viral proteins are shown as arrows scaled by sequence length, ordered by genomic location, and divided into three groups: (i) polyprotein 1a (top); (ii) polyprotein 1b (middle); and (iii) virion and accessory proteins (bottom). Above polyprotein 1a and 1b, a ruler indicates residue numbering from polyprotein 1ab; above selected accessory proteins, numbering indicates sequence length. Sequence regions with unknown structure are indicated with dark coloring. Regions that have matching structures are indicated with green coloring and with representative structures positioned below. Dark colored residues on the structure indicate amino acid substitutions, while conserved residues are colored to highlight secondary structure. Below the representative structures, graphs indicate three distinct states revealed in the matching structures: (i) viral protein hijacking of human proteins (gray coloring; Fig 3), (ii) human proteins that the viral protein may mimic (orange; Fig 2), or (iii) binding to antibodies, HLA, inhibitory peptides, RNA, or to other viral proteins (green; Fig 4). Bindings between viral proteins form two disjoint teams: (i) NSP7, NSP8, NSP9, NSP12, and NSP13 (parts of the viral replication and translation complex); and (ii) NSP10, NSP14, and NSP16. Nine viral proteins (called “suspects”) had no structural evidence for interactions with other viral proteins, or for mimicry or hijacking of human proteins; seven of these (NSP2, NSP6, matrix glycoprotein, ORF6, ORF7b, ORF9c, and ORF10) are structurally dark proteins, i.e., have no significant similarity to any experimentally determined 3D structure. Representative structures for each state shown are given in Table 1; the complete list of matching structures is provided in Datasets [Link], [Link], [Link]. Made using Aquaria and Keynote.
Figure 2Viral mimicry of human proteins
Lists domain topology for seven human proteins potentially mimicked by the macro domain of NSP3. The list was ranked by alignment significance (HHblits E‐value) and includes a summary of potentially mimicked functions. Each macro domain is numbered to indicate its CATH functional family. The top‐ranked proteins (MACROD2 and MACROD1) remove ADPr from proteins, reversing the effect of ADPr writers (PARP14 and PARP9), and affecting ADPr readers (GDAP2, MACROH2A1, and MACROH2A2). For PARP9 and PARP14, the table indicates the best alignment of the NSP3 sequence onto the available structures corresponding to each macro domain.
Lists three human helicase proteins potentially mimicked by NSP13. The list was ranked by alignment significance (HHblits E‐value) and includes a summary of potentially mimicked functions. We found stronger evidence for mimicry by NSP13 than by NSP3. For each human protein, the 3D structure is shown with Aquaria’s default coloring scheme, in this case indicating the region of alignment with NSP13 (Fig 1, Dataset EV4). For UPF1 (https://aquaria.ws/P0DTD1/2wjv), the structure coloring reveals that UPF2 binds to a region not matched by NSP13, suggesting that NSP13 may not bind UPF2. For IGHMBP2 (https://aquaria.ws/P0DTD1/4b3g), the structure coloring reveals that RNA binds to the region matched by NSP13, suggesting that NSP13 binds RNA. For AQR (https://aquaria.ws/P0DTD1/6jyt), the structure coloring reveals that the spliceosome binds to a region not matched by NSP13, suggesting that NSP13 may not bind the spliceosome.
Data information: Made using Aquaria, Photoshop, and Keynote.
Figure 3Viral hijacking of human proteins
Summarizes all structural evidence for viral hijacking; collectively, the regions shown cover 7% of the SARS‐CoV‐2 proteome. The structures are shown with Aquaria’s default coloring scheme which, for viral proteins, highlights secondary structure as well as any amino acid substitutions from the SARS‐CoV‐2 sequence; human proteins and RNA are rendered as semi‐transparent.
Hijacking of ribosomal complexes is shown in 14 matching structures, most of which were determined using the full‐length sequence of NSP1 (180 residues); however, only a ˜36 residue fragment was ordered enough to appear in the structures. The coloring scheme highlights the location of this fragment within the ribosome (https://aquaria.ws/P0DTC1/6zlw), revealing how NSP1 blocks host mRNA translation (Thoms et al, 2020).
Hijacking of PAIP1 (a.k.a. “PABP‐interacting protein 1”) is shown in only one matching structure that was determined using the SUD‐N region of NSP3 from SARS‐CoV (Nikulin et al, 2021). The structure (https://aquaria.ws/P0DTC1/6yxj) shows the strong overall sequence similarity in SARS‐CoV‐2 and reveals that, of the 15 residues contacting PAIP1, 13 are identical in SARS‐CoV‐2.
Hijacking of ubiquitin‐like (Ubl) domains is shown in 10 matching structures, of which only one showed simultaneous binding to two Ubl domains (shown above). The structure (https://aquaria.ws/P0DTC1/5e6j) was determined using NSP3 from SARS‐CoV (Békés et al, 2016), which had strong overall sequence similarity in SARS‐CoV‐2; of the 31 residues contacting UBB or UBC, 27 are identical in SARS‐COV‐2.
Hijacking of ACE2 is shown in 46 matching structures; however, only two also show binding to SLC6A19 (Yan et al, 2020). In the structure shown here (https://aquaria.ws/P0DTC2/6m17), spike glycoprotein does not directly bind to SLC6A19.
Hijacking of NRP1 (a.k.a. neuropilin‐1) is shown in only one matching structure (https://aquaria.ws/P0DTC2/7jjc), which includes only a three‐residue region from spike glycoprotein (Daly et al, 2020).
Hijacking of MPP5 (a.k.a. PALS1, “protein associated with Lin‐7 1”) is shown in only one matching structure (https://aquaria.ws/P0DTC4/7m4r), which includes only a nine‐residue region from envelope protein (Liu & Chai, 2021).
Hijacking of TOMM70 (a.k.a. “translocase of outer mitochondrial membrane protein 70”) is shown in only one matching structure (https://aquaria.ws/P0DTD2/7kdt), which includes only a 38‐residue region from ORF9b protein (Gordon et al, 2020).
Data information: Made using Aquaria and Keynote.
Figure 4Viral protein interaction teams
For each team, an assembly matrix is used to show all observed heteromeric states. For both teams, only a small subset of all combinatorially possible heteromeric states was observed; by highlighting possible transitions between observed states, the matrices suggest the order in which heteromers may assemble. Collectively, the regions shown cover 29% of the SARS‐CoV‐2 proteome.
In team 1, NSP7 (red), NSP8 (cyan), NSP9 (purple), NSP12 (yellow), and NSP13 (green) assemble into the replication and translation complex (RTC). NSP12 alone (top row, left) can replicate RNA (top row, right). NSP8 binds NSP12 at two sites: (i) at the NSP12 core (2nd row, left); and (ii) via NSP7‐mediated cooperative interactions with NSP12 (4th row, center), greatly enhancing RNA replication (4th row, right). NSP7 + NSP8 alone form a dimer in most structures (4th row, left), but can also form a tetramer (e.g., https://aquaria.ws/P0DTD1/7jlt) or hexadecamer (e.g., https://aquaria.ws/P0DTD1/2ahm). Replication is also enhanced by NSP13 (5th row, right) and NSP9 (bottom row, right).
In team 2, NSP10 monomers (2nd row) can either self‐assemble into a spherical dodecamer (top), dimerize with NSP14 (bottom row), or dimerize with NSP16 (third row). The NSP10 + NSP16 heterodimer was also seen bound to a three‐residue RNA segment (fourth row). Residue coloring is used to show that NSP10, NSP14, and NSP16 appear to interact competitively, as noted in previous studies. In the structures shown, nine NSP10 residues (shown in red on the monomer) formed common intermolecular contacts in all three oligomers. Within each oligomer, most NSP10 residues involved in intermolecular contacts were shared (red) with at least one other oligomer; very few NSP10 residues formed contacts specific to that oligomer (blue).
Data information: For brevity, we omitted NSP9, NSP13, and NSP16 monomers, as well as the interaction between NSP4 and NSP5 (see Table 1). Made using Aquaria and Keynote.
SARS‐CoV‐2 minimal models used in Fig 1.
| State | 3D Model | Identity | E | Source |
|---|---|---|---|---|
| NSP1 (NTR) |
| 100% | – | SARS‐CoV‐2 (Semper |
| NSP1 (CTR) hijacks 40S, 43S, and 80S |
| 100% | – | SARS‐CoV‐2 (Thoms |
| NSP3 (Ubl1) |
| 77% | 10–21 | SARS‐CoV (Serrano |
| NSP3 (macro) |
| 100% | – | SARS‐CoV‐2 ( |
| NSP3 (macro) mimics GDAP2 |
| 20% | 10–15 | Human ( |
| NSP3 (macro) mimics MACROD1 |
| 27% | 10–160 | Human (Chen |
| NSP3 (macro) mimics MACROD2 |
| 28% | 10–16 | Human (Jankevicius |
| NSP3 (macro) mimics MACROH2A1 |
| 19% | 10–13 | Human (Kustatscher |
| NSP3 (macro) mimics MACROH2A2 |
| 18% | 10–12 | Human ( |
| NSP3 (macro) mimics PARP9 |
| 23% | 10–10 | Human ( |
| NSP3 (macro) mimics PARP14 |
| 29% | 10–12 | Human (Forst |
| NSP3 (SUD‐N) + PAIP1 |
| 69% | 10–21 | SARS‐CoV ( |
| NSP3 (SUD‐M) |
| 80% | 10–23 | SARS‐CoV (Chatterjee |
| NSP3 (SUD‐C) |
| 78% | 10–34 | SARS‐CoV (Johnson |
| NSP3 (PL‐Pro) |
| 100% | – | SARS‐CoV‐2 ( |
| NSP3 (PL‐Pro) hijacks ISG15 |
| 100% | – | SARS‐CoV‐2 (Klemm |
| NSP3 (PL‐Pro) hijacks UBA52 |
| 31% | 10–31 | MERS‐CoV (Bailey‐Elkin |
| NSP3 (PL‐Pro) hijacks UBB |
| 30% | 10–30 | MERS‐CoV (Lei & Hilgenfeld, |
| NSP3 (PL‐Pro) hijacks UBC |
| 83% | 10–30 | SARS‐CoV (Ratia |
| NSP3 (PL‐Pro) hijacks UBB + UBC |
| 82% | 10–30 | SARS‐CoV (Békés |
| NSP3 (PL‐Pro) binds inhibitory peptides |
| 99% | – | SARS‐CoV‐2 (Rut |
| NSP3 (NAB) |
| 82% | 10–19 | SARS‐CoV (Serrano |
| NSP4 |
| 59% | 10–37 | MHV‐A59 (Xu |
| NSP4 binds NSP5 |
| 99% | – | SARS‐CoV‐2 ( |
| NSP5 (3CL‐Pro) |
| 100% | – | SARS‐CoV‐2 ( |
| NSP5 binds inhibitory peptides |
| 100% | – | SARS‐CoV‐2 (Jin |
| NSP7 |
| 98% | 10–33 | SARS‐CoV (Johnson |
| NSP7 binds HLA |
| 100% | – | SARS‐CoV‐2 ( |
| NSP7 binds NSP8 |
| 100% | – | SARS‐CoV‐2 ( |
| NSP7 binds NSP8 + NSP12 |
| 100% | – | SARS‐CoV‐2 (Gao |
| NSP7 binds NSP8 + NSP12 + vRNA |
| 100% | – | SARS‐CoV‐2 (Naydenova |
| NSP7 binds NSP8 + NSP12 + vRNA + NSP13 |
| 95% | – | SARS‐CoV‐2 (Chen |
| NSP7 binds NSP8 + NSP12 + vRNA + NSP13 + NSP9 |
| 100% | – | SARS‐CoV‐2 (Yan |
| NSP8 |
| 100% | – | SARS‐CoV‐2 ( |
| NSP8 binds NSP12 |
| 97% | 10–76 | SARS‐CoV (Kirchdoerfer & Ward, |
| NSP8 binds HLA |
| 100% | – | SARS‐CoV‐2 ( |
| NSP9 |
| 98% | – | SARS‐CoV‐2 ( |
| NSP10 |
| 96% | 10–72 | SARS‐CoV (Su |
| NSP10 binds NSP14 |
| 95% | 10–68 | SARS‐CoV (Ma |
| NSP10 binds NSP16 |
| 99% | – | SARS‐CoV‐2 ( |
| NSP12 |
| 100% | – | SARS‐CoV‐2 (Hillen |
| NSP12 binds vRNA |
| 15% | 10–14 | FMDV (Ferrer‐Orta |
| NSP13 |
| 100% | 10–63 | SARS‐CoV (Jia |
| NSP13 mimics AQR |
| 20% | 10–27 | Human (De |
| NSP13 mimics AQR + spliceosome |
| 20% | 10–27 | Human (Zhang |
| NSP13 mimics UPF1 |
| 24% | 10–53 | Human (Clerici |
| NSP13 mimics UPF1 + UPF2 |
| 24% | 10–53 | Human (Clerici |
| NSP13 mimics IGHMBP2 |
| 25% | 10–32 | Human (Lim |
| NSP13 mimics IGHMBP2 + hRNA |
| 26% | 10–31 | Human (Lim |
| NSP13 binds vRNA |
| 21% | 10–19 | Arterivirus (Deng |
| NSP13 binds HLA |
| 100% | – | SARS‐CoV‐2 ( |
| NSP14 |
| 95% | 10–142 | SARS‐CoV (Ferron |
| NSP15 |
| 97% | – | SARS‐CoV‐2 (Kim |
| NSP15 binds vRNA |
| 97% | – | SARS‐CoV‐2 (Kim |
| NSP16 |
| 99% | – | SARS‐CoV‐2 (Rosas‐Lemus |
| NSP16 mimics CMTR1 |
| 14% | 10–11 | Human (Smietanski |
| NSP16 mimics MRM2 |
| 22% | 10–11 | Human ( |
| NSP16 mimics CMTR1 + hRNA |
| 14% | 10–11 | Human (Smietanski |
| NSP16 binds vRNA + NSP10 |
| 100% | – | SARS‐CoV‐2 ( |
| Spike glycoprotein |
| 97% | – | SARS‐CoV‐2 (Walls |
| Spike glycoprotein hijacks ACE2 |
| 100% | – | SARS‐CoV‐2 (Guo |
| Spike glycoprotein hijacks ACE2 + SLC6A19 |
| 100% | – | SARS‐CoV‐2 (Yan |
| Spike glycoprotein hijacks NRP1 |
| 100% | – | SARS‐CoV‐2 (Daly |
| Spike glycoprotein binds antibodies |
| 100% | – | SARS‐CoV‐2 (Yuan |
| Spike glycoprotein binds inhibitory peptides |
| 88% | 10–33 | SARS‐CoV (Xia |
| ORF3a |
| 100% | – | SARS‐CoV‐2 (preprint: Kern |
| ORF3a binds APOA1 |
| 100% | – | SARS‐CoV‐2 (preprint: Kern |
| Envelope protein |
| 85% | 10–35 | SARS‐CoV (Surya |
| Envelope protein hijacks MPP5 |
| 100% | – | SARS‐CoV‐2 ( |
| ORF7a |
| 100% | – | SARS‐CoV‐2 ( |
| ORF8 |
| 99% | – | SARS‐CoV‐2 (Flower |
| Nucleocapsid protein (NTD) |
| 96% | – | SARS‐CoV‐2 ( |
| Nucleocapsid protein (NTD) binds antibody |
| 100% | – | SARS‐CoV‐2 (Daly |
| Nucleocapsid protein (NTD) binds HLA |
| 100% | – | SARS‐CoV‐2 (Szeto |
| Nucleocapsid protein (NTD) binds vRNA |
| 96% | – | SARS‐CoV‐2 (Dinesh |
| Nucleocapsid protein (CTD) |
| 98% | – | SARS‐CoV‐2 (Zinzula |
| Nucleocapsid protein (CTD) binds HLA |
| 100% | – | SARS‐CoV‐2 (Szeto |
| ORF9b |
| 100% | – | SARS‐CoV‐2 ( |
| ORF9b hijacks TOMM7 |
| 100% | – | SARS‐CoV‐2 (Gordon |
This table lists 79 distinct protein structural states found in this work, each with details on one representative minimal model, indicated using an Aquaria identifier. The indicated models correspond to those used to generate representative images and hyperlinks in the online version of Fig 1.
In cases showing potential mimicry, the identity scores and E‐values indicate similarity between the SARS‐CoV‐2 viral protein and a human protein.
Indicates the organism used to derive the corresponding PDB structure as well as the publication associated with the PDB entry; where no publication is yet available, the DOI for the dataset is given. Organism names are abbreviated as follows: FMDV (foot‐and‐mouth disease virus); MERS‐CoV (Middle East respiratory syndrome coronavirus); MHV‐A59 (mouse hepatitis virus A59); SARS‐CoV (severe acute respiratory syndrome coronavirus); SARS‐CoV‐2 (severe acute respiratory syndrome coronavirus 2).
| Resource | Reference or source | Identifier or version number |
|---|---|---|
|
| ||
| Polyprotein 1a | UniProt | P0DTC1 |
| Polyprotein 1ab | UniProt | P0DTD1 |
| Spike glycoprotein | UniProt | P0DTC2 |
| ORF3a protein | UniProt | P0DTC3 |
| Envelope protein | UniProt | P0DTC4 |
| Matrix glycoprotein | UniProt | P0DTC5 |
| ORF6 protein | UniProt | P0DTC6 |
| ORF7a protein | UniProt | P0DTC7 |
| ORF7b protein | UniProt | P0DTD8 |
| ORF8 protein | UniProt | P0DTC8 |
| Nucleocapsid protein | UniProt | P0DTC9 |
| ORF9b protein | UniProt | P0DTD2 |
| ORF9c protein | UniProt | P0DTD3 |
| ORF10 protein | UniProt | A0A663DJA2 |
|
| ||
| HH‐suite |
| 3.3.0 (ac765987bd) |
| HMMER |
| 3.3 |
| cath‐resolve‐hits |
| v0.16.2‐0‐ga9f860c |
| CATH API |
| 4.3 |
| PredictProtein API |
| |
| SNAP2 API |
| |
| PSSH2 tools |
| |
| Jolecule |
| |
| Aquaria |
| |
|
| ||
| UniRef30 |
| 2020_03 |
| PDB |
| 27 March, 2021 |
| CATH‐Gene3D FunFamsHMM library |
| 4.3 |
| CATH nr40 |
| 11 September, 2019 |
| PSSH2 |
| 27 June, 2020 |