| Literature DB >> 33348379 |
Vaishali P Waman1, Neeladri Sen1, Mihaly Varadi2, Antoine Daina3, Shoshana J Wodak4, Vincent Zoete5, Sameer Velankar6, Christine Orengo7.
Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.Entities:
Keywords: SARS-CoV-2; mutation/variation; protein 3D structures; structural bioinformatics; structure prediction; therapeutics
Mesh:
Substances:
Year: 2021 PMID: 33348379 PMCID: PMC7799268 DOI: 10.1093/bib/bbaa362
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
wwPDB consortium members providing access to experimentally determined macromolecular structures and EMPIAR providing access to raw EM data
| Data resource | Landing page | Example SARS-CoV-2 entry |
|---|---|---|
| Protein Data Bank in Europe (PDBe) |
|
|
| Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) |
|
|
| Protein Data Bank Japan (PDBj) |
|
|
| Electron Microscopy Data Bank (EMDB) |
|
|
| Electron Microscopy Public Image Archive (EMPIAR) |
|
|
Figure 1Structural information of SARS-CoV-2 from single proteins to organelles. Structural information on SARS-CoV-2 range from high-resolution single protein structures in PDB to lower resolution EM maps in EMDB and organelles and cells in EMPIAR.
Figure 2Aggregated overview of structural data for the SARS-CoV-2 3C-like proteinase. Collating all the available structural information on the protein level can yield valuable insights. This is demonstrated in the aggregated view of SARS-CoV-2 3C-like proteinase at PDBe-KB (https://pdbekb.org/proteins/PRO_0000449623) that collates data from over 160 PDB entries (panel A). The residue-level interactions between the protein and over 140 distinct small molecules can be visualized both on a 2D sequence feature viewer (panel B) and using a molecular graphics viewer (panel C), superposing every small molecule and highlighting residues that are consistently involved in binding to various small molecules.
Data resources providing overviews of the available structural information on SARS-CoV-2 proteins
| Service name | Access URL |
|---|---|
| PDBe COVID-19 Portal |
|
| RCSB PDB COVID-19 Page |
|
| PDBj COVID-19 Page |
|
| BMRB COVID-19 Page |
|
| EMBL-EBI COVID-19 Data Portal |
|
| Swiss-Model COVID-19 Page |
|
| Coronavirus3D |
|
| Complex Portal COVID-19 Page |
|
| InterPro COVID-19 Page |
|
| UniProt COVID-19 Entry Pages |
|
| 3DBioNotes-WS COVID-19 Page |
|
| PDBSum COVID-19 Page |
|
| Protepedia COVID-19 Page |
|
Figure 3Screenshots from the SWISS-MODEL SARS-CoV-2 web resource (A) model of the viral NSP14 (B) model of the host interactor Procollagen galactosyltransferase. Shown along with the models are the quality estimates, template alignment, etc.
Figure 4Screenshots from Aquaria web resource (A) Homepage showing all the SARS-CoV-2 proteins (B) RNA polymerase complex colored using UniProt chain features.
Details of the various resources containing structures/models of SARS-CoV-2 proteins
| Name, URL and resource leader | Presence of experimental structures | Presence of theoretical structures | Information on human proteins | Type of modeling technique used | Brief description of the modeling technique | Criteria used to decide the model quality | Model refinement technique used | Additional comments |
|---|---|---|---|---|---|---|---|---|
| SWISS-MODEL Repository | No | Yes | Yes | Homology modeling | SWISS-MODEL is a fully automated protein structure homology-modeling server, using template-based modeling techniques to model 3-dimensional proteins, as well as homo- and heteromeric complexes. | The model quality estimation tool QMEAN is used to estimate model confidence. | Manually curated a set of 3D homology models and experimental structures for SARS-CoV2 virus proteins and complexes and host proteins. Host proteins have been associated with information from Interpro, STRING, UniProt, variant data, metal-binding site, etc. | |
| Aquaria | Yes | Yes | No | Homology Modeling | Homology models were built by searching sequence homologs of regions of proteins based on a machine learning-based searching method | Contains additional information from CATH, Uniprot, SNAP2, PredictProtein tools. Also contains information about subcellular localization, function, interacting partners, similar proteins, etc. | ||
| Protein Structure Modeling for SARS-CoV-2 at Kiharalab | Yes | Yes | No |
| Inter-residue distances, H-bonds and angles were first predicted with a deep neural network. Then Rosetta was used for modeling the protein structure in | MD and coarse-grained short simulation | ||
| Coronavirus3d | Yes | Yes | No | Homology modeling | MODELLER/SWISS-MODEL equivalent | Sequence similarity | Also contains variant data | |
| Structural genomics and interactomics of SARS-COV2 novel coronavirus | Yes | Yes | No | Homology Modeling | MODELLER | Also, contain functional site mapping and a model of the viral interactome. | ||
| CoV3D | Yes | Yes | No | Glycan modeling on the spike | N-glycan modeling and refinement using Rosetta | |||
| COVID-19 molecular structure and therapeutic hub | Yes | Yes | Yes | Hub containing the crystal structures, models, docking results, MD studies, therapeutics collated from various groups | ||||
| covid-19.bioreproducibilty | Yes | No | Yes | Modeling in electron density using Coot, model refinement in Refmac | MolProbity; ligand validation in Twilight; expert evaluation | Maximum Likelihood structure factor refinement | The resource contains refined experimental structures | |
| Coronavirus Structural Taskforce | Yes | No | No | Modeling in the electron density map | MolProbity based validation | The resource contains refined experimental structures | ||
| GOL COVID-19 | Yes | No | No | Modeling in the electron density map | Buster | The resource contains refined experimental structures | ||
|
| Yes | No | No | Modeling in the electron density map | ISOLDE | The resource contains refined experimental structures | ||
| C-I-TASSER on COVID-19 | No | Yes | No | Deep-learning, threading and | Integrating contact-maps from deep-learning with I-TASSER fragment assembly simulations | C-score (threading score and convergence of simulation decoys) | Fragment-guided MD simulations | Contains a complex structure of the host-viral complex. |
| Alphafold | No | Yes | No | Deep learning | Trained a neural network to make predictions of the distances between pairs of residues. The potential of mean force was constructed to accurately describe the shape of a protein. The resulting potential was optimized by a simple gradient descent algorithm to generate structures. | Best performing tool in the last CASP13 experiment. | ||
| Protein structure models for COVID-19 proteins | No | Yes | No |
| trRosetta builds the protein structure based on direct energy minimizations with a restrained Rosetta. The restraints include inter-residue distance and orientation distributions, predicted by a deep residual neural network. | Estimated TM-score | Rosetta fast relax | |
| SARS-CoV2 Protein Structure Models | No | Yes | No | MD-based protein structure refinement | The resource contains refined theoretical structures | |||
| SARS-CoV-2 EVcouplings: mutations, function and structure | No | Yes | Yes | Evolutionary couplings contact prediction | Direct interactions were predicted from co-evolution in natural sequences and used to predict direct 3D contacts to solve the 3D fold. The interaction data was also used to predict the effect of mutations on the fitness of the sequence. | 3D structure predictions are compared to experimental structures of SARS-CoV-2 proteins and/or potential homologs. | Contains the prediction of contacts between residues of proteins, and the effect of mutations. They plan to model the proteins and their complexes in the future. Also, contains a comparison between the SARS-CoV2 with SARS-CoV and the nearest coronavirus bat RATG31. | |
| CASP_Commons | No | Yes | No | Various model accuracy estimates such as ProQ3D, QmeanDISCO, etc | CASP competition for template free modeling of SARS-CoV2 proteins without any template from PDB. Contains structure details, comparisons between models, and scoring of the model by different scoring schemes. | |||
| CAPRI Docking | No | Yes | Yes | Various established model accuracy estimates | CAPRI competition to predict the models of COVID complexes. |
Details about repositories for MD simulation data
| Name of resource and group/team | URL | Force field | Description |
|---|---|---|---|
| CHARMM COVID library |
| CHARMM, NAMD, Gromacs, Amber, GENESIS and OpenMM | Simulation system (CHARMM, NAMD, Gromacs, Amber, GENESIS, and OpenMM) for running MD on COVID proteins. This resource does not contain MD simulation trajectories for the proteins or the complexes. Developed an all-atom model for the SARS-CoV2 spike protein with all the glycans attached along with building a membrane system for the spike protein simulations. |
| DE Shaw group |
| Variations of Amber on Anton2 supercomputer | Atomistic MD trajectories for SARS-CoV-2 proteins and their complexes (with other viral/host proteins, therapeutics). Simulations of 128 FDA approved drugs with viral targets. |
| The SIRAH-CoV-2 initiative |
| SIRAH | Coarse-Grained trajectories for all SARS-CoV-2 proteins reported in the PDB |
| COVID-19 molecular structure and therapeutic hub |
| Multiple force fields | Hub containing the crystal structures, models, docking results, MD studies, therapeutics collated from various groups |
| BioExcel COVID19 |
| Multiple force fields | Hub containing atomic MD trajectories from different groups |
3D structure based resources focusing on variation/mutation data
| Name of resource | URL and reference | Description | Group/team |
|---|---|---|---|
| Coronavirus3D |
| The resource maps SARS-CoV-2 genomic variations, from CNCB, using experimental structures as well as models | Adam Godzik |
| COVID-3D |
| The resources utilize both virus (obtained from 45 000 SARS-CoV-2 genomes in GISAID) and population variant data (available from a wide range of population variation resources). The resource utilizes both experimental as well as predicted structures for SAR-CoV-2 proteins, for annotations, analyses and visualization of mutations | David Ascher |
| Viral Integrated Structural Evolution Dynamic Database |
| The resource utilizes predicted models along with information from missense variants (from gnomAD database), MD simulations, and evolutionary mapping. It aims to gain insights into variation and dynamics properties of SARS-CoV-2 proteome | Jeremy Prokop |
| SWISS-MODEL |
| Provides models for SARS-CoV-2, annotations from UniProt and sites information from dbSNP | Torsten Schwede |
| EVCouplings (SARS-CoV-2: mutations, function and structure) |
| For SARS-CoV-2 proteins, the resource provides, in silico deep mutation scans, visualization of mutations on 3D structures, structure prediction from coevolution | Debora Marks and Chris Sander |
| EACoV server |
| Provides evolutionary analysis of proteins from SARS-CoV-2 proteome using Evolutionary Trace (ET) method. For every protein, it identifies variants (unique, most frequent and all variants) and epitopes using ET-based approach. These are mapped on corresponding 3D structure. | Olivier Lichtarge |
Structural bioinformatics resources for therapeutics
| Name of resource | URL and reference | Description | Group/team |
|---|---|---|---|
| HADDOCK |
| A virtual screening platform to guide drug-repurposing: It has performed screening of thousands of chemical compounds against key SARS-CoV-2 proteins | Alexander Bonvin group |
| Diamond Light Resource (COVID-19 moonshot) |
| Diamond has initiated a crystallographic fragment screening using the main protease | Diamond Group and collaborators (Credits: |
| Postera.ai |
|
| PostEra, San Francisco, USA |
| Fragalysis browser |
| 3D visualization and analysis of hit fragments from the XChem initiative | XChem Group, Diamond, UK |
| FOLDING@HOME |
| Openly distributed computational power for further CADD on compounds obtained by the COVID Moonshot initiative. | FOLDING@HOME consortium ( |
| Exscalate4COV |
| High-Performance Computing resources for virtual screening of user-submitted compounds | Exscalate4COV’s Consortium ( |
| G2Pdb (Guide to PHARMACOLOGY) |
| Curated data on 64 ligands associated with SARS-CoV-2 | G2Pdb Curation Team ( |
| Chemical Checker-based expansion of drugs |
| The group is regularly collecting suggested COVID-19 drugs from literature with different levels of supporting evidence. To date, 307 literature-derived candidates are identified. Chemical Checker tool is used to find small molecules (that exhibit similar bioactivity and chemical features) to these reported drugs | Patric Alloy group |
| FMODB |
| The database is designed to FMO calculations using crystal structures (in PDB) for several SARS-CoV-2 proteins. | The FMODD consortium |
| D3Targets-2019-nCoV |
| A molecular docking based web server for predicting drug targets and virtual screening against COVID-19 | Zhijian Xu and Weiliang Zhu |
| COVID-19 docking server |
| A docking server for prediction of the binding modes between the targets and their ligands such as peptides, small molecules, antibodies (The server uses docking tools: CoDockPP and Autodock Vina) | Shan Chang group |
| SARS-CoV-2 data at ChEMBL |
| The page provides a link to the summary of SARS-CoV-2 related data at ChEMBL | ChEMBL |
| Virus Chemogenomics resource [ |
| Provides data of 3D structures, antiviral drugs, docking utilities, gene and protein annotations | Feng lab |
| CoV-AbDab: The Coronavirus Antibody Database |
| This is the first database that compiles antibodies known to bind SARS-CoV-2 and other beta coronaviruses (SARS-CoV-1, MERS-CoV): contains data on >380 patented/published/antibodies and nanobodies that bind to at least one betacoronavirus | The Oxford Protein Informatics Group |
Figure 5An account of structure-based studies on spike (S) protein of SARS-CoV-2. Some of the applications using spike protein in SARS-CoV-2 are illustrated, as follows. (I) The crystal structure of the SARS-CoV-2 receptor-binding domain (RBD; shown in purple) in complex with human ACE2 receptor (gray) is depicted using Chimera [PDB ID: 6M0J]. The direct contact residues (shown in red while residues in secondary shell are shown in blue) as well as key hotspot positions 31 and 353 (encircled in orange) are studied by various groups [6, 7]. (II) The impact of mutations at hotspot residue 353, on the stability of the RBD-ACE2 complex in various hosts (A. human, B. horseshoe bat, C. cat and dog) are illustrated (Source of images (I) and (II) and more details in Lam et al. [124]). (III) The crystal structure of RBD (purple) in complex with human antibody CR3022 (heavy chain: blue, light chain: cyan) is resolved (PDB ID: 6W41). (IV) Structure-based design of prefusion conformation of spike: design of vaccine candidate namely HexaPro: the high resolution cryo-EM structure is solved by Hsieh et al. [18]; Source of Image: Hsieh et al. [18].
Figure 6Antiviral drugs repurposed against COVID-19, for which 3D structures of the ligand-protein complex were determined experimentally. Both approved drugs against chronic hepatitis C, boceprevir and telaprevir, inhibit SARS-CoV-2 main protease (3CLpro) and are clinically evaluated in different association. Veterinary molecule against feline CoV infection, GC376, is a prodrug generating an irreversible nanomolar 3CLpro inhibitor and will probably enter clinical phase. Remdesivir, a late development drug against Ebola virus, is a SARS-CoV-2 RNA-Dependent RNA Polymerase (RdRp) strong inhibitor and received emergency use authorization for COVID-19 in Europe and USA. Favipiravir, used in influenza infection, is a RdRp inhibitor investigated against SARS-CoV-2 infection.
Figure 7SARS-CoV-2 Main protease (3CLpro): earliest, advanced structure-based drug discovery routes (pdb entry, resolution, release date). The proteins are displayed in beige cartoon and the ligands in colored ball-and-sticks. (A) the first crystal of 3CLpro enabled the discovery of disulfiram and carmofur as potential drugs to repurpose through structure-based virtual and high-throughput screenings [10] as well as the design of peptidomimetic covalent ligands [159]; (B) an apo crystal was subject to structure-based design for covalent ligands [148] and recently brought rational for repurposing a non-covalent small molecule initially developed as a kinase inhibitor; (C) The XChem initiative started with the resolution of an apo structure allowing soaking experiments that led to more than 80 cocrystals with covalent and non-covalent fragments, most are located in the active site.