| Literature DB >> 32511334 |
Lim Heo1, Michael Feig1.
Abstract
Protein structures are crucial for understanding their biological activities. Since the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there is an urgent need to understand the biological behavior of the virus and provide a basis for developing effective therapies. Since the proteome of the virus was determined, some of the protein structures could be determined experimentally, and others were predicted via template-based modeling approaches. However, tertiary structures for several proteins are still not available from experiment nor they could be accurately predicted by template-based modeling because of lack of close homolog structures. Previous efforts to predict structures for these proteins include efforts by DeepMind and the Zhang group via machine learning-based structure prediction methods, i.e. AlphaFold and C-I-TASSER. However, the predicted models vary greatly and have not yet been subjected to refinement. Here, we are reporting new predictions from our in-house structure prediction pipeline. The pipeline takes advantage of inter-residue contact predictions from trRosetta, a machine learning-based method. The predicted models were further improved by applying molecular dynamics simulation-based refinement. We also took the AlphaFold models and refined them by applying the same refinement method. Models based on our structure prediction pipeline and the refined AlphaFold models were analyzed and compared with the C-I-TASSER models. All of our models are available at https://github.com/feiglab/sars-cov-2-proteins.Entities:
Year: 2020 PMID: 32511334 PMCID: PMC7239069 DOI: 10.1101/2020.03.25.008904
Source DB: PubMed Journal: bioRxiv
Summary of the modeled proteins and comparisons of predicted residues with other available models
| Protein name | RefSeq | FeigLab | AlphaFold | Zhang |
|---|---|---|---|---|
| nsp2 | YP_009725298.1 | 1–638 | 1–345, 438–638 | 1–638 |
| nsp4 | YP_009725300.1 | 1–500 | 1–489 | 1–500 |
| nsp6 | YP_009725302.1 | 1–290 | 1–278 | 1–290 |
| PL-PRO (nsp3) | YP_009725299.1 | 1260–1945 | 1571–1927 | 1–1945 |
| ORF3a | YP_009724391.1 | 1–275 | 38–233 | 1–275 |
| Membrane glycoprotein | YP_009724393.1 | 1–222 | 11–203 | 1–222 |
| ORF6 | YP_009724394.1 | 1–61 | N/A | 1–61 |
| ORF8 | YP_009724396.1 | 1–121 | N/A | 1–121 |
| ORF10 | YP_009725255.1 | 1–38 | N/A | 1–38 |
| ORF7b | YP_009725296.1 | 1–43 | N/A | N/A |
Structure change of AlphaFold models upon refinement measured in Cɑ-RMSD
| Protein name | Residues | Structure change upon refinement [Å] |
|---|---|---|
| nsp2 | 1–345 | 1.50 |
| 438–638 | 1.86 | |
| nsp4 | 1–273 | 1.79 |
| 274–399 | 1.90 | |
| 400–489 | 0.80 | |
| nsp6 | 1–278 | 2.01 |
| PL-PRO (nsp3) | 1571–1762 | 1.12 |
| 1763–1927 | 1.13 | |
| ORF3a | 38–233 | 2.26 |
| Membrane glycoprotein | 11–203 | 1.27 |
Figure 1.Protein models for nsp2: FeigLab (A), Zhang group (B), and AlphaFold models and their refined modelsfor residues 1–345 (C) and 438–638 (D). Structures are shown in cartoon representation and colored in rainbow from blue (N-terminal) to red (C-terminal). (C and D) Refined AlphaFold models are shown in rainbow, while AlphaFold models are shown in grey. Significantly changed regions after refinement are indicated by red arrows.
Figure 2.Protein models for nsp4: FeigLab (A), Zhang group (B), AlphaFold models and their refined models for residues 1–273 (C), 274–399 (D), and 400–489 (E). See Figure 1.
Figure 7.Protein models from FeigLab (A) and Zhang group (B). From the left to the right, protein models for ORF6, ORF8, ORF10, and ORF7b are shown. See Figure 1.
MolProbity scores for the modeled proteins
| Protein name | FeigLab | AlphaFold | Refined AlphaFold | Zhang |
|---|---|---|---|---|
| nsp2 | 1.34 | 1.68 | 0.55 | 4.41 |
| nsp4 | 1.13 | 1.30 | 0.73 | 4.20 |
| nsp6 | 0.94 | 1.28 | 0.66 | 5.12 |
| PL-PRO (nsp3) | 1.04 | 1.25 | 0.77 | 4.18 |
| ORF3a | 1.20 | 2.40 | 1.09 | 4.10 |
| Membrane glycoprotein | 0.74 | 1.38 | 0.50 | 4.55 |
| ORF6 | 0.50 | N/A | N/A | 4.02 |
| ORF8 | 1.38 | N/A | N/A | 4.45 |
| ORF10 | 1.00 | N/A | N/A | 3.50 |
| ORF7b | 0.50 | N/A | N/A | N/A |
Detailed geometric features for MolProbity score, clash score, rotamer outlier, and residues with backbone torsions in Ramachandran-favored regions are shown in parentheses. The clash score is the number of clashes per 1,000 atoms. Rotamer outlier and Ramachandran favored are percentages of residues with rotamer outliers and favored Ramachandran angles, respectively.