| Literature DB >> 35364304 |
Minfei Ma1, Yanqing Yang1, Leyun Wu1, Liping Zhou1, Yulong Shi1, Jiaxin Han2, Zhijian Xu3, Weiliang Zhu4.
Abstract
There are 7 known human pathogenic coronaviruses, which are HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, MERS-CoV, SARS-CoV and SARS-CoV-2. While SARS-CoV-2 is currently caused a severe epidemic, experts believe that new pathogenic coronavirus would emerge in the future. Therefore, developing broad-spectrum anti-coronavirus drugs is of great significance. In this study, we performed protein sequence and three-dimensional structure analyses for all the 20 virus-encoded proteins across all the 7 coronaviruses, with the purpose to identify highly conserved proteins and binding sites for developing pan-coronavirus drugs. We found that nsp5, nsp10, nsp12, nsp13, nsp14, and nsp16 are highly conserved both in protein sequences (with average identity percentage higher than 52%, average amino acid conservation scores higher than 5.2) and binding pockets (with average amino acid conservation scores higher than 5.8). We also performed the similarity comparison between these 6 proteins and all the human proteins, and found that all the 6 proteins have similarity less than 25%, indicating that the drugs targeting the 6 proteins should have little interference of human protein function. Accordingly, we suggest that nsp5, nsp10, nsp12, nsp13, nsp14, and nsp16 are potential targets for pan-coronavirus drug development.Entities:
Keywords: Broad-spectrum drugs; Drug design; Medicinal chemistry; Pharmaceutical informatics; Target discovery
Mesh:
Substances:
Year: 2022 PMID: 35364304 PMCID: PMC8957316 DOI: 10.1016/j.compbiomed.2022.105455
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 6.698
Fig. 1Genomic sequence and protein structures of SARS-CoV-2. (A) The genome organization for SARS-CoV-2 full sequence (GenBank: OL518896.1). (B) Genomic organization of SARS-CoV-2 ORF1a and structures of ORF1a protein products (nsp1-nsp11). (C) Genomic organization of SARS-CoV-2 ORF1b and structures of ORF1b protein products (nsp12-nsp16). (D) Structures of SARS-CoV-2 structural proteins. Proteins with experimentally determined structures are marked with their PDB ID, proteins without experimental structures are predicted by AlphaFold v2.0.
Information of the non-structural and structural proteins of the 7 human pathogenic coronaviruses.a
| Protein | Alternative name | ORF | Length range (a.a.) | HCoV-229E | HCoV-OC43 | SARS-CoV | HCoV-NL63 | HCoV-HKU1 | MERS-CoV | SARS-CoV-2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| UniProt ID | PDB ID | UniProt ID | PDB ID | UniProt ID | PDB ID | UniProt ID | PDB ID | UniProt ID | PDB ID | UniProt ID | PDB ID | UniProt ID | PDB ID | ||||
| Non-structural proteins | |||||||||||||||||
| Leader protein | ORF1a | 110–246 | P0C6X1 | P0C6X6 | P0C6X7 | 2HSX | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | 7K3N | ||||||
| ORF1a | 587–788 | P0C6X1 | P0C6X6 | P0C6X7 | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | 7MSW | ||||||||
| Papain-like proteinase | ORF1a | 1979–1564 | P0C6X1 | P0C6X6 | P0C6X7 | 5TL7 | P0C6X5 | P0C6X4 | K9N7C7 | 4RNA | P0DTD1 | ||||||
| ORF1a | 477–508 | P0C6X1 | P0C6X6 | P0C6X7 | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | |||||||||
| 3C-like proteinase | ORF1a | 302–311 | P0C6X1 | 2ZU2 | P0C6X6 | P0C6X7 | P0C6X5 | 7E6R | P0C6X4 | 3D23 | K9N7C7 | 4RSP | P0DTD1 | ||||
| ORF1a | 279–294 | P0C6X1 | P0C6X6 | P0C6X7 | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | |||||||||
| ORF1a | 83–92 | P0C6X1 | P0C6X6 | P0C6X7 | 2KYS | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | 6YHU | |||||||
| ORF1a | 194–201 | P0C6X1 | P0C6X6 | P0C6X7 | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | |||||||||
| ORF1a | 109–114 | P0C6X1 | 2J97 | P0C6X6 | P0C6X7 | 1UW7 | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | 6W9Q | ||||||
| Growth factor-like peptide | ORF1a | 135–141 | P0C6X1 | P0C6X6 | P0C6X7 | 3R24 | P0C6X5 | P0C6X4 | K9N7C7 | 5YN5 | P0DTD1 | 6W4H | |||||
| ORF1a | 13–17 | P0C6U2 | P0C6U7 | P0C6U8 | P0C6U6 | P0C6U5 | K9N638 | P0DTD1 | |||||||||
| RNA-directed RNA polymerase | ORF1b | 927–947 | P0C6X1 | P0C6X6 | P0C6X7 | 6NUR | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | |||||||
| Helicase | ORF1b | 597–611 | P0C6X1 | P0C6X6 | P0C6X7 | 6JYT | P0C6X5 | P0C6X4 | K9N7C7 | 5WWP | P0DTD1 | ||||||
| Proofreading exoribonuclease | ORF1b | 518–535 | P0C6X1 | P0C6X6 | P0C6X7 | 5C8S | P0C6X5 | P0C6X4 | K9N7C7 | P0DTD1 | 7N0B | ||||||
| Uridylate-specific endoribonuclease | ORF1b | 343–375 | P0C6X1 | 4S1T | P0C6X6 | P0C6X7 | 2H85 | P0C6X5 | P0C6X4 | K9N7C7 | 5YVD | P0DTD1 | 6X4I | ||||
| 2′-O-methyltransferase | ORF1b | 298–303 | P0C6X1 | P0C6X6 | P0C6X7 | P0C6X5 | P0C6X4 | K9N7C7 | 5YN5 | P0DTD1 | |||||||
| ORF2 | 1173–1356 | 7CYD | 6OHW | 6ACD | 7KIP | 5I08 | K9N5Q8 | 5 × 5F | A0A679G9E9 | 6VXX | |||||||
| ORF4 | 67–84 | S5YAG7 | A0A166ZLT5 | A0A6C0QFP9 | |||||||||||||
| ORF5 | 219–230 | K9N7A1 | A0A6V7AL93 | ||||||||||||||
| ORF9a | 377–448 | K9N4V7 | A0A6C0T6Z7 | ||||||||||||||
The underlined PDB ID indicates that the structure has resolved ligand structure.
The conservation ranking result of the 20 proteins among the 7 human coronaviruses.a
| Protein Identity Percentages | Amino Acid Conservation Score | ||||||
|---|---|---|---|---|---|---|---|
| Order | Protein | Average | Variance | Order | Protein | Average | Variance |
| nsp13 | 67.443% | 0.012 | nsp13 | 5.800 | 10.483 | ||
| nsp12 | 66.107% | 0.012 | nsp16 | 5.648 | 10.235 | ||
| nsp16 | 63.623% | 0.011 | nsp12 | 5.579 | 9.471 | ||
| nsp14 | 59.489% | 0.013 | nsp14 | 5.440 | 8.706 | ||
| nsp10 | 57.715% | 0.015 | nsp11 | 5.385 | 6.391 | ||
| nsp5 | 52.181% | 0.019 | nsp5 | 5.288 | 8.368 | ||
| nsp9 | 51.273% | 0.020 | nsp15 | 5.277 | 8.628 | ||
| nsp15 | 51.086% | 0.015 | nsp7 | 5.217 | 7.134 | ||
| nsp7 | 50.476% | 0.028 | nsp10 | 5.209 | 8.554 | ||
| nsp11 | 49.625% | 0.031 | nsp9 | 5.168 | 8.140 | ||
| nsp8 | 41.812% | 0.045 | M protein | 5.090 | 6.055 | ||
| M protein | 40.653% | 0.026 | nsp8 | 5.076 | 5.545 | ||
| nsp4 | 39.339% | 0.025 | nsp4 | 5.074 | 5.545 | ||
| N protein | 36.773% | 0.023 | nsp6 | 5.024 | 4.348 | ||
| S protein | 34.602% | 0.028 | N protein | 5.022 | 4.863 | ||
| nsp6 | 34.503% | 0.027 | nsp1 | 5.022 | 2.822 | ||
| nsp3 | 31.174% | 0.020 | nsp3 | 5.017 | 3.365 | ||
| E protein | 29.884% | 0.031 | E protein | 5.013 | 3.154 | ||
| nsp1 | 23.391% | 0.031 | S protein | 5.009 | 4.322 | ||
| nsp2 | 22.641% | 0.017 | nsp2 | 4.970 | 2.358 | ||
The ranking result based on the average of sequence identity percentages is displayed on the left, while the ranking result based on the average of each amino acid's conservation score is displayed on the right.
Fig. 2Conservative analysis of coronavirus proteins. The 3D structures of (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16 presented by cartoon models (pink indicates the variable while blue indicates the conserved residues). The potential ligand binding pockets are displayed in dots. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Pocket conservation ranking results of the 18 proteins among the 7 human coronaviruses.
| Order | Protein Name | Average of Protein Pocket Amino Acid Conservation Scores | Variance of Protein Pocket Amino Acid Conservation Scores |
|---|---|---|---|
| nsp16 | 6.638 | 8.679 | |
| nsp14 | 6.333 | 7.944 | |
| M protein | 6.294 | 4.590 | |
| nsp13 | 6.110 | 10.149 | |
| nsp1 | 6.000 | 2.625 | |
| nsp5 | 5.975 | 8.124 | |
| nsp12 | 5.958 | 8.895 | |
| nsp10 | 5.875 | 7.026 | |
| S protein | 5.815 | 4.062 | |
| N protein | 5.584 | 5.126 | |
| nsp3 | 5.571 | 3.959 | |
| nsp4 | 5.562 | 4.837 | |
| nsp9 | 5.500 | 7.625 | |
| nsp15 | 5.338 | 7.764 | |
| nsp8 | 5.227 | 4.903 | |
| nsp2 | 5.218 | 2.507 | |
| nsp7 | 4.905 | 6.086 | |
| nsp6 | 4.676 | 3.705 |
Fig. 3The docking scores of SARS-CoV-2 ligands to the homology proteins of different coronaviruses (kcal/mol). The 2D structures represent the FDA approved SARS-CoV-2 nsp5 inhibitor Nirmatrelvir and nsp12 inhibitor Remdesivir.
Fig. 4Ratio of 20 amino acids that form the pockets of coronavirus (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16. Amino acid ratios of HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, MERS-CoV, SARS-CoV-2 are shown in rose pink, pale orange, light mustard, olive green, light teal, sky blue, pale violet respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Information of the top three pieces proteins with the smallest except value in the BLAST results of SARS-CoV-2 nsp5 (3C-like proteinase), nsp10 (Growth factor-like peptide), nsp12 (RNA-directed RNA polymerase), nsp13 (Helicase), nsp14 (Proofreading exoribonuclease) and nsp16 (2′-O-methyltransferase).
| Protein | Human-derived protein description | Total Score | Query Cover | E Value | Per. Similarity |
|---|---|---|---|---|---|
| Cadherin-12 isoform 3 [Homo sapiens] | 26.6 | 29% | 125 | 20.64% | |
| Cadherin-12 isoform 2 precursor [Homo sapiens] | 26.2 | 29% | 135 | 21.84% | |
| Cadherin-12 isoform 1 preproprotein [Homo sapiens] | 26.2 | 29% | 138 | 21.58% | |
| E3 ubiquitin-protein ligase MYCBP2 isoform X4 [Homo sapiens] | 28.5 | 10% | 8.6 | 22.48% | |
| E3 ubiquitin-protein ligase MYCBP2 isoform X17 [Homo sapiens] | 28.5 | 10% | 8.7 | 22.48% | |
| E3 ubiquitin-protein ligase MYCBP2 isoform X11 [Homo sapiens] | 28.5 | 10% | 8.7 | 22.48% | |
| WASH complex subunit 2C isoform 25 [Homo sapiens] | 29.6 | 9% | 45 | 20.86% | |
| WASH complex subunit 2A isoform X14 [Homo sapiens] | 29.6 | 4% | 46 | 20.86% | |
| WASH complex subunit 2A isoform X16 [Homo sapiens] | 29.6. | 4% | 46 | 20.86% | |
| Protein ZGRF1 isoform X10 [Homo sapiens] | 38.1 | 34% | 0.073 | 22.55% | |
| Protein ZGRF1 isoform X9 [Homo sapiens] | 37.7 | 34% | 0.11 | 22.55% | |
| Protein ZGRF1 isoform X8 [Homo sapiens] | 37.4 | 34% | 0.12 | 22.55% | |
| Alstrom syndrome protein 1 isoform 2 [Homo sapiens] | 28.9 | 5% | 55 | 19.79% | |
| Alstrom syndrome protein 1 isoform 1 [Homo sapiens] | 28.9 | 5% | 55 | 19.79% | |
| Betaine—homocysteine S-methyltransferase 1 [Homo sapiens] | 26.6 | 3% | 206 | 12.87% | |
| Potassium voltage-gated channel subfamily C member 2 isoform X4 [Homo sapiens] | 24.3 | 6% | 514 | 25.00% | |
| Sphingomyelin phosphodiesterase 3 [Homo sapiens] | 24.3 | 27% | 525 | 24.01% | |
| Sphingomyelin phosphodiesterase 3 isoform X2 [Homo sapiens] | 23.9 | 27% | 701 | 25.73% |
Per. Similarity was calculated by Clustal W via the webserver Clustal Omega [24].