Literature DB >> 35364304

Conserved protein targets for developing pan-coronavirus drugs based on sequence and 3D structure similarity analyses.

Minfei Ma1, Yanqing Yang1, Leyun Wu1, Liping Zhou1, Yulong Shi1, Jiaxin Han2, Zhijian Xu3, Weiliang Zhu4.   

Abstract

There are 7 known human pathogenic coronaviruses, which are HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, MERS-CoV, SARS-CoV and SARS-CoV-2. While SARS-CoV-2 is currently caused a severe epidemic, experts believe that new pathogenic coronavirus would emerge in the future. Therefore, developing broad-spectrum anti-coronavirus drugs is of great significance. In this study, we performed protein sequence and three-dimensional structure analyses for all the 20 virus-encoded proteins across all the 7 coronaviruses, with the purpose to identify highly conserved proteins and binding sites for developing pan-coronavirus drugs. We found that nsp5, nsp10, nsp12, nsp13, nsp14, and nsp16 are highly conserved both in protein sequences (with average identity percentage higher than 52%, average amino acid conservation scores higher than 5.2) and binding pockets (with average amino acid conservation scores higher than 5.8). We also performed the similarity comparison between these 6 proteins and all the human proteins, and found that all the 6 proteins have similarity less than 25%, indicating that the drugs targeting the 6 proteins should have little interference of human protein function. Accordingly, we suggest that nsp5, nsp10, nsp12, nsp13, nsp14, and nsp16 are potential targets for pan-coronavirus drug development.
Copyright © 2022 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Broad-spectrum drugs; Drug design; Medicinal chemistry; Pharmaceutical informatics; Target discovery

Mesh:

Substances:

Year:  2022        PMID: 35364304      PMCID: PMC8957316          DOI: 10.1016/j.compbiomed.2022.105455

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   6.698


Introduction

Coronavirus, with spikes resembled on the virus surface like a crown, have attracted a great deal of world's attention due to the severe impact on human health and global economies. Among various coronaviruses, severe acute respiratory syndrome coronavirus (SARS-CoV, year 2003), middle east respiratory syndrome coronavirus (MERS-CoV, year 2012) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, year 2019) are known as highly transmissible pathogens that cause significant human morbidity and mortality [1,2]. A recent research suggests that human exposure to and spillover of SARS-related coronaviruses may be substantially underestimated, and the researchers estimated that around 400,000 people are infected with SARS-related coronaviruses annually in South and Southeast Asia [3]. It is reasonable to deduce that the harm of the coronavirus to human may not stop at the outbreak of COVID-19. In addition, the multiple mutations in SARS-CoV-2 Omicron variant may affect the current therapy [4]. Although some drugs for the treatment of COVID-19 can be found in Therapeutic Target Database (http://db.idrblab.net/ttd/) to be approved or out of the clinical trial, it's still a crucial task to find broad-spectrum treatments against the coronaviruses that have emerged or that may appear in the future [5]. Therefore, identifying potential proteins for pan-coronavirus drugs is of great importance. Coronaviruses, which mainly encode 20 proteins, are enveloped viruses containing a positive-sense and single-stranded RNA genome [[6], [7], [8]]. The genome organization for a coronavirus is 5′-leader-UTR-replicase(ORF1ab)-Spike (S)-Envelope (E)- Membrane (M)-Nucleocapsid (N)-3′UTR-poly A [9]. The open reading frames 1 ab occupy the first two thirds of the genome and is translated into 2 polyproteins pp1a and pp1ab. The pp1a and pp1ab are cleaved into nsp1 to nsp16 [10,11]. These non-structural proteins play significant roles in the regulation of viral RNA replication and transcription [12,13]. The later reading frames encode four structural proteins: spike (S) protein, envelop (E) protein, membrane (M) protein, and nucleocapsid (N) protein (Fig. 1 ) [14]. These structural proteins are vital for viral assembly and release of virus-like particles (VLPs) by transfected cells [12,15,16].
Fig. 1

Genomic sequence and protein structures of SARS-CoV-2. (A) The genome organization for SARS-CoV-2 full sequence (GenBank: OL518896.1). (B) Genomic organization of SARS-CoV-2 ORF1a and structures of ORF1a protein products (nsp1-nsp11). (C) Genomic organization of SARS-CoV-2 ORF1b and structures of ORF1b protein products (nsp12-nsp16). (D) Structures of SARS-CoV-2 structural proteins. Proteins with experimentally determined structures are marked with their PDB ID, proteins without experimental structures are predicted by AlphaFold v2.0.

Genomic sequence and protein structures of SARS-CoV-2. (A) The genome organization for SARS-CoV-2 full sequence (GenBank: OL518896.1). (B) Genomic organization of SARS-CoV-2 ORF1a and structures of ORF1a protein products (nsp1-nsp11). (C) Genomic organization of SARS-CoV-2 ORF1b and structures of ORF1b protein products (nsp12-nsp16). (D) Structures of SARS-CoV-2 structural proteins. Proteins with experimentally determined structures are marked with their PDB ID, proteins without experimental structures are predicted by AlphaFold v2.0. Human coronaviruses were first identified in the 1965 [17,18]. The seven coronaviruses that cause disease in humans include 4 common human coronaviruses (HCoVs), namely HCoV-229E, HCoV-OC43, HCoV-NL63, HCoV-HKU1, that are circulating globally in the human population, and 3 coronaviruses (SARS-CoV, MERS-CoV, SARS-CoV-2) which have caused major outbreaks of deadly pneumonia in the 21st century [[19], [20], [21]]. In this study, we collected the sequence and structural information of 20 proteins encoded by the 7 known pathogenic human coronaviruses, including 4 structural proteins (spike protein, envelop protein, membrane protein, nucleocapsid protein) and 16 non-structural proteins (nsp1-nsp16). After protein multiple sequence alignment (MSA), we evaluated the conservation of the 20 proteins among the human coronaviruses. Subsequently, we focused on the potential ligand-binding pockets of proteins, analyzed the pockets conservation and the characteristic laws of amino acids in the pockets, as well as carried out molecular docking to explore the rationality of the pocket conservative. Our results will be helpful for looking for broad-spectrum drugs to fight the currently known human coronavirus as well as coronavirus that could emerge in the future.

Methods

Data collection

Sequence information of the 4 structural proteins and 16 non-structural proteins were collected from Universal Protein Resource (UniProt) for each of the 7 known human coronaviruses. Subsequently, the experimentally determined structure of the 140 proteins were searched in the Protein Data Bank (PDB). Proteins with resolved structures were downloaded from PDB and processed into the form of monomer, while those without resolved structures were predicted by using AlphaFold v2.0 [22].

Protein multiple sequence alignment and conservative analysis

MSA was performed by Clustal W via the webserver Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) [23,24]. The results of MSA then were used to perform conservative analysis by the ConSurf Server (https://consurf.tau.ac.il/) [25]. The rate of evolution at each site was calculated using the empirical Bayesian, and the continuous conservation scores were then divided into nine levels (from grade 1 for most variable positions to grade 9 for the most conserved positions) of discrete scales in ConSurf [26]. The discrete scales were projected onto the protein sequences and structures of SARS-CoV-2 for visualization. In order to compare the conservative property of the 20 coronavirus proteins of the coronaviruses, two evaluation methods were used to rank protein conservation: one is the average value of sequence similarity percentages between the same proteins of different viruses, and the other is the average value of each amino acid's conservation scores that calculated by the ConSurf Server.

Protein pocket generation

All the potential ligand binding pockets of the 140 proteins were predicted by D3Pockets (http://www.d3pharma.com/D3Pocket/index.php) [27]. The pockets for further analysis in this study were selected based on the following criteria: the pockets with endogenous or reported ligand, the pockets with druggability score of 1 or ranked in top by D3Pockets. That is, if there are ligands in the experimental determined structures, select the pockets where the ligands are located. For proteins those without reported ligand, the pockets predicted to be druggable by D3Pockets are chosen preferentially. If the proteins do not meet the above two criteria, the first one ranked by D3Pockets was selected.

Pockets conservative assessment and molecular docking verification

Pocket conservation was assessed by the conservation scores that calculated using the ConSurf Server for each amino acid in the protein pockets. To verify the rationality of the predicted pocket conservative results, we carried out molecular docking for the SARS-CoV-2 proteins with ligands in their resolved structures (nsp3, nsp5, nsp12, nsp13 and nsp16) by the docking program Glide SP of Schrödinger Release 2020 [[28], [29], [30]].

Exploration of pocket amino acid composition

The amino acid composition of a protein pocket determines the interactions available for ligand binding, and understanding the composition of the potential binding site is of great importance for structure-based drug development. In order to explore the property of the potential ligand binding pockets of 20 coronavirus protein, as well as finding the conserved protein pockets with the consistent pocket amino acid composition among 7 human coronaviruses, we counted the frequency of each amino acid around the pocket.

Sequence similarity between highly conserved coronavirus proteins and human proteins

To assess the potential off-target problem, we performed sequence similarity searches between the 6 highly conserved proteins of coronaviruses and all the human proteins (using the protein sequences of SARS-CoV-2 as the representative of the coronaviruses protein family) using the search tool BLASTp provided by NCBI (https://www.ncbi.nlm.nih.gov/BLAST/). The amino acids that are identical to human proteins are shown in the 3D protein structure, and those in binding pockets are highlighted to facilitate the design of highly selective anti-coronavirus drugs in the future.

Results and discussion

Sequence alignment and conservative assessment

The protein information of the 7 coronaviruses is shown in Table 1 . The sequence information of the 140 proteins was obtained from UniProt, but only 42 proteins were found in PDB. The 3D structures of the rest 98 proteins were modeled by AlphaFold v2.0. As examples, the 20 proteins of SARS-CoV-2 were depicted in Fig. 1B and C (for 16 non-structural proteins) and 1D (for 4 structural proteins).
Table 1

Information of the non-structural and structural proteins of the 7 human pathogenic coronaviruses.a

ProteinAlternative nameORFLength range (a.a.)HCoV-229E
HCoV-OC43
SARS-CoV
HCoV-NL63
HCoV-HKU1
MERS-CoV
SARS-CoV-2
UniProt IDPDB IDUniProt IDPDB IDUniProt IDPDB IDUniProt IDPDB IDUniProt IDPDB IDUniProt IDPDB IDUniProt IDPDB ID
Non-structural proteins
nsp1Leader proteinORF1a110–246P0C6X1P0C6X6P0C6X72HSXP0C6X5P0C6X4K9N7C7P0DTD17K3N
nsp2ORF1a587–788P0C6X1P0C6X6P0C6X7P0C6X5P0C6X4K9N7C7P0DTD17MSW
nsp3Papain-like proteinaseORF1a1979–1564P0C6X1P0C6X6P0C6X75TL7P0C6X5P0C6X4K9N7C74RNAP0DTD17CMD
nsp4ORF1a477–508P0C6X1P0C6X6P0C6X7P0C6X5P0C6X4K9N7C7P0DTD1
nsp53C-like proteinaseORF1a302–311P0C6X12ZU2P0C6X6P0C6X72ZU5P0C6X57E6RP0C6X43D23K9N7C74RSPP0DTD16M2N
nsp6ORF1a279–294P0C6X1P0C6X6P0C6X7P0C6X5P0C6X4K9N7C7P0DTD1
nsp7ORF1a83–92P0C6X1P0C6X6P0C6X72KYSP0C6X5P0C6X4K9N7C7P0DTD16YHU
nsp8ORF1a194–201P0C6X1P0C6X6P0C6X7P0C6X5P0C6X4K9N7C7P0DTD1
nsp9ORF1a109–114P0C6X12J97P0C6X6P0C6X71UW7P0C6X5P0C6X4K9N7C7P0DTD16W9Q
nsp10Growth factor-like peptideORF1a135–141P0C6X1P0C6X6P0C6X73R24P0C6X5P0C6X4K9N7C75YN5P0DTD16W4H
nsp11ORF1a13–17P0C6U2P0C6U7P0C6U8P0C6U6P0C6U5K9N638P0DTD1
nsp12RNA-directed RNA polymeraseORF1b927–947P0C6X1P0C6X6P0C6X76NURP0C6X5P0C6X4K9N7C7P0DTD16XEZ
nsp13HelicaseORF1b597–611P0C6X1P0C6X6P0C6X76JYTP0C6X5P0C6X4K9N7C75WWPP0DTD16XEZ
nsp14Proofreading exoribonucleaseORF1b518–535P0C6X1P0C6X6P0C6X75C8SP0C6X5P0C6X4K9N7C7P0DTD17N0B
nsp15Uridylate-specific endoribonucleaseORF1b343–375P0C6X14S1TP0C6X6P0C6X72H85P0C6X5P0C6X4K9N7C75YVDP0DTD16X4I
nsp162′-O-methyltransferaseORF1b298–303P0C6X1P0C6X67NH7P0C6X73R24P0C6X5P0C6X4K9N7C75YN5P0DTD16W4H
Structural proteins
S proteinORF21173–1356P154237CYDP363346OHWP0DTC26ACDQ6Q1S27KIPQ0ZME75I08K9N5Q85 × 5FA0A679G9E96VXX
E proteinORF467–84S5YAG7Q4VID3P59637Q5SBN7Q5MQC8A0A166ZLT5A0A6C0QFP9
M proteinORF5219–230P15422Q4VID2P59596Q6Q1R9Q5MQC7K9N7A1A0A6V7AL93
N proteinORF9a377–448P15130P33469P59595Q6Q1R8Q5MQC6K9N4V7A0A6C0T6Z7

The underlined PDB ID indicates that the structure has resolved ligand structure.

Information of the non-structural and structural proteins of the 7 human pathogenic coronaviruses.a The underlined PDB ID indicates that the structure has resolved ligand structure. The MSA results and conservation scores were used to rank the conservation degree of 20 types of coronavirus proteins. The ranking results by both the average of sequence similarity percentages and the average value of each amino acid's conservation scores are summarized in Table 2 . There are 10 proteins with average identity percentage scores higher than 50% between SARS-CoV-2 and other coronaviruses, which all ranked in the top 10 by amino acid conservation scores (Table 2). Among these 10 proteins, the nsp11 monomer has no potential binding site because of its short sequence. It suggesting that the remaining 9 proteins deserve further assessment as potential targets for pan-coronavirus drug development, which are nsp13 (Helicase), nsp12 (RNA-directed RNA polymerase), nsp16 (2′-O-methyltransferase), nsp14 (Proofreading exoribonuclease), nsp10 (Growth factor-like peptide), nsp5 (3C-like proteinase), nsp9, nsp15 (Uridylate-specific endoribonuclease) and nsp7. We projected the discrete grades on the protein sequence (Fig. S1), the cartoon structure and the surface structure of the 20 proteins of the SARS-CoV-2 (Fig. S2). As examples, we show the cartoon structures of the 6 relatively conservative proteins in Fig. 2 .
Table 2

The conservation ranking result of the 20 proteins among the 7 human coronaviruses.a

Protein Identity Percentages
Amino Acid Conservation Score
OrderProteinAverageVarianceOrderProteinAverageVariance
1nsp1367.443%0.0121nsp135.80010.483
2nsp1266.107%0.0122nsp165.64810.235
3nsp1663.623%0.0113nsp125.5799.471
4nsp1459.489%0.0134nsp145.4408.706
5nsp1057.715%0.0155nsp115.3856.391
6nsp552.181%0.0196nsp55.2888.368
7nsp951.273%0.0207nsp155.2778.628
8nsp1551.086%0.0158nsp75.2177.134
9nsp750.476%0.0289nsp105.2098.554
10nsp1149.625%0.03110nsp95.1688.140
11nsp841.812%0.04511M protein5.0906.055
12M protein40.653%0.02612nsp85.0765.545
13nsp439.339%0.02513nsp45.0745.545
14N protein36.773%0.02314nsp65.0244.348
15S protein34.602%0.02815N protein5.0224.863
16nsp634.503%0.02716nsp15.0222.822
17nsp331.174%0.02017nsp35.0173.365
18E protein29.884%0.03118E protein5.0133.154
19nsp123.391%0.03119S protein5.0094.322
20nsp222.641%0.01720nsp24.9702.358

The ranking result based on the average of sequence identity percentages is displayed on the left, while the ranking result based on the average of each amino acid's conservation score is displayed on the right.

Fig. 2

Conservative analysis of coronavirus proteins. The 3D structures of (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16 presented by cartoon models (pink indicates the variable while blue indicates the conserved residues). The potential ligand binding pockets are displayed in dots. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

The conservation ranking result of the 20 proteins among the 7 human coronaviruses.a The ranking result based on the average of sequence identity percentages is displayed on the left, while the ranking result based on the average of each amino acid's conservation score is displayed on the right. Conservative analysis of coronavirus proteins. The 3D structures of (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16 presented by cartoon models (pink indicates the variable while blue indicates the conserved residues). The potential ligand binding pockets are displayed in dots. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Pockets conservative assessment and verification

With D3Pockets, we found that nsp11 of HCoV-HKU1 and SARS-CoV-2, E protein of MERS-CoV and SARS-CoV-2 have no possible ligand binding pockets. Therefore, the monomer structures of these two types of proteins are not potential targets. Accordingly, we only scored and ranked the pocket conservative property of the other 18 types of protein. Table 3 shows that 6 of the above 9 conservative proteins are ranked within top 10 by pocket amino acid conservation scores, which are nsp16 (2′-O-methyltransferase), nsp14 (Proofreading exoribonuclease), nsp13 (Helicase), nsp5 (3C-like proteinase), nsp12 (RNA-directed RNA polymerase) and nsp10 (Growth factor-like peptide), indicating that the 6 proteins should be potential drug targets for pan-coronavirus drug development. We also used Fpocket to calculate the pockets of these six proteins and compared them with those predicted by D3pockets (Fig. S3, Table S1) [31,32].
Table 3

Pocket conservation ranking results of the 18 proteins among the 7 human coronaviruses.

OrderProtein NameAverage of Protein Pocket Amino Acid Conservation ScoresVariance of Protein Pocket Amino Acid Conservation Scores
1nsp166.6388.679
2nsp146.3337.944
3M protein6.2944.590
4nsp136.11010.149
5nsp16.0002.625
6nsp55.9758.124
7nsp125.9588.895
8nsp105.8757.026
9S protein5.8154.062
10N protein5.5845.126
11nsp35.5713.959
12nsp45.5624.837
13nsp95.5007.625
14nsp155.3387.764
15nsp85.2274.903
16nsp25.2182.507
17nsp74.9056.086
18nsp64.6763.705
Pocket conservation ranking results of the 18 proteins among the 7 human coronaviruses. There are 5 ligand-protein structures of SARS-CoV-2 available from PDB, which are nsp3, nsp5, nsp12, nsp13, nsp16. To validate whether a protein ligand of a coronavirus, e.g. SARS-CoV-2, is also good pan-ligand to the homology protein of other coronaviruses, we performed molecular docking study for the 5 ligands to other homology proteins. The docking results are shown in Fig. 3 and Table S2. The variance of the docking scores of nsp12, nsp5, nsp13, nsp16 are all within 0.5 kcal/mol, in this perspective we considered that these four proteins have higher similarities in the same ligand binding pockets of different coronaviruses. In addition, as shown in Fig. 3, these four types of proteins have good binding ability to ligand molecules, especially nsp14 and nsp16. The successful marketing of SARS-CoV-2 nsp5 inhibitors (such as Nirmatrelvir, an antiviral medication developed by Pfizer which is part of the nirmatrelvir/ritonavir combination sold under the brand name Paxlovid) and nsp12 inhibitors (such as Remdesivir, an antiviral nucleotide analogue developed by Gilead Sciences) proves the druggability of nsp5 and nsp12. Therefore, we have reason to believe that nsp14 and nsp16 are also target proteins with high research value. In contrast, the docking scores for different nsp3 (PL-PRO) are quite different with variance of 1.4 kcal/mol, suggesting that the ligand has different binding affinity to the nsp3 protein from different coronaviruses. All the results revealed that the strategy for identifying potential pan-coronaviruses targets in this study is reasonable.
Fig. 3

The docking scores of SARS-CoV-2 ligands to the homology proteins of different coronaviruses (kcal/mol). The 2D structures represent the FDA approved SARS-CoV-2 nsp5 inhibitor Nirmatrelvir and nsp12 inhibitor Remdesivir.

The docking scores of SARS-CoV-2 ligands to the homology proteins of different coronaviruses (kcal/mol). The 2D structures represent the FDA approved SARS-CoV-2 nsp5 inhibitor Nirmatrelvir and nsp12 inhibitor Remdesivir.

Pocket amino acids analysis

We counted the frequency of each amino acid around the pocket, for investigating the pocket property of the 20 coronavirus proteins (Fig. 4 , Fig. S4). As shown in Fig. S4I and Fig. S4L, the sequence length of nsp11 and E protein is short, and the pockets calculated in the form of monomer protein are of little reference value as we mentioned above. From the statistical results of the pocket amino acids, we can summarize some characteristics: A) the pocket amino acid residues between SARS-CoV and SARS-CoV-2 are more consistent than others (Fig. 4), B) the frequency of CYS in the pocket of nsp10 is significantly higher than that of all other protein pockets (Fig. 4B), indicating that covalent inhibitors target nsp10 can be designed, C) the acidic amino acid ASP has a higher content in the nsp12 pockets of various coronaviruses (Fig. 4C), thus molecules that target nsp12 are preferably positively charged, D) the amino acid frequency of the nsp13 pocket is basically the same among the seven coronaviruses (Fig. 4D), with the smallest difference and the best conservation, followed by nsp5, nsp14 and nsp16 (Fig. 4A, E, 4F).
Fig. 4

Ratio of 20 amino acids that form the pockets of coronavirus (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16. Amino acid ratios of HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, MERS-CoV, SARS-CoV-2 are shown in rose pink, pale orange, light mustard, olive green, light teal, sky blue, pale violet respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Ratio of 20 amino acids that form the pockets of coronavirus (A) nsp5, (B) nsp10, (C) nsp12, (D) nsp13, (E) nsp14, (F) nsp16. Amino acid ratios of HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, MERS-CoV, SARS-CoV-2 are shown in rose pink, pale orange, light mustard, olive green, light teal, sky blue, pale violet respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

The similarity between coronavirus nsp5, nsp10, nsp12, nsp13, nsp14, nsp16 and human-derived proteins

As mentioned above, nsp5 (3C-like proteinase), nsp10 (Growth factor-like peptide), nsp12 (RNA-directed RNA polymerase), nsp13 (Helicase), nsp14 (Proofreading exoribonuclease) and nsp16 (2′-O-methyltransferase) are relatively conserved proteins. Therefore, we preliminarily believed that these types of proteins can be the preferred targets when designing broad-spectrum drugs for coronaviruses. In order to further explore the rationality of this hypothesis, we performed sequence similarity searches for these six types of proteins respectively by BLASTp. We listed the top three pieces of human-derived protein information with the smallest E value in the search results for each protein in Table 4 . In this table it can be seen that nsp5, nsp10, nsp12, nsp14, and nsp16 have very low similarity with human proteins, while nsp13 has a lower expect value with protein ZGRF1 isoform family.
Table 4

Information of the top three pieces proteins with the smallest except value in the BLAST results of SARS-CoV-2 nsp5 (3C-like proteinase), nsp10 (Growth factor-like peptide), nsp12 (RNA-directed RNA polymerase), nsp13 (Helicase), nsp14 (Proofreading exoribonuclease) and nsp16 (2′-O-methyltransferase).

ProteinHuman-derived protein descriptionTotal ScoreQuery CoverE ValuePer. Similaritya
nsp5Cadherin-12 isoform 3 [Homo sapiens]26.629%12520.64%
Cadherin-12 isoform 2 precursor [Homo sapiens]26.229%13521.84%
Cadherin-12 isoform 1 preproprotein [Homo sapiens]26.229%13821.58%
nsp10E3 ubiquitin-protein ligase MYCBP2 isoform X4 [Homo sapiens]28.510%8.622.48%
E3 ubiquitin-protein ligase MYCBP2 isoform X17 [Homo sapiens]28.510%8.722.48%
E3 ubiquitin-protein ligase MYCBP2 isoform X11 [Homo sapiens]28.510%8.722.48%
nsp12WASH complex subunit 2C isoform 25 [Homo sapiens]29.69%4520.86%
WASH complex subunit 2A isoform X14 [Homo sapiens]29.64%4620.86%
WASH complex subunit 2A isoform X16 [Homo sapiens]29.6.4%4620.86%
nsp13Protein ZGRF1 isoform X10 [Homo sapiens]38.134%0.07322.55%
Protein ZGRF1 isoform X9 [Homo sapiens]37.734%0.1122.55%
Protein ZGRF1 isoform X8 [Homo sapiens]37.434%0.1222.55%
nsp14Alstrom syndrome protein 1 isoform 2 [Homo sapiens]28.95%5519.79%
Alstrom syndrome protein 1 isoform 1 [Homo sapiens]28.95%5519.79%
Betaine—homocysteine S-methyltransferase 1 [Homo sapiens]26.63%20612.87%
nsp16Potassium voltage-gated channel subfamily C member 2 isoform X4 [Homo sapiens]24.36%51425.00%
Sphingomyelin phosphodiesterase 3 [Homo sapiens]24.327%52524.01%
Sphingomyelin phosphodiesterase 3 isoform X2 [Homo sapiens]23.927%70125.73%

Per. Similarity was calculated by Clustal W via the webserver Clustal Omega [24].

Information of the top three pieces proteins with the smallest except value in the BLAST results of SARS-CoV-2 nsp5 (3C-like proteinase), nsp10 (Growth factor-like peptide), nsp12 (RNA-directed RNA polymerase), nsp13 (Helicase), nsp14 (Proofreading exoribonuclease) and nsp16 (2′-O-methyltransferase). Per. Similarity was calculated by Clustal W via the webserver Clustal Omega [24]. Therefore, we ran MSA between nsp13 (represented by the nsp13 sequence of SRAS-CoV-2) and ten proteins in ZGRF1 isoform protein family. It showed that the sequence similarity percentage of SARS-CoV-2 nsp13 and protein ZGRF1 isoform X10 is 22.55%. With the MSA result, 111 amino acids (accounting for about 18.5% of the total length of nsp13) that completely matched with human protein were marked on the structure of SARS-CoV-2 nsp13 (Fig. S5A). Among them, 82 amino acid residues are located around the protein ligand binding pocket of SARS-CoV-2 nsp13 (Fig. S5B). However, even the nsp13, which has the highest sequence similarity with human-derived proteins, the similarity has not reached 30%. In fact, as shown in the last column of Table 4, these six conserved proteins have similarity less than 25% with human-derived proteins. Therefore, we still believed that these six proteins are potential drug targets for pan-coronavirus drug development.

Conclusion

Although the first human coronavirus was isolated as early as the 1960s, because HCoV-229E and HCoV-OC43 have low pathogenicity and low infectivity, the large-scale serious infections that have been caused by coronaviruses until the 21st century, especially after the global pandemic of COVID-19 that people have truly realized how harmful the coronavirus will cause to human. In order to deal with the current pathogenic human coronaviruses and even the coronaviruses that may be harmful to human in the future, researches related to pan-coronavirus drug discovery will be key issues for researchers. In this study, we performed MSA for the major proteins of currently known human coronaviruses, and evaluated the sequence conservation of these proteins among seven human coronaviruses. We found that nsp13, nsp12, nsp16, nsp14, nsp10, nsp5, nsp9, nsp15 and nsp7 are highly conserved throughout the 7 coronaviruses. Among them, 6 proteins, viz., nsp16, nsp14, nsp13, nsp5, nsp12 and nsp10, have more conserved ligand binding pockets than other proteins. We also found that the 6 highly conserved proteins are significantly different from all the human-derived proteins in both protein sequence similarity and ligand binding pocket structure. Overall, we believe all the work we have done can help people understand the structural and non-structural proteins of human coronaviruses, and more importantly, can provide information for the future development of pan-coronavirus drugs.

Declaration of competing interest

The authors declare no competing financial interest.
  30 in total

1.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes.

Authors:  Richard A Friesner; Robert B Murphy; Matthew P Repasky; Leah L Frye; Jeremy R Greenwood; Thomas A Halgren; Paul C Sanschagrin; Daniel T Mainz
Journal:  J Med Chem       Date:  2006-10-19       Impact factor: 7.446

2.  Covid-19: Coronavirus was first described in The BMJ in 1965.

Authors:  Elisabeth Mahase
Journal:  BMJ       Date:  2020-04-16

Review 3.  Human coronaviruses: what do they cause?

Authors:  Lia van der Hoek
Journal:  Antivir Ther       Date:  2007

4.  The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles.

Authors:  Y L Siu; K T Teoh; J Lo; C M Chan; F Kien; N Escriou; S W Tsao; J M Nicholls; R Altmeyer; J S M Peiris; R Bruzzone; B Nal
Journal:  J Virol       Date:  2008-08-27       Impact factor: 5.103

5.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.

Authors:  Haim Ashkenazy; Shiran Abadi; Eric Martz; Ofer Chay; Itay Mayrose; Tal Pupko; Nir Ben-Tal
Journal:  Nucleic Acids Res       Date:  2016-05-10       Impact factor: 16.971

6.  The EMBL-EBI search and sequence analysis tools APIs in 2019.

Authors:  Fábio Madeira; Young Mi Park; Joon Lee; Nicola Buso; Tamer Gur; Nandana Madhusoodanan; Prasad Basutkar; Adrian R N Tivey; Simon C Potter; Robert D Finn; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

7.  Fpocket: an open source platform for ligand pocket detection.

Authors:  Vincent Le Guilloux; Peter Schmidtke; Pierre Tuffery
Journal:  BMC Bioinformatics       Date:  2009-06-02       Impact factor: 3.169

8.  Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage.

Authors:  Eric J Snijder; Peter J Bredenbeek; Jessika C Dobbe; Volker Thiel; John Ziebuhr; Leo L M Poon; Yi Guan; Mikhail Rozanov; Willy J M Spaan; Alexander E Gorbalenya
Journal:  J Mol Biol       Date:  2003-08-29       Impact factor: 5.469

Review 9.  Coronavirus biology and replication: implications for SARS-CoV-2.

Authors:  Philip V'kovski; Annika Kratzel; Silvio Steiner; Hanspeter Stalder; Volker Thiel
Journal:  Nat Rev Microbiol       Date:  2020-10-28       Impact factor: 60.633

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.