Vasileios Pierros1, Evangelos Kontopodis1,2, Dimitrios J Stravopodis2, George Th Tsangaris1. 1. Proteomics Research Unit, Biomedical Research Foundation of the Academy of Athens, 11527, Athens, Greece. 2. Section of Cell Biology and Biophysics, Department of Biology, School of Science, National and Kapodistrian University of Athens, 15701, Athens, Greece.
Abstract
SARS-CoV-2 pandemic has necessitated the identification of sequence areas in the viral proteome that are capable to serve as antigenic sites and treatment targets. In the present study, we have applied a novel approach for mechanistically illuminating the virus-host organism interactions, by analyzing the Unique Peptides (UPs) of the virus featured by a minimum amino acid sequence length being defined as Core Unique Peptides (CrUPs), not of the virus per se, but against the entire proteome of the host organism. This approach resulted in the identification of CrUPs of the virus itself, which could not be recognized in the host organism proteome. Thereby, we analyzed the SARS-CoV-2 proteome for identification of CrUPs against the human proteome, which have been defined as C/H-CrUPs. We herein reveal that SARS-CoV-2 include 7.503 C/H-CrUPs, with the SPIKE_SARS2 being detected as the protein with the highest density of C/H-CrUPs. Extensive analysis has indicated that the critical P681R mutation produces new C/H-CrUPs around the R685 cleavage site, while the L452R mutation causes loss of antigenicity of the NF9 peptide and strong(er) binding of the virus to its ACE2 receptor protein. Simultaneous formation of these mutations in detrimental variants like Delta leads to the immune escape of the virus, its massive entrance into the host cell, a notable increase in virus formation, and its massive release and thus elevated infectivity of human target cells.
SARS-CoV-2 pandemic has necessitated the identification of sequence areas in the viral proteome that are capable to serve as antigenic sites and treatment targets. In the present study, we have applied a novel approach for mechanistically illuminating the virus-host organism interactions, by analyzing the Unique Peptides (UPs) of the virus featured by a minimum amino acid sequence length being defined as Core Unique Peptides (CrUPs), not of the virus per se, but against the entire proteome of the host organism. This approach resulted in the identification of CrUPs of the virus itself, which could not be recognized in the host organism proteome. Thereby, we analyzed the SARS-CoV-2 proteome for identification of CrUPs against the human proteome, which have been defined as C/H-CrUPs. We herein reveal that SARS-CoV-2 include 7.503 C/H-CrUPs, with the SPIKE_SARS2 being detected as the protein with the highest density of C/H-CrUPs. Extensive analysis has indicated that the critical P681R mutation produces new C/H-CrUPs around the R685 cleavage site, while the L452R mutation causes loss of antigenicity of the NF9 peptide and strong(er) binding of the virus to its ACE2 receptor protein. Simultaneous formation of these mutations in detrimental variants like Delta leads to the immune escape of the virus, its massive entrance into the host cell, a notable increase in virus formation, and its massive release and thus elevated infectivity of human target cells.
Covid-19 pandemic has emerged the urgent necessity of the identification of sequence sites of the SARS-CoV-2 viral proteome that can serve as appropriate treatment targets and antigenic positions suitable for production of therapeutic vaccines.As we have recently described, a Unique Peptide (UP) is defined as the peptide carrying an amino acid sequence that appears only in one of all proteins in a particular proteome. To this direction, our team has also introduced, for the first time, the concept of Core Unique Peptide (CrUP), which represents the peptide bearing a minimum length of amino acid sequence that resides solely in one of all proteins in a profiled proteome, thereby rendering it a unique signature for identification and differential recognition of a given protein (Alexandridou et al., 2009; Kontopodis et al., 2019). Hence, to thoroughly map the UP-specific landscape of a proteome of interest, we have developed a novel bioinformatics tool that is based on advanced algorithms being dedicated to big-data analysis. Its engagement to deep and accurate processing of the 20.430 reviewed Homo sapiens (human) proteins led to the recognition and identification of more than 7 × 106 CrUPs, which represent the backbone of human Uniquome that is mainly described as the voluminous collection of UPs shaping the human proteome (Kontopodis et al., 2022 and Kontopodis et al. manuscript in preparation).Most importantly, to further illuminate the mechanisms controlling virus-host interactions, we have recently developed a novel, dynamic and advanced bioinformatics platform to thoroughly analyze and compare virus-derived CrUPs against host-organism proteome(s). This unique collection contains peptides that notably differ from the virus-specific CrUPs themselves, with each one of them being described as the peptide carrying an amino acid sequence of minimum length that is accommodated exclusively in one out of all proteins throughout the viral proteome. This virus against host CrUPs bear two cardinal properties: first, they are unique in virus proteome and, second, they do not exist in host-organism proteome. Therefore, the virus against host proteome-derived CrUPs can advance our knowledge and understanding of virus-host interactions, and virus infectiveness and pathogenicity dynamics. Furthermore, they can be used as diagnostic and antigenic peptides, and likely therapeutic targets, as well. Altogether, these CrUPs seem to represent a completely new entity of peptides capable to significantly improve our view and comprehension regarding the structuring, functioning and mapping of virus and human Uniquomes, and their proteomic “cross-talks”, towards immune escape and infectiveness (Kontopodis et al., 2022).Since human cells can host the SARS-CoV-2 virus, we have herein engaged our novel bioinformatics platform not only for the profiling of CrUPs in the SARS-CoV-2 proteome per se, but, most importantly, for their identification against the human proteome (C/H-CrUPs). Remarkably, C/H-CrUPs can likely serve as targets for the immune response upon infection, and antigenic sites with major pharmaceutical and diagnostic potential, for the successful clinical management of Covid-19 pandemic.
Results and discussion
SARS-CoV-2 core unique peptides against human proteome
The SARS-CoV-2 proteome is structurally quite simple. In the UNIPROT database (version 7/2021), 16 reviewed and 75.714 unreviewed proteins have been included (Jungreis et al., 2021). For the present study, only the 16 reviewed proteins are examined, since the unreviewed proteome components contain (among others) duplicate registrations, and unverified sequences and protein fragments, which could lead to unreliable data regarding the uniqueness of a protein sequence.To recognize all the CrUPs being embraced in the SARS-CoV-2 proteome against the human proteome, we in silico constructed a new, artificial, “hybrid-proteome” that contained all the reviewed human proteins (20.430 proteins), plus the one protein derived from the SARS-CoV-2 viral proteome (20.431 proteins). Thus, 16 “hybrid proteomes” including the 16 SARS-CoV-2 proteins were constructed. Hence, these “hybrid proteomes” were bioinformatically searched one by one for the identification of SARS-CoV-2-specific CrUPs in human protein sequence environments (C/H-CrUPs).Strikingly, 7.503 C/H-CrUPs were detected, with 4.213 of them being presented one time in the SARS-CoV-2 proteome, 3.289 being observed two times in the viral proteome and only one peptide (“VNNATN”) with a 6 amino acid length being recognized three times (Table 1). Data processing and analysis unveiled that C/H-CrUPs retain a length range from 4 to 9 amino acids, while longer peptides could not be identified in the SARS-CoV-2 virus proteome. Length distribution showed that the majority of C/H-CrUPs have a 6 amino acid length, whereas only one with 4 amino acids and only two with 9 amino acids C/H-CrUPs were observed (Figure 1).
Table 1
Viral CrUPs against Human proteome (C/H-CrUPs).
VIRUS
Proteins (number)
Total number of AA
Total C/H-CrUPs (number)
C/H-CrUPs appeared 1 time (number)
C/H-CrUPs appeared 2 times (number)
C/H-CrUPs appeared 3 times (number)
C/H-CrUPs Density
SARS-CoV-2
16
14.401
7.503
4.213
3.289
1
75%
SARS-CoV
15
14.396
7.534
4.236
3.298
0
75%
ΜERS
10
14.216
7.413
4.077
3.336
0
76%
Viral proteomes of the β coronavirus group SARS-CoV-2, SARS-CoV and MERS-CoV were analyzed for core unique peptides (CrUPs) against the human proteome. The identified CrUPs of each virus against the human proteome are presented (C/H-CrUPs). C/H-CrUPs were further analyzed for the times by which they appear in each viral proteome. C/H-CrUP density is defined as the percentage of total amino acids contained in C/H-CrUPs of each virus to the total number of the virus amino acids.
Figure 1
Amino acid length distribution of virus Core Unique Peptides (CrUPs) against human proteome. A) Set of CrUPs derived from SARS-CoV-2, SARS-CoV and MERS-CoV viruses against the human proteome. The CrUPs were identified, listed and grouped according to their amino acid length. B) Graphical presentation of CrUPs amino acid length across β coronavirus group.
Viral CrUPs against Human proteome (C/H-CrUPs).Viral proteomes of the β coronavirus group SARS-CoV-2, SARS-CoV and MERS-CoV were analyzed for core unique peptides (CrUPs) against the human proteome. The identified CrUPs of each virus against the human proteome are presented (C/H-CrUPs). C/H-CrUPs were further analyzed for the times by which they appear in each viral proteome. C/H-CrUP density is defined as the percentage of total amino acids contained in C/H-CrUPs of each virus to the total number of the virus amino acids.Amino acid length distribution of virus Core Unique Peptides (CrUPs) against human proteome. A) Set of CrUPs derived from SARS-CoV-2, SARS-CoV and MERS-CoV viruses against the human proteome. The CrUPs were identified, listed and grouped according to their amino acid length. B) Graphical presentation of CrUPs amino acid length across β coronavirus group.The distribution of C/H-CrUPs across SARS-CoV-2 proteins demonstrated that the Replicase Polyprotein 1ab (R1AB_SARS2), which is the longest viral protein consisted of 7.096 amino acids, produces almost half of the identified C/H-CrUPs (5.334; 49,3%) (Table 2). On the other hand, the Putative ORF3b protein (ORF3B_SARS2), with a length of 22 amino acids, produces only 15 C/H-CrUPs that show a protein density of 68%. Notably, Spike glycoprotein (SPIKE_SARS2) is presented with the highest C/H-CrUPs density (78%), thus indicating its intriguing feature to carry the highest number of C/H-CrUPs (987), in terms of their physical length, as opposed to the ORF3c protein (ORF3C_SARS2), which is characterized by a respective density of only 56% (Table 2). A typical example for the construction of C/H-CrUPs is the peptide “PDEDEEEGD”. This peptide is a 9 amino acid in length C/H–CrUP that belongs to Replicase polyprotein 1a (R1A_SARS2), starting at position 927 and ending at position 935 (Figure 2). Around this peptide, 8 C/H-CrUPs were recognized with a 5–7 amino acid length range.
Table 2
Virus detailed analysis.
SARS-CoV-2
Entry ID
Entry Name
Protein Name
Length (AA number)
C/Η-CrUPs (number)
C/Η-CrUPs Density
P0DTD1
R1AB_SARS2
Replicase polyprotein 1ab
7096
5334
75%
P0DTC1
R1A_SARS2
Replicase polyprotein 1a
4405
3294
75%
P0DTC2
SPIKE_SARS2
Spike glycoprotein
1273
987
78%
P0DTC9
NCAP_SARS2
Nucleoprotein
419
308
74%
P0DTC3
AP3A_SARS2
ORF3a protein
275
210
76%
P0DTC5
VME1_SARS2
Membrane protein
222
171
77%
P0DTC7
NS7A_SARS2
ORF7a protein
121
90
74%
P0DTC8
NS8_SARS2
ORF8 protein
121
82
68%
P0DTD2
ORF9B_SARS2
ORF9b protein
97
69
71%
P0DTD3
ORF9C_SARS2
Putative ORF9c protein
73
50
68%
P0DTC4
VEMP_SARS2
Envelope small membrane protein
75
48
64%
P0DTC6
NS6_SARS2
ORF6 protein
61
44
72%
P0DTG0
ORF3D_SARS2
Putative ORF3d protein
57
40
70%
P0DTD8
NS7B_SARS2
ORF7b protein
43
29
67%
P0DTG1
ORF3C_SARS2
ORF3c protein
41
23
56%
P0DTF1
ORF3B_SARS2
Putative ORF3b protein
22
15
68%
SARS-CoV
Entry ID
Entry Name
Protein Name
Length (AA number)
S/Η-CrUPs (number)
S/Η-CrUPs Density
P0C6X7
R1AB_SARS
Replicase polyprotein 1ab
7.073
5.346
76%
P0C6U8
R1A_SARS
Replicase polyprotein 1a
4.382
3.301
75%
P59594
SPIKE_SARS
Spike glycoprotein
1.275
970
76%
P59595
NCAP_SARS
Nucleoprotein
422
319
76%
P59632
AP3A_SARS
ORF3a protein
274
208
76%
P59596
VME1_SARS
Membrane protein
221
162
73%
P59633
NS3B_SARS
ORF3b protein
154
113
73%
P59635
NS7A_SARS
ORF7a protein
122
93
76%
P59636
ORF9B_SARS
ORF9b protein
98
71
72%
Q80H93
NS8B_SARS
ORF8b protein
84
59
70%
P59637
VEMP_SARS
Envelope small membrane protein
75
47
63%
Q7TLC7
Y14_SARS
Uncharacterized protein 14
70
45
64%
P59634
NS6_SARS
ORF6 protein
63
44
70%
Q7TFA1
NS7B_SARS
Protein non-structural 7b
44
27
61%
Q7TFA0
NS8A_SARS
ORF8a protein
39
27
69%
MERS
Entry ID
Entry Name
Protein Name
Length (AA number)
M/Η-CrUPs (number)
M/Η-CrUPs Density
K9N7C7
R1AB_MERS1
Replicase polyprotein 1ab
7.078
5.364
76%
K9N638
R1A_MERS1
Replicase polyprotein 1a
4.391
3.338
76%
K9N5Q8
SPIKE_MERS1
Spike glycoprotein
1.353
1.024
76%
K9N4V7
NCAP_MERS1
Nucleoprotein
411
301
73%
K9N643
ORF4B_MERS
Non-structural protein ORF4b
246
185
75%
K9N7D2
ORF5_MERS1
Non-structural protein ORF5
224
169
75%
K9N7A1
VME1_MERS1
Membrane protein
219
158
72%
K9N4V0
ORF4A_MERS1
Non-structural protein ORF4a
109
77
71%
K9N796
ORF3_MERS1
Non-structural protein ORF3
103
74
72%
K9N5R3
VEMP_MERS1
Envelope small membrane protein
82
59
72%
Analysis of the SARS-CoV-2, SARS-CoV and MERS-CoV virus is presented. All viruses' proteins have been in silicoanalyzed and each protein is shown by its Entry-ID, Entry Name and Protein Name according to the UNIPTOT database. The amino acid length of each protein and the number along with density of CrUPs per protein against the human proteome are shown. Density is defined as the percentage of total amino acids contained in CrUPs of each protein to the total number of the protein's amino acids.
Figure 2
Identification of C/H-CrUPs around amino acid position 925-942 of the SARS-CoV-2 protein R1A_SARS2 (P0DTC1). In between these amino acid positions one of the two C/H-CrUPs with a 9 amino acid length is included (927–935). A) Schematic representation of the C/H-CrUPs included in that peptide (925-942), B) Table of C/H-CrUPs.
Virus detailed analysis.Analysis of the SARS-CoV-2, SARS-CoV and MERS-CoV virus is presented. All viruses' proteins have been in silicoanalyzed and each protein is shown by its Entry-ID, Entry Name and Protein Name according to the UNIPTOT database. The amino acid length of each protein and the number along with density of CrUPs per protein against the human proteome are shown. Density is defined as the percentage of total amino acids contained in CrUPs of each protein to the total number of the protein's amino acids.Identification of C/H-CrUPs around amino acid position 925-942 of the SARS-CoV-2 protein R1A_SARS2 (P0DTC1). In between these amino acid positions one of the two C/H-CrUPs with a 9 amino acid length is included (927–935). A) Schematic representation of the C/H-CrUPs included in that peptide (925-942), B) Table of C/H-CrUPs.
Comparative analysis of SARS-CoV-2, SARS-CoV and MERS-CoV core unique peptides against human proteome
In order to illuminate the mechanisms orchestrating the differential pathologies of SARS-CoV-2 compared to other coronavirus family members, we, next, applied the same strategy to other two similar viruses, the Severe Acute Respiratory Syndrome CoronaVirus (SARS-CoV) and the Middle East Respiratory Syndrome-related CoronaVirus (MERS-CoV). Among human viruses, SARS-CoV-2 (C) together with SARS-CoV (S) and MERS-CoV (M) constitute the β coronavirus group, and they use the same cellular receptor, the Angiotensin-Converting Enzyme 2 (ACE2), with SARS-CoV-2 sharing approximately 80 and 70% amino acid sequence identity with SARS-CoV and MERS-CoV, respectively (Saputri et al., 2020; Walls et al., 2020). SARS-CoV viral proteome includes 15 reviewed proteins, while MERS-CoV contains 10 reviewed proteins in the UNIPROT database. Our findings confirm the strong similarities among these three coronaviruses at the level of CrUP structure and architecture against human proteome. Interestingly, a more comprehensive analysis of CrUPs per protein has revealed significant differences between them. The density of M/H-CrUPs per protein ranges between 71-76% (5% range), the density of S/H-CrUPs per protein varies between 61-76% (15% range) and the density of C/H-CrUPs per protein fluctuates between 56-78% (22% range) (Table 2), thus indicating the comparatively more heterogenous CrUPs density in the SARS-CoV-2 coronaviral proteome.
Comparative analysis of viruses spike protein
Among all SARS-CoV-2 proteins, the SPIKE_SARS2 (P0DTC2) one (Spike) has received the greatest attention as a key element for virus attachment to the host cell, and as such it has become a principal target for therapeutic vaccine development (Papa et al., 2021; Xia 2021). To mechanistically couple protein's molecular features with virus pathology at the level of C/H-CrUPs, we comparatively analyzed the Spike proteins of the three coronaviruses, and, next, we projected the findings onto SPIKE_SARS2 mutation map. Spike glycoprotein presents a length of 1.273 amino acids in SARS-CoV-2, 1.275 amino acids in SARS-CoV and 1.373 amino acids in MERS-CoV (Agrawal et al., 2021). Their densities in CrUPs against the human proteome are measured as 78%, 76% and 76%, respectively, exhibiting the highest CrUP density values among all proteins for each virus herein studied (Table 2). Amino acid sequence alignment of SPIKE_SARS2 (P0DTC2), SPIKE_SARS (P59594) and R9UQ53_MERS (R9UQ53) proved that these three viral Spike proteins share a group of 12 regions, herein defined as Universal Peptides (UnPs) (Figure 3 and Table 3). The majority of coronaviral UnPs are clustered in the S2 domain of each Spike protein, with a critical one of them (UnPs) containing the Furin cleavage site 3 (R815↓S).
Figure 3
Alignment of the SARS-CoV-2, SARS-CoV and MERS-CoV Spike proteins. The amino acid sequence of sSpike proteins P0DTC2, P59594 and R9UQ53 derived from the SARS-CoV-2, SARS-CoV and MERS-CoV viruses, respectively, were obtained for Uniprot database and subsequently aligned, according to an online available bioinformatic tool in that database. Green blocks with red outline mark the identical peptidic sequences between the alignment sequences. The identical peptide sequences are considered as Universal Peptides (UnPs). Red arrows indicate the cleavage sites of the SARS-CoV-2 Spike protein.
Table 3
Spike-derived Universal Peptides (UnPs) and their residing CrUPs against the human proteome.
Collection of the Universal peptides of SARS-CoV-2, SARS-CoV and MERS-CoV spike proteins according to Figure 4Β alignement. The position in each protein sequence and the peptide sequence are shown. "∗" symbol indicates positions with different amino acids residues among the examined proteins. CrUPs being contained in Universal Peptides (UnPs) are recorded. Notably, they are followed by the domain of Spike protein which they belong in. Yellow blocks indicate complete sequence CrUPs that appear in the Universal Peptides (UnPs) in all Spike proteins alignment.
Alignment of the SARS-CoV-2, SARS-CoV and MERS-CoV Spike proteins. The amino acid sequence of sSpike proteins P0DTC2, P59594 and R9UQ53 derived from the SARS-CoV-2, SARS-CoV and MERS-CoV viruses, respectively, were obtained for Uniprot database and subsequently aligned, according to an online available bioinformatic tool in that database. Green blocks with red outline mark the identical peptidic sequences between the alignment sequences. The identical peptide sequences are considered as Universal Peptides (UnPs). Red arrows indicate the cleavage sites of the SARS-CoV-2 Spike protein.Spike-derived Universal Peptides (UnPs) and their residing CrUPs against the human proteome.Collection of the Universal peptides of SARS-CoV-2, SARS-CoV and MERS-CoV spike proteins according to Figure 4Β alignement. The position in each protein sequence and the peptide sequence are shown. "∗" symbol indicates positions with different amino acids residues among the examined proteins. CrUPs being contained in Universal Peptides (UnPs) are recorded. Notably, they are followed by the domain of Spike protein which they belong in. Yellow blocks indicate complete sequence CrUPs that appear in the Universal Peptides (UnPs) in all Spike proteins alignment.
Figure 4
Alignment of the SARS-CoV-2 Spike protein (SPIKE_SARS2, P0DTC2) of the 25 sup-variants belonging to the major 9 virus variants, together with the native (wild-type) Spike Protein. A) N-terminal and C-terminal areas of the native (wild-type) Spike protein, and the 25 sup-variants are presented. B) Complete Spike protein sequence alignment. Purple blocks mark the point mutation sites in variants; green color indicates the Unique Peptides (UnPs) of the Spike proteins from Figure 3; yellow color denotes the Receptor-Binding Domain (RBD) of Spike protein to ACE2; pink color indicates the Receptor-Binding Motif (RBM); cyan color marks the NF9 peptide; light blue color indicates the bridge between S1 and S2 domains; red arrows denote the cleavage sites. Different domains of the Spike protein are marked with different colors in the upper side of the alignment. C) The Spike protein alignment around the bridge domain (light blue color) between the S1 and S2 domains is presented. Red arrow denotes the Furin cleavage site R685↓S. Purple blocks mark the point mutations around this position, while red outline indicates the Delta and Kappa variants carrying the critical mutation P681R.
Analysis of SARS-CoV-2 variants spike protein
Most importantly, SARS-CoV-2 Spike protein has presented a significant mutational diversity (Sanches et al., 2021; Tzou et al., 2020). Hitherto, 9 main variants with adaptive mutations and high spread to human populations, named from Alpha to Lambda, respectively, have been thoroughly mapped and characterized. These 9 variants are divided in 39 sub-variants, while other 32 sporadic variants have also been described (Tzou et al., 2020). To investigate the association of mutational profiling with C/H–CrUP landscaping of SARS-CoV-2 Spike protein, the 39 sub-variants together with the wild-type Spike protein (SPIKE_SARS2, P0DTC2) were suitably aligned (Figure 4). This multiple alignment illustrates all the herein identified Universal Peptides (UnPs) (Table 3) and all the mutations previously announced per isolated variant (Figure 4B). Notably, it seems that almost all the hitherto characterized mutations are identified in regions being located outside the UnPs group. Their majority are clustered in the S1 domain of Spike protein, with two critical mutations being detected in the S1–S2 bridge region, at the amino acid residue 681 that resides in proximity to the first cleavage position by Furin protease, in between the 685th and 686th amino acid residue (Figure 4C) (Davidson et al., 2020; Coutard et al., 2020).Alignment of the SARS-CoV-2 Spike protein (SPIKE_SARS2, P0DTC2) of the 25 sup-variants belonging to the major 9 virus variants, together with the native (wild-type) Spike Protein. A) N-terminal and C-terminal areas of the native (wild-type) Spike protein, and the 25 sup-variants are presented. B) Complete Spike protein sequence alignment. Purple blocks mark the point mutation sites in variants; green color indicates the Unique Peptides (UnPs) of the Spike proteins from Figure 3; yellow color denotes the Receptor-Binding Domain (RBD) of Spike protein to ACE2; pink color indicates the Receptor-Binding Motif (RBM); cyan color marks the NF9 peptide; light blue color indicates the bridge between S1 and S2 domains; red arrows denote the cleavage sites. Different domains of the Spike protein are marked with different colors in the upper side of the alignment. C) The Spike protein alignment around the bridge domain (light blue color) between the S1 and S2 domains is presented. Red arrow denotes the Furin cleavage site R685↓S. Purple blocks mark the point mutations around this position, while red outline indicates the Delta and Kappa variants carrying the critical mutation P681R.Remarkably, all the examined mutations herein prove to create new CrUPs against the human proteome compared to the wild-type Spike protein, thus indicating that the mutant virus strains need novel clinical treatments. This is an important finding, since these new C/H-CrUPs do not exist in the human proteome, but are observed exclusively in the mutant virus proteomes, thereby justifying the great attention Alpha, Delta, Kappa, Lambda and Mu variants have recently received at the worldwide level (Tzou et al., 2020). Table 4 lists all the novel C/H-CrUPs being created by the hitherto reported mutations in coronavirus variants. These variants include 25 mutations, which produce 44 new CrUPs against the human proteome. It may be these novel C/H-CrUPs that give rise to formation of new Intrinsically Disordered Regions (IDRs) and Small Linear Motifs (SLiMs) in the SARS-CoV-2 Spike protein mutant versions (van der Lee et al., 2014; Hraber et al., 2020).
Table 4
New C/H-CrUPs of SARS-CoV-2 Spike protein in Alpha, Delta, Kappa and Lambda variants.
The new C/H-CrUPs of SARS-CoV-2 spike protein (SPIKE_SARS2, P0DTC2) across the variants Alpha, Delta, Kappa and Lambda are presented. In the first column, the position of each mutation in the Spike protein sequence is shown. In the second column the mutation is recorded. In the third column, the SARS-CoV-2 main variant which each mutation is appeared in, is recorded. In the fourth column, the position of the first amino acid residues of the new C/H–CrUP created by each mutation is shown. In the last column, the new created C/H-CrUPs by each mutation is recorded. Each mutant amino acid residue in the new C/H-CrUPs is denoted by red color. Mutations that not create new C/H-CrUPs are indicated by the symbol ‘-‘. Some mutations produce multiple new C/H-CrUPs, while 4 new C/H-CrUPs are created in more than one variant.
New C/H-CrUPs of SARS-CoV-2 Spike protein in Alpha, Delta, Kappa and Lambda variants.The new C/H-CrUPs of SARS-CoV-2 spike protein (SPIKE_SARS2, P0DTC2) across the variants Alpha, Delta, Kappa and Lambda are presented. In the first column, the position of each mutation in the Spike protein sequence is shown. In the second column the mutation is recorded. In the third column, the SARS-CoV-2 main variant which each mutation is appeared in, is recorded. In the fourth column, the position of the first amino acid residues of the new C/H–CrUP created by each mutation is shown. In the last column, the new created C/H-CrUPs by each mutation is recorded. Each mutant amino acid residue in the new C/H-CrUPs is denoted by red color. Mutations that not create new C/H-CrUPs are indicated by the symbol ‘-‘. Some mutations produce multiple new C/H-CrUPs, while 4 new C/H-CrUPs are created in more than one variant.The molecular mechanism of Spike protein's proteolytic activation has been shown to play a crucial role in the selection of host species, virus binding to the ACE2 receptor, virus-cell fusion, and viral infection of human lung cells (Peacock et al., 2021; Whittaker 2021; Shang et al., 2020a). Spike protein contains three cleavage sites: the R685↓S and the R815↓S positions that serve as direct targets of Furin protease, and the T696↓M position that can be recognized by TMPRSS2 protease (Hoffmann et al., 2020a, 2020b; Takeda, 2021). Analysis of the wild-type C/H-CrUPs and the new formed, mutation-induced, C/H-CrUPs in Spike protein unveiled that the mutation-driven, novel, peptides are created exclusively around the critical R685↓S cleavage site by the two pathogenic mutations P681H and P681R (Table 5).
Table 5
New C/H-CrUPs around the SARS-CoV-2 Spike protein cleavage sites.
The new C/H-CrUPs created by the mutations around the SARS-CoV-2 spike protein (SPIKE_SARS2, P0DTC2) are identified. Fist column: The cleavage site of SARS-CoV-2 Spike protein. Second column: The mutation identified around the cleavage site. Third column: The virus variants in which the mutation appears in. Fourth column: The position in the SARS-CoV-2 Spike protein sequence which the first amino acid of the C/H–CrUP appears in. Fifth column: The sequence of the new C/H–CrUP. "↓" symbol indicates the cleavage site within this peptide.
New C/H-CrUPs around the SARS-CoV-2 Spike protein cleavage sites.The new C/H-CrUPs created by the mutations around the SARS-CoV-2 spike protein (SPIKE_SARS2, P0DTC2) are identified. Fist column: The cleavage site of SARS-CoV-2 Spike protein. Second column: The mutation identified around the cleavage site. Third column: The virus variants in which the mutation appears in. Fourth column: The position in the SARS-CoV-2 Spike protein sequence which the first amino acid of the C/H–CrUP appears in. Fifth column: The sequence of the new C/H–CrUP. "↓" symbol indicates the cleavage site within this peptide.
Analysis of C/H-CrUPs around the R685↓S cleavage site
Notably, among these four new peptides (Table 5), the only one that embraces Furin's cleavage site is the “SRRRAR↓S” C/H–CrUP, which is solely generated by the P681R mutation carried by the Delta and Kappa coronavirus variants, while at the same time the “PRRARSV” peptide maintains its uniqueness even after the replacement of Proline (P) with Arginine (R) and its transformation to “RRRARSV” (Figure 5A,B).
Figure 5
C/H-CrUPs residing around the R685↓S cleavage site and belonging to the NF9 peptide of Spike protein (SPIKE_SARS2, P0DTC2). A) Amino acid sequences of Spike protein between position 671 and 700 in wild-type, Alpha and Delta variants of SARS-CoV-2 virus are shown. In each variant, the identified C/H-CrUPs are marked. Blue lines indicate C/H-CrUPs derived from wild-type protein around the R685↓S cleavage site. Red lines denote C/H-CrUPs produced by the P681H and P681R mutations. Green lines indicate the new created mutant C/H-CrUPs that derive from the P681H and P681R mutations in Alpha and Delta variants, respectively. B) Set of C/H-CrUPs generated around the R685↓S cleavage site of wild-type and mutant Spike protein forms. C) Amino acid sequences of the NF9 peptide between positions 448 and 456 in wild-type Spike protein, before and after creation of the L452R and L452Q mutations. Blue lines indicate C/H-CrUPs that belong to the NF9 peptide. Red lines denote C/H-CrUPs that are produced by the L452R and L452Q mutations. Green lines indicate the new generated mutant collection of C/H-CrUPs derived from the L452R and L452Q mutations. D) Set of C/H-CrUPs residing in the NF9 peptide in wild-type, and L452R and L452Q mutated protein forms.
C/H-CrUPs residing around the R685↓S cleavage site and belonging to the NF9 peptide of Spike protein (SPIKE_SARS2, P0DTC2). A) Amino acid sequences of Spike protein between position 671 and 700 in wild-type, Alpha and Delta variants of SARS-CoV-2 virus are shown. In each variant, the identified C/H-CrUPs are marked. Blue lines indicate C/H-CrUPs derived from wild-type protein around the R685↓S cleavage site. Red lines denote C/H-CrUPs produced by the P681H and P681R mutations. Green lines indicate the new created mutant C/H-CrUPs that derive from the P681H and P681R mutations in Alpha and Delta variants, respectively. B) Set of C/H-CrUPs generated around the R685↓S cleavage site of wild-type and mutant Spike protein forms. C) Amino acid sequences of the NF9 peptide between positions 448 and 456 in wild-type Spike protein, before and after creation of the L452R and L452Q mutations. Blue lines indicate C/H-CrUPs that belong to the NF9 peptide. Red lines denote C/H-CrUPs that are produced by the L452R and L452Q mutations. Green lines indicate the new generated mutant collection of C/H-CrUPs derived from the L452R and L452Q mutations. D) Set of C/H-CrUPs residing in the NF9 peptide in wild-type, and L452R and L452Q mutated protein forms.The Furin cleavage site R685↓S has been characterized as a 20 amino acid sequence motif that corresponds to the amino acid sequence A672-S691 of the Spike protein (Figure 4A,B) (Wu and Zhao, 2020). The 8 amino acid sequence peptide “SPRRAR↓SV” (S680–V687) serves as the core region of the motif, while two flanking solvent-accessible regions of 8 amino acids (A672-N679) and 4 amino acids (A688-S691) long, respectively, are recognized (Takeda, 2021; Wu and Zhao, 2020).Pro-protein Convertase (PC) Furin and/or Furin-like PCs act as sequence-specific proteases and can cleave the Spike protein in a position recognizing the unique, and positively charged by the Arginine, motif “R-x-x-R↓S” (Wu and Zhao, 2020). Since Furin and/or Furin-like PCs are secreted from host cells and bacteria in the airway epithelium, while other PCs, such as PC5/6A and PACE4, exhibit widespread tissue distribution, it is likely that their activities may be critically implicated in the SARS-CoV-2-induced damage and pathology of multiple infected organs (Örd et al., 2020). It seems that Furin's cleavage site essentially contributes to the infection process and disease progression, and offers a powerful target for immunogenetic, antigenic and therapeutic interventions, as strongly supported by the recently developed new antibody against Furin's cleavage site (Braun and Sauter, 2019; Zahradník et al., 2021; Wu et al., 2020).Most importantly, the SARS-CoV-2 Delta variant that carries the critical mutation P681R seems to be more infectious and pathogenic than the wild-type virus form, while the importance of this mutation has very recently begun to be recognized (Wu et al., 2020). Replacement of Proline (P) with Arginine (R) at position 681 causes the loss of amino acid sequence uniqueness that characterizes the wild-type “PRRARSV” C/H–CrUP and likely increases the possibility of Furin's cleavage site (core region) to be significantly stabilizing its conformation, thus facilitating a more efficient Spike protein cleavage process by the Furin protease (Whittaker, 2021; Callaway, 2021).To the same direction, novel SLiMs, such as “SRRR”, “RRR”, “RRRAR” and “RRRARS”, can be produced by the mutant C/H-CrUPs, which may act as specific targets of other than Furin PCs, thereby enabling the stronger (and quicker) binding of the mutant virus to its host ACE2 receptor, which likely leads to a comparatively more generalized infection and massive mutant virus production (Table 6) (Shorthouse and Hall, 2021; Davey et al., 2015). This finding seems to be evidenced by the remarkable increase of the total number of motifs created by the P681R mutation identified within the human proteome (Table 6). Of note, the mutant C/H–CrUP-derived new SLiMs, in the SARS-CoV-2 Delta variant, could render Spike protein antigenically weak or defective, fostering it to lose its capacity to serve as antibody target and thus promoting the virus immune escape (Davey et al., 2015; Almehdi et al., 2021).
Table 6
Small Linear Motifs (SLiMs) of wild-type C/H-CrUPs and C/H-CrUPs created by the critical mutation P681R being detected in human proteome.
The list of SLiMs of wild-type and mutant C/H-CrUPs produced by the critical mutation P681R in SPIKE_SARS2, and being detected in the human proteome, are presented. Green block indicates the C/H–CrUP in wild-type protein; blue block denotes the mutant C/H–CrUP peptide derived from the P681R mutation; yellow block descibed the newly created C/H–CrUP by the same mutation. X (in red color) is used for the position within the peptide to create the motif. In the third column, the detected motif is recorder, and is followed by the Protein Entry-ID and the protein name it is detected in. "Total" summarizes the time for which the motifs related to C/H–CrUP are recorded in the human proteome.
Small Linear Motifs (SLiMs) of wild-type C/H-CrUPs and C/H-CrUPs created by the critical mutation P681R being detected in human proteome.The list of SLiMs of wild-type and mutant C/H-CrUPs produced by the critical mutation P681R in SPIKE_SARS2, and being detected in the human proteome, are presented. Green block indicates the C/H–CrUP in wild-type protein; blue block denotes the mutant C/H–CrUP peptide derived from the P681R mutation; yellow block descibed the newly created C/H–CrUP by the same mutation. X (in red color) is used for the position within the peptide to create the motif. In the third column, the detected motif is recorder, and is followed by the Protein Entry-ID and the protein name it is detected in. "Total" summarizes the time for which the motifs related to C/H–CrUP are recorded in the human proteome.
Analysis of C/H-CrUPs around the ACE2 receptor site
An important issue for viral infectivity and pathogenesis is the receptor recognition and binding of the virus to the host cell surface. SARS-CoV-2 belongs to the β coronavirus group and, like SARS-CoV, uses the same cellular receptor, the Angiotensin-Converting Enzyme 2 (ACE2) (Walls et al., 2020; Wang et al., 2020). The SARS-CoV-2 Spike protein attaches to ACE2 receptor by a Receptor-Binding Domain (RBD) defined in the Spike protein from positions F318 up to F541 (Shang et al., 2020b). Nowadays, this region has received great attention, as it seems to be the target of antibodies against the virus and other therapeutic interventions (Chen et al., 2021; Zahradník et al., 2021; Hastie et al., 2021). Additional studies have shown that from the amino acid residue W436 up to the Q506 one the RBD contains the Receptor-Binding Motif (RBM), which carries 12 contact positions with ACE2 (Hatmal et al., 2020). Mutation analysis revealed that in 10 positions of the RBD region 13 mutations were described (Figure 4 and Table 7). In RBM, 10 mutations in 6 sequence positions were reported for different virus variants (Table 7), while from the 10 contact positions only the P501Y in Alpha, Beta, Gamma and Mu variants was found to be mutated.
Table 7
C/H-CrUPs of wild-type and mutant Receptor-Binding Domain (RBD) of SARS-Cov-2 Spike protein.
Novel C/H-CrUPs created by critical mutations in the Receptor-Binding (RBD) domain of SARS-CoV-2 wild-type and mutant Spike protein (SPIKE_SARS2, P0DTC2) amino acid sequence are identified. Peptide number/peptide length is the number of a given length C/H–CrUP around the position. By red color the amino acids in wild-type C/H-CrUPs, which will be modified, and the mutated amino acids in the new C/H-CrUPs are marked. Light blue color indicates the peptides which disappear from the wild-type viral proteome by the mutation, yellow color shows the completely new created C/H-CrUPs peptides by the mutation.
C/H-CrUPs of wild-type and mutant Receptor-Binding Domain (RBD) of SARS-Cov-2 Spike protein.Novel C/H-CrUPs created by critical mutations in the Receptor-Binding (RBD) domain of SARS-CoV-2 wild-type and mutant Spike protein (SPIKE_SARS2, P0DTC2) amino acid sequence are identified. Peptide number/peptide length is the number of a given length C/H–CrUP around the position. By red color the amino acids in wild-type C/H-CrUPs, which will be modified, and the mutated amino acids in the new C/H-CrUPs are marked. Light blue color indicates the peptides which disappear from the wild-type viral proteome by the mutation, yellow color shows the completely new created C/H-CrUPs peptides by the mutation.
C/H-CrUPs around the NF9 peptide
The most important region in RBM is the peptide “NYNYLYRLF” (from 448 to 456 position). This Tyrosine (Y)-enriched peptide contains two contact site (Y449 and Y453) and it is known as the NF9 peptide (Motozono et al., 2021). It seems to affect antigen recognition, by being an immunodominant HLA∗24:02-restricted epitope identified by the CD8+ T-cells. Furthermore, NF9 stimulation also increases cytokine production by the CD8+ T-cells, such as IFN-γ, TNF-α and IL-2 (Kared et al., 2021). Analysis of C/H-CrUPs that are being associated with the NF9 peptide showed that it contains 3 UPs (Figure 5D,E, and Table 7). Mutation analysis indicated that in the NF9 peptide the mutation L452R is carried by the variants Alpha, Delta, Lamda and Kappa, while the mutation L452Q appears in the variant Lambda. Further analysis unveiled that these mutations are observed in the amino acid that resides at position 5, exactly in the middle of the peptide, creating 3 and 4 new C/H CrUPs, respectively (Table 8). These mutations have a dramatic effect in the uniqueness of the NF9 peptide(s). Namely, the 6 amino acid length C/H-CrUPs “NYNYLY” lose their uniqueness against the human proteome, while only by the mutation L452Q a new CrUP with 5 amino acid length is surprisingly created (Figure 5D,E). The loss of uniqueness of this peptide, which notably is located at the beginning of NF9 peptide, seems to be crucial, as it leads to the loss of antigenic capacity of the NF9 peptide, thus evading the HLA-A24-restricted immunity and inducing the immune escape of the virus. Interestingly, related studies have shown that the L452R mutation (and subsequently the new created C/H-CrUPs herein characterized) increases the infectiveness of SARS-CoV-2, by strengthening the electrostatic interactions of this region on Spike protein with the ACE2 virus receptor (Motozono et al., 2021).
Table 8
NF9-specific C/H–CrUPS.
Τhe C/H-CrUPs in wild-type and mutant NF9 peptide are listed. By red color the mutant amino acids are marked.
NF9-specific C/H–CrUPS.Τhe C/H-CrUPs in wild-type and mutant NF9 peptide are listed. By red color the mutant amino acids are marked.Hitherto, epidemiological data indicated that the dominant variant of SARS-CoV-2 is the Delta variant (Micochova et al., 2021). Under the light of the aforementioned findings, variant's enhanced pathogenicity seems to be the outcome of the simultaneous presence (accumulation) of two critical mutations, the L452R and P681R ones, in Delta variant. The mutation L452R, through the loss of NF9 peptide uniqueness, causes virus immune escape and strong(er) binding of the virus to its cognate receptor, while at the same time the mutation P681R facilitates the Spike protein cleavage process by different proteases, inducing a generalized infection and a massive virus release. Therefore, the Delta variant gains a significant advantage of escape from the immune system per se, as well as from the vaccination-induced immunity, together with an increased infectiveness as a result of virus entrance into the host cell, and an increase of virus formation and its massive release.
Conclusion
Since mutations outside the Spike protein locus in SARS-CoV-2 coronavirus genome have not been yet completely mapped, in a systematic manner, our study importantly reveals novel and useful information of all the remaining, Spike protein-independent, C/H-CrUPs that seem to hold strong promise and open new therapeutic windows for the Covid-19 pandemic. Finally, the approach of virus-host UP-specific signature identification could prove a useful tool for the elucidation of virus infectiveness, prevention of virus immune escape, domination of pathogenic variants, and identification of new antigenic and pharmacological targets.
Materials and methods
Methods
A new bioinformatics tool that has been recently built on an advanced big-data algorithm was herein developed to extract CrUP collections from proteomes of interest and, thereby, create organism-specific Uniquomes. The user can specify the min and max peptide lengths that the tool will analyze. The tool will split each protein to all possible peptides of length min to length max, thus generating a very large set of peptides (for a protein of length L with a window of size W, a set of “C = L - W + 1” will be generated). In the next step, all these peptides, starting from smallest and ending to largest, will be searched against the rest of the proteome to decide whether the peptide exists on another protein or not. Since the search is dedicated for the smallest possible peptide (Core Unique Peptide: CrUP), the tool will first make sure that the peptide under examination does not already contain a smaller CrUP. This is ensured by examining if any of the already identified CrUPs of the protein is contained within the peptide under examination. All peptides that conform to these two rules are considered as CrUPs. Figure 6 describes the algorithm we have herein developed and used to recognize these novel CrUPs.
Figure 6
Schematic presentation of the algorithm herein developed for the identification of Core Unique Peptides (CrUPs).
Schematic presentation of the algorithm herein developed for the identification of Core Unique Peptides (CrUPs).In Figure 7, a sliding window of 9 amino acids is applied on O00400 ACATN_HUMAN protein, generating the candidate peptides “VYVKNFGRR” and “YVKNFGRRK”. These peptides will be searched against the rest of the proteome, to determine their uniqueness once we have ensured that they do not already contain a smaller CrUP. The latter is determined by examining whether an already defined CrUP is included within the peptide.
Figure 7
Presentation of the bioinformatic process developed for the identification of the CrUPs peptides, performed amino acid by amino acid residue.
Presentation of the bioinformatic process developed for the identification of the CrUPs peptides, performed amino acid by amino acid residue.To address the question of the present study, the aforementioned tool was expanded by developing a new feature, where the user can give a reference and a target proteome. This new feature allows the tool to search all the peptides of the target proteome against the reference proteome, thus creating a set of CrUPs of target versus reference proteome. To this direction, the tool (similar to the initial implementation) will split all proteins in the target proteome to all possible peptides of length min to length max. Now, instead of searching for the uniqueness of each peptide within the same proteome, it performs that search against the reference proteome. Like before, the peptide under examination must not contain any smaller peptides already identified as CrUPs. The algorithm we have employed to identify these CrUPs is described diagrammatically in Figure 6.
Motifs and SLiMs search
For Motif and SLiM identification, and search, the tool offers the user the ability to perform a motif search to identify putative SLiMs. User gives an N-length peptide, as well as the number of amino acids that can vary in the given peptide. Then, the tool creates all possible combinations of peptides that can be produced by considering in each combination exactly N-amino acid(s) as unknown. Once these combinations are produced, an exhaustive search using regular expressions is performed against the reference proteome, to locate all possible proteins containing such peptides. To better highlight the process, if the user provides the peptide “TQYILG” and N = 2, the following combinations will be generated:??YILG?Q?ILG?QY?LG?QYI?G?QYIL?T??ILGT?Y?LGT?YI?GT?YIL?TQ??LGTQ?I?GTQ?IL?TQY??GTQY?L?TQYI??User will receive a list of all proteins containing peptides that match the criteria, including the motif against which the peptide was matched, and all the positions within the protein sequence where that peptide can be found. All proteomes were taken from the UNIPROT database.
Algorithm's application to the identification of virus CrUPs against human proteome
To recognize all the CrUPs being embraced in a virus proteome against the human proteome, we in silico constructed a new, artificial, “hybrid-proteome” that contained all the reviewed human proteins (20.430 proteins), plus the one protein derived from the viral proteome (20.431 proteins). Thereby, n “hybrid proteomes”, including the n viral proteins, were constructed, with n representing the number of viral proteins. Hence, these “hybrid proteomes” were bioinformatically searched one by one for the identification of virus-specific CrUPs in human protein sequence environments.
Databases
All proteomes and proteins were obtained from UNIPROT [https://www.uniprot.org]. SARS-CoV-2 wild-type and variant/mutated sequences derived from Stanford COVID database [https://covdb.stanford.edu/page/mutation-viewer/]. Motifs were taken from the Eukaryotic Linear Motif resource for Functional Sites in Proteins [http://elm.eu.org/index.html] and KEGG/GenomeNet/MOTIF2 [https://www.genome.jp/tools/motif/MOTIF2.html]. SLiM-containing proteins were taken from Davey lab SLiM servers (The Institute of Cancer Research {ICR}, UK) [http://slim.icr.ac.uk/slimsearch/] and [http://slim.icr.ac.uk/index.php?page=tools].
Declarations
Author contribution statement
Vasileios Pierros: Conceived and designed the experiments; Performed the experiments; Contributed reagents, materials, analysis tools or data.Evangelos Kontopodis: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.Dimitrios J. Stravopodis, George Th. Tsangaris: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Data availability statement
Data included in article/supp. material/referenced in article.
Declaration of interest’s statement
The authors declare no conflict of interest.
Additional information
No additional information is available for this paper.
Authors: Guido Papa; Donna L Mallery; Anna Albecka; Lawrence G Welch; Jérôme Cattin-Ortolá; Jakub Luptak; David Paul; Harvey T McMahon; Ian G Goodfellow; Andrew Carter; Sean Munro; Leo C James Journal: PLoS Pathog Date: 2021-01-25 Impact factor: 6.823
Authors: Ma'mon M Hatmal; Walhan Alshaer; Mohammad A I Al-Hatamleh; Malik Hatmal; Othman Smadi; Mutasem O Taha; Ayman J Oweida; Jennifer C Boer; Rohimah Mohamud; Magdalena Plebanski Journal: Cells Date: 2020-12-08 Impact factor: 6.600