Since the beginning of the COVID-19 pandemic caused by SARS-CoV-2, millions of patients have been diagnosed and many of them have died from the disease worldwide. The identification of novel therapeutic targets are of utmost significance for prevention and treatment of COVID-19. SARS-CoV-2 is a single-stranded RNA virus with a 30 kb genome packaged into a membrane-enveloped virion, transcribing several tens of proteins. The belief that the amino acid sequence of proteins determines their 3D structure which, in turn, determines their function has been a central principle of molecular biology for a long time. Recently, it has been increasingly realized, however, that there is a large group of proteins that lack a fixed or ordered 3D structure, yet they exhibit important biological activities─so-called intrinsically disordered proteins and protein regions (IDPs/IDRs). Disordered regions in viral proteins are generally associated with viral infectivity and pathogenicity because they endow the viral proteins the ability to easily and promiscuously bind to host proteins; therefore, the proteome of SARS-CoV-2 has been thoroughly examined for intrinsic disorder. It has been recognized that, in fact, the SARS-CoV-2 proteome exhibits significant levels of structural order, with only the nucleocapsid (N) structural protein and two of the nonstructural proteins being highly disordered. The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV. Noteworthy, however, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones play major roles in the functioning and virulence of the virus and are thus promising drug targets for rational antiviral drug design. Presented here is a COVID-19 perspective on the intrinsically disordered proteins, summarizing recent results on the SARS-CoV-2 proteome disorder features, their physiological and pathological relevance, and their prominence as prospective drug target sites.
Since the beginning of the COVID-19 pandemic caused by SARS-CoV-2, millions of patients have been diagnosed and many of them have died from the disease worldwide. The identification of novel therapeutic targets are of utmost significance for prevention and treatment of COVID-19. SARS-CoV-2 is a single-stranded RNA virus with a 30 kb genome packaged into a membrane-enveloped virion, transcribing several tens of proteins. The belief that the amino acid sequence of proteins determines their 3D structure which, in turn, determines their function has been a central principle of molecular biology for a long time. Recently, it has been increasingly realized, however, that there is a large group of proteins that lack a fixed or ordered 3D structure, yet they exhibit important biological activities─so-called intrinsically disordered proteins and protein regions (IDPs/IDRs). Disordered regions in viral proteins are generally associated with viral infectivity and pathogenicity because they endow the viral proteins the ability to easily and promiscuously bind to host proteins; therefore, the proteome of SARS-CoV-2 has been thoroughly examined for intrinsic disorder. It has been recognized that, in fact, the SARS-CoV-2 proteome exhibits significant levels of structural order, with only the nucleocapsid (N) structural protein and two of the nonstructural proteins being highly disordered. The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage of intrinsic disorder is still higher than that of the spike protein of SARS-CoV. Noteworthy, however, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones play major roles in the functioning and virulence of the virus and are thus promising drug targets for rational antiviral drug design. Presented here is a COVID-19 perspective on the intrinsically disordered proteins, summarizing recent results on the SARS-CoV-2 proteome disorder features, their physiological and pathological relevance, and their prominence as prospective drug target sites.
Entities:
Keywords:
COVID-19; SARS-CoV-2; drug design; intrinsically disordered protein; nucleocapsid protein; proteome; spike protein
For a long time, one of the central principles of molecular biology has been the belief
that the amino acid sequence of each protein determines its three-dimensional structure
which, in turn, determines its function. Recently, it has been increasingly realized,
however, that there is a large group of proteins and protein regions that lack a fixed or
ordered 3D structure, yet they exhibit biological activities—so-called intrinsically
disordered proteins and intrinsically disordered regions (IDPs/IDRs) (Figure ).[1−3] The highly
dynamic disordered regions of these proteins have been linked to important phenomena such as
enzyme catalysis and allosteric regulation and vital physiological functions such as cell
signaling and transcription. They are also key players in the cellular liquid–liquid
phase separation driving the formation of membraneless organelles, allowing the
concentration of biomolecules to increase biochemical reaction efficiency and protection of
nucleic acids or proteins to promote cell survival under stress.[4,5] In viral proteins, disordered regions
have been strongly correlated to the viral infectivity and pathogenicity because they
provide the viral proteins with the ability to easily and promiscuously bind to host
proteins. With the COVID-19 disease caused by infection with the severe acute respiratory
syndrome coronavirus 2 (SARS-CoV-2) rapidly spreading around the world, the search for
correlations between the viral protein disorder and the infection pathway has been
intensified, hoping to find ways to mitigate the effects of the viral infection.
Furthermore, the practice of rational drug design has largely ignored so far the presence of
intrinsic disorder in target proteins. Understanding of the structure of these regions in
the COVID-19 proteome would be a valuable asset for high-throughput screening in drug design
and development for treatment of COVID-19.
Figure 1
Schematic presentation of (A) intrinsically disordered proteins (IDPs), (B)
intrinsically disordered regions (IDRs), and (C) structured proteins.
Schematic presentation of (A) intrinsically disordered proteins (IDPs), (B)
intrinsically disordered regions (IDRs), and (C) structured proteins.In fact, the SARS-CoV-2 proteome has been found to exhibit significant levels of structural
order. Except for the nucleocapsid (N) structural protein and two of the nonstructural
proteins (ORF6 and ORF9b), the majority of SARS-CoV-2 proteins are predominantly highly
ordered proteins, with a few disordered regions.[6] The spike (S) protein
of SARS-CoV-2 exhibits significant levels of structural order, yet its predicted percentage
of intrinsic disorder is still higher than that of the spike protein of SARS-CoV.
Furthermore, the IDRs located in the SARS-CoV-2 proteins appear to be of high functional
importance. Most of the SARS-CoV-2 proteins contain intrinsic disorder-based
protein–protein interaction sites utilized for molecular recognition and interaction
with certain partner proteins. The accumulated knowledge on the SARS-CoV-2 proteome is
important for understanding the virulence of coronaviruses and to find ways to alleviate the
effects of the viral infections.Here, along with a concise review of the common knowledge on IDPs, we present a COVID-19
perspective on the intrinsically disordered proteins, summarizing and analyzing recent
results on the SARS-CoV-2 proteome disorder features and their physiological and
pathological relevance. As IDPs/IDRs typically undergo structural transitions upon attaching
to their physiological associates, such knowledge generates an important base for better
understanding the activity of these proteins, their interactions with host proteins, and
their prominence as prospective drug target sites.
Intrinsically Disordered Proteins
IDPs Lack a Fixed or Ordered 3D Structure
IDPs/IDRs do not exhibit a fixed three-dimensional structure. Instead, they fold
dynamically into a series of conformations depending on the surrounding
conditions.[7−9] This allows them to have
a wide range of binding associates and thus serve significant roles in critical biological
processes such as cell signaling and transcription.[2,10] It has been clear for quite some time now that
IDPs/IDRs are functionally important and abundant in proteins implicated across the
disease spectrum.[11]Whereas ordered proteins usually have a single, well-defined conformation representing a
global free-energy minimum with the potential to bind small molecules with high affinity,
the free-energy landscape of disordered proteins is characterized by a large number of
local minima. These minima correspond to the many conformations within the structural
ensemble populated by disordered proteins, which can transiently bind small molecules with
weak affinity. Examples exist of disordered proteins interacting with other proteins or
nucleic acids in which they undergo disorder-to-order transitions, resulting in low
affinity complexes.[12−15] Due to the difference in their structure, ordered and
disordered proteins exhibit different hydration degrees. The hydration is significantly
higher for the IDPs in comparison to the similar size globular proteins.[16] IDPs also exhibit a high propensity of binding to charged solute
ions.[17,18]The IDPs are abundant in living systems. Eukaryotes generally have the highest amount of
IDPs in their proteomes, estimated between 30 and 45%.[19,20] More than half of eukaryotic proteins have
long regions of disorder. Intrinsic disorder is particularly enriched in proteins
implicated in cell signaling and transcription and are subject of tight control by the
organisms.[21] IDPs play key roles in many biological processes. They
have numerous crucial functions that complement the functionality of ordered proteins.
Highly dynamic disordered proteins have been related to functionally important processes
such as allosteric regulation and enzyme catalysis. For many disordered proteins, the
binding affinity to their receptors is regulated by post-translational modifications. The
flexibility of disordered proteins facilitates the conformational requirements for binding
enzymes and their receptors. IDPs are involved in regulation of transcription and
translation, cellular signal transduction, protein phosphorylation, and the regulation of
the self-assembly of large multiprotein complexes.Many intrinsically disordered proteins undergo transitions to more ordered states or fold
into stable secondary or tertiary structures upon binding to their targets—i.e., a
coupled folding and binding route. Coupled folding and binding often produces a complex
with high specificity and relatively low affinity, which is suitable for signal
transduction proteins that must not only associate specifically to initiate the signaling
process but also be capable of dissociation upon signaling completion. Another advantage
of a system that undergoes coupled folding and binding is that the conformational
flexibility expediates the post-translational modification of important transcription
factors.The difference between ordered and disordered proteins starts at the level of their amino
acid sequences. There are noticeable differences between ordered and disordered proteins
in terms of their amino acid compositions, charge, flexibility, and hydrophobicity.[22] Thus, the tendency of a protein to stay intrinsically disordered is
programed in its amino acid sequence.[23,24] Indeed, IDPs are notably depleted in bulky hydrophobic
(Ile, Leu, and Val) and aromatic amino acid residues (Trp, Tyr, Phe), which normally
create the hydrophobic core of a folded globular protein and also have low content of the
hydrophobic and uncharged Cys and Asn residues. These lessened residues are considered
order-promoting amino acids. On the other hand, natively unfolded proteins are
substantially enriched in polar, hydrophilic, disorder-promoting amino acids: Ala, Arg,
Gly, Gln, Ser, Pro, Glu, and Lys. Many IDPs have a substantial excess of basic or acidic
amino acids and are thus highly charged at neutral pH. The charge destabilizes a compact
structure on such proteins. The relationship between amino acid composition and sequence
and protein order or disorder has been carefully explored in an effort to develop
predictors of intrinsic disorder. At present, there are a multitude of such predictors
based on a variety of amino acid attributes.[25−31]IDPs are suggested to be relevant to the origin of life. Scientists had long hypothesized
that the early genetic code was simpler than it is now. The number of protein-building
amino acids have been supposedly 12 or 14 amino acids rather than 20.[32]
The last of the 20 amino acids to evolve was tryptophan, the largest amino acid and also
the most structure-promoting one. It has been speculated that the earliest proteins have
been disordered and may have played a unique and crucial role in the origin of life. Even
more, it has been suggested that the amino acids of the earliest proteins lacked aromatic
rings, which are required to stabilize the active site of the structured proteins, thus
the outcome is that the later-evolving amino acids may have been key to developing
structure.[32] The evolution of intrinsic disorder has been supposed to
exhibit a wavy pattern, with disordered primordial proteins having mainly chaperone
activities gradually substituted by highly ordered enzymes and protein intrinsic disorder
reinvented at subsequent evolutionary steps along with development of more complex
organisms,[33−35] playing substantial
roles in the evolution of multifunctionality.[36−38] Moreover, the relationship between structural disorder and organism
complexity, along with proteome size, has been discussed, suggesting that structural
disorder may effectively increase the complexity of the species.[39,40]Interest in IDPs in protein science has been rapidly increasing in the years since 2000,
as demonstrated by a search in CAS Content Collection.[41] Currently,
there are over 6000 publications on IDPs-related topics (Figure ).
Figure 2
Annual number and accumulated number of IDP-related publications in CAS Content
Collection.[41]
Annual number and accumulated number of IDP-related publications in CAS Content
Collection.[41]
IDPs in Diseases
IDPs have numerous crucial biological functions that complement the functionality of
ordered proteins. However, when misfunction occurs (e.g., misexpression, misprocessing, or
misregulation), IDPs/IDRs tend to engage into some undesirable interactions and get
involved in the development of various pathological states. As a matter of fact, many
proteins which are associated with neurodegeneration, diabetes, cardiovascular disease,
amyloidosis, and genetic diseases, as well as the majority of the human cancer-related
proteins, are either IDPs or contain long IDRs.[42−48] Recently, IDPs were suggested to be responsible
for exothermic events observed in cerebrospinal fluid and brain proteome. These reversible
exothermic transitions vary depending on neurodegenerative pathologies and possibly
reflect processes of protein fibrillization and/or aggregation.[49,50] Further, it has become clear that
IDPs such as α-synuclein and tau protein are common among neurodegenerative
diseases, especially Parkinson’s disease. It has been recognized that
α-synuclein can form amyloid fibrils, which are implicated with the pathogenesis of
Parkinson’s disease and other neurodegenerative conditions, collectively termed
synucleinopathies.[51] IDPs, such as α-synuclein, tau protein,
p53, and BRCA1, are appropriate targets for drugs modulating protein–protein
interactions. From these and other examples, novel strategies for drug discovery based on
IDPs have been developed.[52]The COVID-19 pandemic has been ongoing for almost 2 years. Although various treatments
have been explored, efficient antiviral drugs are currently still in short supply.
Targeting the IDPs/IDRs in the SARS-CoV-2 proteome could be an alternative strategy for
rational antiviral drug design.
Intrinsically Disordered Proteins in SARS-CoV-2 Proteome and Their Role in COVID-19
Infection
Since the beginning of the COVID-19 pandemic caused by the SARS-CoV-2 virus, millions of
patients have been diagnosed and many of them have died from the disease worldwide. The
identification of novel therapeutic targets are of utmost significance for the prevention
and treatment of COVID-19. The SARS-CoV-2 virus is a single-stranded RNA virus with a 30 kb
genome packaged into a membrane-enveloped virion 80–90 nm in diameter, transcribing
several tens of proteins (Figure ).[53]
Figure 3
Schematic diagram of the coronavirus particle. The viral genomic RNA is assembled with
the nucleocapsid (N) protein, which is enclosed by the lipid bilayer membrane. The viral
surface proteins—spike (S), envelope (E), and membrane (M) proteins—are
embedded in the lipid bilayer envelope.
Schematic diagram of the coronavirus particle. The viral genomic RNA is assembled with
the nucleocapsid (N) protein, which is enclosed by the lipid bilayer membrane. The viral
surface proteins—spike (S), envelope (E), and membrane (M) proteins—are
embedded in the lipid bilayer envelope.SARS-CoV-2 forms a virion including its genomic RNA bundled in a particle comprising four
structural proteins—spike (S) glycoprotein that binds to human angiotensin converting
enzyme 2 (ACE2) receptor to mediate the entry of the virus into the host cell,[54] the membrane (M) protein facilitating viral assembly in the endoplasmic
reticulum,[55,56] the
ion channel small envelope (E) protein,[57] and the nucleocapsid (N)
protein,[58−60] which assembles with viral
RNA to form a ribonucleoprotein complex—the nucleocapsid.[61,62] A recent extensive NMR exploration
provided thorough knowledge on the structure of the SARS-CoV-2 proteins, which is required
for understanding the basic principles of the viral life cycle and processes underlying
viral infection and transmission.[63] Moreover, protocols for the
large-scale production of more than 80% of all SARS-CoV-2 proteins are provided, highly
valuable for further explorations.As it is known that viral proteomes typically contain noticeable levels of intrinsically
disordered proteins and viral proteins utilize intrinsic disorder during host cell invasion,
it is important to examine the intrinsic disorder of the viral proteins associated with the
SARS-CoV-2 infection.[64] Intrinsically disordered proteins and
intrinsically disordered regions play key roles in vital biological processes, including
DNA/RNA and protein binding. The RNA–protein recognition often needs conformational
changes in both RNA and protein, which is facilitated by the structural flexibility of
disordered residues.[65] Knowledge related to coronavirus proteomes from
the viewpoint of the intrinsic disorder propensities can provide a useful outlook for the
viral pathogenicity.Disordered regions in viral proteins are generally associated with the viral infectivity
and pathogenicity, thus it is of certain interest to evaluate the intrinsic disorder within
the SARS-CoV-2 proteome. As a matter of fact, the SARS-CoV-2 proteome has been found to
exhibit significant levels of structural order: except for the nucleocapsid (N) protein from
the structural proteins (plus Nsp8 and ORF6 from the nonstructural proteins[6]), the majority of SARS-CoV-2 proteins are highly ordered proteins containing
a very few intrinsically disordered protein regions.[6] Noteworthy,
however, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the existing ones
contribute significantly to the functioning and virulence of the virus and are thus
promising drug targets for antiviral drug discovery.[6,66,67]
Nucleocapsid (N) Protein Exhibits Large IDRs
The RNA-binding nucleocapsid (N) protein stabilizes the genomic RNA inside the virus
particle and regulates the viral genome transcription, replication, and packaging. It is
composed of two structural domains: N-terminal RNA-binding domain (NTD, residues
45–181) and C-terminal dimerization domain (CTD, residues 248–365), bordered
by three large intrinsically disordered regions (Figure ). The primary role of the RNA-binding N protein in the coronavirus life cycle
is to assemble with the genomic RNA into the viral RNA–protein complex and to
mediate its packaging into 80–90 nm virions.[71]
Figure 4
(A) Structure of the nucleocapsid (N) protein comprising three intrinsically
disordered regions, IDR1, IDR2, and IDR3. (B) N protein disorder prediction using the
protein disorder prediction system (PrDOS)[28,29] (similar disorder predictions were obtained with
other disorder prediction tools).
(A) Structure of the nucleocapsid (N) protein comprising three intrinsically
disordered regions, IDR1, IDR2, and IDR3. (B) N protein disorder prediction using the
protein disorder prediction system (PrDOS)[28,29] (similar disorder predictions were obtained with
other disorder prediction tools).The N protein is highly disordered—its average percentage of predicted intrinsic
disorder according to the various disorder prediction tools[27] has been
estimated as 65%.[6] The IDRs in SARS-CoV-2 nucleocapsid protein comprise
three segments, including residues 1–44 (IDR1), 182–247 (IDR2), and
366–422 (IDR3) (Figure A).[66] Analysis of the N protein disorder propensity using the protein disorder
prediction system (PrDOS)[28,29] indicates a significant degree of disorder in these three domains
(Figure B). The highly flexible intrinsic
disordered linker region, which connects the NTD and CTD, is rich in serine and arginine
residues, both of which are highly disorder-promoting.[24] The middle IDR
is known to be responsible for its RNA-binding activity.[68] An IDR that
borders the CTD (C-terminal tail peptide) plays a significant role in dimer–dimer
association in human coronaviruses, whereas nucleotide binding is largely mediated by the
central NTD core.[69] In fact, molecular recognition domains have been
identified in all three IDRs of the N protein. The RNA–protein recognition requires
conformational changes in both RNA and protein, which is enabled by the structural
flexibility of disordered regions. These findings support the key function of the N
protein in a ribonucleoprotein core formation via interaction with the genomic RNA, which
is a crucial step for RNA encapsulation and virus life cycle. Thus, the IDRs appear as
appropriate targets to inhibit the interaction of N protein with viral genomic RNA.[70]Many IDPs are known to undergo liquid–liquid phase separation into dense
intracellular organelles with a multitude of important cellular functions.[71] Liquid–liquid phase separation is a process by which biomolecules,
such as proteins and/or nucleic acids, condense into a dense phase that often resembles
liquid droplets.[72,73]
It typically occurs within the cell, forming compartments termed membraneless
organelles.[73] Eukaryotes contain numerous such membraneless
compartments, and they are highly dynamic structures that can rapidly form on demand, thus
concentrating proteins and biochemical reactions at distinct locations as needed.
Virtually all such membraneless compartments contain a large proportion of IDPs. Their
lack of defined secondary or tertiary structure and high conformational flexibility match
the dynamic behavior of the membraneless compartments. One type of such membraneless
aggregates—the stress granules—are composed of proteins and RNAs and form
when the cell is under stress and the translation initiation is
limited.[74−77] Here again, it would be noteworthy to examine whether and
how stress enhances or diminishes the degree of intrinsic disorder of the IDPs involved in
the stress granules.It has been shown recently that the N protein of the SARS-CoV-2 virus undergoes
liquid–liquid phase separation into stress granules through its N-terminal
intrinsically disordered region IDR1.[58,78,79] The condensation of the N protein
into stress granules is possibly a way for SARS-CoV-2 to inhibit host cell innate
immunity. A model has been developed in which phase separation of the SARS-CoV-2 N protein
contributes both to suppression of the host immune response and to packaging genomic RNA
during virion assembly.[78] Disruption of the N protein
liquid–liquid phase separation process holds promise for antiviral intervention and
offers new targets and strategies for the development of drugs to combat COVID-19.Recent NMR and crystal studies have confirmed the disordered nature of the N protein
domains, including the NTD[80,81] and CTD[82,83] domains, whereas fluorescence recovery after photobleaching (FRAP)
experiments confirmed the N protein liquid–liquid phase separation,[79] which concentrates components of the SARS-CoV-2 replication machinery,
thus providing a means for enhanced viral transcription and replication.A key to hold back the COVID-19 pandemic is to understand how SARS-CoV-2 manages to
overcome host antiviral defense mechanisms. Stress granules, which are assembled
throughout viral infection and function to sequester host and viral mRNAs and proteins,
are part of the antiviral responses. It has been shown that the SARS-CoV-2 nucleocapsid
(N) protein, an RNA binding protein essential for viral production, interacts with
Ras-GTPase-activating protein SH3-domain-binding protein (G3BP) and disrupts stress
granule assembly, both of which require the intrinsically disordered region 1 (IDR1) in
the N protein. The N protein segregates into stress granules through liquid–liquid
phase separation with G3BP and obstructs the interaction of G3BP with other
stress-granule-related proteins. Furthermore, the N protein IDR domains important for
phase separation with G3BP and stress granule disassembly are found to be essential for
SARS-CoV-2 viral production. It has been suggested that N-protein-mediated stress granule
disassembly is crucial for SARS-CoV-2 production.[84] It is thus implied
that inhibition of the RNA-induced phase separation of the N protein provides a viable new
strategy for the design of COVID-19 therapeutics.[79]
Spike (S) Protein Exhibits Small but Critical IDRs
The spike (S) protein is a large multidomain 1273 amino acid fusion protein creating the
exterior of the CoV particles.[85] It protrudes from the virion and
ornaments the viral surface like a crown. It is anchored in the viral membrane and
mediates fusion of the viral membrane with the host cell membrane.[86]
The spike protein is critical for the entry of the coronaviruses into the host, so it is
an attractive antiviral target. It contains two distinct domains, S1 and S2, each of which
consists of a number of functional subunits (Figure A).
Figure 5
(A) Structure of the SARS-CoV-2 spike protein comprising S1 and S2 subunits;[102] domain arrangement of spike protein: SS, signal sequence; NTD,
N-terminal transactivation domain; RBD, receptor-binding domain; SD, subdomain; FP,
fusion peptides; HR1, heptad repeat 1; CH, central helix; CD, connector domain; HR2,
heptad repeat 2; S1/S2 and S2′, protease cleavage sites; TM, transmembrane
domain; CT, cytoplasmic tail.[70] (B) Spike protein disorder
prediction using the PONDR-VLXT[97,98] (similar disorder predictions were obtained with other disorder
prediction tools).
(A) Structure of the SARS-CoV-2 spike protein comprising S1 and S2 subunits;[102] domain arrangement of spike protein: SS, signal sequence; NTD,
N-terminal transactivation domain; RBD, receptor-binding domain; SD, subdomain; FP,
fusion peptides; HR1, heptad repeat 1; CH, central helix; CD, connector domain; HR2,
heptad repeat 2; S1/S2 and S2′, protease cleavage sites; TM, transmembrane
domain; CT, cytoplasmic tail.[70] (B) Spike protein disorder
prediction using the PONDR-VLXT[97,98] (similar disorder predictions were obtained with other disorder
prediction tools).The spike protein is cleaved by host proteases into the S1 and S2 subunits, which are
responsible for receptor recognition and membrane fusion, respectively: subunit S1
activates viral infection by binding to host cell receptors, and S2 mediates the fusion of
the virion and cellular membranes, thus promoting viral entry into the host
cells.[62,86] The
spike protein binds to a specific surface receptor angiotensin converting enzyme 2 (ACE2)
on the host cell plasma membrane via its N-terminal receptor-binding domain (RBD).[87] The S1 subunit consists of NTD and RBD; the S2 subunit contains a fusion
peptide (FP), a heptad repeat 1 (HR1), a central helix (CH), a connector domain (CD), a
heptad repeat 2 (HR2), a transmembrane domain (TM), and a cytoplasmic tail (CT) (Figure A).[88] Recent experimental
studies confirmed that human ACE2 (hACE2) mediates SARS-CoV-2 S-protein-mediated entry
into cells, establishing it as a functional entry receptor for this newly emerged
coronavirus.[54]The site at the border between the S1 and S2 subunits—the S1/S2 protease cleavage
site—is where spike protein is cleaved into S1 and S2 subunits during entry into
the infected cells. The attachment of the spike protein with the host cell is activated by
the host cell enzymes trypsin, cathepsin L, furin, and TMPRSS2. Comparison of the sequence
of SARS-CoV-2 against other coronaviruses indicates that a unique amino acid sequence
pattern RRAR (Arg-Arg-Ala-Arg) is present at the S1/S2 junction of the spike protein,
which is cleaved by the furin enzyme but is absent in other coronaviruses of the same
clade, including SARS-CoV-1.[70,89] Moreover, the structure reported for SARS-CoV-2 spike protein (PDB
code: 6VSB) shows that the S1/S2
junction is in a disordered, solvent-exposed loop,[90] which is
hypothesized to be responsible for the effective viral transmission.[91,92]Another cleavage event at an additional site inside the S2 subunit, the S2′ site
(Figure A), brings the viral and cellular
membranes together, eventually creating a fusion pore that lets the viral genome reach the
cell cytoplasm. ACE2 binding by the virus exposes the S2′ site. The S2′ site
cleavage by transmembrane protease serine 2 (TMPRSS2) results in the release of the fusion
peptide.[86]In the SARS-CoV-2 S protein, the first S1/S2 cleavage site is at residue
R685[89,93] (or R684
from a different study[64]), whereas the second cleavage site generating
the S2′ subunit is located at residue R816[94] (or
R815[64,95]),
bordering the FP located at residues 816–855.[96] An intrinsic
disorder profile generated for the spike protein of the SARS-CoV-2 virus by the disorder
predictor PONDR-VLXT[97,98] (Figure B) indicates that
both the S1/S2 cleavage site and the FP are located within in IDRs. Considering the notion
that the proteolytic digestion is considerably faster in unstructured relative to
structured protein regions,[45,99] this structural specificity of the SARS-CoV-2 spike protein might be
of high functional importance (Figure B).The S1 subunit of the S protein contains a RBD and an amino-terminal (N-terminal) domain.
The RBD (residues 333–526[100]) has a β-sheet core, bordered
on either side by a short helix, and contains a receptor-binding motif (RBM, residues
438–508[101]), which makes extensive contact with the ACE2
receptor, thus accounting for the interaction with ACE2. The tyrosine-rich RBM, which is
stabilized by disulfide bonds, is also characterized by recognizable structural
flexibility[101]—it does not possess a regular secondary
structure except for the two small β-sheets.[62] During SARS-CoV-2
virus infection, intrinsically disordered regions are detected at the interface of the
spike protein and ACE2 receptor, providing a shape match to the complex. The key residues
of the spike protein have strong binding affinity to ACE2, which can be a likely reason
for the higher transmission rate of SARS-CoV-2.[70]Thus, the receptor binding and its membrane fusion are the initial and important steps in
coronavirus infection and serve as primary targets for inhibiting the viral entry. Both
exhibit regions of substantial intrinsic disorder, supposedly responsible for the viral
cycle and pathogenicity.[70] Altogether, even though the spike protein of
SARS-CoV-2 exhibits significant levels of structural order, its predicted percentage of
intrinsic disorder is still higher than the spike protein of SARS-CoV (cf. 1.41 for
SARS-CoV-2 vs 1.12 for SARS-CoV[6]), which may correlate with the higher
infectivity and pathogenicity of SARS-CoV-2. Moreover, it was recently shown that the
SARS-CoV-2 S glycoprotein exhibits a furin cleavage site at the borderline between the
S1/S2 subunits, which sets this virus apart from other SARS-CoVs, supposedly enhancing its
transmissibility due to the ubiquitous distribution of furin-like proteases in host
cells.[54]A recent study made an advance in approaching anti-COVID-19 drug discovery by
specifically focusing on targeting disordered protein regions in the virus
proteome.[67] It was demonstrated how these IDRs can be targeted
through molecular docking. As a result, 11 new drug candidates were identified that
exhibited high binding and activity scores, as well as good antiviral properties.[67]
M Protein Is Highly Ordered, Which May Contribute to the Greater Virulence of the
SARS-CoV-2
The membrane (M) protein, one of the four structural proteins in the coronavirus virion,
is a major transmembrane protein that is found in large numbers in the virion. It is a
part of the protective proteinaceous layer responsible for the virus survival upon its
transmission between the hosts. Estimates of the intrinsic disorder indicate that
SARS-CoV-2 has one of the hardest protective outer shell among coronaviruses (Figure )—the percentage of intrinsic disorder
of the M protein of only 6% (cf. 8% for SARS-CoV, 9% for MERS-CoV, 11% for HCoV-NL63,
etc.).[103] Structural studies have confirmed the highly ordered
organization of the M protein.[104] Thus, it might be anticipated for the
SARS-CoV-2 virus to be highly resistant to antimicrobial substances in saliva and/or other
bodily fluids, as well as in the environment, outside of the body. It is therefore
expected to remain active for a longer time, which may account for greater contagiousness
of SARS-CoV-2. Indeed, correlation has been confirmed between the virulence of various
viruses and the percentage of intrinsic disorder of their M protein, with the less
disordered viruses being more contagious.[105]
Figure 6
M protein (A) structure and (B) disorder prediction using the protein disorder
prediction system (PrDOS)[28,29]
M protein (A) structure and (B) disorder prediction using the protein disorder
prediction system (PrDOS)[28,29]A model that estimates the percentage of intrinsic disorder of viral proteins
demonstrates that the degree of shell disorder in coronaviruses correlates with the levels
of their fecal–oral and respiratory transmission.[106] The N
protein is also important for the model, as it exhibits the greater disorder in the inner
shell related to the mode of infection and virulence in other viruses.[103] In fact, it seems reasonable that the performance of the virus depends on a delicate
balance between the rigidity of its protective outer shell on one side, with the M protein
as a major constituent, responsible for the virus survival, and the flexibility and
adaptability of the inner core on the other side, especially the N nucleoprotein, upon RNA
encapsulation throughout the virus life cycle, promoting binding to host proteins. Thus,
the M protein orderliness and the N protein disorder seem to contribute to the SARS-CoV-2
high virulence and infectivity. It is also worth noting that, despite that the spike
protein of SARS-CoV-2 exhibits significant levels of structural order, its predicted
percentage of intrinsic disorder is still higher than that of the spike protein of
SARS-CoV,[6] which may also contribute to the higher infectivity and
pathogenicity of SARS-CoV-2.In the group of the nonstructural SARS-CoV-2 proteins, several moderately disordered
proteins—Nsp8, ORF6, and ORF9b—have been identified, as well.[6] The C-terminal region of the nonstructural protein Nsp2 has been predicted
to be disordered.[107] Because it interacts with host proteins, which
regulate translation initiation and endosome vesicle sorting, compounds that block these
interactions could be valuable candidates for drug development. Although the other
proteins exhibit lower disorder content, nearly all contain at least one IDR. Moreover,
intrinsic disorder has been reported at the cleavage sites of replicase protein 1ab of
SARS-CoV-2.[6]Generally, there is a large variation in the distribution of disordered residue fractions
in the viral proteomes, with certain viral species highly enriched in intrinsic
disorder.[40] Examples of viral proteins with a high degree of
intrinsic disorder (>80%) according to the DisProt database[108] are
shown in Table . Indeed, the high content of
intrinsic disorder predicted in viruses agrees with a recent study which showed that viral
proteins were significantly enriched in polar residues and depleted in hydrophobic
residues compared with that of archaea and bacteria,[109] which
correlates with their disorderedness.[110,111] The high intrinsic disorder in viral proteins is
supposedly linked to important functional implications, helping the viruses to highjack
various pathways of the host cells and/or to accommodate to their hostile
habitats.[110,111]
The roles of IDPs/IDRs in certain viral proteins upon viral infections have been explored
but are not yet completely understood, including membrane-binding protein λN of
bacteriophage, horde virus protein TGBp1, influenza virus nonstructural protein 2, basic
protein δAg of hepatitis B virus, and human adenovirus type 5.[1]
Table 1
Examples of Viral Proteins with >80% Intrinsic Disorder According to the
DisProt Database[108]
protein name
organism
antitermination protein N
Escherichia phage lambda
protein E7
human papillomavirus type 16
antitoxin phd
Escherichia phage P1
phosphoprotein
human respiratory syncytial virus A
nuclear export protein
influenza A virus
latent membrane protein 2A
Epstein–Barr virus (strain GD1)
protein Tat
human immunodeficiency virus 1
early E1A protein
human adenovirus A serotype 2
transcriptional repressor arc
Salmonella phage P22
Outline of the Prospective Drug Target Sites
As discussed above, even though IDPs/IDRs are not common in the SARS-CoV-2 proteome, the
existing ones contribute significantly to the functioning and virulence of the virus and are
thus promising drug targets for antiviral drug discovery.[6,66,67] Therefore, it might be
productive to focus on targeting those intrinsically disordered protein regions that lack a
stable structure instead of following the common drug discovery approach along the
conventional protein structure–function paradigm, thus designing drugs to bind to
fixed 3D structures. Moreover, such an approach already has proven valuable in designing new
SARS-CoV-2 drug candidates targeting IDRs.[67]The nucleocapsid (N) protein plays a key function for the ribonucleoprotein core
formation via interaction with the genomic RNA, a critical step in the virus life
cycle. Thus, the three IDRs in the N protein appear as appropriate targets to inhibit
its interaction with the viral genomic RNA.[70]The N protein segregates into stress granules through liquid–liquid phase
separation with Ras-GTPase-activating protein SH3-domain-binding protein (G3BP) and
obstructs the interaction of G3BP with other stress-granule-related proteins. The
process requires active involvement of the intrinsically disordered region 1 (IDR1) of
the N protein. Disruption of N protein liquid–liquid phase separation process
holds promise for antiviral intervention and offers new targets and strategies for the
development of drugs to combat COVID-19.Because of the vital function of the spike (S) protein for the entry of coronaviruses
into the host, it is a reasonable target for inhibition by neutralizing antibodies,
and characterization of the spike protein structure provides strategic information for
rational vaccine design. The receptor binding and its membrane fusion are the initial
and important steps in the coronavirus infection, both of which involve specific
intrinsically disordered regions of the protein as key players and serve as primary
targets for inhibiting the viral entry. Neutralizing antibodies targeting the
SARS-CoV-2 S trimer have exhibited protection from viral infection in animal models
and are currently being assessed as therapeutics in humans.[62] Such
antibodies include human monoclonal antibodies isolated from COVID-19 recovering
donors and single-domain nanobodies. Although some neutralizing antibodies target the
NTD or the S2 domain, most of them bind to the RBD, producing steric hindrance and
thus blocking ACE2 attachment.[84] Recent biophysical modeling
suggested that the SARS-CoV-2 spike protein may provide an adaptable allosteric
mechanism utilized to fine-tune the response to antibody binding, which may be useful
for therapeutic intervention by targeting specific hotspots of allosteric
interactions.[112]Research consortia have been established with the goal to bring together scientists
with different expertise to advance our understanding of COVID-19 and speed-up drug
discovery.[113,114] A COVID-19 NMR research consortium was set up in 2020, aiming to
support the search for antiviral drugs using an NMR-based screening and establishing
protocols for large-scale production of all druggable SARS-CoV-2 proteins and RNAs for
rational drug design and fast mapping of compound binding sites.[63]
Conclusions and Outlook
The notion of the abundance of IDPs and IDRs in proteomes is altering protein science. The
concept of protein intrinsic disorder provides answers for many scientific problems that
cannot be easily explained based on the classic structure–function paradigm.
Functions of IDPs/IDRs can be controlled by multiple processes, such as post-translational
modifications, interaction with various other molecules, etc. The multilevel structural and
functional complexity of IDPs/IDRs makes them particularly sensitive to the environment.
Furthermore, intrinsic disorder may be related to the emergent behavior of several systems
characterized by the presence of specific patterns and can be used to explain the
biochemical reaction–diffusion processes. Understanding how structured and disordered
proteins work in concert is crucial for understanding protein functions.Appearance of novel viruses and associated epidemics around the globe are currently a major
concern. Knowledge on the structures and functions of the viral proteins is of utmost
significance for identification of novel therapeutic targets for prevention and treatment of
the viral diseases.[63] In this Perspective, we summarized information
available on SARS-CoV-2 proteome with regards to the occurrence of intrinsic disorder in
SARS-CoV-2 proteins. It has been recognized that the SARS-CoV-2 proteome exhibits
substantial levels of structural order. From the SARS-CoV-2 proteome, only the nucleocapsid
(N) protein is highly disordered, with an average percentage of predicted intrinsic disorder
of 65%. The spike (S) protein of SARS-CoV-2 exhibits significant levels of structural order,
yet its predicted percentage of intrinsic disorder is still higher than that of the spike
protein of SARS-CoV. Furthermore, although the other structural proteins of SARS-CoV-2 apart
from the N protein are characterized by a low degree of disorder, their existing IDRs
contribute significantly to the functioning and virulence of the virus and are thus
promising drug targets for antiviral drug design. As these IDRs typically undergo structural
transitions upon attaching to their physiological associates, such knowledge generates an
important base for better understanding the activity of these proteins and their
interactions with host proteins in different physiological conditions. The association of
viral and host proteins needs to be explored to find means suitable for limiting
replication, maturation, and ultimately pathogenesis of the viruses. Although structural
biology techniques can be utilized in drug development, the practice of rational drug design
has essentially ignored so far the presence of intrinsic disorder in target proteins.
Understanding of the structure of these regions in the COVID-19 proteome would be a valuable
asset for high-throughput screening in drug development for the disease.
Authors: Bin Xue; Robert W Williams; Christopher J Oldfield; Gerard Kian-Meng Goh; A K Dunker; Vladimir N Uversky Journal: Protein Pept Lett Date: 2010-08 Impact factor: 1.890
Authors: Uros Midic; Christopher J Oldfield; A Keith Dunker; Zoran Obradovic; Vladimir N Uversky Journal: Protein Pept Lett Date: 2009 Impact factor: 1.890
Authors: Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan Journal: Science Date: 2020-02-19 Impact factor: 47.728
Authors: Maik Pietzner; Eleanor Wheeler; Julia Carrasco-Zanini; Johannes Raffler; Nicola D Kerrison; Erin Oerton; Victoria P W Auyeung; Jian'an Luan; Chris Finan; Juan P Casas; Rachel Ostroff; Steve A Williams; Gabi Kastenmüller; Markus Ralser; Eric R Gamazon; Nicholas J Wareham; Aroon D Hingorani; Claudia Langenberg Journal: Nat Commun Date: 2020-12-16 Impact factor: 14.919