Samantha A Yost1, Joseph Marcotrigiano. 1. Center for Advanced Biotechnology and Medicine, Department of Chemistry and Chemical Biology, Rutgers University, 679 Hoes Lane West, Piscataway, NJ 08854, United States.
Abstract
Many viruses use a replication strategy involving the translation of a large polyprotein, which is cleaved by viral and/or cellular proteases. Several of these viruses severely impact human health around the globe, including HIV, HCV, Dengue virus, and West Nile virus. This method of genome organization has many benefits to the virus such as condensation of genetic material, as well as temporal and spatial regulation of protein activity depending on polyprotein cleavage state. The study of polyprotein precursors is necessary to fully understand viral infection, and identify possible new drug targets; however, few atomic structures are currently available. Presented here are structures of four recent polyprotein precursors from viruses with a positive sense RNA genome.
Many viruses use a replication strategy involving the translation of a large polyprotein, which is cleaved by viral and/or cellular proteases. Several of these viruses severely impact human health around the globe, including HIV, HCV, Dengue virus, and West Nile virus. This method of genome organization has many benefits to the virus such as condensation of genetic material, as well as temporal and spatial regulation of protein activity depending on polyprotein cleavage state. The study of polyprotein precursors is necessary to fully understand viral infection, and identify possible new drug targets; however, few atomic structures are currently available. Presented here are structures of four recent polyprotein precursors from viruses with a positive sense RNA genome.
The regulatory mechanisms of viral gene expression vary throughout the virus world. One widely used mechanism employed by evolutionarily divergent RNA and DNA viruses alike organizes multiple genes into a single open reading frame. Most DNA viruses use extensive alternative splicing to express the viral proteins as needed; although, there are exceptions (ex. African swine fever virus [1]). Almost all retroviruses and RNA viruses use the strategy of translating an open reading frame as a large, precursor polyprotein that is further cleaved by viral or cellular proteases in a highly regulated manner.Viruses that regulate gene expression by polyprotein processing represent an important group including several important human pathogens such as HIV, poliovirus, rhinovirus, Dengue virus, hepatitis C virus, West Nile virus, Chikungunya virus, and SARS coronavirus. Encoding and translating large viral polyprotein precursors is a strategy with many benefits for the virus lifecycle. First, polyproteins allow for a more compact genome by eliminating additional genetic features, such as promoter or enhancer elements, that would be necessary to express each protein individually. Additionally, differential cleavage site usage is coordinated to regulate activity. Lastly, proteins can sometimes perform alternative functions in their precursor forms versus their mature forms.Given the wide prevalence of polyprotein processing in many viral families, there are surprisingly few structures of precleavage polyprotein intermediates available. Structure of viral proteins before and after cleavage is essential to provide a greater understanding of the regulation, function, and organization of the viral genome. Here we review four recent atomic structures of polyprotein precursors.
Precursors from viral genome replication proteins
Alphavirus P23
Alphaviruses are enveloped viruses of the Togaviridae family whose genome consists of a positive-sense RNA of ~9–11 kb with a 5′ cap structure and 3′ polyadenosine tail (Figure 1
a). The genome contains two cistrons: the first encodes the viral replication machinery, or nonstructural proteins (nsPs), and the latter contains the structural proteins that form the virion particle. The alphavirus replication machinery is composed of four nonstructural proteins (nsP1 to nsP4), which are expressed as one of two polyproteins (P123 or P1234) caused by read through of an opal codon at the end of nsP3. The nonstructural precursor polyproteins are cleaved in a highly regulated manner by a protease at the C-terminus of nsP2 [2, 3]. Shortly after translation of P1234, cleavage at the P3/4 junction occurs either in cis or trans, followed by the P1/2 junction in cis only [3, 4]. Cleavage intermediates P123 + nsP4 and nsP1 + P23 + nsP4 preferentially use the genomic RNA to synthesize a negative-sense, intermediate RNA [5, 6]. The final cleavage event between P23 yields fully mature nsPs and switches the template to the negative-sense RNA to synthesize positive-sense genomic and subgenomic RNAs.
Figure 1
Alphavirus genome organization and P23 structure. (a) RNA genome organization of Sindbis virus with 5′ cap and 3′ poly(A) tail. Polyprotein cleavage sites are denoted with an asterisk (nsP2 protease), solid black arrow (Sindbis virus capsid autoprotease), solid black arrowheads (cellular signalase), and hollow arrow (cellular furin). (b) P23 structure (PDB ID: 4GUA) highlighting the nsP2 protease (blue) and methyltransferase-like (cyan) domains and nsP3 macro (yellow) and zinc binding (red) domains. The protease active site is denoted with a green asterisk, zinc ion with a gray sphere, and the P2/3 cleavage site with an arrow. (c) nsP3 macro and zinc binding domains of the P23 structure shown in ribbon format, with coloring similar to panel B. The extended linker between the two nsP3 domains creates a ring-like structure which is fitted around nsP2.
Alphavirus genome organization and P23 structure. (a) RNA genome organization of Sindbis virus with 5′ cap and 3′ poly(A) tail. Polyprotein cleavage sites are denoted with an asterisk (nsP2 protease), solid black arrow (Sindbis virus capsid autoprotease), solid black arrowheads (cellular signalase), and hollow arrow (cellular furin). (b) P23 structure (PDB ID: 4GUA) highlighting the nsP2 protease (blue) and methyltransferase-like (cyan) domains and nsP3 macro (yellow) and zinc binding (red) domains. The protease active site is denoted with a green asterisk, zinc ion with a gray sphere, and the P2/3 cleavage site with an arrow. (c) nsP3 macro and zinc binding domains of the P23 structure shown in ribbon format, with coloring similar to panel B. The extended linker between the two nsP3 domains creates a ring-like structure which is fitted around nsP2.The structure of an uncleaved P23 precursor protein from Sindbis virus spanning the nsP2 protease domain through to the central, zinc-binding domain (ZBD) of nsP3 (P23pro-zbd) has been recently determined [7]. P23pro-zbd shows a solvent exposed but inaccessible P2/3 cleavage site located at the base of an 11–13 Å wide cleft (Figure 1b) [7]. The nsP2 protease active site is 40 Å away from the P2/3 cleavage site, which supports a trans only cleavage mechanism [4, 7••, 8, 9]. Attempts to model the nsP2 protease onto the P2/3 cleavage site resulted in steric clashes, indicating that accessibility of the cleavage site is highly regulated by an unknown mechanism. P23 has an extensive buried surface area of 3000 Å2 due to a ring-shape structure created by nsP3, which encircles nsP2 (Figure 1c). Several previously described mutations in nsP2 that confer a noncytopathic phenotype [10, 11, 12, 13, 14] map to the surface of nsP2 and make contact with nsP3, implicating this interaction in viral pathogenesis. During the alphavirus lifecycle, nsP2 localizes to the nucleus; however, the mechanism by which nsP2 dissociates from nsP3 post cleavage is unknown. A potential RNA binding surface, which extends over the surface of both nsP2 and nsP3 has been proposed. It is hypothesized that the RNA binding surface may be altered after cleavage, resulting in the RNA template switching that occurs after P23 cleavage.
Poliovirus 3CD
Poliovirus is a nonenveloped virus of the Picornaviridae family with a positive-sense RNA genome of ~7–9 kb. Poliovirus RNA lacks a 5′ cap structure and uses an internal ribosomal entry site (IRES) for translation initiation [15, 16]. The genome contains a single, open reading frame encoding both structural and nonstructural genes as a large polyprotein [17, 18, 19] (Figure 2
a). For simplicity, the proteins are grouped into viral capsid proteins (P1), and proteins involved in viral genome replication and protein processing (P2 and P3). Cleavage of the polyprotein is performed by virus-encoded proteases 2A, 3C, or 3CD precursor. 3CD polyprotein is made up of the 3C protease and the viral RNA polymerase (3D) that is responsible for genome replication. The efficiency of 3C protease and 3D polymerase activity is altered depending on the cleavage state. The 3CD polyprotein preferentially cleaves the P1 precursor compared to mature 3C alone [20, 21] while the 3D polymerase activity is inhibited in the precursor [22]. Furthermore, 3CD promotes RNA synthesis by a redistribution of cellular Arf proteins involved in membrane trafficking, an activity not triggered by the cleaved forms [23, 24•]. These activities highlight the importance of polyprotein cleavage as a mechanism of gene regulation.
Figure 2
Organization of poliovirus genome and 3CD structure. (a) The RNA genome organization of poliovirus with 5′ covalently linked viral protein VPg and 3′ poly(A) tail. Polyprotein cleavage sites are denoted by an asterisk (autoprotease), solid black arrow (3C or 3CD protease), and a hollow arrow (2A protease). (b) Ribbons representation of the 3CD structure (PDB ID: 2IJD) with a green asterisk indicating the 3C protease active site and an arrow showing the cleavage site.
Organization of poliovirus genome and 3CD structure. (a) The RNA genome organization of poliovirus with 5′ covalently linked viral protein VPg and 3′ poly(A) tail. Polyprotein cleavage sites are denoted by an asterisk (autoprotease), solid black arrow (3C or 3CD protease), and a hollow arrow (2A protease). (b) Ribbons representation of the 3CD structure (PDB ID: 2IJD) with a green asterisk indicating the 3C protease active site and an arrow showing the cleavage site.The atomic structure of 3CD resembles beads-on-a-string with 3C and 3D connected by a flexible, solvent exposed linker region containing the cleavage site [25] (Figure 2b). Overall, the structures of 3C and 3D alone are very similar to their counterparts in the 3CD precursor. In sharp contrast to alphavirus P23, 3C and 3D have no intramolecular contacts. The 3CD structure supports a trans cleavage mechanism due to the location of the protease active site and the cleavage site [7••, 25••, 26]. In the post cleavage structure of 3D, the N-terminal residues fold inward and insert into a binding pocket where they participate in several hydrogen bonding interactions, while these residues are extended in the 3CD structure. The rearrangement of the N-terminal residues of 3D post cleavage is proposed to be the mechanism for activation of the 3D polymerase activity [25].
Precursors from viral structural proteins
Flavivirus prM-E
Flaviviruses are enveloped viruses with a positive-sense RNA genome of ~11 kb. The flavivirus structural and nonstructural genes are contained within the same open reading fame and are translated as a large polyprotein (Figure 3
a). Both viral and cellular encoded proteases are involved in specific cleavage of the polyprotein. The surface of the virion is composed of the envelope (E) and membrane (M) proteins [27]. E interacts with cellular receptors and contains a loop responsible for membrane fusion [27, 28]. M is expressed as a precursor protein, prM, which prevents premature fusion by E and is cleaved by furin during egress to release the pr fragment, resulting in a mature particle. In addition to its role in virion morphogenesis, the prM precursor may assist in proper folding of the E protein [29, 30].
Figure 3
Genomic organization of Flavivirus genome and the structure of prM-E. (a) The genome organization of flavivirus shown with a 5′ cap and 3′ OH. Polyprotein cleavage sites are denoted by an asterisk (unknown protease), solid black arrow (NS2B-NS3 protease), solid black arrowhead (cellular furin), and a hollow arrow (cellular signalase). (b) Structure of prM-E (PDB ID: 3C5X) highlighting pr (red) with E domains DI (blue), DII (green), and DIII (yellow). The fusion loop of E protein is shown in black. The M protein and linker region, which are missing in the structure, are represented as a dotted line.
Genomic organization of Flavivirus genome and the structure of prM-E. (a) The genome organization of flavivirus shown with a 5′ cap and 3′ OH. Polyprotein cleavage sites are denoted by an asterisk (unknown protease), solid black arrow (NS2B-NS3 protease), solid black arrowhead (cellular furin), and a hollow arrow (cellular signalase). (b) Structure of prM-E (PDB ID: 3C5X) highlighting pr (red) with E domains DI (blue), DII (green), and DIII (yellow). The fusion loop of E protein is shown in black. The M protein and linker region, which are missing in the structure, are represented as a dotted line.The structure of a dengue virus prM-E polyprotein was published in 2008 (Figure 3b) [31]. The construct included prM residues 1 through 91, an eight amino acid linker in place of a transmembrane region, E residues 1 through 394, and a mutated furin cleavage site between pr and M. E contains three distinct domains, termed DI, DII, and DIII, with the fusion loop located at the tip of DII [32, 33, 34, 35, 36, 37]. The pr peptide has a novel beta barrel fold positioned at the distal end of DII, blocking the fusion loop, consistent with its role in preventing membrane fusion [31••, 38]. Similar inaccessibility of the fusion loop in immature virus has also been described in Chikungunya structural precursor polyprotein p62 (discussed by Vaney, Duquerroy, and Ray in current issue of Current Opinion in Virology) [39]. Cryoelectron microscopy shows that immature flaviviruses at neutral pH have a ‘spiky’ appearance, while mature flaviviruses have a ‘smooth’ surface [40, 41, 42, 43]. These appearances are altered depending on the oligomeric state of immature prM-E or mature M and E proteins. The hinge angle between DI and DII of E alone is very similar to the hinge angle of the prM-E heterodimer [31••, 37]. However, the difference in the angle between DI and DII in immature versus mature virus is 23°, which may determine the oligomeric state of E. Fitting the X-ray structure to cryoEM density shows that immature prM-E at neutral pH would prevent furin cleavage of prM due to steric hindrance [31]. During egress, immature virions are transported through the trans Golgi Network where exposure to more acidic conditions causes a conformational change in prM-E, allowing furin access to the cleavage site [44].
HIV Gag
The Retroviridae genome consists of two copies of positive-sense RNA, which is reverse-transcribed early during infection. The resulting DNA is inserted into the host genome and used for production of viral RNA and proteins as infection progresses [45]. Retroviruses vary in the number of genes encoded, but they all have at least four open reading frames (gag, pro, pol, and env), which may or may not be in the same reading frame. HIV is a retrovirus whose genome encodes nine (sometimes 10) genes. The gag gene is located at the 5′ end of all retrovirus genomes and is translated as a polyprotein precursor containing three distinct domains: matrix (MA), capsid (CA), and nucleocapsid (NC) (Figure 4
a) [46]. The Gag precursor is associated with the inner surface of the cell's plasma membrane, where it participates in forming the immature virion and selectively packaging viral RNA [45, 46, 47]. Cleavage of precursor protein causes mature CA proteins to organize into cone-like structures and ultimately leads to maturation of the virus [48].
Figure 4
HIV Gag polyprotein organization and structure. (a) Schematic representation of the domain organization of HIV-1 Gag polyprotein. Cleavage sites by viral PR protein are denoted with a solid, black arrow. (b) N-terminal 283 residues of the Gag polyprotein (PDB ID: 1L6N) including MA (red) and the N-terminal portion of CA (blue). The post cleavage form of CA (PDB ID: 1GWP) is shown in gray with a box highlighting the N-terminal β hairpin formed after protease digestion. The cleavage site is noted with an arrow.
HIV Gag polyprotein organization and structure. (a) Schematic representation of the domain organization of HIV-1Gag polyprotein. Cleavage sites by viral PR protein are denoted with a solid, black arrow. (b) N-terminal 283 residues of the Gag polyprotein (PDB ID: 1L6N) including MA (red) and the N-terminal portion of CA (blue). The post cleavage form of CA (PDB ID: 1GWP) is shown in gray with a box highlighting the N-terminal β hairpin formed after protease digestion. The cleavage site is noted with an arrow.To discern differences between immature and mature virion structures, the NMR structure for the N-terminal portion of immature HIV-1Gag, including MA and the N-terminal portion of the CA domain (CAN), was determined (Figure 4b) [48]. Structures of CAN in the immature polyprotein and mature post cleavage form demonstrate that the N-terminus undergoes a disorder to order transition upon cleavage, resulting in the formation of a β-hairpin, while immature and mature MA structures do not show significant changes. Integral residues located in the CAN hairpin region have previously been shown to prevent proper capsid core formation, leading to formation of noninfectious virus [49, 50, 51]. Therefore, cleavage triggers the formation of the β-hairpin in CAN, and may be responsible for proper capsid core particle assembly.
Conclusions
Presented here are four recent atomic structures of viral polyprotein precursors. Several underlying mechanisms for viral gene regulation can be gleaned from these structures. The first involves accessibility of the polyprotein cleavage site to control the gene's activity, function, and location at specific times post infection. Cleavage of these polyproteins is regulated by different methods. The alphavirus P2/3 cleavage site is solvent exposed but not readily accessible to the protease and the requirement for cleavage is still unknown. Access to the Poliovirus 3CD cleavage site has been postulated to be dependent on the intermolecular scaffolding thought to form between 3CD molecules [25]. Cleavage of flavivirus prM-E is regulated by the low pH environment found late in viral egress, preventing immature exposure of the fusion loop. Secondly, cleavage events themselves can cause conformational changes in the newly formed termini of the mature, cleaved protein, permitting new functions post cleavage. The 3D polymerase activity is initiated by inserting the N-terminus of the mature protein into a hydrophobic pocket only after cleavage. The N-terminus of HIV CA undergoes the formation a β-hairpin, resulting in maturation of the core particle. Further structures regarding virus particle maturation and the involvement of precursor proteins not discussed here can be found for Chikungunya virus p62 [39], poliovirus VP0 [52], influenza HA0 [53], and human parainfluenza virus 3 fusion protein [54].Viral polyproteins represent a largely unexplored structural aspect of the viral lifecycle, especially those involved in viral genome replication. Despite the prevalence of these proteins throughout virus families, very few viral precursor polyprotein structures are currently available. On the basis of the diversity of the four structures and cleavage site regulation presented here, it is likely that many more regulatory mechanisms are yet to be described. Because of their regulatory roles ranging from genome replication to virion maturation, viral polyproteins will be valuable tools for the design of uniquely targeted drugs.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:• of special interest•• of outstanding interest
Authors: I-Mei Yu; Wei Zhang; Heather A Holdaway; Long Li; Victor A Kostyuchenko; Paul R Chipman; Richard J Kuhn; Michael G Rossmann; Jue Chen Journal: Science Date: 2008-03-28 Impact factor: 47.728
Authors: N Kitamura; B L Semler; P G Rothberg; G R Larsen; C J Adler; A J Dorner; E A Emini; R Hanecak; J J Lee; S van der Werf; C W Anderson; E Wimmer Journal: Nature Date: 1981-06-18 Impact factor: 49.962
Authors: Rachel Van Duyne; Lillian S Kuo; Phuong Pham; Ken Fujii; Eric O Freed Journal: Proc Natl Acad Sci U S A Date: 2019-04-11 Impact factor: 11.205