Literature DB >> 35857464

Cryo-EM structure of the HIV-1 Pol polyprotein provides insights into virion maturation.

Jerry Joe E K Harrison^1,2,3, Dario Oliveira Passos⁴, Jessica F Bruhn^4,5, Joseph D Bauman^1,2, Lynda Tuberty^1,2, Jeffrey J DeStefano^6,7, Francesc Xavier Ruiz^1,2, Dmitry Lyumkis^4,8, Eddy Arnold^1,2.

Abstract

Key proteins of retroviruses and other RNA viruses are translated and subsequently processed from polyprotein precursors by the viral protease (PR). Processing of the HIV Gag-Pol polyprotein yields the HIV structural proteins and enzymes. Structures of the mature enzymes PR, reverse transcriptase (RT), and integrase (IN) aided understanding of catalysis and design of antiretrovirals, but knowledge of the Pol precursor architecture and function before PR cleavage is limited. We developed a system to produce stable HIV-1 Pol and determined its cryo-electron microscopy structure. RT in Pol has a similar arrangement to the mature RT heterodimer, and its dimerization may draw together two PR monomers to activate proteolytic processing. HIV-1 thus may leverage the dimerization interfaces in Pol to regulate assembly and maturation of polyprotein precursors.

Entities: Chemical

Year: 2022 PMID： 35857464 PMCID： PMC9258950 DOI： 10.1126/sciadv.abn9874

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.957

INTRODUCTION

Many of the gene products of RNA viruses are initially translated as precursor polyproteins that are subsequently cleaved to yield mature proteins that play either structural or enzymatic roles (). Early association events can affect the functions of the polyproteins during viral assembly and maturation. The spatial and temporal regulation of assembly and processing can modulate enzymatic functions; association with other proteins, nucleic acids, cofactors, and substrates; and the subcellular localizations of the components during infection and virus particle formation (, ). In the case of HIV-1 and other retroviruses, premature processing can lead to incomplete assembly and maturation of virions. Given that polyproteins can be long-lived and have distinct sequences at their processing junctions, they may form distinct inhibitor-binding sites from the mature proteins for drug targeting, as exemplified by bevirimat blockage of capsid (CA)–spacer peptide 1 (SP1) processing within Gag (, ). In HIV-1, the structural proteins are translated initially as part of Gag polyprotein, while the enzymes that enable the virus to multiply and spread are synthesized as part of the Gag-Pol precursor polyproteins. Both polyproteins are cleaved by the viral protease (PR), which is synthesized as part of Gag-Pol, during maturation (). Gag is processed to yield the structural proteins of the mature virion—matrix (MA), CA, and nucleocapsid (NC). Pol is processed to produce the viral enzymes PR, reverse transcriptase (RT), and integrase (IN). Gag-Pol is synthesized by translational readthrough; different retroviruses use different read-through mechanisms. HIV-1 uses a translational frameshift mechanism. The efficiency of the frameshifting creates a Gag:Gag-Pol ratio of 20:1 (, , , ), making Gag-Pol a relatively minor constituent of the virion (~100 copies per virion). Proteolytic maturation is initiated by activation of the viral PR while it is still embedded in the Gag-Pol polyprotein. PR is an obligate homodimer, and activation requires dimerization of at least the PR portion of two Gag-Pol molecules. In virions, the mechanism of PR activation and polyprotein processing has been challenging to analyze in large part because purified virions mature asynchronously (). In the absence of in virio data, studies with in vitro–translated polyprotein suggest that activation of PR, which appears to be a slow step in virion maturation, requires cleavage events that occur at the C-terminal end of Gag and within the trans-frame region (TFR or p6*). The TFR/p6* is an unstructured linker of variable length (56 to 68 amino acid residues depending on type or clade) that connects Gag to Pol in the Gag-Pol polyprotein (–). These studies show that the PR embedded in the liberated Pol continues processing polyprotein precursors with similar sluggishness until the mature PR is released, suggesting an architectural similarity of the PR that is embedded in Pol to that in Gag-Pol (). It is worth noting that HIV-1 Pol expressed separately or as a PR-RT fusion [similar to the mature form of the protein found in prototype foamy virus (PFV)] retains PR and RT enzymatic activities both in vitro and in virio (, ). Thus, Pol provides the simplest model system with which to understand the early events of PR activation and virion maturation. A more thorough understanding of Pol and its architectural configuration would yield valuable insights into a poorly characterized aspect of the retroviral replication cycle. We have a detailed understanding of the atomic-level organization of mature processed forms for both the HIV-1 structural proteins (MA, CA, and NC) and the enzymes (PR, RT, and IN), including wild-type (WT) and drug-resistant variants, and we have a good understanding of how they interact with other proteins, nucleic acid substrates, and inhibitors. These structures have helped in the development of antiretroviral drugs (–). There is also useful information about the structures of the Gag polyprotein and the immature virion (–). However, much less is known about the molecular organization of the Pol portion of the Gag-Pol polyprotein or the immature forms of the viral enzymes. This is due, at least in part, to the absence of systems that can be used to efficiently produce and purify large amounts of the HIV-1 Gag-Pol or Pol polyproteins for structural and other biophysical studies. To address these limitations, we have developed a system for efficient production and purification of HIV Pol polyprotein and have determined the structure of the dimeric HIV-1 Pol. The architectural organization of the constituent enzymes helps to explain prior data on Pol activity and provide important insights into viral maturation.

RESULTS

Expression and purification of the Pol polyprotein

Proteolytic degradation of heterologously expressed proteins in bacteria is a common problem that has been difficult to mitigate with genetic approaches and/or media engineering (). We have discovered that proteolytic degradation of HIV-1 Pol (and other proteins) expressed in Escherichia coli cells was highly attenuated in the presence of 50 mM or higher concentrations of Mg2+ and synergistically complemented by low pH (6.0 and below) using a modified recipe of the original lysogeny broth (see Materials and Methods, fig. S1, and table S1 for details) (). Degradation of the HIV-1 Pol polyprotein construct was monitored by Western blot analysis with the anti-IN monoclonal antibody 8E5 antibody (fig. S1) while varying the composition of the growth medium. An HIV-1 Pol construct from the BH10 strain that was found to have favorable properties for biochemical and structural analyses was further optimized through the incorporation of multiple mutations. First, we incorporated an inactivating D25A mutation in the PR, which was necessary to avoid autocatalytic cleavage of the polyprotein. Second, amino acids RT (L560) and IN (F1) were mutated to D/D at the RT/IN cleavage junction to reduce proteolytic degradation and improve solubility. Last, we incorporated an Sso7d tag and an HRV14 3C PR cleavage site at the N terminus of the p6* region (designated HIV-1 Pol-D25A and subsequently referred to as HIV-1 Pol) (Fig. 1A). HIV-1 Pol was purified using nickel affinity followed by HiTrap heparin chromatography and lastly gel filtration using a Superose 6 Increase 10/300 GL column (see Materials and Methods for details). The gel filtration elution peak contained a shoulder, suggesting the presence of structural heterogeneity (which could be either conformational or compositional) in the polyprotein (Fig. 1B). SDS–polyacrylamide gel electrophoresis (SDS-PAGE) analysis revealed a single band (Fig. 1C), and dynamic light scattering (DLS) of the pooled fractions showed that 99.9% of the sample by mass was accounted for by a single species with a hydrodynamic radius of 6 nm, consistent with a single-chain multimeric protein (Fig. 1D). Thus, our optimized bacterial expression conditions and the multistep purification procedure yielded suitable amounts (>3 mg/liter) of pure protein amenable for structural and biochemical studies.

Fig. 1.

Expression and biophysical characterization of HIV-1 Pol.

(A) Annotated schematic of the HIV-1 Pol construct used for this study. (B) Superose 6 Increase gel filtration profile of purified HIV-1 Pol indicating dimeric protein in solution. mAU, milli Absorbance Units. (C) SDS-PAGE analysis of fractions from the gel filtration profile of HIV-1 Pol. (D) Dynamic light scattering (DLS) analysis of HIV-1 Pol. Values refer to hydrodynamic radius (R), percent polydispersity (%Pd), molecular weight (MW-R), percent intensity (%Int) and percent mass (%Mass). The majority of the sample (99.9% mass) is characterized by a radius of ~6 nm with some polydispersity. Peaks 2-3 were detected but are not apparent in the plot.

Expression and biophysical characterization of HIV-1 Pol.

Cryo–electron microscopy structure determination of HIV-1 Pol

Having optimized the expression and purification of HIV-1 Pol, we used cryo–electron microscopy (cryo-EM) for high-resolution structure determination. Cryo-EM excels at deciphering structures of dynamic macromolecules and their assemblies, and it is capable of distinguishing different structural states from a single dataset (). We collected 901 cryo-EM movies of frozen hydrated HIV-1 Pol using a Thermo Fisher Scientific Titan Krios microscope equipped with a K2 direct electron detector (table S2). We processed the dataset using an iterative classification procedure, which yielded a final stack of 27,555 particles and a map that was resolved to 3.8 Å within the homogeneous central region composed of globular protein density (figs. S2 and S3). The cryo-EM map enabled the building of an atomic model that was consistent with the experimental density and showed good statistics (Fig. 2, A and B, fig. S3, and table S2).

Fig. 2.

Cryo-EM structure of the HIV-1 Pol dimer.

(A) The high-resolution cryo-EM map of HIV-1 Pol reveals an asymmetric dimer of RT with no additional density for PR and IN at this display threshold. RTp51L is in gray, and RTp66L is colored and labeled by functional domains. (B) Refined atomic coordinate model labeled and colored accordingly. (C) Domain architecture for both subunits of RT colored according to (A) and (B). Full bars represent the first and last residues resolved in the model. Dashed regions represent the regions of the full-length RT that were not resolved in the model. (D) Root mean square deviation (RMSD) of the high-resolution cryo-EM model compared with mature apo RT (PDBID 1DLO). Blue regions represent regions of the model that are structurally conserved between both models, whereas red regions depict portions of the model that are displaced in Angstroms (Å). Functional domains of RTp66L are labeled, and regions indicated by dashed circles show the greatest variation. The first and last resolved residues and the buried PR cleavage site (F440/Y441) in the RNase H domain of RTp66L are labeled. (E) Alignments of the individual subdomains of RTp66L in Pol compared to mature RT p66 show low RMSD within each subdomain.

Cryo-EM structure of the HIV-1 Pol dimer.

Architecture of RT in the Pol polyprotein

The central core of HIV-1 Pol showed an architecture that is notably similar to mature HIV-1 RT (Fig. 2, A and B). Mature HIV-1 RT is an asymmetric heterodimer containing a p66 subunit, which includes a ribonuclease (RNase) H domain, and a p51 subunit, from which the RNase H domain has been cleaved by HIV-1 PR between positions 440 and 441. Although the entirety of Pol was present in the sample frozen for cryo-EM analysis, as was confirmed by SDS-PAGE (Fig. 1C), only the p66/p51 heterodimer-like RT portion could be resolved within the high-resolution map. This suggested that the remaining polypeptide regions in Pol outside the p66/p51 heterodimer were positionally and/or conformationally variable. The fact that Pol is dimeric in the reconstructed map is consistent with both gel filtration and DLS experiments, both of which suggested that Pol is dimeric in solution (Fig. 1, B and D). In Pol, the N-terminal residues of the “RT p66-like” (RTp66L) and “RT p51-like” (RTp51L) subunits are displaced from their usual positions observed in crystal structures of RT (Fig. 2C) and extend away from the body of the RT. As will be discussed below, the displacement of these residues can be attributed to the presence of PR upstream of the N termini of RTp66L and RTp51L. The uncleaved RNase H domain extending from the Pol RTp51L subunit was not visible in our globally refined structures, suggesting that this component has high conformational flexibility. The ordered density for RTp51L ends after residue 428 of the C terminus of the connection subdomain (Fig. 2D). Many of the available crystal structures of HIV-1 RT p66/p51 show weak density in this region, and the structure is not usually modeled beyond p51 Q428 (www.rcsb.org/uniprot/P03366). The presence of homogeneous density consistent with a heterodimeric p66/p51 structure in Pol suggests that this configuration is produced early in maturation. RT dimerization would stabilize the structure of the RNase H domain in the RTp66L, sequestering the F440/Y441 cleavage site in the folded polyprotein and making it inaccessible for cleavage (Fig. 2D). Conversely, the F440/Y441 site in RTp51L remains exposed to solvent, as suggested by the lack of density corresponding to this region within the high-resolution cryo-EM reconstruction. The exposure of the F440/Y441 site in RTp51L accordingly likely facilitates the cleavage between F440 and Y441 that creates the C terminus of p51 (). Whether the cleavage that frees the IN domain occurs before the RNase H domain of the RTp51L subunit is cleaved is unknown. The structure of the RT dimer in Pol implies that only one RNase H domain is available for cleavage. This suggests a plausible mechanism for maintaining the stoichiometry of the p66/p51 subunits of RT in virions. Although the density following Q428 in RTp51L does not permit additional residues to be modeled reliably, there is an extension of weak but continuous density that threads through a hole in the p66/p51 dimer interface to the side opposite the nucleic acid–binding cleft and into a region that shows unresolved but significant density that could represent the RNase H domain of RTp51L. This density provides the first glimpse of the second RNase H domain in RTp51L (cl2 in figs. S8 and S9).

HIV-1 Pol is active for reverse transcription and RNase H cleavage

Given that RT resembling the mature p66/p51 heterodimer forms the core scaffold of Pol, we examined whether Pol exhibits either of the enzymatic activities of the mature RT protein: The enzyme (i) copies RNA or DNA using its polymerase domain and (ii) degrades the RNA strand of the RNA/DNA hybrids using its RNase H domain. We found that the preformed RT embedded within the Pol polyprotein has both polymerase and RNase H activity on an RNA/DNA template-primer and produces products that are similar to the products produced by mature p66/p51 RT (Fig. 3, A and B). These data broadly suggest that RT within Pol can carry out reverse transcription. However, when a double-stranded DNA hairpin template-primer aptamer () was used as a polymerase substrate, the pause patterns during DNA synthesis differed between mature HIV-1 RT p66/p51 and the RT embedded in Pol (Fig. 3C), suggesting that there are differences in the way the aptamer is engaged by these two RTs. HIV-1 virions containing a PR-RT fusion have been shown to retain PR and RT activities in COS-7 cells, leading to the production of mature infectious virions that were morphologically indistinguishable from WT virions (, ). In the RNase H assays that we performed, no off-target cleavages were seen, although two RNase H domains are present in Pol (Fig. 3B). These data suggest that the two main enzymatic activities of mature RT are fully recapitulated in the context of Pol.

Fig. 3.

The RT component of HIV-1 Pol is enzymatically active.

(A) Primer extension and processivity assays using a ~4-kb genomic RNA template (T) derived from the pBKBH10S molecular clone and hybridized to a 5′-P–labeled DNA primer (P) (see Materials and Methods for details). Heparin was included as a trap for the enzymes during the processivity tests. Pol + MBP-IBD indicates the use of a complex between Pol and the IN-binding domain (IBD) of LEDGF expressed as a maltose-binding protein (MBP) fusion protein. Lanes: 1: no RT; 2: trap control, 100 nM RT; 3: full extension, 100 nM RT, 20 min; 4: trapped, 100 nM RT, 10 min; 5: trap control, 280 nM Pol; 6 to 8: full extension, 140, 280, and 560 nM Pol, respectively; 9 to 11: trapped, 140, 280, and 560 nM Pol, respectively; 12: trap control, 280 nM Pol + MBP-IBD; 13 to 15: full extension, 140, 280, and 560 nM Pol + MBP-IBD, respectively; and 16 to 18, trapped 140, 280, and 560 nM Pol + MBP-IBD, respectively. (B) Analysis of RNase H activity. Positions for the 60-nt starting material and primary and secondary RNA cleavages are indicated. (C) Extension of DNA aptamer.

The RT component of HIV-1 Pol is enzymatically active.

Architecture of PR in the Pol polyprotein

Class averages derived from the cryo-EM data revealed the presence of indistinct and weak density surrounding the homogeneous region attributed to the core RT p66/p51 heterodimer (Fig. 4A). This weak density was present in most of the two-dimensional (2D) class averages but was poorly defined in the final reconstructed map (Fig. 2A and fig. S3). Some of this weak density was located proximal to the N termini of the fingers subdomains of both RTp66L and RTp51L. This density was poorly resolved compared to the RT core in this high-resolution map and was only evident at lower map threshold values. Notably, a global classification approach failed to improve the quality of this weak and likely heterogeneous density. To better resolve this region, we instead used a focused classification approach that is implemented within the cisTEM processing package (). This approach works by performing 3D classification on a region of the map constrained by a spherical mask, while the Eulerian angles and translational shifts remain fixed and aligned with the established orientations used in the reconstruction. The advantage of this approach is that it enables recovery of density for structured regions that are delineated by the boundary of the mask, obviating the need for signal subtraction, a process that can introduce errors if the subtracted density is large (as would be expected if the density assigned to RT was subtracted). The workflow and applications of such approaches have been previously described (, ). Using focused classification applied to the region outlined by a spherical mask of radius 35 Å, we found that ~70% of particles used to generate the high-resolution map contained a mass of density in the expected region for PR, as shown in fig. S4. As a control, we also applied a mask with the same radius to a region not expected to contain additional protein mass. As expected, only a negligible amount of density appeared in the mask applied in the control region. The additional density recovered through focused classification within the predominant class (40% of particles, class 2) was improved compared to the density that was recovered using global classification alone (fig. S5) and was consistent with the size and shape of a PR dimer, which could be unambiguously rigid body docked into the resulting map (Fig. 4, B to D). The remaining two classes had either partial or low occupancy and were omitted in the subsequent analysis.

Fig. 4.

RT dimerization within HIV-1 Pol positions PR into a configuration competent for activation.

RT dimerization within HIV-1 Pol positions PR into a configuration competent for activation.

(A) 2D class averages of HIV-1 Pol show weak density proximal to RT, depicted by red arrows and suggestive of structural heterogeneity. (B) A map obtained from focused classification was used for rigid-body docking of mature apo PR dimer (PDB 2HB4) and building the residues linking the PR and RT regions of Pol. (C) Superposition of Pol and apo RT (PDBID 1DLO, magenta). (D) Close-up view of the PR-RT linkers in Pol. Comparison between the N termini of RT (PDBID 1DLO) and the RT portion of Pol shows distinct conformational changes that accompany bringing the PR and RT dimers into proximity. The displacements of the N-terminal residues of RTp51L (~17 Å) and RTp66L (~24 Å) compared with the same residues in mature RT (PDB 1DLO) are depicted by dashed lines. The PR component of Pol is directly adjacent to the RT core, connected to the N-terminal residues of the fingers subdomains of both RTs (PRp66L connects to RTp66L and PRp51L to RTp51L) of the Pol dimer. The resolution of the PR in our structures (see fig. S4) is lower than the resolution of RT (3.5 Å), which is suggestive of an orientationally dynamic protein. We could unambiguously identify the orientation of the PR dimer within the cryo-EM density, because the linkers that connect the C-terminal residues of PR to RT (see fig. S4) are located directly adjacent to the N termini of the two fingers subdomains within the dimeric RT core. These linkers were also apparent within the cryo-EM density after focused classification (Fig. 4, C and D). The tethering of C-terminal residues of PR to RT pulled the N termini of RT slightly away from their usual positions in crystal structures of the mature enzyme. The N terminus of RT contains a proline-rich sequence (with six prolines in the first 25 residues) that facilitates the repositioning of the C termini of PR. The two Pro100 of Pol (which corresponds to Pro1 of mature RT) are pulled 23.5 and 16.9 Å from their positions in the p66 and p51 subunits of mature RT, respectively. Pro103 (which corresponds to Pro4 of mature RT) is displaced ~6 Å away from the position of equivalent residue in both mature RT subunits, while Pro108 (which corresponds to Pro9 of mature RT) is in approximately the same position in both subunits of Pol and mature RT (Fig. 4D). PR is therefore held at “arm’s length,” away from RT, by its own C-terminal residues and the N-terminal residues of RT. There is relatively little buried surface area at the PR-RT junction (253 Å2) (). This somewhat loose association between PR and RT may contribute to the conformational mobility of the PR relative to RT that gives rise to the weak density that we observed (in the focused classification class 2). This is likely to be functionally relevant, because the PR and RT must eventually be liberated from Pol during maturation. This configuration exposes the PR/RT cleavage site to solvent (and to PR), making it accessible to cleavage during maturation. Thus, the formation of the highly stable RT dimers appears to be a prelude to the activation of the PR by bringing the monomeric PRs into close proximity, which will, in turn, promote PR dimerization and may relieve some of the destabilizing effects known to be associated with upstream p6* residues, as observed in in vitro studies (). The heterogeneity observed for PR within the cryo-EM map of Pol may help to explain the sluggishness of the initial cleavage events during maturation and the markedly reduced inhibitory activity of active site PR inhibitors against the immature PR in the context of Gag-Pol (, , ). Together, our results reveal that, in the context of Pol, PR is connected by two flexible linkers that loosely tether PR and RT, ensuring that the PR can efficiently dimerize during the maturation stage of the viral replication cycle. It is worth noting that the PR in our HIV-1 Pol structure is in a similar location to that of monomeric PR of the apo PFV PR-RT with an inactive RT conformation and marmoset FV PR-RT–nucleic acid complexes similarly joined by flexible linkers to their respective RTs that provide room for conformational variability although these PRs are not cleaved from their respective RTs during maturation (, ). The HIV-1 Pol PR-RT structure reveals a retroviral polyprotein with a dimeric PR in a conformation suitable for proteolysis (fig. S6).

Arrangement of IN in the Pol polyprotein

The high-resolution cryo-EM map of Pol at reasonable contour levels only contained density for the RT portion of Pol, despite the protein sample containing the additional PR and IN domains. Focused classification in the region expected to contain PR revealed additional protein density, so we applied this same strategy to find additional density extending from the C-terminal ends of RT within Pol. First, a 21-Å radius mask was applied to the region directly downstream of the RNase H domain of RTp66L, into which the IN domain should extend. As a control, a mask with the same radius was applied onto a region that is not expected to be occupied by protein mass. This approach recovered the N-terminal domain (NTD) of IN (residues 1 to 49) in one class corresponding to ~20% of the particles from the high-resolution reconstruction (fig. S7, FC1). No density appeared after focused classification in the control experiment (fig. S7, FC2). Despite using larger masks and trying different mask placements, we could not determine the remaining density for IN, which should contain the catalytic core domain (CCD) and C-terminal domain (CTD) of IN. This is not unexpected, because IN contains two long flexible linkers—one connecting the NTD to the CCD and the other connecting the CCD to the CTD. As a result, most of the IN density is not visible in our cryo-EM maps, consistent with conformational variability imparted by the presence of flexible domain linkers within IN. The low-resolution cryo-EM density belonging to the IN NTD, which was seen with our constructs, emanated from the C terminus of the RTp66L subunit (Fig. 2C and fig. S7, FC1). RT and IN are connected by a loop that serves as the hinge for IN (Fig. 1A). The flexibility of the RT/IN junction, which is solvent-exposed, makes it easily accessible by the PR; cleavage at the junction is required to produce the mature IN protein. We next performed focused classification downstream of Polp51L in the hopes of locating the second RNase H domain of the RTp51L chain that will be eventually cleaved to form p51, as well as any additional IN density. Again, a 21-Å radius mask was applied, but this time, the mask was placed proximal to the last visible residue of RTp51L and near the last residue of p51 that is modeled in most crystal structures of mature RT. Focused classification revealed some additional density extending in this region, but it was small and not defined well enough to model any high-resolution structures of the missing protein via rigid-body docking (fig. S8). Ultimately, focused classification could only localize the NTD of IN extending from the Polp66L chain and hint at where the missing RNase H domain of Polp51L sits within Pol. The remaining domains (the CCD and CTD of IN extending from Polp66L, and the RNase H and IN domains connected to Polp51L) must be either disordered or connected by flexible linkers. The Pol structure is dimeric, and we only observed a single band on the SDS-PAGE gel after purification (Fig. 1, A and C, respectively), which suggested that no cleavage occurred during the preparation of the Pol polyprotein. Thus, both IN protomers must be present in each Pol dimer. Ultimately though, only a single NTD of one of the IN protomers could be resolved by focused classification, and we were unable to establish the multimeric state for IN based on the cryo-EM map. This raised the question of how IN is arranged within Pol. To address this question and investigate the multimeric state of IN, we relied on the fact that the IN-binding domain (IBD) of lens epithelium–derived growth factor (LEDGF)/p75 binds specifically to dimeric and tetrameric HIV-1 IN multimers. We found that, when coexpressed with the IBD of LEDGF/p75 fused to the maltose-binding protein (MBP–IBD), Pol forms a complex with IBD and remains stably bound even after a 1.0 M salt wash on a dextrin sepharose column (fig. S10, A and B). Because the IBD is known to bind to highly conserved residues at the CCD/CCD dimer interface (), we infer that the IN, as part of uncleaved Pol, can form CCD-mediated dimers, at least in the presence of IBD. As dimerization of RT is evident in Pol based on our cryo-EM structure and focused classification of PR revealed some degree of dimerization for PR, at least some dimerization of IN within Pol seems plausible. Overall, our data suggest that RT and IN dimerization are intricately linked to PR activation.

DISCUSSION

Collectively, our structural, biophysical, and biochemical data demonstrate that a p66/p51 asymmetric dimer resembling mature RT forms the core scaffold of HIV-1 Pol before cleavage by PR and that this complex is enzymatically active in terms of reverse transcription and cleavage of an RNA/DNA duplex. In addition, our cryo-EM structure indicates that PR can form dimers while part of Pol. Our constructs contain the D25A mutation in the PR, which renders even the mature PR dimer less stable than WT because of disruption of a hydrogen-bonding network that stabilizes the dimer at the fireman’s grip. These observations suggest that RT plays a positive role in holding the PR subunits in close proximity, enhancing their ability to form enzymatically active dimers. Consistent with this, it has been reported that (i) while a minimal TFR-PR D25N construct fails to dimerize, the addition of the first seven amino acids of RT leads to dimerization (); and (ii) inhibition of RT dimerization is deleterious to Gag and Gag-Pol processing during maturation (). Conversely, stabilization of RT dimers using molecules such as efavirenz leads to premature PR activation, presumably by enhancing the formation of enzymatically competent PR dimers within the polyprotein (, ). We propose a model of PR activation occurring in three general stages (Fig. 5). While the first stage has yet to be visualized by solving the Gag-Pol structure (where the Gag portion could play a role in dimerization propensity), the Pol structure presented here may represent an intermediate stage. Hence, the propensity of PR to dimerize—regulated by the Gag:Gag-Pol 20:1 molar ratio and the sequences surrounding PR in the polyprotein—tunes the dimerization and activation of PR and viral maturation.

Fig. 5.

Model of PR activation during HIV maturation.

Model of PR activation during HIV maturation.

The maturation process is divided into three stages according to the dimerization propensity and activity of PR. SU, surface glycoprotein; TM, transmembrane protein. (I) Initially, most Gag-Pol molecules are associated with multimers of Gag, and dimers of Gag-Pol may be infrequent. In addition, dimers of PR within Gag-Pol may be unstable and necessitate extensive sampling to adopt a catalytically active conformation. Accordingly, the PR activity of Gag-Pol is low. (II) In an intermediate stage, once PR (most likely within a Pol-containing polyprotein precursor) has cleaved Pol from Gag, free Pol predominantly forms dimers through dimerization of the RT portion of Pol as observed in these cryo-EM studies. This increases the dimerization propensity of PR (~40% particles analyzed by cryo-EM contain PR dimers according to the focused classification results), driven by RT and/or IN domains of Pol. PR activity within Pol may still be suboptimal but is likely higher than within Gag-Pol. (III) The mature PR dimer should be fully active after p6* removal from the PR N terminus, increasing processing activity and rapidly completing viral maturation. Created using BioRender (). Notably, the assembly of this dimeric Pol entity could also be affected by the formation of IN dimers. The initial site of nucleation for the formation of the Pol dimer now remains unknown. This extremely stable association brings two PR monomers, and their respective N termini, into close proximity, which enables them to form enzymatically active dimers. Thus, RT and IN homodimerization partially offsets the dimerization inhibitory effect of the p6* residues at the N termini of PR, which might not be relieved on its own (, ). Consistent with this observation, Hoyte et al. () report that not only do IN T124N/T174I mutations weaken the binding of an allosteric inhibitor at the CCD dimer interface, but these mutations also inhibit maturation of HIV-1 virions by impairing Gag and Gag-Pol processing, presumably by inhibiting IN dimerization. In the case of PFV, IN has been shown to be required for the dimerization of Pol and PR activation (). Studies of the proteolytic processing of Gag-Pol in cells suggest that stabilization or destabilization of RT dimerization can lead to either enhanced or diminished PR activity (). Our structures imply that RT dimerization will enhance PR dimerization and activation. The two PR monomers are drawn together by the proximity of the N termini of the two RTs. The IN domain is largely not observed in our structures, likely because of flexibility between RT and IN and within the subdomains of IN; however, the ability of Pol to bind the IBD of LEDGF/p75 suggests that at least the IBD-binding site in IN is dimeric in Pol. Together, our results show how HIV-1 maturation leverages the dimerization interface in the RT portion of Pol to regulate the maturation of polyprotein precursors. This regulation controls the timing of the dimerization and activation of PR, which is slow for isolated PR monomers (). The coupling of PR activation to the dimerization of RT helps to delineate a key role for RT in PR activation that was initially suggested by biochemical and virological studies (, ). The observation that RT adopts a p66/p51 heterodimer-like conformation in Pol explains why only one RNase H domain is available for cleavage and offers a plausible explanation for the maintenance of the strict 1:1 stoichiometry of p66/p51 in virions.

MATERIALS AND METHODS

Protein expression

HIV-1 Pol constructs of the BH10 strain (gift from M. Parniak) were cloned into a pET28a vector and transformed into BL21-CodonPlus (DE3)-RIL cells (Agilent Technologies) alone or cotransformed with MBP-IBD. The MBP-IBD construct was cloned into a pCDFDuet vector, compatible with cotransformation with the pET28a HIV-1 Pol vector. In both instances, colonies were selected and inoculated into 100 ml of overnight culture using Jerry Joe Harrison medium (table S1), composed of 1.5% tryptone, 1.0% yeast extract, 1.5% NaCl, 1.5% NZ-Amine, and 50 mM MgSO4 or MgCl2 at pH 6.5 in a 500-ml Erlenmeyer flask containing kanamycin (50 μg/ml) and chloramphenicol (34 μg/ml) [and streptomycin (50 μg/ml) when cotransforming with MBP-IBD]. The culture was shaken at 230 to 250 rpm overnight at 37°C, then diluted into 1 liter of JJH medium with 5% glycerol at pH 6.0 in a 2.5-liter Erlenmeyer flask, and supplemented with kanamycin (50 μg/ml) [and streptomycin (50 μg/ml) when coexpressing with MBP-IBD]. The cells were allowed to grow to an optical density of 2 to 2.5 at 37°C and then for at least 1 hour at 15°C. Subsequently, protein expression was induced with 1 mM isopropyl-β-d-thiogalactopyranoside, and culture was grown overnight at 15°C. Phosphate buffer (50 mM) at pH 6.0 was added to the medium as a buffering agent. Cells were harvested by centrifugation at 4000g for 30 min; the cell pellet resuspended in 100 to 150 ml of lysis buffer [100 mM tris-Cl buffer at pH 8.0, 600 mM NaCl, 0.5% Triton X-100, 10% glycerol, 30 mM imidazole, and 2 mM TCEP (tris(2-carboxyethyl)phosphine)] with 1 mM phenylmethylsulfonyl fluoride, 1 μM pepstatin A, and 1 μM leupeptin; and cells were sonicated for 10 min on ice. The cellular debris was spun down at 38,000g for 30 min.

Protein purification

The previous supernatant was loaded onto a nickel gravity column preequilibrated with the lysis buffer and then washed with (i) 5 to 10 column volumes (CV) of lysis buffer, (ii) 5 to 10 CV of high-salt buffer wash (1.5 M NaCl) in lysis buffer, (iii) chaperone wash of 5 CV (containing 5 mM adenosine 5′-triphosphate, 5 mM MgCl2, 50 mM imidazole, and lysis buffer), and (iv) 3-CV wash with the lysis buffer. Protein was eluted with 4 CV of 80 mM tris (pH 8.0), 600 mM NaCl, 500 mM imidazole, 10% glycerol, and 2 mM TCEP and diluted twofold with water before loading onto a 5-ml HiTrap heparin column preequilibrated with 30 mM tris-Cl (pH 8.0), 300 mM NaCl, 5% glycerol, and 1 mM TCEP. The column was washed with loading buffer until the background ultraviolet absorption was negligible. Elution of protein was carried out with the loading buffer containing 1 M NaCl, and protein was concentrated and injected onto a Superose 6 Increase 10/30 GL gel filtration column preequilibrated with 20 mM tris-Cl (pH 8.0) and 250 to 300 mM NaCl. Fractions containing pure protein were pooled, concentrated to 0.2 to 0.5 mg/mL, flash-frozen, and stored at −80°C. For coexpression with MBP-IBD, glycerol was removed from the nickel column elution buffer, while elution from heparin column was done using 20 mM tris-Cl (pH 8.0) and 1.0 M NaCl, followed by an MBP Trap HP step, washed with the previous buffer, and eluted with 20 mM tris-Cl (pH 8.0), 600 mM NaCl, and 10 mM maltose. Gel filtration chromatography suggested a dimeric molecule of Pol in solution at concentrations between 0.2 and 0.5 mg/ml in both cases (Fig. 1, B and C, and fig. S10, A and B, respectively), consistent with DLS experiments (Fig. 1D and fig. S10C, respectively).

Cryo-EM specimen preparation for electron microscopy

HIV-Pol protein sample at ~0.4 mg/ml in gel filtration chromatography buffer [20 mM tris-Cl (pH 8.0) and 250 to 300 mM NaCl] was applied (2.5 μl) onto freshly plasma-treated (7 s, 50 W, Gatan Solarus plasma cleaner) holey gold UltrAuFoil grid (Quantifoil), adsorbed for ~1 min, and plunged into liquid ethane using a manual cryo-plunger operated inside the cold room (~4°C).

Cryo-EM data collection

Data were acquired using the Leginon software (), installed on an FEI/Thermo Fisher Scientific Titan Krios electron microscope at the Scripps Research Institute, operating at 300 keV and using a K2 summit direct electron detector operating in counting mode. All data collection statistics are summarized in table S2.

Cryo-EM data processing

Movies were corrected for beam-induced movement using MotionCor2 (), implemented within the Appion platform (). Individual frames were gain-corrected, aligned, and summed with the application of an exposure filter (). The generated sums excluding the first frame (frames 2 to 60) were used as input for contrast transfer function (CTF) estimation and particle selection performed in Warp (). The particle stack generated in Warp was used as input for 2D and 3D classifications in cryoSPARC (), followed by nonuniform (NU) refinement (). 2D classification was used to remove particles that did not produce clean class averages. 3D classification using two classes, followed by NU refinement using the particles giving rise to the highest-resolution reconstruction from the two classes, were performed iteratively until no further improvement in resolution and map quality was observed. This reconstruction was then used as input to another round of heterogeneous 3D classification, with the entire stack as input. After another iterative workflow entailing 3D classification and NU refinement, one round of CTF refinement and a final round of NU refinement were performed (fig. S2). The resolution of the RT portion of Pol was assessed using the conventional Fourier shell correlation (FSC) analysis (), for both masked and unmasked half-map and map-to-model curves, using the “mtriage” tool implemented within the Phenix package (). The masks for resolution calculation were automatically generated with mtriage. Directional resolution volumes were generated using the 3D FSC tool (), whereas the local resolution was generated using sxlocres.py (, ). The final 3D reconstruction was used as input for focused classification performed in cisTEM (, ). Different sphere dimensions were tested for focused classification, and the final reconstructions used spheres of 35-Å (PR) and 21-Å (RNase Hp51L, IN, and negative controls) radius. The focused classification jobs were performed in manual refinement node without adjustment of either angles or shifts for 30 cycles. For all jobs, we asked for three classes, and a global mask of 100 Å was applied.

Model building and refinement

The model from the RT portion of Pol was built using the initial coordinates from the crystal structure of the apo RT [Protein Data Bank (PDBID) 1DLO]. The model was aligned to the cryoSPARC sharpened map, and 500 Rosetta_Relax jobs were performed for the relaxation of the backbone bond geometry to improve protein energy landscape for the next step of modeling (). Individual regions that diverged from the high-resolution crystal structures were adjusted manually as necessary in Coot (). The model of the PR-RT portion of Pol was built using the refined RT structure within Pol, and linkers were manually modeled (N-terminal residues 1 to 7 of each RT domain) to connect to the C termini of the PR dimer that was positioned by rigid-body docking using the apo HIV-1 PR structure (PDBID 2HB4). Final models were generated by iterative model building using Coot and refinement with phenix.real_space_refine (with secondary structure restraints) (). The geometry of the final models and other validation statistics were reported by Molprobity (, ). The relevant refinement statistics are summarized in table S2. High-resolution structural figures were prepared for publication using UCSF Chimera ().

Primer extension and processivity assays

For polymerization assays, reactions were performed using a ~4-kb RNA template made by runoff transcription with T7 RNA polymerase using plasmid pBKBH10S that was cleaved with Eco RI. Plasmid pBKBH10S was obtained through the National Institutes of Health (NIH) HIV Reagent Program, Division of AIDS, National Institute of Allergy and Infectious Diseases, NIH: Human Immunodeficiency Virus 1 (HIV-1) BH10 Non-Infectious Molecular Clone (pBKBH10S DNA), ARP-194, contributed by J. Rossi. After treatment with deoxyribonuclease I to remove the plasmid DNA, isolated RNA was hybridized to a 5′-32P–labeled DNA primer (5′- GCTTGATTCCCGCCCACCAA-3′) at a 3:1 ratio of primer:template. All reactions were initiated by the addition of deoxynucleoside triphosphates (dNTPs) and MgCl2 to other components that were preincubated at 37°C in final reaction volume of 12.5 μl. For reactions testing processivity, heparin (final concentration 1 μg/μl) was included as a “trap” during the initiation step to sequester enzyme molecules that were free in solution or had dissociated from the primer-template. Trapped reactions were incubated for 10 min and nontrapped for 20 min before terminating with 2× gel loading buffer [90% formamide, 10 mM EDTA (pH 8), and 0.025% each bromophenol blue and xylene cyanol]. Samples were run on a 6% denaturing polyacrylamide gel and visualized with a phosphorimager. For analysis of RNase H activity, reactions were performed in the buffer conditions described above (without dNTPs) using as the substrate for RNase H activity, a 60-nt 5′-32P–labeled RNA (5′-GGGCGAAUUCGAGCUCGGUACCCGGGGAUCCUCUAGAGUCGACCUGCAGGCAUGCAAGCU-3′) hybridized to a 23-nt DNA (5′-AGGATCCCCGGGTACCGAGCTCG-3′) at 1:2 RNA:DNA ratio as the enzyme substrate (final concentration of 10 nM RNA in all assays). Assays were carried out in an 80-μl reaction volume at 37°C and were initiated by the addition of 100 nM (final concentration) p66/51 RT, 85 nM Pol, or 70 nM Pol + MBP-IBD. Calculation of the concentrations of the Pol assumed a Pol dimer and a 1:1 ratio of Pol:MBP-IBD. Aliquots were removed at the indicated time points and terminated with 2× gel running buffer, then run on a 10% denaturing gel, and visualized with a phosphoimager. For extension of DNA aptamer, reactions were performed in the buffer conditions described in the polymerization assays using a 5′-32P–labeled 38-nt primer-template mimicking aptamer (final concentration of 5 nM) that binds HIV-1 RT with pM affinity as the substrate (). The loop-back aptamer has a 5-nt 5′ overhang, allowing full extension to 43 nt. Reactions were carried out in a 20-μl reaction volume at 37°C and were initiated by the addition of 10 nM (final concentration) of either p66/51 RT, Pol-D25A, or Pol-D25A + MBP-IBD and incubated for 10 min at 37°C. Terminated reactions were resolved on a 16% denaturing gel and visualized with a phosphoimager.

59 in total

1. Proteolytic processing of an HIV-1 pol polyprotein precursor: insights into the mechanism of reverse transcriptase p66/p51 heterodimer formation.

Authors: Nicolas Sluis-Cremer; Dominique Arion; Michael E Abram; Michael A Parniak
Journal: Int J Biochem Cell Biol Date: 2004-09 Impact factor: 5.085

Review 2. The structural biology of HIV assembly.

Authors: Barbie K Ganser-Pornillos; Mark Yeager; Wesley I Sundquist
Journal: Curr Opin Struct Biol Date: 2008-04-09 Impact factor: 6.809

3. Characterization of ribosomal frameshifting in HIV-1 gag-pol expression.

Authors: T Jacks; M D Power; F R Masiarz; P A Luciw; P J Barr; H E Varmus
Journal: Nature Date: 1988-01-21 Impact factor: 49.962

4. Stock-based detection of protein oligomeric states in jsPISA.

Authors: Eugene Krissinel
Journal: Nucleic Acids Res Date: 2015-04-23 Impact factor: 16.971

5. Crystal Structure of a Retroviral Polyprotein: Prototype Foamy Virus Protease-Reverse Transcriptase (PR-RT).

Authors: Jerry Joe E K Harrison; Steve Tuske; Kalyan Das; Francesc X Ruiz; Joseph D Bauman; Paul L Boyer; Jeffrey J DeStefano; Stephen H Hughes; Eddy Arnold
Journal: Viruses Date: 2021-07-29 Impact factor: 5.818

6. Features and development of Coot.

Authors: P Emsley; B Lohkamp; W G Scott; K Cowtan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-03-24

7. Appion: an integrated, database-driven pipeline to facilitate EM image processing.

Authors: Gabriel C Lander; Scott M Stagg; Neil R Voss; Anchi Cheng; Denis Fellmann; James Pulokas; Craig Yoshioka; Christopher Irving; Anke Mulder; Pick-Wei Lau; Dmitry Lyumkis; Clinton S Potter; Bridget Carragher
Journal: J Struct Biol Date: 2009-04 Impact factor: 2.867

8. Initial cleavage of the human immunodeficiency virus type 1 GagPol precursor by its activated protease occurs by an intramolecular mechanism.

Authors: Steven C Pettit; Lorraine E Everitt; Sumana Choudhury; Ben M Dunn; Andrew H Kaplan
Journal: J Virol Date: 2004-08 Impact factor: 5.103

9. Addressing preferred specimen orientation in single-particle cryo-EM through tilting.

Authors: Yong Zi Tan; Philip R Baldwin; Joseph H Davis; James R Williamson; Clinton S Potter; Bridget Carragher; Dmitry Lyumkis
Journal: Nat Methods Date: 2017-07-03 Impact factor: 28.547

Review 10. Strategies to optimize protein expression in E. coli.

Authors: Dana M Francis; Rebecca Page
Journal: Curr Protoc Protein Sci Date: 2010-08