Julia Grawenhoff1, Alan N Engelman1. 1. Julia Grawenhoff, Alan N Engelman, Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02215, United States.
Abstract
Retroviral replication proceeds through the integration of a DNA copy of the viral RNA genome into the host cellular genome, a process that is mediated by the viral integrase (IN) protein. IN catalyzes two distinct chemical reactions: 3'-processing, whereby the viral DNA is recessed by a di- or trinucleotide at its 3'-ends, and strand transfer, in which the processed viral DNA ends are inserted into host chromosomal DNA. Although IN has been studied as a recombinant protein since the 1980s, detailed structural understanding of its catalytic functions awaited high resolution structures of functional IN-DNA complexes or intasomes, initially obtained in 2010 for the spumavirus prototype foamy virus (PFV). Since then, two additional retroviral intasome structures, from the α-retrovirus Rous sarcoma virus (RSV) and β-retrovirus mouse mammary tumor virus (MMTV), have emerged. Here, we briefly review the history of IN structural biology prior to the intasome era, and then compare the intasome structures of PFV, MMTV and RSV in detail. Whereas the PFV intasome is characterized by a tetrameric assembly of IN around the viral DNA ends, the newer structures harbor octameric IN assemblies. Although the higher order architectures of MMTV and RSV intasomes differ from that of the PFV intasome, they possess remarkably similar intasomal core structures. Thus, retroviral integration machineries have adapted evolutionarily to utilize disparate IN elements to construct convergent intasome core structures for catalytic function.
Retroviral replication proceeds through the integration of a DNA copy of the viral RNA genome into the host cellular genome, a process that is mediated by the viral integrase (IN) protein. IN catalyzes two distinct chemical reactions: 3'-processing, whereby the viral DNA is recessed by a di- or trinucleotide at its 3'-ends, and strand transfer, in which the processed viral DNA ends are inserted into host chromosomal DNA. Although IN has been studied as a recombinant protein since the 1980s, detailed structural understanding of its catalytic functions awaited high resolution structures of functional IN-DNA complexes or intasomes, initially obtained in 2010 for the spumavirus prototype foamy virus (PFV). Since then, two additional retroviral intasome structures, from the α-retrovirus Rous sarcoma virus (RSV) and β-retrovirus mouse mammary tumor virus (MMTV), have emerged. Here, we briefly review the history of IN structural biology prior to the intasome era, and then compare the intasome structures of PFV, MMTV and RSV in detail. Whereas the PFV intasome is characterized by a tetrameric assembly of IN around the viral DNA ends, the newer structures harbor octameric IN assemblies. Although the higher order architectures of MMTV and RSV intasomes differ from that of the PFV intasome, they possess remarkably similar intasomal core structures. Thus, retroviral integration machineries have adapted evolutionarily to utilize disparate IN elements to construct convergent intasome core structures for catalytic function.
Core tip: This review examines the history of retroviral integrase structural biology and covers the currently available high-resolution structures of retroviral intasomes in detail. We in particular focus on the similarities and differences among the intasome structures of prototype foamy virus, Rous sarcoma virus and mouse mammary tumor virus.
INTRODUCTION
Retroviral replication requires the incorporation of the viral genetic information into the host cellular genome, which occurs via two main steps: (1) the reverse transcription of single-stranded viral RNA into linear double-stranded DNA; and (2) the integration of this DNA into a host chromosome. These steps occur in the context of two subviral nucleoprotein complexes: The reverse transcription complex (reviewed in[1]) and the pre-integration complex (PIC)[2], each of which consists of a variety of cellular and viral proteins including reverse transcriptase (RT) and integrase (IN)[3-7]. In the cytoplasm, RT mediates the synthesis of a linear viral DNA (vDNA) molecule that harbors a copy of the viral long-terminal repeat (LTR) at each end[8-10]. In the confines of the PIC, vDNA is trafficked toward the nucleus, where its integration into host cell target DNA (tDNA) is promoted by IN. Here, we discuss the current knowledge of IN structural determinants and intasome function, highlighting both key similarities and differences among the retroviruses.
REACTIONS CATALYZED BY IN
Retroviral IN performs two biochemically and temporally distinct bimolecular nucleophilic substitution (SN2) reactions[11]: 3’-processing and strand transfer (Figure 1). During 3’-processing, a di- or trinucleotide is hydrolytically cleaved from each 3’ vDNA end[12-14], exposing reactive hydroxyl groups of invariant CA dinucleotides. These groups act as nucleophiles for subsequent strand transfer whereby the newly processed 3’ vDNA ends are covalently inserted into a major groove of tDNA in a staggered fashion. The product of the second reaction is an integration intermediate that is characterized by unjoined 5’ vDNA overhangs[15,16]. Following disassembly of the associated strand transfer complex (STC, Figure 1), a DNA polymerase, 5’ flap endonuclease, and DNA ligase are required to fill in the single-strand gap regions in tDNA, excise 5’ vDNA overhangs, and join the vDNA 5’ ends to host DNA strands, respectively (reviewed in[17]). During this process, short target site duplications are generated, which flank the integrated provirus. Depending on the genus of retrovirus, the size of these target site duplications ranges from 4-6 base pairs (bp). Whereas spumavirus prototype foamy virus (PFV)[18,19] and lentivirus human immunodeficiency virus 1 (HIV-1)[20,21] integration yield 4 bp and 5 bp target site duplications, respectively, mouse mammary tumor virus (MMTV)[22] and Rous sarcoma virus (RSV)[23,24] INs cleave tDNA phosphodiester bonds that are separated by 6 bp.
Figure 1
Integrase catalytic functions and intasome complexes. A multimer of integrase (IN) (depicted simply by blue oval) engages the end regions of the linear vDNA molecule (yellow), forming the stable synaptic complex (SSC). During 3’-processing, IN hydrolyzes the vDNA ends adjacent to invariant CA dinucleotides, revealing a set of reactive 3’-hydroxyl groups in the confines of the cleaved donor complex (CDC). After nuclear localization, the target capture complex (TCC) is formed upon tDNA (black) capture. Strand transfer, whereby IN employs the 3’ hydroxyl groups as nucleophiles to attack the tDNA, marks the transition to the strand transfer complex (STC).
Integrase catalytic functions and intasome complexes. A multimer of integrase (IN) (depicted simply by blue oval) engages the end regions of the linear vDNA molecule (yellow), forming the stable synaptic complex (SSC). During 3’-processing, IN hydrolyzes the vDNA ends adjacent to invariant CA dinucleotides, revealing a set of reactive 3’-hydroxyl groups in the confines of the cleaved donor complex (CDC). After nuclear localization, the target capture complex (TCC) is formed upon tDNA (black) capture. Strand transfer, whereby IN employs the 3’ hydroxyl groups as nucleophiles to attack the tDNA, marks the transition to the strand transfer complex (STC).
RETROVIRAL IN DOMAIN ORGANIZATION
Retroviral IN proteins consist of approximately 275-470 amino acid residues. The INs to be discussed in detail in this review amount to 288 (HIV-1), 392 (PFV), 286 (RSV), and 319 (MMTV) residues[25-27]. Retroviral INs comprise three domains common to all genera: The N-terminal domain (NTD), the catalytic core domain (CCD), and the C-terminal domain (CTD)[28-32], which connect to one another via flexible linkers that vary in length across the viruses (Figure 2). The NTD adopts a helix-turn-helix fold and harbors two pairs of Zn2+-coordinating histidine and cysteine residues (HHCC motif), which are additionally conserved in retrotransposon INs and are involved in the recognition of the viral LTRs[30,31,33-35]. Accordingly, Zn2+ binding triggers HIV-1 IN multimerization and increases its catalytic activity[36,37]. The CCD adopts an RNase H fold and coordinates two Mg2+ ions via the invariant Asp and Glu amino acid residues that comprise the D, DX35E catalytic triad motif[28,29,38-40]. The coordination of Mg2+ ions chemically activates the nucleophiles for 3’-processing (water) as well as for strand transfer (3’-OH groups of vDNA) and destabilizes the respective scissile phosphodiester bonds[41-43]. The CTD is the least conserved among the shared IN domains, however, the tertiary structures of resolved CTDs show similar characteristics: They adopt a Src homology 3 fold[44,45], are involved in DNA binding[46], and promote IN multimerization[47,48]. Some retroviruses, namely spuma-, ε- and γ-retroviruses, harbor an additional NTD extension domain (NED) that precedes the NTD[26,40,49] and engages vDNA in the context of the intasome structure[40]. These IN proteins accordingly are larger than their lenti-, α-, β- and δ-retroviral cousins that lack the NED.
Figure 2
Integrase domain organization and representative secondary structures. Starting and ending residues for integrase (IN) domains are indicated above the boxes, and interdomain linker lengths as well as C-terminal tail lengths are indicated below the lines. Crystal structures of N-terminal domains (NTDs), catalytic core domains (CCDs), and C-terminal domains (CTDs) are provided underneath the corresponding schematic IN representation. Crystal structures in the absence of DNA are not available for the PFV NTD extension domain (NED), NTD, or CTD, as well as for the RSV NTD. PDB accession codes: HIV-1 (NTD, 1K6Y; CCD, 1BIU; CTD, 1EX4), PFV (CCD, 3DLR), RSV (CCD, 1C0M; CTD, 1C0M) and MMTV (NTD, 5CZ2; CCD, 5CZ1; CTD, 5D7U). HIV: Human immunodeficiency virus; PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus.
Integrase domain organization and representative secondary structures. Starting and ending residues for integrase (IN) domains are indicated above the boxes, and interdomain linker lengths as well as C-terminal tail lengths are indicated below the lines. Crystal structures of N-terminal domains (NTDs), catalytic core domains (CCDs), and C-terminal domains (CTDs) are provided underneath the corresponding schematic IN representation. Crystal structures in the absence of DNA are not available for the PFV NTD extension domain (NED), NTD, or CTD, as well as for the RSV NTD. PDB accession codes: HIV-1 (NTD, 1K6Y; CCD, 1BIU; CTD, 1EX4), PFV (CCD, 3DLR), RSV (CCD, 1C0M; CTD, 1C0M) and MMTV (NTD, 5CZ2; CCD, 5CZ1; CTD, 5D7U). HIV: Human immunodeficiency virus; PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus.
MULTIMERIZATION OF IN
Numerous biochemical studies revealed that the active form of retroviral IN is a multimer that engages vDNA and tDNA in the confines of a nucleoprotein complex[50-58]. Bacteriophage Mu-mediated PCR footprinting of PICs extracted from infected cells revealed the protection of several hundred bp at the vDNA ends, and the associated complex was termed “intasome” to distinguish it from the larger PICs[59,60]. Subsequently, the intasome term was adopted by structural biologists who constructed and purified distinct, functional IN-DNA complexes capable of efficient concerted integration of two synapsed oligonucleotide vDNA ends, the structures of which were solved by X-ray crystallography[40,43,61-63] or single particle cryo-electron microscopy (EM)[26]. Although retroviral INs have been studied for decades, the 3-dimensional structures of PFV intasomes greatly aided the elucidation of the details of 3’-processing and strand transfer reaction mechanisms[40,43,62].The “intasome” term today applies to the family of nucleoprotein complexes that are known to mediate retroviral DNA integration (Figure 1), which encompasses the stable synaptic complex (SSC)[43], the cleaved donor complex (CDC) or cleaved intasome[40,43], the target capture complex (TCC)[43,62], and the STC[62,63] (Figure 1). The SSC forms upon IN binding to the vDNA ends[5,40,55,57-60,64,65]. The hydrolytic cleavage of a di- or trinucleotide from each 3’-end marks the transition to the CDC. The TCC forms when the CDC engages tDNA, whereas the STC is formed when the vDNA ends are inserted into tDNA and thus strand transfer is completed[57,58,61-64,66]. The PFV system has importantly afforded high-resolution structures for each of these complexes[40,43,62,63].
APPROACHES TO STUDY THE THREE-DIMENSIONAL STRUCTURES OF RETROVIRAL IN PROTEINS
Mechanistic studies of retroviral DNA integration began in earnest in the late 1980s and early 1990s when PICs partially purified from infected cells were shown to promote correct DNA integration in vitro[5,14,60,67-70]. Nearly parallel biochemical studies, which utilized purified avian, murine, and human retroviral proteins, importantly showed that IN alone was sufficient to catalyze both the 3’-processing and strand transfer of recombinant vDNA substrates[71-77]. The results of these experiments opened up a new field dedicated to the structural and functional analysis of retroviral INs. Initial work on HIV-1 IN quickly revealed the relatively poor solubility of the full-length enzyme in vitro, which limited further biochemical and thus structural characterization[48]. To date, there is no high-resolution structure of a full-length retroviral IN in the absence of DNA, which is likely attributable to the inherent flexibility of the interdomain linkers[78-80]. In essence, the intertwined architecture of IN with vDNA in the context of the intasome complex necessarily “locks down” the inherently flexible enzymes, which afforded platforms for their detailed structural analyses.
STRUCTURES OF INDIVIDUAL DOMAIN AND TWO-DOMAIN IN CONSTRUCTS
Initial structures of individual IN domains or two-domain constructs in the absence of DNA (reviewed in[81,82]) (Figure 2) turned out to be challenging in many cases, and were only possible for the HIV-1 IN protein with the help of solubility-enhancing mutations. By systematically replacing hydrophobic residues in the HIV-1 IN CCD, Phe185 was identified as a solubility-limiting residue[83]: Substituting either Lys or His for Phe185 dramatically improved HIV-1 IN protein solubility[48,84]. Although the F185K change enabled the determination of the HIV-1 CCD X-ray structure[85], it conferred a lethal viral phenotype due to deficiencies in virus particle assembly and reverse transcription in addition to defective vDNA integration[48]. Indeed, the vast majority of IN mutations elicit such pleotropic defects on HIV-1 replication (reviewed in[86]). In contrast, the F185H change was tolerated by the virus, and importantly enabled crystallization of the CCD[84,87]. In both cases, the positively charged substituent residue reaches across the CCD dimerization interface and makes a hydrogen bond contact with the main chain carbonyl of Ala105 of the partner IN monomer. The F185K and F185H changes are therefore likely to dramatically increase HIV-1 IN solubility by removing a surface exposed aromatic residue that may nucleate protein aggregation, as well as by enhancing multimerization by adding two hydrogen bonds per IN dimer. Even greater solubility of the HIV-1 CCD was achieved by mutating the tryptophan at position 131 to glutamic acid in the context of the F185K change[88]. Simultaneous to the work on the HIV-1 IN CCD, the X-ray structure of the avian sarcoma virus (ASV) IN CCD was solved[89]. Of note, ASV IN harbors His at the position analogous to Phe185 in HIV-1 IN, and solubility-enhancing mutations accordingly were not required to crystallize the ASV IN CCD. Succeeding structures of HIV-1[88,90-93] and ASV[94-97] IN CCDs elucidated binding sites for metal ion cofactors as well as for early inhibitors. NTD and CTD structures of HIV-1 IN and HIV-2 IN, which were solved by nuclear magnetic resonance spectroscopy[34,35,44,45,98,99], suggested functions of the NTDs in metal ion coordination and the binding of the CTD to vDNA, each of which is required for integration.Two-domain IN constructs were initially studied as an approach to understand how individual IN domains might interact to form an active nucleoprotein complex. Three different CCD-CTD two-domain structures for HIV-1, simian immunodeficiency virus (SIV), and avian sarcoma-leukosis virus (ASLV) were reported in 2000[100-102]. Each construct harbored at least one solubility-enhancing mutation: C56S, W131D, F139D, F185K, and C280S for HIV-1, F185H for SIV, and F199K for ASLV, the latter two of which are analogous to the F185K change in HIV-1. The HIV-1 CCD-CTD structure revealed an extended α-helix for the CCD-CTD linker region[100]. Whereas the SIV IN CCD-CTD linker could not be traced[101], the RSV IN CCD-CTD linker had a rather extended, non-helical form compared to the HIV-1 IN CCD-CTD linker[102]. The appearance of three different linker configurations in three different IN CCD-CTD constructs led some to suggest that such configurations may result from crystal packing and therefore represent limited physiological relevance[82].Crystallization of a HIV-1 IN NTD-CCD construct was achieved in 2001 by including W131D, F139D and F185K solubility-enhancing mutations[103]. The resulting X-ray structure revealed possible interactions between two of the NTDs with two CCDs of opposing NTD-CCD molecules, which was of potential physiological relevance due to the fact that it was known from prior biochemical studies that the NTD functioned in trans with the CCD[51-53]. However, the inability to trace the NTD-CCD linker regions limited the confidence of this interpretation. Importantly, the domain sharing arrangement suggested by this structure was later confirmed by additional NTD-CCD structures and mutagenesis[104], and ultimately through the elucidation of intasome structures (see below).
LENTIVIRAL IN-LEDGF CO-CRYSTAL STRUCTURES
In the early 2000s, a host factor implicated in the nuclear retention of HIV-1 IN, lens epithelium-derived growth factor/transcriptional co-activator p75 (LEDGF/p75), was reported to increase the solubility of HIV-1 IN through its tight binding interaction[105-107]. LEDGF/p75 is a lentiviral-specific IN-binding protein[105,108,109] that tethers vDNA integration to transcriptionally active regions of the host genome (reviewed in[110,111]). LEDGF/p75 engages lentiviral IN via its C-terminally located IN-binding domain (IBD)[112]. Although the HIV-1 IN CCD was sufficient for LEDGF/p75 binding, the NTD was required for the high efficiency interaction[107]. LEDGF/p75 binding stabilizes lentiviral IN tetramerization[104,113], which is likely related to its ability to enhance the solubility of the viral proteins.Crystal structures of lentiviral INs in complex with LEDGF/p75 include an HIV-1 IN F185K CCD construct[114] as well as HIV-2[115] and maedi-visna virus (MVV) IN NTD-CCD two-domain fragments[104]. Though HIV-2 and MVV INs harbor hydrophobic residues at the positions analogous to Phe185 in HIV-1 IN (Phe and Ile, respectively), the favorable solubility properties of lentiviral IN-LEDGF/p75 complexes likely dispelled the need for solubility-enhancing mutations for the crystallization of these constructs. The LEDGF/p75 IBD is a PHAT domain composed of two helix-hairpin-helix motifs[116], with Asp366 at the tip of the N-terminal hairpin nestling into a binding cleft at the HIV-1 IN CCD dimerization interface and contacting the main chain amides of IN residues Glu170 and His171 via hydrogen bonds[114]. A novel class of potent anti-HIV compounds, known as LEDGINs (LEDGF-IN inhibitors) or ALLINIs (allosteric IN inhibitors), structurally mimic the role of Asp366 in their binding to HIV-1 IN, which accounts for their abilities to compete for LEDGF/p75 binding to IN (reviewed in[117]). The two domain NTD-CCD constructs revealed the structural basis for the IN NTD interaction with the LEDGF/p75 IBD, which was ionic in nature. Interestingly, the polarities of the participating salt bridges were functionally reversible, such that HIV-1 particles carrying NTD reverse charge substitutions that were otherwise dead regained partial activity in the presence of the complementary reverse charge LEDGF/p75 partner protein[115,118].
PFV INTASOME STRUCTURE
Although the aforementioned individual and two-domain constructs provided insight into retroviral IN function, the field sorely required the structural determination of a functional intasome. The sole class of clinically approved HIV-1 IN inhibitors, known as IN strand transfer inhibitors (INSTIs), displays little if any binding affinity for free IN protein; their clinical target is the IN-vDNA complex[119]. Fortuitously, INSTIs are active against most types of retroviruses[120-122], so intasome structures derived from basically any retroviral genus would have in theory provided a backdrop for understanding the structural basis for INSTI action and the clinical emergence of drug resistance.Due to the poor solubility of HIV-1 and other early studied retroviral INs, the search for an enzyme with more favorable biochemical properties for in vitro experimentation and crystallography was initiated. Though early work had revealed that relatively short oligonucleotide substrates, which modeled the vDNA ends, supported IN 3’ processing and strand transfer activities[72-77], not all enzymes behaved similarly. Most critical for intasome structural biology was the ability for the IN multimer to coordinate the binding of two vDNA ends, and insert these in concerted fashion into opposing strands of tDNA. Critically, PFV IN was discovered to promote efficient concerted integration of oligonucleotide vDNA ends[121]. By contrast, HIV-1 IN had revealed the tendency to insert just one vDNA end at a time[77]. Subsequent modifications of HIV-1 IN expression systems, including protein purification under relatively dilute conditions to prevent IN aggregation[123], or by fusing the small Sso7d DNA binding domain from Sulfolobus solfataricus to the IN N-terminus to mimic the NED that naturally exists in PFV IN[124], yielded proteins that supported efficient concerted integration activity. Such modifications might eventually prove useful to characterize HIV-1 intasomes structurally[123,124].Functional PFV-vDNA complexes assembled by differential salt dialysis migrated as a distinct species on gel filtration columns, and remained intact and active following challenge with high salt concentrations[40]. The initial X-ray crystal structure of the PFV intasome, representing the CDC, was reported in 2010[40] (Figure 3). To date, 37 PFV intasome structures composed of wild-type IN or mutant variants that contain clinically relevant amino acid substitutions have been solved by X-ray crystallography or cryo-EM, representing complexes in the presence of divalent metal ion cofactors, tDNA/nucleosomes, and INSTIs[40,43,62,63,125-130]. The INSTI-bound structures elucidated the mechanism of drug action: The halo-benzyl chemical group common to these compounds assumes the position of the invariant 3’ deoxyadenylate in vDNA with its critical hydroxyl group, thus ejecting the strand transfer nucleophile from the enzyme active site and disarming the nucleoprotein complex[40].
Figure 3
Comparison of prototype foamy virus, Rous sarcoma virus and mouse mammary tumor virus intasomes. A: The PFV intasome comprises two catalytic inner subunits (green and pink) and two outer supportive INs (cyan and light grey). Only the CCDs of the outer subunits are discernable in crystallographic electron density maps. RSV and MMTV share the PFV intasome core architecture and employ two additional flanking IN dimers (orange-purple and yellow-dark pink) to complete the intasome structures; B: Three-dimensional alignment of RSV (grey, PDB accession code 5EJK), PFV (yellow, PDB code 3L2Q), and MMTV (red, PDB code 3JCA) intasome structures was performed using Chimera. For the MMTV intasome, flanking dimers were unambiguously positioned into the intasome core of the cryo-EM map via rigid-body docking. The alignment reveals a high degree of flexibility (approximately 30-40 Å) for the flanking RSV and MMTV dimers relative to the common intasome core structures (arrows). PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus.
Comparison of prototype foamy virus, Rous sarcoma virus and mouse mammary tumor virus intasomes. A: The PFV intasome comprises two catalytic inner subunits (green and pink) and two outer supportive INs (cyan and light grey). Only the CCDs of the outer subunits are discernable in crystallographic electron density maps. RSV and MMTV share the PFV intasome core architecture and employ two additional flanking IN dimers (orange-purple and yellow-dark pink) to complete the intasome structures; B: Three-dimensional alignment of RSV (grey, PDB accession code 5EJK), PFV (yellow, PDB code 3L2Q), and MMTV (red, PDB code 3JCA) intasome structures was performed using Chimera. For the MMTV intasome, flanking dimers were unambiguously positioned into the intasome core of the cryo-EM map via rigid-body docking. The alignment reveals a high degree of flexibility (approximately 30-40 Å) for the flanking RSV and MMTV dimers relative to the common intasome core structures (arrows). PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus.The PFV intasome consists of a tetrameric assembly of IN arranged around a dimer-of-dimers architecture[40] (Figure 3). The inner dimer is composed of two inner monomers (green and pink in Figure 3A), whereas each outer dimer is composed of an inner IN monomer and an outer IN monomer (cyan-green and pink-light grey in Figures 3 and 4). The inner IN monomers make all contacts with vDNA and thus are the catalytic subunits, with each of their constitutive domains mediating vDNA in addition to IN-IN contacts. As previously alluded to, the catalytic subunits are established via a domain sharing mechanism whereby the NTD of each inner IN monomer interacts intimately with the CCD of the opposing IN monomer. The outer IN dimers center around the extensive CCD dimeric interface observed in prior retroviral IN CCD crystal structures[85,87-97]. The CTDs, NTDs, and NEDs of the outer IN monomers are not resolved in the electron density maps, and it is currently unclear what precise role(s) they may play in the catalysis of vDNA integration[131]. It is generally thought that the outer IN monomers mainly play a supportive architectural role to truss the inner IN monomers and vDNA together. As the outer CTDs can contribute to nucleosome binding in vitro[130], it seems possible they might play a role during vDNA integration into chromatinized templates, as occurs during virus infection. The NTD-CCD and CCD-CTD linkers, which are only visible for the inner IN monomers in the crystal structures, adopt extended conformations and contact the vDNA[40].
Figure 4
Integrase domain organizations within the prototype foamy virus, Rous sarcoma virus and mouse mammary tumor virus intasome structures. Separate integrase (IN) domains are labeled, with IN monomer coloring code retained from Figure 3. The green IN1 and pink IN3 monomers donate their active sites for catalysis of 3’ processing and strand transfer across the structures. Circled areas represent similarly positioned CTDs. While these emanate from inner IN1 and IN3 monomers in the PFV structure, they originate from flanking MMTV and RSV IN monomers IN6 and IN8. PDB accession codes same as in Figure 3. PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus; CCD: Catalytic core domain; NTD: N-terminal domain; CTD: C-terminal domain.
Integrase domain organizations within the prototype foamy virus, Rous sarcoma virus and mouse mammary tumor virus intasome structures. Separate integrase (IN) domains are labeled, with IN monomer coloring code retained from Figure 3. The green IN1 and pink IN3 monomers donate their active sites for catalysis of 3’ processing and strand transfer across the structures. Circled areas represent similarly positioned CTDs. While these emanate from inner IN1 and IN3 monomers in the PFV structure, they originate from flanking MMTV and RSV IN monomers IN6 and IN8. PDB accession codes same as in Figure 3. PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus; CCD: Catalytic core domain; NTD: N-terminal domain; CTD: C-terminal domain.
MMTV AND RSV INTASOME STRUCTURES
MMTV and RSV intasome structures were recently solved using single-particle cryo-EM[26] and X-ray crystallography[61], respectively. The MMTV intasome was assembled using pre-processed 22 bp vDNA, and thus represents the CDC[26]. The RSV intasome structure by contrast is the STC, which was assembled using a so-called X-mer disintegration substrate[132] where three oligonucleotide strands were annealed together to yield a synapsed complex composed of two 22 bp vDNA branches covalently linked through a central 6 bp stagger to a 38 bp palindromic tDNA[61]. The crystal structure of the PFV STC assembled with its analogous X-mer DNA substrate[63] was virtually identical to the structure that was solved when the CDC integrated into tDNA during crystallogenesis[62], validating the X-mer substrate design approach for RSV STC crystallography.Although the tetrameric IN4-to-vDNA2 stoichiometry represented by the PFV intasome was generally thought to be evolutionarily conserved across the retroviruses[54,55,57,58], the intasome structures of MMTV and RSV strikingly revealed octameric IN assemblies[26,61] (Figures 3 and 4). The MMTV and RSV intasomes comprise a core density region consisting of IN dimers A and B, as well as flanking density regions that consist of IN dimers C and D (Figure 3). Analogous to the PFV structure, core inner IN monomers IN1 and IN3 intimately contact the vDNA ends and are catalytically active, with their NTDs reaching out and contacting the CCDs of the opposing inner monomer (Figure 4). The core structure moreover is primarily supported through CCD dimerization interfaces with outer IN monomers IN2 and IN4. In contrast to PFV, flanking structures in the MMTV and RSV intasomes constitute additional IN dimers, which on their own multimerize primarily through the familiar CCD dimer interface[26,61]. The NTD-CCD linker that is extended in IN1 and IN3 to contact the opposing CCD in trans is contracted in the other IN monomers[26]. This observation highlights the necessity for NTD-CCD linker flexibility: Though principally contracted, it must also possess the ability to extend when situated at the IN1 and IN3 positions to support IN catalytic function. In hindsight, it is not surprising that the linker regions in the original HIV-1 IN NTD-CCD structure, which lacked LEDGF/p75 or DNA binding partners, were untraceable[103]. Of note, whereas the crystallographic PFV intasome structure is rather rigid, the flanking dimer regions in the MMTV and RSV intasome structures reveal significant flexibility (Figure 3B). As small angle X-ray scattering analysis of the PFV intasome revealed significant conformational flexibility for the outer subunit NEDs, NTDs and CTDs of the IN tetramer[79], it is tempting to describe retroviral intasomes as common rigid core structures surrounded by extraneous elements that, although likely to play physiologically relevant roles during virus infection, display marked movement as purified biochemical entities.The most striking difference between the tetrameric and octameric IN assemblies is the unique function attributed to the CTDs of the flanking dimers in the MMTV and RSV intasome structures. While contacts with vDNA in the PFV structure are restricted to the inner IN1 and IN3 subunits, MMTV and RSV INs donate their CTDs in trans to the core region of the intasome[26,61]. The locations of six CTDs, including those of the flanking IN dimers, are conserved in the MMTV and RSV intasomes (Figure 4). The exclusive conformation of the CTDs allows them to tightly associate near the vDNA and assume positions resembling those of the inner PFVCTDs (Figure 4). Biochemical complementation assays revealed that the flanking MMTV IN dimers are crucial for IN catalytic function[26]. Intriguingly, the length of the CCD-CTD linker is quite variable among retroviral INs, amounting to about 50 residues for PFV, but only to 8 residues for MMTV and RSV IN[26] (Figure 2). The extended conformation of the PFV IN 49-mer CTD-CCD linker affords the positioning of IN1 and IN3 CTDs to enable critical contacts with vDNA and tDNA during integration[40,62]. The analogous HIV-1 IN linker, composed of 20 residues (Figure 2), could be stretched to similarly position inner IN monomer CTDs in a molecular model of the HIV-1 intasome based on the PFV structure[133]. However, it is physically impossible for an 8 amino acid region to span the required distance. MMTV and RSV accordingly solve this conundrum by employing additional IN molecules to donate their CTDs to the required positions in the intasome core structure (Figure 4). Hence, retroviral IN CTD-CCD linker length is suggested to be a key determinant for the higher-order architecture of the respective intasome structures[26,61].
TARGET DNA BINDING AND STRAND TRANSFER
The integration of vDNA into tDNA does not occur randomly in the host genome, with integration site selection preferences varying among retroviruses (recently reviewed in[134]). Whereas HIV-1 and other lentiviruses favor integration into highly expressed genes that are rich in introns[135-139], the spumavirus PFV avoids active gene regions[130,140,141]. Interestingly, α- and β-retroviruses such as RSV and MMTV are the least selective in integration site selection, displaying patterns that much more closely approach random[138,142,143].The co-crystallization of the PFV intasome with an oligonucleotide tDNA[62] derived from the PFV consensus integration sequence[121,140] elucidated the mechanism of strand transfer. Whereas crystallization in the absence of Mg2+ or the presence of a dideoxy viral 3’ end led to high-resolution TCC structures, the addition of Mg2+ with the normal vDNA end afforded integration during crystallogenesis, yielding the first high-resolution structure of a retroviral or bacterial transposon STC[62] (Figure 5). The tDNA adopted a highly bent conformation at the PFV DNA insertion sites, with the major groove widened to 26.3 Å and the minor groove compressed to 9.6 Å[62]. This conformation enables the accommodation of the inner IN1 and IN3 D,DX35E catalytic triads to the scissile phosphodiester bonds of tDNA, thus promoting integration[62]. As SN2 transesterification reactions are iso-energetic, they have the potential to reverse direction if chemical leaving groups remain associated with the catalytic active site. Following strand transfer, the newly formed phosphodiester bonds are displaced by 2.3 Å from the IN active sites, effectively suppressing the probability for strand transfer reversal[62]. A similar displacement is described for the tDNA in the RSV intasome[61].
Figure 5
Superposition of prototype foamy virus strand transfer complex, Rous sarcoma virus strand transfer complex, and mouse mammary tumor virus cleaved donor complex structures with respect to vDNA and tDNA. For simplicity, integrase content is either partially transparent or omitted. Color-coding is as following: PFV DNA: Black; RSV DNA: Grey; MMTV DNA: Purple. 90° rotations show different angles of the intasomes. PDB accession codes: PFV STC: 3OS0, RSV STC: 5EJK, MMTV CDC: 3JCA. PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus; STC: Strand transfer complex; CDC: Cleaved donor complex.
Superposition of prototype foamy virus strand transfer complex, Rous sarcoma virus strand transfer complex, and mouse mammary tumor virus cleaved donor complex structures with respect to vDNA and tDNA. For simplicity, integrase content is either partially transparent or omitted. Color-coding is as following: PFV DNA: Black; RSV DNA: Grey; MMTV DNA: Purple. 90° rotations show different angles of the intasomes. PDB accession codes: PFV STC: 3OS0, RSV STC: 5EJK, MMTV CDC: 3JCA. PFV: Prototype foamy virus; RSV: Rous sarcoma virus; MMTV: Mouse mammary tumor virus; STC: Strand transfer complex; CDC: Cleaved donor complex.Early studies revealed that retroviral INs prefer chromatinized tDNA templates over naked DNA for integration in vitro[144-147], and subsequent work revealed the propensity to similarly target nucleosomes during virus infection[135,148]. PFV IN prefers relatively flexible tDNA sequences for integration[62], and a cryo-EM structure of an intasome-nucleosome complex revealed the same degree of local tDNA distortion can occur on the nucleosome surface during PFV integration[130]. Since spumaviral INs cleave tDNA with a 4 bp stagger, it has been suggested that the degree of tDNA kink has to be greater for spumaviral integration than for viruses that cleave target DNA with a 6 bp stagger, such as the α- and β-retroviruses[149]. Although the IN1 and IN3 catalytic triads of the PFV and MMTV intasomes are superimposable, modeling of tDNA into the MMTV CDC revealed a relatively unbent conformation to accommodate a 6 bp staggered cut as compared to the highly bent tDNA conformation for the 4 bp stagger in the PFV TCC[26] (Figure 5).The RSV STC harbors a highly bent tDNA conformation despite the fact that RSV IN cleaves the DNA with a 6 bp stagger (Figure 5). Kinks located at the vDNA/tDNA junctions of the RSV intasome provoke a 20 Å shift in the helical axis, leading to an overall tDNA twist[61]. Hence, the tDNA conformation in the RSV STC differs significantly from the tDNA conformation in the PFV structure (Figure 5) and also from the relatively unbent tDNA confirmation in the MMTV TCC model[26]. These observations suggest potentially different modes of integration into nucleosomal DNA among the studied viruses. Whereas PFV and RSV are predicted to target chromatinized DNA during virus infection, MMTV was unique among a study of 10 exogenous retroviruses for its apparent avoidance of nucleosomal DNA in vivo[149]. Elucidation of high-resolution MMTV STC/TCC structures will help to illuminate the degree of tDNA bending that occurs during MMTV integration.Based on the variety of tDNA structural properties that influence target site selectivity, including bendability[62,149-152], major groove widening, and nucleosomal packaging[144-147,153,154], retroviral INs can be classified as shape-readout DNA binding proteins[155]. DNA minicircles, which mimic nucleosome-induced tDNA circularization in the absence of histones, represent a relatively new tool to tease out physiologically-relevant influences of tDNA structure on integration site selectivity in vitro, and the roles of IN-binding cofactors such as LEDGF/p75[156].
CONCLUSION
The study of retroviral integration has come a long way since its beginnings in the late 1970s. The relatively large repertoire of individual and two-domain retroviral IN structures that were solved initially has since expanded to a set of high-resolution intasome structures, including those from the spumavirus PFV, β-retrovirus MMTV, and α-retrovirus RSV. To date, the plethora of PFV intasome structures represents a remarkable advance for the field of retroviral integration. Not only have they elucidated the mechanism of INSTI action, they provide high-resolution structures of the entire set of complexes (SSC, CDC, TCC and STC) that mediate retroviral DNA integration[40,43,62,63].The recently emerged octameric intasome structures of MMTV[26] and RSV[61] reveal an unexpected evolutionary diversity among retroviruses. As the intasomal core is conserved among the three studied retroviruses, the utilization of flanking dimers to complete the functional MMTV and RSV intasome structures represents a remarkable example of convergent evolution of the DNA integration apparatus[26,61]. Considering CCD-CTD linker length as a predictor of the state of IN multimerization within functional intasomes[26], it remains to be investigated whether retroviral INs with intermediary linker lengths, including those of HIV-1 and the δ-retrovirus human T-cell lymphotropic virus, which harbor 20 and 19 residues, respectively[26], will reveal tetrameric, octameric, or perhaps even higher-order IN assemblies.Motivated by the recent advances in the intasome field, new IN-DNA complexes are currently being investigated in various laboratories, including those derived from lentiviruses. The emergence of new three-dimensional intasome structures will help to model novel interactions between HIV-1 IN and DNA, and thus should reveal new insights into the mechanisms of emergence of drug resistance to clinical INSTIs.
Authors: Daniel P Maskell; Ludovic Renault; Erik Serrao; Paul Lesbats; Rishi Matadeen; Stephen Hare; Dirk Lindemann; Alan N Engelman; Alessandro Costa; Peter Cherepanov Journal: Nature Date: 2015-06-10 Impact factor: 49.962
Authors: Xue Zhi Zhao; Steven J Smith; Daniel P Maskell; Mathieu Metifiot; Valerie E Pye; Katherine Fesen; Christophe Marchand; Yves Pommier; Peter Cherepanov; Stephen H Hughes; Terrence R Burke Journal: ACS Chem Biol Date: 2016-02-05 Impact factor: 5.100
Authors: Parmit Kumar Singh; Matthew R Plumb; Andrea L Ferris; James R Iben; Xiaolin Wu; Hind J Fadel; Brian T Luke; Caroline Esnault; Eric M Poeschla; Stephen H Hughes; Mamuka Kvaratskhelia; Henry L Levin Journal: Genes Dev Date: 2015-11-01 Impact factor: 11.361
Authors: Ashley C Hoyte; Augusta V Jamin; Pratibha C Koneru; Matthew J Kobe; Ross C Larue; James R Fuchs; Alan N Engelman; Mamuka Kvaratskhelia Journal: J Biol Chem Date: 2017-09-28 Impact factor: 5.157
Authors: Tiina S Rasila; Elsi Pulkkinen; Saija Kiljunen; Saija Haapa-Paananen; Maria I Pajunen; Anu Salminen; Lars Paulin; Mauno Vihinen; Phoebe A Rice; Harri Savilahti Journal: Nucleic Acids Res Date: 2018-05-18 Impact factor: 16.971
Authors: Kevin M Rose; Irani Alves Ferreira-Bravo; Min Li; Robert Craigie; Mark A Ditzler; Philipp Holliger; Jeffrey J DeStefano Journal: ACS Chem Biol Date: 2019-10-07 Impact factor: 5.100