Paul Lesbats1, Alan N Engelman2, Peter Cherepanov1,3. 1. Clare Hall Laboratories, The Francis Crick Institute , Blanche Lane, South Mimms, EN6 3LD, U.K. 2. Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute and Department of Medicine, Harvard Medical School , 450 Brookline Avenue, Boston, Massachusetts 02215 United States. 3. Imperial College London , St-Mary's Campus, Norfolk Place, London, W2 1PG, U.K.
Abstract
The integration of a DNA copy of the viral RNA genome into host chromatin is the defining step of retroviral replication. This enzymatic process is catalyzed by the virus-encoded integrase protein, which is conserved among retroviruses and LTR-retrotransposons. Retroviral integration proceeds via two integrase activities: 3'-processing of the viral DNA ends, followed by the strand transfer of the processed ends into host cell chromosomal DNA. Herein we review the molecular mechanism of retroviral DNA integration, with an emphasis on reaction chemistries and architectures of the nucleoprotein complexes involved. We additionally discuss the latest advances on anti-integrase drug development for the treatment of AIDS and the utility of integrating retroviral vectors in gene therapy applications.
The integration of a DNA copy of the viral RNA genome into host chromatin is the defining step of retroviral replication. This enzymatic process is catalyzed by the virus-encoded integrase protein, which is conserved among retroviruses and LTR-retrotransposons. Retroviral integration proceeds via two integrase activities: 3'-processing of the viral DNA ends, followed by the strand transfer of the processed ends into host cell chromosomal DNA. Herein we review the molecular mechanism of retroviral DNA integration, with an emphasis on reaction chemistries and architectures of the nucleoprotein complexes involved. We additionally discuss the latest advances on anti-integrase drug development for the treatment of AIDS and the utility of integrating retroviral vectors in gene therapy applications.
More
than 100 years ago, the Danes V. Ellerman and O. Bang and
the American P. Rous passaged oncogenic variants of what is known
today as the avian sarcoma-leukosis virus (ASLV).[1,2] The
importance of retroviruses in biology and medicine has greatly increased
over the past 50 years with two major milestones: first, in 1981,
the isolation of the human T-lymphotropic virus 1 (HTLV-1)[3] and, soon after, the discovery of human immunodeficiency
virus type 1 (HIV-1),[4,5] which is responsible for one of
the most dramatic pandemics in recent history. The flurry of high-octane
research, initially driven by the suspected role of retroviruses in
human cancer and later by the acquired immunodeficiency syndrome (AIDS)
pandemic, yielded a plethora of discoveries and tools to bolster all
disciplines of biology.[6] It would be hard
to imagine cancer biology without the concept of the oncogene or molecular
biology without reverse transcriptase (RT).Retroviridae is a large viral family comprising
seven genera: α- through ε-retroviruses, lentivirus, and
spumavirus (Table ). HTLV-1 and HIV-1 (along with their respective types) belong to
δ-retrovirus and lentivirus genera, respectively. Several other
retroviral species gained prominence as research models, for historical
reasons or as animal pathogens. These include ASLV (an α-retrovirus),
mouse mammary tumor virus (MMTV, a β-retrovirus), murine leukemia
virus (MLV, a γ-retrovirus), simian immunodeficiency viruses
(SIVs, lentiviruses highly related to HIV-1 and HIV-2), feline immunodeficiency
virus (FIV, a lentivirus), and the prototype foamy virus (PFV, a spumavirus).
Integration, which yields the establishment of the obligatory proviral
state,[7] is the one feature that distinguishes
retroviruses from all other viral families. Herein, we present state-of-the-art
interpretations of the structure of retroviral integrase (IN), the
essential enzyme responsible for this process, as well as the role
of IN in virus replication. Due to the conservation among IN proteins
from different retroviral species, we will refer to them collectively
as retroviral IN, except when discussing aspects that may be relevant
to a particular retroviral genus or species.
Table 1
Classification
of Retroviruses
subfamily
genus
species examples
duplication
size (bp)a
orthoretrovirinae
α-retrovirus
avian sarcoma-leukosis virus (ASLV)
6
β-retrovirus
mouse mammary tumor virus (MMTV)
6
jaagsiekte sheep retrovirus (JSRV)
human endogenous retrovirus K (HERV-K)b
γ-retrovirus
murine leukemia virus (MLV)
4 or 5
feline leukemia virus (FeLV)
reticuloendotheliosis virus strain A (RevA)
human endogenous
retrovirus H (HERV-H)b
δ-retrovirus
human T-lymphotropic viruses 1 and 2 (HTLV-1,2)
6
bovine leukemia virus (BLV)
ε-retrovirus
walleye dermal sarcoma virus (WDSV)
?
lentivirus
human immunodeficiency viruses 1 and 2 (HIV-1,2)
5
maedi-visna virus (MVV)
equine infectious
anemia
virus (EIAV)
feline immunodeficiency
virus (FIV)
spumaretrovirinae
spumavirus
prototype foamy virus (PFV)
4
bovine foamy virus (BFV)
feline foamy virus (FFV)
Size of target DNA duplications
flanking integrated proviruses.
Extinct as exogenous species.
Size of target DNA duplications
flanking integrated proviruses.Extinct as exogenous species.
IN and the Retroviral Life Cycle
Replication via formation
of a stable DNA form makes retroviruses
particularly amenable to reverse genetics. Accordingly, functions
of retroviral gene products have been extensively probed through mutagenesis.
In early studies, IN was identified as the protein product encoded
within the 3′ portion of the retroviral pol gene that was essential for efficient retroviral replication and
integration.[8−11] Reverse transcription of the diploid retroviral RNA genome results
in the formation of a linear double-stranded viral DNA (vDNA) molecule
carrying a copy of the long terminal repeat (LTR) sequence at either
end.[12−15] The vDNA molecule exists in the form of a preintegration complex
(PIC)[16,17] that is rather poorly biophysically characterized
due to the scarce level at which it forms, ca. one copy per cell,
during acute virus infection. Nevertheless, PICs have been reported
to contain a number of cellular and viral proteins, most notably IN.[18−26] Once the PIC gains access to the nuclear compartment, the vDNA ends
are inserted into a cellular chromosome. This step, initiated by the
enzymatic action of IN and completed by the host cell DNA repair machinery,
is a point of no return: the cell becomes a permanent carrier of the
integrated viral genome, which is referred to as the provirus.In addition to this well-established role, IN may play a range
of less characterized functions in retroviral replication, as suggested
by its unusually complex genetics (reviewed in ref (27)). For instance, disruption
of the IN coding portion of the HIV-1 pol gene can
lead to production of viral particles with aberrant morphology and
severe defects in reverse transcription.[28−31] In fact, only a minority of HIV-1
IN mutants display defects solely at the integration step of the viral
life cycle. Such mutants, which include amino acid substitutions within
the IN active site, were collectively categorized as class I mutants.[32] The hallmark of the associated phenotype is
the predictable accumulation of nonintegrated forms of vDNA, including
a circular form that contains two abutted copies of the LTR (2-LTR
circles). Conversely, class II HIV-1 IN mutants disrupt viral replication
at multiple steps while usually retaining at least partial IN enzymatic
activity in vitro.[33−36] The pleiotropic effects observed with class II HIV-1 IN mutants
range from disrupted virion assembly to apparent nuclear import defects.[30,33,34,37−41] Most notably, class II IN mutants typically show reduced levels
of reverse transcription.[27] The abundance
of HIV-1 IN mutations with pleiotropic phenotypes is a strong indication
that the protein may play critical roles in the viral lifecycle outside
of the integration step. Accordingly, HIV-1 IN was shown to interact
with the viral RT and influence its activity in vitro.[42−44] More recent work with allosteric IN inhibitors (described at length
below) has highlighted a direct role for IN in HIV-1 particle maturation.[45−47] Among the esoteric functions of HIV-1 IN, its proposed involvement
in PIC nuclear import has been the subject of considerable and yet
to be resolved debate.[34,40,41,48−54]
Enzymatic Steps in Retroviral DNA Integration
Reactions Catalyzed by IN
Biochemical
studies of retroviral DNA integration began with partial purification
of enzymatically competent PICs from acutely infected cells. Such
preparations can catalyze vDNA integration into exogenous DNA in vitro,
and the reaction products can be detected and quantified by Southern
blotting or PCR.[16,55−58] During infection, a considerable
fraction of vDNA becomes circularized[59−64] and, perhaps owing to the circular nature of the bacteriophage λ
DNA substrate for integrative recombination,[65] initial studies proposed that retroviral integration proceeded through
the 2-LTR circular DNA form.[66] However,
subsequent experiments using native MLV PICs demonstrated that it
is the linear vDNA that serves as the immediate precursor for integration.[67,68] These landmark studies also established two activities associated
with the PICs—3′-processing and strand transfer[67−70]—and retroviral IN was shown to be sufficient to catalyze
each of these reactions in vitro.[71−77]Retroviral IN bears no similarities to its namesake from λ
phage and is instead closely related to the DD(E/D) family of DNA
transposases.[78] Crucially, the DNA cutting
and strand transfer reactions catalyzed by IN and transposases proceed
through phosphodiester transesterification, without formation of covalent
protein–DNA intermediates.[79−81] However, unlike transposases,
retroviral IN requires a prelinearized DNA molecule—the product
of reverse transcription—and cannot act on an integrated molecule
that is flanked by continuous host DNA sequences. Therefore, while
the active site of a prokaryotic cut-and-paste transposase must carry
out four consecutive reactions,[82,83] IN needs to accomplish
only two, 3′-processing and strand transfer, equivalent to
the first and the final steps in DNA transposition, respectively.
These reactions are carried by a multimer of IN assembled on vDNA
ends, referred to as the intasome (also known as the stable synaptic
complex), at the business end of the PIC (Figure a).[17,84,85] During 3′-processing IN hydrolyses a phosphodiester bond
at either vDNA end, removing a di- or trinucleotide, liberating 3′-hydroxyl
groups attached to invariant 5′-CA-3′ dinucleotides.
The intasome can then bind host chromosomal DNA, forming the target
capture complex (TCC). Within the TCC, the enzyme utilizes vDNA 3′-hydroxyls
as nucleophiles to cut host DNA in a staggered fashion, simultaneously
joining
both 3′ vDNA ends to apposing strands of host DNA. The postcatalytic
complex, referred to as the strand transfer complex (STC), is subject
to disassembly, which is likely accomplished by host cell machinery.
Figure 1
Retroviral
DNA integration pathway: reactions catalyzed by IN (a,
blue arrows) and host cell enzymes (b, red arrows). The intasome contains
a multimer of IN (gray oval) assembled on vDNA ends. Following 3′-processing
and nuclear entry, the cleaved intasome complex engages cellular chromosomal
DNA, forming the TCC. Insertion of the 3′ vDNA ends into host
DNA results in formation of the STC with hemi-integrated vDNA. Formation
of the stable provirus further requires disassembly of the STC and
repair of the strand discontinuities flanking vDNA by the sequential
actions of a DNA polymerase, 5′-flap endonuclease, and DNA
ligase. Red dots represent magnesium ions in the intasome active sites.
Retroviral
DNA integration pathway: reactions catalyzed by IN (a,
blue arrows) and host cell enzymes (b, red arrows). The intasome contains
a multimer of IN (gray oval) assembled on vDNA ends. Following 3′-processing
and nuclear entry, the cleaved intasome complex engages cellular chromosomal
DNA, forming the TCC. Insertion of the 3′ vDNA ends into host
DNA results in formation of the STC with hemi-integrated vDNA. Formation
of the stable provirus further requires disassembly of the STC and
repair of the strand discontinuities flanking vDNA by the sequential
actions of a DNA polymerase, 5′-flap endonuclease, and DNA
ligase. Red dots represent magnesium ions in the intasome active sites.The fact that 3′-processing
of the vDNA can occur in the
cell cytoplasm[21] presents a potential problem
for retroviruses. The high local concentration of vDNA in the confines
of the PIC makes it a potential target for strand transfer, and products
of viral self-integration—called autointegration—are
readily detectable in infected cells.[86−88] It is obviously in the
best interest of retroviruses to avoid autointegration, and various
mechanisms have been identified. The barrier-to-autointegration factor
(BAF), a small DNA-binding protein that possesses the unusual property
to bridge and condense separate DNA molecules together,[89,90] was identified via its ability to suppress the autointegration activity
of MLV PICs in vitro.[91] However, it has
yet to be determined whether BAF provides this function during MLV
infection, and the lentiviruses seem to utilize other protective measures.
The SET complex, which harbors a variety of DNA metabolizing enzymes,
can suppress autointegration during HIV-1 infection,[92] while the viral capsid (CA) protein has been shown to play
a role in regulating SIV autointegration.[88] Additional work should help to clarify whether there might be a
universal mechanism or if, indeed, different viral species have evolved
unique ways to protect themselves from suicidal integration as they
move through the cell toward their preferred chromosomal DNA targets.Recombinant IN proteins are generally proficient in supporting
robust 3′-processing and strand transfer activities on short
double-stranded oligonucleotide substrates that represent the U5 or
U3 vDNA ends at the tips of the upstream and downstream LTRs, respectively.[71−74,76,77,93] However, depending on the viral source and
the method of enzyme preparation, unpaired insertions of vDNA ends
into target DNA often account for the bulk of the strand transfer
products formed. HIV-1 IN is particularly prone to such aberrant strand
transfer events (referred to as half-site integration). Carefully
optimized HIV-1 IN in vitro assays developed over the past 15 years
greatly improved the efficiency of paired integration of vDNA ends
(i.e., full-site or concerted integration).[94−101] Conversely, some recombinant IN proteins, in particular PFV IN,
are far more proficient at full-site strand transfer.[102−104] The underlying reasons for these differences between divergent IN
proteins are not fully understood. The propensity of HIV-1 IN to self-associate
into higher-order multimers in solution[105] could be a factor limiting its concerted integration activity in
vitro. A third type of reaction that IN can catalyze in vitro is disintegration,
representing a reversal of strand transfer.[106] Although this reaction is unlikely to be relevant in vivo, disintegration
assays were instrumental to help map the different IN protein domains
and the enzyme active site.[107−115] Having evolved to catalyze a single reaction cycle, retroviral IN
and transposases do not effectively dissociate from their reaction
products, requiring active assistance from their host cells to disassemble
the STC.[116,117] Because they do not turn over,
these enzymes are used in stoichiometric and often superstoichiometric
ratios to their DNA substrates.
Postintegration
DNA Repair
Having
executed concerted strand transfer to join both 3′ vDNA ends
to host DNA, IN leaves a pair of single-stranded gaps and short 5′
overhangs flanking the vDNA, which are repaired by cellular enzymes
(Figure b; reviewed
in ref (118)). Notwithstanding
their importance, the final steps of integration are among the least
studied aspects of retroviral replication. Because disorderly repair
of these discontinuities may lead to disruption of proviral and chromosomal
DNA integrity, the handover of the retroviral hemi-integrant to the
host DNA repair machinery is likely an exquisitely choreographed process.
Disassembly of the thermodynamically stable STC complex must be the
first step toward repair of the vDNA–chromosome junctions.
Active degradation of IN/transposase, which is well-established in
the bacteriophage Mu transposition system,[116,117] may allow host cellular proteins to gain access to the sites of
repair. Consistent with this model, HIV-1 IN is subject to ubiquitination
and proteasome-dependent degradation when it is ectopically expressed
in human cells.[119−121] RAD18, an E3 ubiquitin ligase involved in
DNA gap repair, was shown to interact with HIV-1 IN.[122] However, a follow up study observed that knockdown of RAD18
in target cells did not reduce HIV-1 infectivity.[123] Subsequently, von Hippel–Lindau binding protein
1, a cellular subunit of the prefoldin chaperone, was implicated in
proteasome-mediated HIV-1 IN degradation and, moreover, was shown
to be required for efficient viral replication.[124]Following STC disassembly, three separate DNA repair
enzymatic activities are required to complete the integration process
by joining 5′ vDNA ends to chromosomal DNA: a DNA polymerase,
a 5′ flap endonuclease, and a ligase (Figure b). Early studies used biochemical approaches
to test candidate DNA cellular enzymes in repair of retroviral integration
products in vitro.[125−128] Synthetic model DNA containing a gap and 5′ overhang served
as a template for testing candidate proteins. In these experiments
a cocktail of host DNA repair proteins was challenged to polymerize
across the gap, to cleave the two-nucleotide flap, and to ligate the
resulting product. Base excision repair (BER) pathway enzymes DNA
polymerase β, flap endonuclease 1 (FEN1), and ligase I were
shown sufficient to repair the substrate.[125] DNA polymerase δ also supported repair in the presence of
FEN1 and ligase I and was stimulated by its cofactor PCNA. Alternatively,
viral RT and IN have been proposed to mediate postintegration repair.[106] In this model, RT polymerizes across the gap
sequence, while 5′ flap removal and ligation of both strands
are catalyzed by IN disintegration activity. However, RT- and IN-dependent
gap repair would require a complicated juggling of the vDNA ends between
the two viral enzymes and, more problematically, would involve assembly
of an unusual complex with 5′ vDNA and freshly extended 3′
chromosomal DNA ends in the IN active site. Even though RT is capable
of polymerizing through the gap in vitro, the reaction displayed poor
fidelity.[125] Moreover, addition of IN to
the reaction did not afford completion of the repair process.[125,127] Although HIV-1 IN interacts with FEN-1[129] and can stimulate its activity in vitro,[126] convincing evidence for a role of IN in repair of the hemi-integrant
during retroviral infection is lacking. Cell-based studies have by
contrast highlighted a role for the BER pathway of oxidative DNA damage
in lentiviral DNA integration.[130,131] Other studies have
described that the alteration of the nonhomologous end-joining (NHEJ)
pathway affects retroviral infection and survival of the infected
cells.[22,132−134] While some have shown
a role for active IN and, by extension, the vDNA hemi-integrant, in
NHEJ activation,[132,134] a separate study concluded that
the linear vDNA substrate was cytotoxic to cells.[22]As a result of integration across the major groove
in target DNA
and the subsequent gap repair, the proviral DNA is flanked by a short
duplication of the target DNA sequence. The duplication size is virus-specific
and ranges from 4 bp for MLV and PFV to 6 bp for HTLV-1, ASLV, and
MMTV, while HIV-1 and other characterized lentiviruses generate 5-bp
duplications (Table ).[63,135−146]
Structure of Retroviral IN
Domain
Organization
IN proteins contain
three conserved structural domains common to all retroviral genera:
the N-terminal domain (NTD), the catalytic core domain (CCD), and
the C-terminal domain (CTD) (Figure a). Initially identified by limited proteolysis of
HIV-1 IN,[108] the domains were characterized
in isolation using X-ray crystallography and/or nuclear magnetic resonance
(NMR) spectroscopy (Figure a).[147−153] The NTD folds into a compact three-helical bundle stabilized by
coordination of a Zn2+ ion by side chains of His and Cys
residues comprising the invariant HHCC motif.[147,150] Often misidentified as a “zinc finger domain”, the
NTD is most closely related to helix–turn–helix DNA
binding domains.[154] In spumaviral and ε-
and γ-retroviral INs, the NTD is expanded by a ∼40 amino
acid residue NTD extension domain (NED).[85,155] The CTD is the least conserved of the three canonical IN domains
and features a Src homology 3 (SH3)-like β-barrel fold.[149,151] Although most structurally related to the Tudor family of chromatin
binding domains, the IN CTD lacks the conserved hydrophobic cage used
by Tudor domains to bind methylated Lys and Arg residues. During integration,
the NTD and the CTD make important interactions with DNA substrates
and play critical structural roles within the intasome assemblies.[85,156,157] The flexible linkers connecting
these domains to the CCD lack sequence conservation and vary in length
between retroviral genera.[155,158]
Figure 2
Domain organization of
HIV-1 IN. (a) Schematic of the protein sequence
with the NTD, CCD, and CTD shown as boxes (top). Structures of the
individual domains determined in isolation (from left to right PDB
IDs 1WJC, 1ITG, and 1IHV). Protein chains
are shown as cartoons, with zinc-coordinating residues His12, His16,
Cys40, and Cys43 and active site residues Asp64 and Asp116 shown as
sticks with carbon atoms in red. (b) Structure of a two-domain HIV-1
IN construct (PDB ID 1K6Y). Details of the key NTD–CCD interface are shown as a blown
up image to the right. Selected amino acid residues are indicated.
Dashes represent hydrogen bonds; gray spheres are Zn2+ ions.
All structural images in this review were prepared using PyMOL software
(http://www.pymol.org).
Domain organization of
HIV-1 IN. (a) Schematic of the protein sequence
with the NTD, CCD, and CTD shown as boxes (top). Structures of the
individual domains determined in isolation (from left to right PDB
IDs 1WJC, 1ITG, and 1IHV). Protein chains
are shown as cartoons, with zinc-coordinating residues His12, His16,
Cys40, and Cys43 and active site residues Asp64 and Asp116 shown as
sticks with carbon atoms in red. (b) Structure of a two-domain HIV-1
IN construct (PDB ID 1K6Y). Details of the key NTD–CCD interface are shown as a blown
up image to the right. Selected amino acid residues are indicated.
Dashes represent hydrogen bonds; gray spheres are Zn2+ ions.
All structural images in this review were prepared using PyMOL software
(http://www.pymol.org).The CCD harbors the active site of the enzyme composed of
three
invariant carboxylates comprising the signature D,D-35-E motif.[159] In isolation, the CCD can catalyze disintegration,
but neither of the biologically relevant activities of full-length
IN.[112−114] Determined over 2 decades ago, the crystal
structure of the HIV-1 IN CCD revealed a protein fold shared by a
superfamily of nucleotidyltransferases, which notably includes bacterial
and viral RNase H enzymes, Holliday junction resolvase RuvC, prokaryotic
and eukaryotic DD(E/D) transposases, and RAG1 recombinase.[148,160] This initial study and the majority of subsequent IN crystal structures
captured the CCD in a recurring dimeric form (Figure a). Given the wide separation of IN active
sites in the CCD dimer, it became immediately clear that a higher-order
IN multimer must be responsible for concerted integration of vDNA
ends. Similarly, four copies of the mechanistically related Mu phage
transposase assemble into its core synaptic complex, with only two
of the four protomers contributing active sites for the catalysis
of DNA cleavage and strand transfer.[161−163] Of note, the HIV-1
virion packages as many as ∼250 copies of IN,[164,165] a large excess over the two active sites that are required to catalyze
integration.Over the last 20 years, considerable effort has
been expended to
establish the functional multimeric state of retroviral IN. The bulk
of these studies has focused on HIV-1 IN, which was shown to form
a variety of multimeric species.[166−172] Unfortunately, it is not obvious how the tendency of IN to aggregate
in solution relates to its active forms in complex with vDNA. Indeed,
well-studied transposases tend to form functional multimers only when
bound to their cognate DNA substrates,[173−175] which is explained
by the involvement of DNA in the synaptic interfaces.[175−177] Due to perplexing difficulties in obtaining soluble intasome preparations,
the functional state of HIV-1 IN has been particularly hard to address.
The tetrameric form of HIV-1 IN has so far received the strongest
experimental support,[169,170] although evidence for octameric
states was also reported.[178,179] The breakthrough came
from recent studies of nonlentiviral INs, which fortuitously allowed
assembly of monodisperse preparations of the intasome complexes in
vitro. Surprisingly, structural characterization of three such intasomes
established that the functional multimeric states of IN proteins are
not conserved among retroviral genera. Thus, whereas PFV IN is monomeric
in solution and forms a tetramer within the intasome,[85,103,180] MMTV and ASLV INs are dimeric
proteins that assemble into octamers on vDNA ends.[155,181]Initial hints for the structural organization of the different
IN protomers within functional intasomes came from results of biochemical
complementation studies.[168,182−184] Mutant IN proteins that by themselves were defective for catalysis
in vitro could recover significant levels of activity when studied
as a mixture. In this way, IN NTD and CCD functions were shown to
originate from different protomers within the multimeric complex.
The first glimpse into the structural basis for IN multimerization
beyond the canonical CCD dimerization was provided by the crystal
structure of a two-domain construct spanning the NTD and the CCD of
HIV-1 IN.[185] The structure revealed how
pairs of IN dimers can interlock via formation of cross-dimer NTD–CCD
interfaces (Figure b). Subsequently validated by site directed mutagenesis, the critical
NTD–CCD interface, predicted from prior biochemical studies,
became the second recurrent feature in structures of diverse INs and
later intasomes.[85,155,157,181,186] Further details on the partial structures of the preintasome era
in retroviral IN structural biology can be found in recent reviews.[158,187,188]
Architecture
of the PFV Intasome
The first functional retroviral IN–DNA
complex found to be
amenable to structural characterization was the intasome from the
spumavirus PFV.[85] Identified though comparative
analysis of diverse orthologs, PFV IN was shown to be highly soluble
and exceptionally active on short mimics of vDNA ends.[103,189] The relatively simple architecture of the PFV intasome will provide
a basis to describe the structures of the more complex α- and
β-retroviral intasomes, which were characterized very recently.[155,181] Although the U5 and U3 ends of retroviral LTRs are not identical,
functional intasomes can generally be assembled with pairs of U5 sequences.
Snapshots of such symmetrized PFV intasomes, visualized in the states
prior to and after 3′-processing, as well as in complex with
target host DNA prior to and after strand transfer, are now available.[156,157,190]The PFV intasome (Figure a) contains a tetramer
of IN with a dimer-of-dimers architecture, composed of two structurally
and functionally distinct IN subunits. The inner subunits (colored
green and cyan in the figures) of each IN dimer are responsible for
all interactions with vDNA, including provision of the active sites
for catalysis. The outer IN subunits (shown in yellow) attach to the
inner chains via the canonical CCD dimerization interface. The inner
IN chains interact via forming intersubunit NTD–CCD contacts,
and the two halves of the intasome are held together by the insertion
of a pair of CTDs, which act as solid spacers between the CCD dimers
(Figure a,b). The
extended NTD–CCD and CCD–CTD linkers run parallel to
each other, trussing the nucleoprotein assembly. The PFV intasome
structures reported to date lack outer IN chain NEDs, NTDs, and CTDs,
which were disordered in crystals and under cryo-electron microscopy
(cryo-EM) conditions.[85,191] Some clues about the average
positions of the outer chain NTD and CTD could be gleaned from solution-state
X-ray and neutron scattering.[180] However,
these domains are dispensable for PFV intasome assembly and strand
transfer activity in vitro, and their functions await clarification.[192]
Figure 3
Structure of the PFV intasome. (a) Cleaved intasome complex
(PDB
ID 3L2Q) shown
in two orthogonal views. Inner IN chains are colored green and cyan,
and outer chains are yellow. Individual IN domains are indicated;
invariant IN active site carboxylates are shown as red sticks; vDNA
is shown as cartoons with transferred (i.e., the strand that IN acts
upon) and nontransferred strands at each end shown in magenta and
beige, respectively. (b) Schematic of the intasomal tetramer with
IN domains shown as ovals and colored by chain; curved lines represent
NTD–CCD (cyan and green) and CCD–CTD (black) linkers.
Red circles represent inner IN chain active sites. (c) View on the
IN active site prior to 3′-processing. Active site carboxylates
and catalytic metal ions are respectively shown as sticks and gray
spheres. The invariant vDNA CA nucleotides and their complements are
letter-coded; the scissile 3′ dinucleotide is indicated.
Structure of the PFV intasome. (a) Cleaved intasome complex
(PDB
ID 3L2Q) shown
in two orthogonal views. Inner IN chains are colored green and cyan,
and outer chains are yellow. Individual IN domains are indicated;
invariant IN active site carboxylates are shown as red sticks; vDNA
is shown as cartoons with transferred (i.e., the strand that IN acts
upon) and nontransferred strands at each end shown in magenta and
beige, respectively. (b) Schematic of the intasomal tetramer with
IN domains shown as ovals and colored by chain; curved lines represent
NTD–CCD (cyan and green) and CCD–CTD (black) linkers.
Red circles represent inner IN chain active sites. (c) View on the
IN active site prior to 3′-processing. Active site carboxylates
and catalytic metal ions are respectively shown as sticks and gray
spheres. The invariant vDNA CA nucleotides and their complements are
letter-coded; the scissile 3′ dinucleotide is indicated.The vDNA ends enter the IN dimer–dimer
interface, providing
both 3′ termini to the active sites of the inner IN subunits.
The CCD, CTD, and NED of both inner IN chains make intimate interactions
with the vDNA backbone and bases, consistent with vDNA sequence specificity
of retroviral IN.[85] Three bases of each
5′ vDNA end are unpaired, tunneling through the CCD–CTD
interface to protrude from the sides of the intasome structure (Figure c). Conceivably,
such a protein–DNA complex can only form when the recognition
sites are available on free vDNA ends, which makes the integration
process fundamentally irreversible once postintegration repair is
completed.
Engagement of Target DNA
by the PFV Intasome
The PFV intasome assembled with bunt-ended
vDNA ends readily undergoes
3′-processing in the presence of Mg2+ or Mn2+ ions.[156] Following dissociation
of the cleaved 3′-dinucleotide, the saddle-shaped groove between
the two halves of the intasome becomes available for host cell (target)
DNA binding (Figure ). The interaction primarily involves the target DNA backbone, which
explains only weak preferences of IN for integration target site sequence.[156,157,193] Target DNA binds to the PFV
intasome in a sharply bent conformation, such that the base pair step
at the center of the integration site unstacks with a 60° roll
(Figure ).[157] The associated expansion of the major groove
to over 26 Å allows the widely spaced intasome active sites to
align with a pair of phosphodiesters separated by 4 bp on apposing
strands in target DNA. The ability of DNA to form a sharp kink contributes
to integration site selection at the level of nucleotide sequence,
leading to a bias against rigid purine–pyrimidine base pair
steps in the central positions of PFV integration sites.[157,194] The energy of DNA deformation is offset by the interactions with
the synaptic CTDs and the inner IN CCDs of the PFV intasome. Predictably,
mutations affecting these contacts can lead to increased bias for
a bendable target DNA sequence.[157] Conversely,
prebent and distorted DNA performs better as a target for retroviral
integration.[195,196]
Figure 4
Target DNA capture by the PFV intasome.
(a) Crystal structure of
the PFV TCC (PDB ID 3OS1). (b) Conformation of vDNA and target DNA in the TCC (left) and
STC (right, PDB ID 3OS0), with protein chains hidden for clarity.
Target DNA capture by the PFV intasome.
(a) Crystal structure of
the PFV TCC (PDB ID 3OS1). (b) Conformation of vDNA and target DNA in the TCC (left) and
STC (right, PDB ID 3OS0), with protein chains hidden for clarity.The PFV intasome displays robust preference for chromatinized
targets
compared to naked DNA in vitro.[191,197] These observations
agree with early studies that, albeit using poorly defined IN–vDNA
complexes, reported preferences of HIV-1 and MLV for nucleosomes.[195,196,198−200] Moreover, nucleotide sequence periodicities observed in the immediate
vicinity of retroviral integration sites are consistent with nucleosomes
serving as targets for retroviral integration in vivo.[194,201,202] Given the nonuniform availability
of the major groove and the structure of the underlying histone octamer,[203] it is not surprising that integration sites
cluster in sharply defined hotspots along nucleosomal DNA. Recently,
the PFV TCC containing the intasome and a nucleosome core particle
was characterized by cryo-EM at 8 Å resolution (Figure a).[191] The structure revealed that nucleosomal DNA engaged in the target
DNA binding groove of the intasome is lifted from the surface of the
histone octamer to assume the distorted conformation compatible with
strand transfer. Outside of the target DNA binding groove, the intasome
makes supporting contacts with one H2A–H2B heterodimer and
the second gyre of the nucleosomal DNA (Figure b). These interactions presumably compensate
for the energy of stretching and deforming histone-wrapped DNA and
explain the strong preference of PFV to target superhelix ±3.5
locations on nucleosomal DNA.
Figure 5
PFV intasome engaged with a mononucleosome.
(a) Pseudoatomic model
assembled by rigid body docking of the PFV intasome (PDB ID 3L2Q) and nucleosome
(PDB ID 1KX5) structures into the cryo-EM map of the complex (EMDB ID 2992) and
shown in two orientations. Individual histones are color coded as
indicated. (b) Blow up of the regions boxed in the left side of panel
a showing IN contacts with H2A and H2B histones (left) and with the
second gyre of the nucleosomal DNA (right).
PFV intasome engaged with a mononucleosome.
(a) Pseudoatomic model
assembled by rigid body docking of the PFV intasome (PDB ID 3L2Q) and nucleosome
(PDB ID 1KX5) structures into the cryo-EM map of the complex (EMDB ID 2992) and
shown in two orientations. Individual histones are color coded as
indicated. (b) Blow up of the regions boxed in the left side of panel
a showing IN contacts with H2A and H2B histones (left) and with the
second gyre of the nucleosomal DNA (right).The strand transfer reactions occurring within the TCC that
result
in the joining of the 3′ vDNA ends to the host DNA mark the
end of IN catalytic function. Like 3′-processing, strand transfer
does not lead to global conformational rearrangements within the intasome
structure.[156] It remains a mystery if and
how IN signals completion of strand transfer for recruitment of STC
disassembly and DNA repair machineries. Given the apparent preference
of retroviruses to integrate into nucleosomes,[191,194,197] chromatin remodellers appear
to be good candidates for the job of STC disassembly.
Mechanics of the IN Active Site
Although
the first CCD structures were determined in the 1990s, the functional
organization of the IN active site was revealed only in the context
of the functional PFV IN–DNA complexes. Dubbed “the
active site loop” in the early literature,[204] the IN residues connecting β5 and α4 of the
CCD, often including the essential Glu of the D,D-35-E motif, are
disordered in most partial IN crystal structures. As expected, the
region folds into a defined structure through interactions with the
vDNA end (Figure c).
Given a high degree of amino acid conservation within the IN active
site, the lessons learned from the high-resolution PFV structures
should be generally applicable. Accordingly, the PFV intasome was
successfully used as a model to study active site inhibitors of HIV-1
IN.[85,205,206]The
active sites of retroviral IN and the related DD(E/D) transposases
contain three carboxylates, which serve to coordinate a pair of essential
divalent metal cofactors (Figure ). Although both Mg2+ and Mn2+ are capable of supporting IN activities, due to its greater abundance,
the former metal ion is believed to be primarily utilized in vivo.[207,208] The general mechanism of two-Mg2+-ion catalysis at a
phosphodiester bond, initially proposed in the early 1990s,[209,210] has been corroborated by a growing body of experimental and theoretical
work, including recent studies of Bacillus halodurans RNase H.[208,211−213] The primary coordination spheres of the Mg2+ ions include
essential active site carboxylates, the substrate phosphodiester,
and the attacking nucleophile (a water molecule in the case of RNase
H). The strong preference of Mg2+ for octahedral coordination
enforces precise relative positioning of the reactants and aids in
destabilizing the target phosphodiester group. During catalysis, the
ions act as Lewis acids, with metal A assisting in deprotonation of
the nucleophile and metal B neutralizing the negative charge developing
on the phosphorane intermediate. The ability of Mn2+ to
replace Mg2+ ion as a cofactor is explained by its similar
size, pKa, and coordination geometry.
Flavors of the two-Mg2+-ion mechanism are employed by a
wide range of structurally and functionally diverse enzymes, which
also includes protein kinases,[214−216] DNA and RNA polymerases,[214,217,218] and even ribozymes.[219]
Figure 6
Configurations of the IN active site leading to 3′-processing
(a, PDB ID 4E7I) and strand transfer (b, PDB ID 4E7K). Direction of nucleophilic attack by
a water molecule (W, panel a) or 3′-hydroxyl of vDNA (panel
b) is indicated with blue arrows marked SN2. Red spheres
represent water molecules; lower TCC X-ray data resolution did not
allow refinement of water molecules in panel b.
Configurations of the IN active site leading to 3′-processing
(a, PDB ID 4E7I) and strand transfer (b, PDB ID 4E7K). Direction of nucleophilic attack by
a water molecule (W, panel a) or 3′-hydroxyl of vDNA (panel
b) is indicated with blue arrows marked SN2. Red spheres
represent water molecules; lower TCC X-ray data resolution did not
allow refinement of water molecules in panel b.Due to the absolute dependence of the retroviral IN active
site
on its metal cofactors, it was possible to grow crystals of wild type
(WT) PFV IN engaged with its DNA substrates. When exposed to Mg2+ or Mn2+ salts, such apo forms of the intasome
and the TCC readily undergo 3′-processing and strand transfer
“in crystallo”. Soaking the crystals in the presence
of Mn2+ for very short periods of time (sufficient for
diffusion of the salt through the crystal lattice but not catalysis)
allowed freeze trapping the fully engaged configurations of the PFV
nucleoprotein complexes in their ground states prior to 3′-processing
and strand transfer.[156]For consistency
with the RNase H literature, the ion cofactors
coordinated to PFV IN residues Asp185 and Glu221 are designated metal
A and metal B, respectively (Figure ). The third carboxylate, Asp128, and a nonbridging
oxygen atom from the scissile phosphodiester are shared between both
ions. In the pre-3′-processing state, the octahedral coordination
sphere of metal A is nearly perfect and includes a water molecule
positioned for an in-line nucleophilic attack at the phosphorus atom
of the scissile phosphodiester (Figure a). Due to simultaneous bidentate interactions with
Glu221 and the phosphodiester, the environment of metal B cannot assume
octahedral coordination. This departure from the preferred coordination
geometry of metal B is thought to provide a destabilizing potential
on the scissile phosphodiester.[208,212] Following
3′-processing and dissociation of the cleaved dinucleotide,
metal B remains coordinated to the 3′ oxygen atom of the processed
vDNA end. Consequently, it befalls metal B to activate the nucleophilic
3′ hydroxyl of vDNA during strand transfer (Figure b). Thus, retroviral IN uses
the inherent symmetry of the two-Mg2+-ion mechanism to
carry out the consecutive reactions of hydrolysis and transesterification.
Because the latter step does not change the number of high-energy
bonds, strand transfer should in principle be reversible, at least
for as long as the STC persists. However, such unproductive reversal
of strand transfer is prevented by relocation of the newly formed
phosphodiester out of the active site.[157] This reconfiguration is likely driven by conformational strain due
to target DNA deformation, which is a conserved feature of retroviral
intasomes and DD(E/D) transposases.[157,173,177,220−222]
ASLV and MMTV Intasome Structures
In the
PFV intasome, a pair of CTDs from the inner IN protomers are
inserted in the dimer–dimer interface. The synaptic CTDs provide
rigidity to the assembly and contribute to the host DNA binding platform.
Crucially, this architecture depends on an extended polypeptide linker
to track the linear distance of ∼50 Å that separates the
carboxyl terminal boundary of the inner chain CCD to the beginning
of the CTD (depicted as a black curve in Figure b). Intriguingly, the length of the CCD–CTD
linker is not conserved among retroviral IN proteins, ranging from
50 to 60 amino acid residues in γ- and ε-retroviruses
and spumaviruses, to less than 10 residues in α- and β-retroviruses,
whereas lentiviruses and δ-retroviruses possess intermediate-size
linkers.[155,158] Modeling suggested that the
lentiviral IN CCD–CTD linker may stretch sufficiently to allow
formation of the tetrameric intasomal architecture similar to that
in PFV.[205,223] However, this scenario is clearly not conceivable
in the cases of α- and β-retroviral IN proteins.Recently, the STC from ASLV and the intasome from MMTV were characterized
by X-ray crystallography at 3.8 Å and cryo-EM at ∼5 Å
resolution, respectively.[155,181] Unexpectedly, the
structures revealed that the α- and β-retroviruses maintain
a PFV-like core intasomal structure by employing additional IN dimers
(referred to as flanking dimers) to source the pair of synaptic CTDs
(Figure a,b). Strikingly,
these intasomes contain homo-octamers of IN, each with four structurally
and functionally distinct types of subunits. The intasomal core structure
is further decorated by NTDs and CTDs belonging to the eight IN subunits.
Locations of six CTDs, including the four provided by the flanking
dimers, are conserved between the ASLV and MMTV intasome structures,
and these make direct contacts with vDNA. The CCDs of the flanking
IN dimers are considerably less defined in the cryo-EM structure,
and their positions differ between the ASLV and MMTV intasomes (Figure a,c). Intriguingly,
their locations are consistent with their potential roles in interactions
with host DNA. Indeed, contacts between the flanking CCDs and the
backbone of the target DNA are observed in the ASLV STC (Figure a, bottom panel).[181] Results of biochemical complementation experiments
moreover indicate that the flanking IN dimers are necessary for MMTV
IN strand transfer activity.[155]
Figure 7
Architecture
of the α and β-retroviral intasomes. (a)
Crystal structure of the ASLV STC (PDB ID 5EJK) in two orientations. Target DNA is hidden
in the top panel. The core IN tetramer containing inner and outer
IN subunits and a pair of synaptic CTDs from the flanking IN chains
is indicated. (b) Schematic of the intasomal IN octamer with IN domains
shown as ovals and colored by chain. For clarity, CTDs and NTDs belonging
to the outer IN subunits and NTDs from the flanking subunits present
in the structures are not indicated in panel a or shown in panel c.
(c) Pseudo-atomic model of MMTV intasome based on cryo-EM data (PDB
ID 3JCA and
EMDB ID 6441).
Architecture
of the α and β-retroviral intasomes. (a)
Crystal structure of the ASLV STC (PDB ID 5EJK) in two orientations. Target DNA is hidden
in the top panel. The core IN tetramer containing inner and outer
IN subunits and a pair of synaptic CTDs from the flanking IN chains
is indicated. (b) Schematic of the intasomal IN octamer with IN domains
shown as ovals and colored by chain. For clarity, CTDs and NTDs belonging
to the outer IN subunits and NTDs from the flanking subunits present
in the structures are not indicated in panel a or shown in panel c.
(c) Pseudo-atomic model of MMTV intasome based on cryo-EM data (PDB
ID 3JCA and
EMDB ID 6441).
Integration
Site Selection
Retroviral Integration
Is Not Random with
Respect to the Target Genome
Depending on the genomic location,
the local chromatin environment of the provirus may be conducive to
active viral expression or transcriptional silencing. Thus, the choice
of integration site influences the level of ongoing viral replication
and may contribute to the establishment of latent viral reservoirs.[224−228] In fact, the propensity of HIV-1 to establish chronic and hitherto
incurable infection is the direct consequence of its ability to establish
latent reservoirs.[225,229,230] While the interaction of IN with vDNA ends is nucleotide- sequence-
and structure-specific, the enzyme displays very little selectivity
with regard to the host DNA. Alignments of retroviral integration
sites revealed weak, virus-specific palindromic sequence consensi
that do not extend farther than several bp from the integration site.[193,194,231−233] These weak nucleotide sequence preferences are in part explained
by the sparse interactions between IN and target DNA bases.[157,194] However, far more interesting patterns emerge when the distributions
of retroviral integration sites are scrutinized on the genomic scale.Early research conducted in the 1970s and 1980s indicated that
MLV integration may be associated with sites of DNase I hypersensitivity
in the host cell genome.[234−236] The availability of the draft
human genome sequence[237,238] and the relative ease of recovering
vDNA–chromosomal junctions using PCR allowed Bushman and colleagues
to address the distribution of HIV-1 integration sites in their landmark
2002 study.[239] The field was subsequently
bolstered by the advent of deep sequencing, which now permits recovery
of hundreds of thousands of unique integration sites in a single infection
experiment.[201,240−242] It has emerged that retroviruses display distinct and contrasting
preferences for various host cell genomic features (reviewed in ref (243)). Thus, HIV-1 and other
lentiviruses display strong preferences for transcription units with
a sharp bias toward highly expressed and intron-rich genes.[201,239,240,244] MLV and other γ-retroviruses strongly favor promoter regions
and DNase I hypersensitive sites in general.[194,235,236,241,242,245,246] In sharp
contrast, the spumavirus PFV disfavors genes and loci of active transcription.[191,247,248] It appears that the least selective
retroviruses are from the α- and β-retrovirus genera,
which show nearly random integration site distributions with respect
to well-mapped genomic features.[244,249,250]Strong evidence implicated IN as a major determinant
for integration
site selection. Thus, implanting MLV IN into HIV-1 results in a chimeric
virus with integration site preference which is closer to that of
MLV.[246] However, the same study found a
subtler role for viral gag gene products, and more recent work has shown that amino
acid substitutions in viral CA protein have considerable bearing on
HIV-1 integration site distributions.[246,251,252] A hallmark of lentiviruses is their ability to infect
nondividing cells, with their PICs capable of traversing the nuclear
envelope through the nuclear pore complex (NPC),[253,254] while many other retroviruses need the nuclear envelope to break
down during mitosis to access host chromatin.[255−259] It seems plausible that CA may modulate HIV-1 PIC nuclear entry
pathways, potentially via interactions with cleavage and polyadenylation
specificity factor 6 (CPSF6) and/or nucleoporins (NUPs), which are
the constitutive components of the NPC. HIV-1 CA has been shown to
interact directly with CPSF6,[260,261] NUP358,[262] and NUP153,[263] and
depletion of each of these factors significantly reduced the frequency
of HIV-1 integration into gene-rich regions.[251,262,264,265] The involvement of several NPC components in integration site selection
is consistent with the observation that HIV-1 PICs preferentially
target highly expressed genes in the nuclear periphery that are proximal
to the nuclear pore.[266,267] Conversely, HIV-1 integration
is excluded from internal nuclear regions as well as from lamina-associated
domains.[267]
The Nexus
between Lentiviruses and LEDGF/p75
Whereas roles of retroviral
structural proteins in PIC trafficking
are only starting to transpire, the IN-dependent mechanisms of integration
site selection are well-established. Lentiviruses and γ-retroviruses
find their preferred genomic locations via recognition of specific
chromatin-associated cellular proteins, which act as receptors or
tethering factors for the PICs. Lens epithelium-derived growth factor
(LEDGF) is a ubiquitous cellular chromatin-associated protein,[268] initially described as transcriptional coactivator
p75.[269,270] Although a proposed extracellular function
of the protein or specific roles in lens epithelium development were
not corroborated, its misnomer has persisted in use. The protein was
identified as a dominant HIV-1 IN binding partner in affinity-capture
and yeast two-hybrid screening experiments[171,271] and was later shown to interact with and stimulate the enzymatic
activities of divergent lentiviral INs.[102,272,273] LEDGF/p75 is composed of 530
amino acid residues and contains two small structured domains: an
N-terminal PWWP domain and C-terminal IN binding domain (IBD) (Figure a).[274,275] An alternative splice form, LEDGF/p52,[269] lacks the IBD (Figure a) and consequently neither interacts with IN nor has an effect on
integration.[276] The extended flexible regions
of LEDGF/p75 harbor a classical importin α/β-dependent
nuclear localization signal (NLS)[277] and
a pair of AT-hook motifs implicated in DNA binding.[278,279] The PWWP domain belongs to the Tudor family and was shown to bind
nucleosomes trimethylated on Lys36 of the histone 3 tail (H3K36me3),
an epigenetic mark associated with transcription elongation and enriched
within transcription units.[280−282] Knockout of the gene encoding
LEDGF does not affect cell proliferation in tissue culture but results
in enhanced neonatal mortality and developmental abnormalities in
mice.[283,284] While its precise cellular functions remain
to be elucidated, LEDGF/p75 reportedly interacts with a range of functionally
diverse proteins, including the basal RNA polymerase cofactor PC2,
mRNA splicing factors, multiple endocrine neoplasia type 1 protein
product (menin), methyl CpG binding protein 2 (MeCP2), the end-resection
protein CtIP, transcription factor JPO2, and the activating subunit
of Cdc7 kinase.[240,269,285−291] LEDGF/p75 was shown to recruit some of the binding partners to chromatin,[276,285,290] suggesting that it may function
as an adaptor protein in various chromatin-bound transactions.
Figure 8
Interaction
of LEDGF/p75 and ALLINIs with HIV-1 IN. (a) Schematic
of LEDGF/p75 and p52 organization with NLS, AT-hooks, and structural
domains (PWWP and IBD) shown as boxes. (b) Crystal structure of the
HIV-1 IN CCD dimer in complex with LEDGF/p75 IBD (PDB ID 2B4J). (c) Details of
the protein–protein interactions (rotated ∼90°
counterclockwise from the boxed region in panel b). (d) Chemical structures
of selected ALLINIs; chemical groups mimicking LEDGF/p75 Asp366 and
Ile365 are shown in red and blue, respectively. (e) Crystal structure
of ALLINI BI-D in complex with HIV-1 IN CCD (PDB ID 4ID1), aligned with the
panel c projection.
Interaction
of LEDGF/p75 and ALLINIs with HIV-1 IN. (a) Schematic
of LEDGF/p75 and p52 organization with NLS, AT-hooks, and structural
domains (PWWP and IBD) shown as boxes. (b) Crystal structure of the
HIV-1 IN CCD dimer in complex with LEDGF/p75 IBD (PDB ID 2B4J). (c) Details of
the protein–protein interactions (rotated ∼90°
counterclockwise from the boxed region in panel b). (d) Chemical structures
of selected ALLINIs; chemical groups mimicking LEDGF/p75 Asp366 and
Ile365 are shown in red and blue, respectively. (e) Crystal structure
of ALLINI BI-D in complex with HIV-1 IN CCD (PDB ID 4ID1), aligned with the
panel c projection.Knockdown of LEDGF/p75
abolished the ability of ectopically expressed
HIV-1 IN to bind chromatin, which provided the first hint about its
role in lentiviral replication.[24,276] However, the functional
significance of the LEDGF/p75-IN interaction in vivo was initially
unclear because the first attempts at LEDGF/p75 knockdown obstensibly
failed to affect HIV-1 infectious titer and yielded only modest reductions
in integration.[24,292−294] Since infection of each new cell depends on a single-molecule integration
event, even very limited levels of chromatin-associated LEDGF/p75
can be sufficient to support efficient viral replication.[295] Clearer results came from infections conducted
under conditions of intensified LEDGF/p75 knockdown and genetic knockout,
which revealed considerable decreases of HIV-1 infectivity with the
specific defect at the integration step.[295−299]The analysis of distribution of residual HIV-1 integration
sites
in LEDGF/p75-depleted cells demonstrated a dramatic loss of transcription
unit targeting concomitant with a significant increase of integration
near transcription start sites.[296,299,300] The WT phenotype can be rescued upon restoration
of LEDGF/p75 expression, confirming a role of the host factor in the
targeting mechanism. Furthermore, by swapping the PWWP domain of LEDGF/p75
for alternative chromatin binding domains, it was possible to redirect
HIV-1 integration toward chromatin regions bound by the heterologous
tether.[301−303] Collectively, the results support a model
whereby LEDGF/p75 acts as a chromatin-bound tether that anchors the
PIC by engaging its IN component in a direct protein–protein
interaction. The results of a recent study that analyzed integration
distribution patterns in cells knocked out for LEDGF/p75, CPSF6, or
both factors clarified that the CA binding protein CPSF6 predominantly
directs the HIV-1 PIC to actively transcribed euchromatin, where LEDGF/p75
determines positions of integration along gene bodies.[265] LEDGF/p75 is additionally involved in HIV-1
latency by recruiting, post-integration, host factors IWS1 and SPT6
to the LTR, leading to the silencing of the provirus.[226] Rapid silencing of the integrated viral genome
is frequently observed during HIV-1 infection in proliferating CD4+
T cells.[304−306] By interacting with LEDGF/p75, HIV-1 integration
and transcriptional silencing may thus be coordinated to the establishment
of long-lived viral reservoirs.[226]The solution structure of the LEDGF/p75 IBD, determined using NMR,
revealed a compact domain comprising a pair of HEAT repeatlike α-helical
hairpins.[307] On the virus side, the IN
CCD is essential and minimally sufficient for the interaction with
LEDGF/p75, whereas the NTD is required for high-affinity binding.[276] X-ray crystallography was used to visualize
atomic details of the virus–host interaction (Figure b,c).[98,186,308] The hairpin loops at the tip
of the elongated IBD structure contact the IN CCD dimer, with side
chains of LEDGF/p75 residues Ile365 and Asp366 buried in a small pocket
at the CCD dimerization interface, making hydrophobic interactions
and a bifurcated hydrogen bond with the IN backbone, respectively.
The protein–protein interaction is enhanced by intermolecular
contacts involving Lys401, Lys402, and Arg405 on the basic side of
the IBD and a conserved cluster of carboxylates on lentiviral IN NTDs.[98] Crucially, LEDGF/p75 contacts both the CCD and
the NTD, which cooperate in higher-order IN multimerization and intasome
assembly. It is not surprising, therefore, that the host factor enhances
tetramerization and the strand transfer activity of HIV-1 IN in vitro.[98,186,309] The pocket on the surface of
the HIV-1 IN CCD involved in the interaction with the host factor
has been targeted by small-molecule inhibitors, which in their binding
mode to IN mimic the interactions made by LEDGF/p75 Ile365 and Asp366[46,310−315] (Figure d,e; see section ). Surprisingly,
these small molecules potently enhance IN multimerization,[310,311,315−317] presumably without contacting the NTD.[314,318]
γ-Retroviruses and BET Proteins
As
LEDGF/p75 was shown to target lentiviral integration, it seemed
possible that other retroviruses might similarly rely on genus-specific
cellular factors to direct integration toward preferred genomic loci.
Hence, it did not come as a surprise when it was discovered that γ-retroviruses
hijack cellular transcription factors as targeting factors.[129,319−321] Highly related transcription factors BRD2,
BRD3, and BRD4 belong to the bromodomain and extra-terminal (BET)
family. These proteins play major roles in transcription regulation[322,323] and had already been implicated in host–pathogen interactions.[324−326] The characteristic features of BET members are two tandem bromodomains
and a highly conserved extra-terminal (ET) domain within their N-
and C-terminal regions, respectively (Figure a).[243] Bromodomains
belong to the well-studied group of chromatin readers with specificity
for acetylated histone tails,[327] while
the ET domains were implicated in binding a range of cellular and
viral proteins.[323] In particular, BRD4
was shown to recruit P-TEFb to its target promoters to facilitate
transcription elongation of cellular genes.[328] Papilloma viruses tether their genomes to mitotic chromosomes via
a direct interaction between the viral E2 protein and the C-terminal
motif of BRD4, which allows stable segregation of vDNA copies between
daughter cells.[326,329] The latent nuclear antigen of
Kaposi’s sarcoma associated herpesvirus, essential for the
viral episome maintenance and transcription, interacts with the ET
domains of BET proteins.[330,331]
Figure 9
Interaction of BET proteins
with MLV IN. (a) Schematic of domain
compositions of BRD2, BRD3, and BRD4. (b) Solution structure of BRD4
ET domain in complex with the EBM of MLV IN (PDB ID 2N3K) shown in two orientations.
BRD4 and IN residues participating in the hydrophobic core of the
interface are shown as sticks and are indicated.
Interaction of BET proteins
with MLV IN. (a) Schematic of domain
compositions of BRD2, BRD3, and BRD4. (b) Solution structure of BRD4
ET domain in complex with the EBM of MLV IN (PDB ID 2N3K) shown in two orientations.
BRD4 and IN residues participating in the hydrophobic core of the
interface are shown as sticks and are indicated.Pull down experiments revealed a direct high-affinity interaction
of γ-retroviral INs with the ET domains of BET proteins; moreover,
the latter potently simulated IN strand transfer activity in vitro.[194,319−321] Finally, the function of BET proteins in
the context MLV replication was demonstrated using small-molecule
inhibitors of the bromodomain–chromatin interaction as well
as siRNA-mediated knockdown. The small molecules inhibited MLV but
not HIV-1 integration in a dose-dependent manner.[319−321] Notably, MLV integration sites correlate with binding sites of BET
proteins as determined by chromatin immunoprecipitation studies.[319,321,332] Treatment of cells with BET
inhibitors or a siRNA cocktail specific to BRD2–4 mRNAs significantly
reduced the preference of MLV to integrate near transcription start
sites.[319,320] As a complementary approach, LEDGF/p75-BRD4
hybrid proteins, containing the LEDGF/p75 PWWP and BRD4 ET domain,
retargeted MLV integration toward active transcription units and away
from transcription start sites,[321] confirming
the major role of the BET proteins in γ-retroviral integration
site selection.Despite highly similar functional consequences,
the structural
bases for binding of lentiviral and γ-retroviral IN proteins
to their cognate host factors are strikingly different. In contrast
to lentiviruses, which engage the LEDGF/p75 IBD using a quaternary
assembly of IN domains (minimally a CCD dimer), γ-retroviruses
bind BET proteins using extended C-termini characteristic to the INs
of this genus.[321,333,334] The solution structure of the complex between the BRD4 ET domain
and a conserved C-terminal ET binding motif (EBM) of MLV IN showed
that the interaction involves the formation of an intermolecular three-stranded
antiparallel β-sheet (Figure b).[335] The folding of the
interface only occurs upon binding of the two partners, as both the
C-terminal tail of MLV IN and the BRD4 ET domain loop are unstructured
on their own. The protein–protein interface contains a set
of hydrophobic interactions involving buried side chains from both
β6 and β7 of MLV IN and residues from helices α1
and α2 and the β1 strand of the ET domain (Figure b). The interaction further
depends on complementary electrostatics between the negatively charged
amino acids from ET domain strand β1 and the highly conserved
positively charged residues of MLV C-terminal β7. Mutation of
the critical conserved amino acids showed a strong reduction of binding
affinity as well as a shift of the integration pattern away from transcription
start sites.[320,334−336] Interestingly, BRD4 residues involved in the interaction with MLV
IN were shown also to be important for binding its cognate cellular
cofactors.[335] Therefore, it seems likely
that γ-retroviral IN evolved its C-terminal tail to mimic a
cellular BET binding protein in order to optimize integration into
transcriptionally active regions.
Integration
Site Selection by Other Retroviruses
and LTR Retrotransposons
The mechanisms of integration site
selection employed by lentiviruses and γ-retroviruses present
a remarkable case of convergent evolution, with both genera usurping
cellular readers of the histone code to locate optimal target sites.
Therefore, it is tempting to speculate that other retroviral genera
may use similar strategies. It was recently reported that HTLV-1 and
other δ-retroviral INs specifically interact with the B′
protein phosphatase 2A (PP2A) regulatory subunits; moreover, recombinant
B′ proteins stimulated concerted integration activity of the
δ-retroviral INs in vitro.[337] Although
not a classic chromatin binder, PP2A was implicated in dephosphorylating
chromatin-resident targets.[338,339] However, it remains
to be determined whether PP2A is involved in directing δ-retroviral
integration.LTR retrotransposons, such as well-studied Ty and
Tf elements from budding and fission yeasts, share most features of
their replication cycle with retroviruses, although they complete
it within the cell where they reside.[340] The fundamental difference between a transposon and a virus is that
the fate of the former wholly depends on the fitness of the host organism.
Therefore, yeast LTR retrotransposons avoid instigating harmful insertional
mutagenesis on the cell by precisely targeting new integration events
to safe loci, and they achieve it by utilizing IN-binding host proteins.
Thus, Ty5 retrotransposition is directed into transcriptionally silent
regions of the Saccharomyces cerevisiae genome via the interaction between IN and the heterochromatin maintenance
protein Sir4p.[341] Of note, the C-terminal
peptide of Ty5 IN engages a patch on Sir4p, which is also recognized
by the cellular interacting partner Esc1.[342,343] Another S. cerevisiae retrotransposon,
Ty1, engages the AC40 subunit of RNA polymerase III in a direct protein–protein
interaction with IN, for specific integration upstream of RNA polymerase
III transcribed genes.[344]
HIV-1 IN as a Target for Drug Development
HIV-1
IN Strand Transfer Inhibitors (INSTIs)
Its essential role
in viral replication and the lack of functional
equivalents in human cells made IN an ideal target for anti-HIV/AIDS
drug development. Intense interest and early screening efforts notwithstanding,
the first class of small molecules capable of inhibiting HIV-1 replication
by blocking IN was reported only in 2000.[345] The key to identification of these molecules was the use of preassembled
HIV-1 IN–vDNA complexes in screening assays.[346] Empirical optimization of the original “diketo acid”
pharmacophore led to the discovery of MK0518, now widely known as
raltegravir (RAL),[347,348] the first integrase strand transfer
inhibitor (INSTI) to be approved for the treatment of AIDS in 2007.
Two more antiretroviral drugs with an identical mode of action, elvitegravir
(EVG) and dolutegravir (DTG), have entered clinical use since then
(Figure a).[349,350]
Figure 10
Inhibitors of HIV-1 IN strand transfer activity. (a) Chemical structures
of a diketo acid (L-731,988) and the clinical INSTIs RAL, EVG, and
DTG. Metal chelating atoms and halobenzyl moieties of the INSTIs are
shown in red and blue, respectively. (b) Active site of the PFV intasome
prior to (top) and after binding RAL (middle) or DTG (bottom). The
310 helix that participates in the interactions with the
INSTIs is indicated as η.
Inhibitors of HIV-1 IN strand transfer activity. (a) Chemical structures
of a diketo acid (L-731,988) and the clinical INSTIs RAL, EVG, and
DTG. Metal chelating atoms and halobenzyl moieties of the INSTIs are
shown in red and blue, respectively. (b) Active site of the PFV intasome
prior to (top) and after binding RAL (middle) or DTG (bottom). The
310 helix that participates in the interactions with the
INSTIs is indicated as η.Consistent with the method of their initial identification,
INSTIs
engage the active site of IN only when it is in complex with the vDNA
end, competing with target DNA for binding to the intasome.[85] INSTIs specifically inhibit the strand transfer
reaction, although they are capable of affecting 3′-processing
at greatly elevated concentrations.[345,351] These small
molecules possess unusually tight binding to the HIV-1 intasome, with
dissociative half-times measuring in hours (for EVG or RAL) or even
days (for DTG).[352,353] This property is likely very
important for an inhibitor that blocks function of a long-lived complex,
such as the PIC, which is geared for a one-off reaction event: to
be effective, an INSTI must remain associated with the intasome until
the cell destroys the PIC.Despite their apparent chemical diversity,
the INSTIs share two
common functionalities: a Mg2+ chelating core, usually
a triad of oxygen atoms attached to a rigid scaffold (colored red
in Figure a), and
a flexibly linked aromatic side chain, typically a halobenzyl group
(shown in blue). Due to the conservation of the retroviral IN active
site, these small molecules display broad-spectrum activity against
diverse retroviruses.[103,354,355] Accordingly, the structural basis for INSTI action could be characterized
in the context of the PFV intasome.[85,205,206,356] Soaking the PFV intasome
crystals with the different INSTIs invariably results in binding of
the drug at the active site. In each studied case, the small molecules
engaged the catalytic pair of Mg2+ ions in the IN active
site (Figure b).[85] Here, the triad of metal-chelating heteroatoms
of the small molecule closely imitates interactions made by the oxygen
atoms of the scissile phosphodiesters and the respective nucleophiles
during 3′-processing and strand transfer.[156] The aromatic side chain of the INSTI assumes the position
normally occupied by the base of the deoxyadenosine on the processed
3′ vDNA end, intercalating between the base of the penultimate
deoxycytidine and a short 310 helix (designated η
in Figure b) containing
conserved PFV IN residues Pro214 and Gln215, which are equivalent
to HIV-1 IN Pro145 and Gln146, respectively. INSTIs also make variable
contacts with PFV IN Tyr212 (corresponding to HIV-1 IN Tyr143). In
particular, the oxadiazole ring of RAL stacks with the phenolic side
chain (Figure b,
middle panel) and additionally makes a hydrogen bond to the backbone
amide of the Tyr residue. Other compounds, such as EVG and DTG, make
much less extensive van der Waals contacts with the Tyr side chain.Crucially, by directly engaging the catalytic metals and displacing
the 3′ vDNA nucleotide, the INSTIs are incompatible with target
DNA binding and strand transfer. The requirement to displace the 3′
adenosine from its natural position accounts for the slow kinetics
of INSTIs binding to the intasome.[357] A
further energetic penalty associated with disengagement of a phosphodiester
group from the Mg2+ ions in the intasomal active site explains
the relative ineffectiveness of INSTIs to inhibit IN 3′-processing
activity.[156]Adapted to operate on
the bulky DNA substrate, the intasome harbors
a voluminous active site, which is not ideal for the development of
small-molecule inhibitors. X-ray structures of the PFV intasome prior
to 3′-processing and strand transfer elucidated additional
features of the DNA substrates that could potentially be mimicked
in future INSTI design.[156] Furthermore,
design of small molecules that more completely fill the substrate
envelope within the intasome active site could lead to improved inhibitory
properties.[358]
Viral
Resistance to INSTIs
The clinical
use of INSTIs has seen the emergence of HIV-1 variants with high-level
resistance to RAL and EVG (for detailed reviews see refs (359, 360)). Although
not yet documented in INSTI naïve cohorts, resistance to DTG
has been described in patients that initially failed RAL-based therapy.[361−363] Due to their identical modes of action, INSTIs show substantially
overlapping profiles of HIV-1 resistance mutations. The major genetic
pathways leading to RAL resistance and virologic failure in patients
are associated with substitutions of HIV-1 IN residues Tyr143 (typically
to Cys or Arg), Asn155 (to His), and Gln148 (to His or Arg).[364] Although, on their own, the primary substitutions
result in modest levels of drug resistance, their effects are greatly
amplified by secondary mutations. Most notably, combinations of Q148R/H
with G140S/A in HIV-1 IN result in a loss of viral susceptibility
to RAL and EVG, as well as substantial levels of resistance to DTG.[361,362]In the absence of HIV-1 intasome crystals, the PFV model was
instrumental to shed some light on the mechanism of viral resistance
to INSTIs.[206] The observation that the
oxadiazole ring of RAL interacts extensively with the aromatic side
chain of Tyr212 readily explained why substitutions of HIV-1 IN residue
Tyr143 cause viral resistance to RAL.[85] Because EVG and DTG make only weak contacts with the Tyr residue,
their activities are largely unaffected by its substitutions.[206] In contrast to Tyr212, PFV IN residues corresponding
to HIV-1 Gln148 and Asn155 (Ser217 and Asn224, respectively) do not
make direct interactions with the INSTIs. Nevertheless, akin to the
effects of the analogous mutations in HIV-1 IN, S217H and N224H reduced
susceptibility of PFV IN to RAL in vitro.[206,356] Crystal structures revealed that the amino acid substitutions result
in subtle but significant deformations of the intasomal active site.
Expectedly, binding of the relatively rigid INSTIs requires the mutant
active site to adopt a WT-like conformation. The energetic cost associated
with the rehabilitation of the active site was proposed to be the
reason for the apparent reduction in drug binding affinity.[206] A Ser or an Ala residue at HIV-1 IN position
140 is predicted to make direct contacts with the side chain of His148,
helping to explain the coevolution of the G140S and Q148H mutations.[206] Ostensibly, a conformational adaptation to
a large substrate, such as target DNA or chromatin, which make extensive
interactions outside of the active site, will be offset to a lesser
degree than small-molecule binding. The difference may provide the
mutant viruses a sufficient selective advantage in the presence of
the drug. This model accordingly predicts that INSTIs that make more
extensive interactions with immutable features of the IN active site
will be less affected by mutations. Indeed, INSTIs such as DTG and
MK2048, which display relatively long dissociative half-times from
the WT HIV-1 intasome, are considerably less affected by the “shape-shifting”
Q148H/R and N155H mutations.[356,361,362] These small molecules, commonly referred to as second-generation
INSTIs, tend to make van der Waals contacts to the main chain of the
β4-α2 loop of the IN active site.[206,356,358,365] Bulkier compounds, which occupy the substrate envelope of the intasomal
active site more completely, tend to be more active against the classic
RAL-resistant strains.[358] Additionally,
main chain IN amides involved in the interaction with the scissile
phosphodiesters of the viral and target DNA substrates may provide
useful immutable bonding points for the next-generation of INSTIs.[156]
Emerging Allosteric Inhibitors
of HIV-1 IN
Despite the great success of the combinatorial
therapeutic approach
against HIV/AIDS, new infections emerge from drug-resistant strains.
In addition to developing new derivative compounds against current
targets, new drugs that inhibit untapped steps of the HIV-1 replication
cycle are of great importance. In the case of IN, the design of drugs
that target positions different from the active site has the advantage
of remaining theoretically potent against INSTI-resistant strains.
Among the noncatalytic site inhibitors described so far, the most
promising molecules target the LEDGF/p75 binding pocket at the HIV-1
IN CCD dimerization interface; these new molecules, which go by a
variety of names (see refs (366, 367) for detailed reviews), will be referred to here as allosteric IN
inhibitors (ALLINIs, Figure d).Results of biochemical and cell-based infection
assays provided initial evidence that the CCD–CCD interface
might serve as a target for antiviral drug development. X-ray crystallography
was used to screen for small-molecule binders of the HIV-1 IN CCD,
and micromolar concentrations of one compound, 3,4-dihydroxyphenyltriphenylarsonium
bromide, which engaged the CCD dimer interface at a region that was
later confirmed as the LEDGF/p/75 binding site, inhibited IN 3′-processing
and strand transfer activities in vitro.[368] Overexpression of LEDGF/p75 IBD-containing proteins that lacked
chromatin-binding activity moreover inhibited HIV-1 replication at
the integration step.[295,369] Further highlighting the host
factor-binding pocket for drug development, the combination of IBD
protein overexpression with LEDGF/p75 knockdown could effectively
cripple HIV-1 infection.[370]The most
advanced ALLINIs, derived from quinoline-based acetic
acid, were independently discovered using two different approaches.
Debyser and colleagues[312] used the HIV-1
IN CCD-LEDGF/p75 IBD cocrystal structure to screen in silico for LEDGF/p75
binding site inhibitors, whereas another group discovered highly similar
small molecules in a high-throughput screen for antagonists of IN
3′-processing activity.[313,371] Optimized ALLINI compounds
are highly potent, inhibiting HIV-1 replication with effective concentration
50% (EC50) values in the low (∼10–100) nanomolar
range.[310,313,315,317] This family of small molecules induces IN multimerization
and inhibits IN catalysis[310,311,316,317] and IN-LEDGF/p75 binding[310−312,316,317] in vitro. Cocrystal structures of the inhibitors with the IN CCD
dimer revealed that the molecules are bona fide allosteric inhibitors,
as they engage the LEDGF/p75 binding site, which is distal from the
enzyme active site.[46,310−314,317] A crucial component of the ALLINI
pharmacophore is the carboxylic moiety (shown red in Figure d), which hydrogen bonds with
the backbone amides of IN residues Glu170 and His171, mimicking the
LEDGF/p75 Asp366 bidentate interaction with IN (Figure e). Another important feature is an aromatic
side chain (blue in Figure d), which mimics hydrophobic interactions made by Ile365 of
LEDGF/p75.Initial experiments expectedly unveiled inhibition
of integration
during HIV-1 infection, with drug resistance mapping to the IN coding
portion of the viral pol gene.[310,312] However, follow up work revealed that the compounds are far more
potent when they are present during the late phase of the HIV-1 lifecycle
as compared to the acute phase of infection, when reverse transcription
and integration occur.[45,46,317,372] Consistent with the in vitro
data, ALLINIs induce HIV-1 IN multimerization in the context of virus
particles, which results in a catastrophic defect during virion maturation.[45,46,372] The viral ribonucleoprotein
(RNP) complex, composed mainly of viral RNA and nucleocapsid protein,
is normally housed within a conical core composed of the CA protein.
ALLINIs apparently uncouple the internal placement of the RNP within
the core, yielding so-called “eccentric” virions with
the RNP situated outside of the core, usually in association with
the viral membrane.[45−47,372] Virion protein, RNA,
and cellular tRNA content are unaffected by ALLINI treatment,[45,46,372,373] and the defective virions accordingly support normal levels of endogenous
RT activity in vitro.[45,373]When present during the
acute phase of HIV-1 infection, ALLINIs
specifically inhibit integration without affecting the preceding reverse
transcription step.[45,46,310,312,315,317] Although drug-treated particles
enter target cells normally,[45,46] they are reportedly
defective for reverse transcription, integration,[45,46,315,372] and nuclear
import of the PIC.[372] Careful comparisons
of dose–response curves for inhibition of particle maturation,
reverse transcription, and HIV-1 infection, however, suggest that
the ability of the compounds to inhibit virion maturation accounts
for their full antiviral activity.[47] In
other words, drug-treated viruses are defective for reverse transcription
because the misplaced RNP is unable to support DNA synthesis in the
subsequently infected cell, as compared to a direct inhibition of
reverse transcription by ALLINIs. Intriguingly, the range of replication
defects ascribed to ALLINI-treated virions is reminiscent of the pleiotropic
nature of class II IN mutations on HIV-1 replication (section above). Moreover, class II
IN mutant virus particles harbor an eccentric RNP that is indistinguishable
from the one induced by ALLINI treatment.[30,45,46,223] By extension,
we suspect that the inability to encapsidate the RNP into the viral
core underscores the majority of replication defects ascribed to class
II HIV-1 IN mutant viruses. The ability to form eccentric RNPs has
been shown to depend on the presence of viral RNA, indicating that
IN may normally engage the viral genome, either directly or indirectly,
to orchestrate RNP encapsidation into the viral core.[47]ALLINI potency is significantly increased when target
cells are
depleted for LEDGF/p75, suggesting that the host factor can compete
with the compounds during the acute phase of infection.[46,298,315,374] By contrast, LEDGF/p75 depletion or overexpression does not influence
drug potency during particle assembly.[45,46,297,315] As LEDGF/p75 associates
constitutively with chromatin,[171,268,276] it seems that the inability of LEDGF/p75 to compete for compound
binding to IN during HIV-1 particle morphogenesis, which occurs at
the cell periphery or after the virus exits the cell, accounts for
the unique pharmacology of this anti-IN drug class.As might
be expected from their known binding site at the IN CCD
dimer interface, ALLINIs retain potency against INSTI-resistant HIV-1
strains.[312,316,317,375] Unfortunately, ALLINIs seem
to possess a relatively low genetic barrier to resistance, as several
mutations mapping to the LEDGF/p75 binding cavity that greatly reduce
drug potency have been described following ex vivo virus passage.[310,312,315,316,375] Nevertheless, because these
molecules perform well in concert with INSTIs,[316,375] their clinical development is of great interest.
Retroviral Integration as a Therapeutic Tool
Due to the
natural trait to stably integrate their genetic cargo
into a cell chromosome, retroviruses have long been studied as tools
for corrective gene therapy (see ref (376) for a current overview). Replication-defective
vectors derived from MLV were used in pioneering studies to correct
a variety of crippling diseases, including X-linked severe combined
immunodeficiency (SCID-X1),[377] Wiskott–Aldrich
syndrome,[378] and chronic granulomatous
disease.[379,380] However, a significant number
of patients from these trials developed severe adverse effects due
to clonal expansion of the treated cells, leading to leukemia.[378,381−383] Detailed characterization revealed that
many of these genotoxic events resulted from the integration of the
MLV vector in the vicinity of a growth-promoting proto-oncogene, such
as LMO2.(378,382−384) Deregulation of proto-oncogene expression has long been known as
a driver for retroviral genotoxicity,[385] and the safety of retroviral vectors for human gene therapy applications
has accordingly become a major priority in their development. An optimized
vector would in theory efficiently express its transgene cargo over
a long period of time without displaying adverse side effects on cellular
gene expression or physiology.The propensity for MLV to target
integration to cellular promoters
and enhancers,[241,242,245] which was virtually unknown during the planning stages of the initial
SCID-X1 trials, in hindsight likely made MLV an unfortunate choice
for treatment. However, the field now has a much broader appreciation
of how different types of retroviruses target potentially unsafe genomic
features such as genes and enhancer regions.[243] In this vein, vectors based on α-retroviruses,[386] β-retroviruses,[387] or spumaviruses,[388] each of which targets
genes and enhancers to lesser extents than lentiviruses and γ-retroviruses,
respectively, might prove safer than MLV-based vectors. The identification
of the mechanisms of integration targeting for the lentiviruses and
γ-retroviruses has additionally opened up new approaches to
vector design. As one example, MLV-derived vectors show greatly reduced
propensity to integrate nearby transcriptional start sites in the
presence of BET protein inhibitors,[319] raising
the possibility of using such inhibitors with MLV vectors in the clinic.
Several drawbacks however seem likely to limit such approaches. In
addition to potential small-molecule toxicity, MLV retained a partial
tendency to target promoter-proximal regions in the presence of BET
inhibitors such as JQ-1.[319] Furthermore,
MLV vector titer was reduced significantly by JQ-1 treatment due to
inhibition of integration.[319−321] An alternative strategy is to
delete the IN C-terminal tail region that mediates BET protein binding.[321,334] Such vectors greatly reduce promoter-proximal integration with only
a modest effect on vector titer.[334,389] It will be
instructive to determine the carcinogenic potential of such IN deletion
constructs in animal models of MLV pathogenesis.A separate
approach to MLV vector modification looks extremely
promising from initial clinical trials. The viral promoter and enhancer,
which are situated within the U3 region of the LTR, can be deleted,
resulting in so-called self-inactivating (SIN) vectors without severely
affecting reverse transcription or integration.[390] An SCID-X1 trial with an MLV SIN vector has failed to detect
evidence for leukemia during an initial 1–3 year observational
period, indicating that deletion of the viral enhancer might go a
long way to improve MLV-based vector safety.[391]As fusion proteins between the LEDGF/p75 IBD[301−303] or BET protein ET domain[321] and chromatin
binding modules can effectively retarget integration, such protein
hybrids could in theory be used to steer lentivirus or γ-retrovirus
integration out of harm’s way. For example, a LEDGF/p75-based
fusion harboring the HP1α heterochromatin protein yielded an
overall integration pattern that was remarkably similar to random.[301] A key drawback of such approaches is the requirement
to express the retargeting factor in the cells that will receive the
retroviral vector. This approach accordingly seems to hold considerably
less promise than the attempts to reduce or eliminate genotoxicity
through direct vector modification.Notwithstanding highly significant
genetropic integration targeting
by the lentiviruses, there is little evidence to suggest an operational
link between HIV-1 integration and carcinogenesis. A primary reason
may be the highly cytopathic nature of infection: infected CD4-positive
T cells, the primary targets of the virus, display an average half-life
of only ∼1.5 days.[392] However, cancer-related
genes are targeted about 5-fold more frequently by HIV-1 PICs than
expected.[240] Integrations in the vicinity
of growth-promoting genes moreover can help to drive the clonal expansion
of cells that constitute the latent viral reservoir.[227,228] Numerous HIV-1 gene products, including the surface envelope glycoprotein
Gp120[393] and viral protein R (Vpr),[394] are acutely cytopathic, and such genes are
deleted from HIV-based vectors. As is the case with MLV vectors, it
is critical to monitor HIV-based gene therapy trials for sites of
vector DNA integration by deep sequencing to catch potential longitudinal
emergence of dominant cell clones in patients.As is the case
with MLV SIN vectors, results of preliminary clinical
trials with HIV-based vectors look promising, with little to no evidence
for the type of clonal dominance that was observed in initial MLV-based
trials.[395−398] The ability to safely integrate a corrective transgene in a long-lasting
target cell can in theory be expected to go a long way—perhaps
for a patient’s lifetime—to correct certain debilitating
diseases. The field accordingly cautiously awaits long-term follow
up of ongoing retroviral-based gene therapy trials.
Perspectives
During the past 30 years the field of retroviral
integration progressed
from an epitome of experimental hardship and enigma to arguably the
best-understood DNA recombination system. Combined efforts of academic
groups and leading pharmaceutical companies resulted in the discovery
of potent HIV-1 IN inhibitors, characterization of the first cellular
cofactors of retroviral INs, and elucidation of the mechanistic and
structural details of retroviral integration.One clear vector
for future research will be expansion of the intasome
structure repertoire. In particular, elucidation of HIV-1 or lentiviral
intasome structure will be of great importance to help the rational
design of next-generation INSTIs and improved ALLINIs. Furthermore,
characterization of the HIV-1 intasome may help to identify new pockets
that are potentially druggable by allosteric inhibitors. The development
of resistance to the antiretroviral drug arsenal is a major health
issue, making it important to find new therapeutic strategies. Many
steps modulating retroviral integration are still poorly described
and would necessitate further investigation. Among these is the mechanism
by which retroviruses protect themselves against autointegration.
One could imagine the development of small molecules that could be
used to trigger suicidal autointegration before the PIC encounters
host chromatin.Another open question is the mechanism of retroviral
STC disassembly.
Since the available TCC and STC structures are very similar, how IN
signals the completion of strand transfer to the cell to allow DNA
repair remains unknown. Unravelling the mechanism and the cellular
proteins involved in this process may well lead to the development
of new drugs targeting these steps. A more detailed characterization
of the unexpected role of IN in HIV-1 particle morphogenesis could
also lead to new ways of inhibiting viral replication.Studies
on retroviral IN host factors have clarified the molecular
mechanisms underlying integration site selection. The recent description
of MLV IN targeting factors BRD2–4 together with the lentiviral
IN cofactor LEDGF/p75 established the concept of bimodal tethering
as a major mechanism for integration site selection. These discoveries
opened new windows toward tuning the specificity of retroviral integration.
Future studies will elucidate if the other retroviral families rely
on similar pathways to select suitable chromatin environments and
will hopefully give important insights into improving the safety of
retroviral gene therapy vectors. Given the parallel development of
gene-editing technologies such as CRISPR (clustered regularly interspaced
short palindromic repeats)-Cas9 (reviewed in ref (399)), there has been in recent
years a resurgence of interest in human gene therapy. Yet, notwithstanding
its lauded efficiency, gene editing using CRISPR-Cas9 involves generation
of a double-strand DNA break at the target locus, a highly genotoxic
chromosomal lesion in its own right. Therefore, it remains to be seen
which methodology might evolve into a safer therapy approach in the
future.
Authors: Linos Vandekerckhove; Frauke Christ; Bénédicte Van Maele; Jan De Rijck; Rik Gijsbers; Chris Van den Haute; Myriam Witvrouw; Zeger Debyser Journal: J Virol Date: 2006-02 Impact factor: 5.103
Authors: Lee D Fader; Eric Malenfant; Mathieu Parisien; Rebekah Carson; François Bilodeau; Serge Landry; Marc Pesant; Christian Brochu; Sébastien Morin; Catherine Chabot; Ted Halmos; Yves Bousquet; Murray D Bailey; Stephen H Kawai; René Coulombe; Steven LaPlante; Araz Jakalian; Punit K Bhardwaj; Dominik Wernic; Patricia Schroeder; Ma'an Amad; Paul Edwards; Michel Garneau; Jianmin Duan; Michael Cordingley; Richard Bethell; Stephen W Mason; Michael Bös; Pierre Bonneau; Marc-André Poupart; Anne-Marie Faucher; Bruno Simoneau; Craig Fenwick; Christiane Yoakim; Youla Tsantrizos Journal: ACS Med Chem Lett Date: 2014-01-22 Impact factor: 4.345
Authors: David R Langley; Himadri K Samanta; Zeyu Lin; Michael A Walker; Mark R Krystal; Ira B Dicker Journal: Biochemistry Date: 2008-12-23 Impact factor: 3.162
Authors: Nikki van Bel; Yme van der Velden; Damien Bonnard; Erwann Le Rouzic; Atze T Das; Richard Benarous; Ben Berkhout Journal: PLoS One Date: 2014-07-29 Impact factor: 3.240
Authors: Xue Zhi Zhao; Steven J Smith; Daniel P Maskell; Mathieu Metifiot; Valerie E Pye; Katherine Fesen; Christophe Marchand; Yves Pommier; Peter Cherepanov; Stephen H Hughes; Terrence R Burke Journal: ACS Chem Biol Date: 2016-02-05 Impact factor: 5.100
Authors: Parmit Kumar Singh; Matthew R Plumb; Andrea L Ferris; James R Iben; Xiaolin Wu; Hind J Fadel; Brian T Luke; Caroline Esnault; Eric M Poeschla; Stephen H Hughes; Mamuka Kvaratskhelia; Henry L Levin Journal: Genes Dev Date: 2015-11-01 Impact factor: 11.361
Authors: Zachary J Whitfield; Patrick T Dolan; Mark Kunitomi; Michel Tassetto; Matthew G Seetin; Steve Oh; Cheryl Heiner; Ellen Paxinos; Raul Andino Journal: Curr Biol Date: 2017-11-09 Impact factor: 10.834
Authors: Mathieu Métifiot; Barry C Johnson; Evgeny Kiselev; Laura Marler; Xue Zhi Zhao; Terrence R Burke; Christophe Marchand; Stephen H Hughes; Yves Pommier Journal: Nucleic Acids Res Date: 2016-07-01 Impact factor: 16.971