Valentina Poletti1, Fulvio Mavilio2. 1. Genethon, 1bis rue de l'Internationale, 91002 Evry, France. 2. Department of Life Sciences, University of Modena and Reggio Emilia, 41125 Modena, Italy.
Abstract
Replication-defective retroviral vectors have been used for more than 25 years as a tool for efficient and stable insertion of therapeutic transgenes in human cells. Patients suffering from severe genetic diseases have been successfully treated by transplantation of autologous hematopoietic stem-progenitor cells (HSPCs) transduced with retroviral vectors, and the first of this class of therapies, Strimvelis, has recently received market authorization in Europe. Some clinical trials, however, resulted in severe adverse events caused by vector-induced proto-oncogene activation, which showed that retroviral vectors may retain a genotoxic potential associated to proviral integration in the human genome. The adverse events sparked a renewed interest in the biology of retroviruses, which led in a few years to a remarkable understanding of the molecular mechanisms underlying retroviral integration site selection within mammalian genomes. This review summarizes the current knowledge on retrovirus-host interactions at the genomic level, and the peculiar mechanisms by which different retroviruses, and their related gene transfer vectors, integrate in, and interact with, the human genome. This knowledge provides the basis for the development of safer and more efficacious retroviral vectors for human gene therapy.
Replication-defective retroviral vectors have been used for more than 25 years as a tool for efficient and stable insertion of therapeutic transgenes in human cells. Patients suffering from severe genetic diseases have been successfully treated by transplantation of autologous hematopoietic stem-progenitor cells (HSPCs) transduced with retroviral vectors, and the first of this class of therapies, Strimvelis, has recently received market authorization in Europe. Some clinical trials, however, resulted in severe adverse events caused by vector-induced proto-oncogene activation, which showed that retroviral vectors may retain a genotoxic potential associated to proviral integration in the human genome. The adverse events sparked a renewed interest in the biology of retroviruses, which led in a few years to a remarkable understanding of the molecular mechanisms underlying retroviral integration site selection within mammalian genomes. This review summarizes the current knowledge on retrovirus-host interactions at the genomic level, and the peculiar mechanisms by which different retroviruses, and their related gene transfer vectors, integrate in, and interact with, the human genome. This knowledge provides the basis for the development of safer and more efficacious retroviral vectors for human gene therapy.
Integration in the genome of a host cell is a key step in the life cycle of a retrovirus, allowing stable transmission of the viral genome to the host cell progeny and persistent viral gene expression. Retroviral integration is a non-random process whereby the viral RNA genome, reverse transcribed into double-stranded DNA and assembled in a pre-integration complex (PIC), associates to the host cell chromatin and integrates in its proviral form in the genome through the activity of the viral integrase (IN), a specialized protein encoded by the viral pol gene. Early studies on viral integration were based on in vitro models, which identified several physical factors playing a role in the process, such as nucleosome-induced DNA bending or steric hindrance by DNA-binding proteins. In vivo, however, viral integration is an active process involving mainly cellular and viral players that have been, and still are, extensively investigated. When the complete sequence of the genomes of several mammals, including humans, became available, PCR-based methods were developed to amplify, clone, and sequence the junctions between proviral and host genome,2, 3 with the scope of mapping proviruses on genomic DNA and understanding the rules governing the integration process. Thanks to the spectacular development of DNA sequencing technology, ligation-mediated (LM) or linear amplification-mediated (LAM) PCR have been progressively refined, allowing extensive mapping of viral integration sites (ISs) in mammalian genomes in a quantitative fashion.The first pioneering studies showed that retroviruses select their genomic target sites in a sequence-independent manner, with preferences for transcribed genes in the case of the HIV or gene promoters in the case of the Moloney murine leukemia virus (MLV). More recently, massive parallel sequencing has greatly increased the resolution of integration maps, while genome-wide association with genetic and epigenetic annotations of the human genome provided crucial clues to the understanding of the molecular determinants of target site selection. In summary, these studies revealed that each retrovirus has a unique and peculiar pattern of integration within mammalian genomes, which is faithfully reproduced by the gene transfer vectors derived thereof. The non-randomness of viral integration patterns has significant implications in terms of biosafety for gene therapy applications.
Retroviruses Have Different Integration Preferences
Retroviruses are divided into seven genera (alpha-, beta-, gamma-, delta-, and epsilon-Retroviridae, Spumaviridae, and Lentiviridae), and integration characteristics and preferences are known for at least one member of each family, except for epsilon-retroviruses. Integration profiles have been defined for members of the lentivirus, spuma-retrovirus, alpha-retrovirus, and gamma-retrovirus genera. Not surprisingly, the extent of available scientific knowledge on target site selection and its determinants directly reflects the clinical relevance of each retrovirus type, thus the HIV and MLV integration profiles are the most extensively characterized.Based on IS preferences, retroviruses can be classified in roughly three groups: MLV, foamy virus (FV), and human T cell leukemia virus (HTLV), which integrate preferentially around transcription start sites (TSSs) and in transcriptional regulatory elements; HIV and simian immunodeficiency virus (SIV), which preferentially target transcribed sequences; and avian sarcoma-leucosis virus (ASLV), which shows little preference for any genomic feature. The different integration patterns and propensity for specific genomic characteristics indicate the involvement of specific viral and cellular factors in the integration process. While several host cell factors play important roles in target site selection, genetic and biochemical evidence indicate that the viral IN is the major viral determinant for most retroviruses. Unsupervised clustering of IS patterns and phylogenetic analysis of INs of six different retroviruses showed that global and local IS preferences correlate with the sequence and/or structure of viral INs, suggesting a strong correlation between target site selection and viral evolution. The choice of a particular set of ISs is essential for the viral fitness, because it influences not only the transcriptional regulation the provirus will be subjected to, but also any interference of the provirus with the host genome transcription.Early studies demonstrated that DNA wrapped in nucleosomes is a favored target for retroviral integration compared to naked DNA,6, 7, 8 with the outward-facing DNA major groove being a preferred target region. Much is known about the sequence constraints for efficient integration of retroviruses. A dinucleotide CA is invariably positioned exactly 2 bp from both ends of the viral termini. The sequences extending up to 15 bp into the host DNA have significant influence on integration efficiency, although retroviral integration is not a sequence-specific process.4, 9, 10, 11, 12 The first analysis of hundreds of HIV-1, MLV, ASLV, and SIV ISs in mammalian cells showed only a statistically weak palindromic consensus centered on the virus-specific duplicated target site sequence at the insertion sites. Later analysis of larger datasets confirmed the occurrence of a weakly conserved consensus at the ISs, not recurrent among different retroviruses and enriched when integration is studied on naked DNA in vitro, suggesting that the consensus is mainly selected by the integration machinery itself, most likely reflecting the spatial or energy requirements of the integration complex.9, 14 These studies indicated that the linear DNA sequence plays a minor role in the integration preferences of retroviruses, which depends essentially on viral and cellular determinants and on their cell-specific availability and interaction.
Lentiviruses Target Transcribed Genes
HIV-1 is the best-characterized representative of the Lentiviridae family, and the etiological agent of AIDS. As soon as LM-PCR mapping technology became available, the HIV integration pattern in the human genome was characterized, showing a strong preference for gene bodies, with up to 80% of the proviruses located inside an active transcriptional unit. As a consequence, HIV integration is influenced by the transcription pattern of the target cell.3, 15, 16 Higher resolution maps (up to 150,000 ISs) of the integration of HIV-1-derived lentiviral vectors (LVs) were determined on human primary T cells and CD34+ hematopoietic stem-progenitor cells (HSPCs). These studies confirmed the HIV-1 preference for transcribed gene bodies, and revealed a negative correlation between integration frequency and TSSs and other typical feature of gene promoters, such as CpG islands, G/C-rich sequences, DNase I hypersensitive sites, and clusters of transcription factor binding sites (TFBSs).17, 18 The association of ISs with genome-wide maps of histone modifications revealed another interesting feature: HIV-1 integration is strongly associated to epigenetic marks of transcribed gene bodies such as H4K20me1, H3K36me3, H2BK5me1, and H3K27me1.14, 19, 20 Interestingly, HIV-1 has apparently evolved to avoid integrating into active transcriptional regulatory regions, such as promoters and enhancers, as defined by mono-, di-, and tri-methylation of H3K4 and histone hyperacetylation, or in genes controlling cell development and differentiation, including proto-oncogenes (Figure 1). Genes targeted by HIV-1 and LVs are mostly located in euchromatic regions in the outer, membrane-proximal portion of the cell nucleus in close correspondence with the nuclear pore, indicating a strong association between nuclear entry and integration mechanisms. On the contrary, HIV and LVs strongly disfavor genes located in heterochromatic regions marked by H3K27me3 and H3K9me3,14, 17, 20 including the nuclear lamin-associated heterochromatin. Transcriptionally active genes are also disfavored if located centrally within the nucleus.
Figure 1
Differential Distribution of Gamma-Retroviral and Lentiviral Integration Sites in the Human Genome
MLV and HIV integration sites are differentially distributed within the genome of a target cell. MLV preferred sites are tightly clustered in and around active regulatory elements (red arrows), whereas HIV preferred sites are spread along transcription units, grouped in larger clusters (blue arrows).
Differential Distribution of Gamma-Retroviral and Lentiviral Integration Sites in the Human GenomeMLV and HIV integration sites are differentially distributed within the genome of a target cell. MLV preferred sites are tightly clustered in and around active regulatory elements (red arrows), whereas HIV preferred sites are spread along transcription units, grouped in larger clusters (blue arrows).
Mechanisms of Lentiviral IS Selection: The Tethering Factors
HIV integration has been the first comprehensively described model of target site selection based on protein-mediated tethering of the retroviral PIC to the host cell chromatin. Several cellular proteins have been isolated as physically bound to lentiviral PICs, and for some of them the association occurs via direct interaction with the HIV-1 IN. These include members of the DNA repair machinery such as hRad1, components of chromatin remodeling complexes such as INI1 and EED, the constitutive chromatin components HMGI(Y), and most importantly, the lens epithelium-derived growth factor (PSIP1/LEDGF/p75), a ubiquitously expressed nuclear protein tightly associated with chromatin throughout the cell cycle. The role of LEDGF/p75 in mediating HIV infectivity has been deeply investigated because of its tight interaction with the viral IN and its role in stimulating the IN catalytic activity in vitro.27, 28, 29 LEDGF/p75 knockdown significantly reduces HIV-1 infectivity, showing the functional role of this protein as the major cellular binding partner of HIV-1 IN.30, 31LEDGF/p75 was initially discovered as a human transcriptional coactivator interacting with a number of cellular proteins, including JPO2,31, 33 Cdc7-activator of S-phase kinase (ASK), the transposase pogZ, and menin, which links LEDGF/p75 with mixed-lineage leukemia (MLL) histone methyltransferase, causing MLL-dependent transcription and leukemic transformation. LEDGF/p75 belongs to the hepatoma-derived growth factor-related protein (HRP) family, which comprises five additional members—HDGF, HRP1-3, and LEDGF/p52, an alternatively spliced, smaller isoform of LEDGF—that, except for HRP2, lack the domains necessary to HIV-1 IN interaction. LEDGF/p75 is characterized by a conserved N-terminal PWWP domain, a basic-type nuclear localization signal, two AT-hook DNA binding motifs, and three highly charged regions (CR1-3) that allow to tightly engage chromatin throughout the cell cycle.29, 38 The C-terminal region contains the IN-binding domain (IBD), which directly interacts with HIV-1 IN. The N-terminal PWWP domain is the key determinant for the site-selective association of LEDGF/p75 with chromatin: HIV-1 ISs generated in the presence of PWWP domain-defective LEDGF/p75 mutants differ substantially from those generated in the presence of wild-type LEDGF/p75.NMR structures of the LEDGF/p75 PWWP domain revealed two distinct functional interfaces: a hydrophobic pocket that interacts with the H3K36me3 histone tail and an adjacent basic interface that non-specifically engages DNA.41, 42 Interestingly, the LEDGF/p75 PWWP domain exhibits low binding affinity for either an H3K36me3 peptide or naked DNA, whereas it interacts tightly with mononucleosomes containing an H3K36me3 analog, indicating that cooperative binding of LEDGF/p75 with both the H3K36me3 tail and nucleosomal DNA is essential for its tight and site-selective chromatin association. Indeed, mutations introduced in either the hydrophobic pocket or the basic surface significantly compromised the ability of LEDGF/p75 to both associate with chromatin and stimulate HIV-1 integration. These findings collectively indicate that LEDGF/p75-mediated tethering of lentiviral PICs to actively transcribed genes provides IN with increased access to nucleosomal DNA, which are the favored sites for integration both in vitro and in infected cells6, 14, 44 (Figure 2A).
Figure 2
HIV and MLV Pre-integration Complexes Are Tethered to Chromatin by Different Mechanisms
(A) The HIV pre-integration complex (PIC) is tethered to transcribed gene regions (active gene body), marked by specific histone modifications (H3K20me1, H3K27me1, H3K36me3), by the LEDGF-P75 protein, which interacts with the HIV integrase (IN) through an integrase-binding domain (IBD) and with histones through its PWWP domain. An AT-hook domain (AT) mediates interaction with AT-rich DNA sequences on genomic DNA. (B) The MLV PIC is tethered to transcriptionally active promoters (left), enhancers, and super-enhancers (right) through interaction of IN with bromodomain/extraterminal domain proteins (BETs) bound to acetylated histones. A simplified transcription initiation complex is shown on the promoter, including the TATA-binding protein (TBP), the basal transcription factors TFIIB and TFIIB, the Mediator complex, and an elongating RNA polymerase II (Pol-II). A simplified histone acetylation complex is shown on the enhancer, including histone acetyl transferases (HATs) and p300. Promoters (left) are marked by H3K4me2/3 histone modifications, histone acetylation, and the H2AZ histone variant. Enhancers (right) are marked by H3K4me1/2 and H3K27ac histone modifications.
HIV and MLV Pre-integration Complexes Are Tethered to Chromatin by Different Mechanisms(A) The HIV pre-integration complex (PIC) is tethered to transcribed gene regions (active gene body), marked by specific histone modifications (H3K20me1, H3K27me1, H3K36me3), by the LEDGF-P75 protein, which interacts with the HIV integrase (IN) through an integrase-binding domain (IBD) and with histones through its PWWP domain. An AT-hook domain (AT) mediates interaction with AT-rich DNA sequences on genomic DNA. (B) The MLV PIC is tethered to transcriptionally active promoters (left), enhancers, and super-enhancers (right) through interaction of IN with bromodomain/extraterminal domain proteins (BETs) bound to acetylated histones. A simplified transcription initiation complex is shown on the promoter, including the TATA-binding protein (TBP), the basal transcription factors TFIIB and TFIIB, the Mediator complex, and an elongating RNA polymerase II (Pol-II). A simplified histone acetylation complex is shown on the enhancer, including histone acetyl transferases (HATs) and p300. Promoters (left) are marked by H3K4me2/3 histone modifications, histone acetylation, and the H2AZ histone variant. Enhancers (right) are marked by H3K4me1/2 and H3K27ac histone modifications.Interestingly, LEDGF/p75 fails to interact with INs from other retroviral genera.28, 45, 46 In vitro assays with purified INs have revealed that LEDGF/p75 significantly stimulates the strand transfer activity of lentiviral but no other retroviral INs.27, 29, 39, 47 Studies using LEDGF/p75 knockout cells revealed 5- to 80-fold defects in HIV-1 infectivity, associated with ∼2- to 12-fold reduction in integration.48, 49, 50, 51 Significant inhibitory effects on HIV-1 replication were also observed in cells engineered to express the LEDGF/p75 IBD domain only.52, 53, 54 LEDGF/p75 depletion and overexpression of dominant-interfering IBD constructs do not significantly affect HIV-1 reverse transcription while selectively impairing provirus integration.Genome-wide IS mapping provided additional evidence for the role of LEDGF/p75 in the selectivity of HIV-1 integration. LEDGF/p75 knockdown significantly reduces integration into active genes,46, 55 while complete knockout kills infectivity and dramatically changes integration preferences, with a significant percentage of proviruses aberrantly located near TSSs.48, 49 Interestingly, chimeric LEDGF/p75 proteins can retarget lentiviral integration: replacing the N-terminal PWWP domain and AT hooks with a plant homeodomain redirects HIV-1 integration to TSSs, while the use of the chromobox homolog 1 (CBX1) and heterochromatin protein 1 (HP1) alpha chromatin-binding modules randomizes the integration pattern.56, 57 Mapping of the LEDGF/p75 chromatin-binding profile has revealed a preference for binding active transcription units, which paralleled the enhanced HIV-1 integration frequencies at these locations. Collectively, these findings provide strong evidence that LEDGF/p75 tethers PICs to active transcription units during HIV-1 integration.Although LEDGF/p75 can potently stimulate HIV-1 IN catalytic function in vitro,27, 29, 39, 59, 60 it is unclear whether it provides this function during natural virus infection. Normal levels of HIV-1 PIC activity are maintained in LEDGF/p75 knockout cells, indicating that LEDGF/p75 may provide chromatin-tethering functions without contributing to the formation of a catalytically active PIC. HIV-1 integration in transcription units is over-represented even in LEDGF/p75 knockout cells,48, 49, 50 suggesting a potential role of other cellular proteins in the IS selection. In particular, HRP2 was closely investigated because of its structural similarity with LEDGF/p75: in vitro assays with purified proteins demonstrated that HRP2 tightly binds HIV-1 IN and significantly stimulates its catalytic function, although, unlike LEDGF/p75, it does not bind chromatin throughout the cell cycle. HRP2 depletion in cells containing normal levels of LEDGF/p75 has no effect on HIV-1 infectivity or integration preferences,50, 52, 62, 63 while depletion of both HRP2 and LEDGF/p75 reduces integration into active genes.63, 64 The preference of HIV-1 for active genes remains greater than random even in LEDGF/HRP2 double-knockout cells, suggesting that additional host factors may play a role in HIV target site selection.
Lentiviral Integration and Nuclear Import
A striking feature of HIV-1 integration is the preference for large chromosomal domains, or hotspots, rich in active genes,3, 17, 65, 66 which is not explained by the chromatin-binding characteristics of LEDG/p75 or other chromatin interactors. Recent research on the mechanisms of HIV active entry into the cell nucleus showed that the interaction between the HIV PIC and components of the nuclear import machinery plays a key role in tethering HIV integration to its target chromatin, and pointed to nuclear topology as a major determinant of target site selection. The nuclear pore complex (NPC) mediates docking and entry of the HIV into the nucleus through direct interaction of the PIC with nucleoporins, thereby indirectly targeting HIV integration to the pore-proximal chromatin regions.21, 67 The nuclear pore is associated with 10- to 500-kb macrogenomic domains of open, actively transcribed chromatin. Knockdown of the Nup358/RanBP2 or Nup153 components of the NPC impair HIV-1 nuclear entry and bias integration toward a non-canonical pattern, indicating that nuclear translocation and IS selection are coupled processes.68, 69, 70 Interestingly, HIV ISs in cells depleted of the chromatin-proximal Tpr nucleoporin, which is not required for HIV nuclear entry, are less associated with epigenetic marks of transcribed genes (H3K36me3), as revealed by super-resolution microscopy. Interaction with the NPC is mediated by the HIV-1 capsid protein (CA) encoded by the Gag gene. Accordingly, chimeric HIV-1 viruses containing the MLVGag showed a reduced integration frequency into gene-rich regions.69, 70, 71To account for the overall integration preferences of HIV-1, a two-step model has been proposed whereby NPC components direct HIV-1 PICs toward membrane-proximal euchromatic regions of high gene density, whereas LEDGF/p75 tethers the PIC to the transcribed gene bodies within these regions.69, 70 A recent study showed that the integration hotspots are in fact located in close proximity to the nuclear pore, and that the likelihood of integration into an active gene is a function of its radial distance from the nuclear membrane (Figure 3). Because the topological organization of chromatin is cell specific, integration hotspots vary from one cell type to another, reflecting their position with respect to the NPC. Hotspots are therefore essentially generated by the topological organization of chromatin in the cell nucleus, while the local preference for transcribed regions within hotspots is actively determined by the PIC through its tethering interactions primarily with LEDGF/p75 (Figure 3). Because all the integration preferences are mediated by the protein components of the PIC, LVs faithfully reproduce the integration characteristics of parental HIV.
Figure 3
Overview of Lentiviral Nuclear Entry and Integration
The lentiviral pre-integration complex (PIC) docks to and passes through the nuclear pore via the interaction with Nup153 and other nucleoporins, and integrates preferentially in the euchromatic regions (in green) near the nucleopores. The lentiviral PIC is then targeted to regions marked by histone modifications specific of transcribed gene (H3K36me3) through tethering operated by LEDGFp75 (see Figure 2). The hot-cold color scale indicates low (cold) and high (hot) probability of lentiviral integration. IN, HIV-1 integrase; LAD, lamina-associated domains; Nup153, nucleoporin 153. (Modified from Marini et al., 2015.)
Overview of Lentiviral Nuclear Entry and IntegrationThe lentiviral pre-integration complex (PIC) docks to and passes through the nuclear pore via the interaction with Nup153 and other nucleoporins, and integrates preferentially in the euchromatic regions (in green) near the nucleopores. The lentiviral PIC is then targeted to regions marked by histone modifications specific of transcribed gene (H3K36me3) through tethering operated by LEDGFp75 (see Figure 2). The hot-cold color scale indicates low (cold) and high (hot) probability of lentiviral integration. IN, HIV-1 integrase; LAD, lamina-associated domains; Nup153, nucleoporin 153. (Modified from Marini et al., 2015.)
Gamma-Retroviruses Target Transcriptionally Active Promoters and Enhancers
MLV belongs to the gamma-retrovirus family, historically known as Oncoretroviridae since the discovery that viruses of this family can transmit leukemia to newborn mice. Because of their relatively simple structure and efficiency in infecting hematopoietic cells, replication-defective MLV-based vectors were the first gene transfer tools used in gene therapy for hematological disorders. The MLV integration profile has been extensively analyzed in HSPCs and other cell types for many years, particularly after the occurrence of insertional leukemia in patients treated for inherited immunodeficiencies.73, 74 High-throughput sequencing technology recently extended the number of analyzed MLV ISs into the millions, providing deep insight into the mechanism of MLV target site selection.Initial, low-resolution integration studies showed that MLV and MLV-derived vectors had a modest bias for integration into active genes with a peculiar distribution around TSSs, with ∼20% of insertions landing 2.5 kb upstream or downstream of the +1 position of target genes.4, 15, 16, 65 Thus, gene promoters were considered for some time as the major target of MLV integration. Later studies showed that MLV ISs are enriched in RNA Pol-II binding sites, CpG islands, DNase I hypersensitive sites, TFBSs, phylogenetically conserved non-coding sequences, and binding sites for the p300/CBP histone acetyl transferase, often predictive of cis-acting regulatory elements.20, 71, 75 High-definition integration maps showed that regions bound by the Pol-II basal transcriptional machinery, such as core promoters, are protected from the MLV insertion,17, 20 confirming that integration is directed to TF-bound transcriptionally active elements and not simply to open chromatin regions. Genome-wide integration and association studies showed that the MLV bias for TSSs is just a consequence of a more general preference of MLV PICs for active regulatory elements, particularly enhancers and super-enhancers.17, 18, 76, 77, 78 High-resolution studies carried out in human HSPCs and committed progenitors,17, 18, 78 T cells,20, 79 and keratinocytes indicated that MLV ISs are highly clustered and occur almost exclusively in regions carrying epigenetic signatures of transcriptionally active regulatory regions, such as acetylation of H3K27 and H3K9, and mono-, di- and tri-methylation of H3K4 (Figure 1). Symmetrically, typical heterochromatic marks, such as H3K27me3 and H3K9me3, are significantly under-represented at these genomic loci.14, 17, 20, 76 As a consequence, the MLV integration pattern is strongly cell specific and depends on the enhancer and promoter usage of any given cell type: all studies invariably showed a correlation between MLV integration and gene expression levels,3, 16, 17, 20, 65, 80 and a significant bias toward functional categories such as developmental regulation, cell growth, and cell differentiation.20, 65, 76, 80, 81 These integration characteristics suggest that oncoretroviruses may have developed a unique integration strategy that, by coupling target site selection to cell-specific gene regulation, maximizes the chances of maintaining proviral expression in the target cells. In addition, integration around promoters and regulatory elements of cell growth and differentiation may increase the chances of inducing clonal expansion or transformation, and ultimately favor viral propagation. This hypothesis is supported by recent chromatic conformation capture studies showing that MLV ISs co-localize in tridimensional clusters enriched for known cancer genes in the nucleus of target cells, suggesting that MLV proviruses engage in long-range chromatin interactions that may favor oncogene deregulation.
Mechanism of Gamma-Retroviral Target Site Selection
As for HIV, at least two determinants are involved in MLV target site selection, the viral PIC and the host cell factors that tether it to chromatin. The viral determinants of MLV target site selection have been extensively investigated by genetic and biochemical analysis. Early studies indicated that an HIV-1 vector packaged with an MLV IN acquires the MLV-specific bias for TSSs, CpG islands and TFBS-rich regions,71, 75 identifying the IN protein as the main viral determinant of MLV target site selection. Given the preference for active transcriptional regulatory elements, components of the basal RNA Pol-II transcriptional machinery were obvious candidates as cell-derived tethering factors. Early data indicated that the protein BAF (barrier to autointegration factor) is physically associated to the MLV PICs. BAF was originally identified as an inhibitor of suicide integration of the MLV provirus, which promotes efficient intermolecular DNA recombination. Although essential for PIC integration activity, interaction with BAF did not explain the MLV integration preferences. A yeast two-hybrid screen of proteins potentially interacting with the MLV IN later provided a number of potential binding targets, many of which were indeed components of active chromatin and transcription complexes. More recent mass spectrometry-based proteomic analysis of human cellular proteins co-purifying with recombinant MLV IN identified the bromodomain-containing BET proteins BRD2, BRD3, and BRD4 as main binding partners of the viral IN protein (Figure 2B). BRD2 was in fact one of the interactors identified by the yeast two-hybrid screening. Enhancers are the major source of BRD4-dependent transcriptional activation, and genes under the control of super-enhancers are particularly sensitive to BET inhibition.BRD2, BRD3, BRD4, and BRDT belong to the extended BET protein family that includes BRD1, BRD7, BRD8, and BRD9. BRD2, BRD3, and BRD4 are ubiquitously expressed, whereas BRDT is only expressed in the testis. BET proteins have been implicated in transcription, DNA replication, and cell-cycle control. They exhibit two N-terminal bromodomains (BD-I and BD-II) that bind acetylated H3 and H4 tails, but not with their unmodified counterparts, and two conserved motifs, A and B, which are positively charged and could contribute to DNA binding. The simultaneous binding to acetylated histone tails and nucleosomal DNA could be a generic mechanism used by chromatin tethers to achieve tighter and more specific interactions with chromatin. The conserved C-terminal extra-terminal (ET) and SEED, Ser/Glu/Asp-rich, domains directly bind several cellular proteins including transcription factors, chromatin-modifying proteins, and histone modification enzymes. BET proteins directly bind acetylated histone tails, so their association with strong enhancers is mediated by direct interactions with H3K27ac and H3K9ac and/or with other chromatin readers. Chromatin binding sites of BET proteins have been mapped by chromatin immunoprecipitation sequencing (ChIP-seq) experiments and show a positive correlation with MLV, but not with HIV-1 or ASLV ISs.Binding of BRD2, BRD3, and BRD4 to the MLV IN is mediated by the ET domain.86, 94, 95 Solution NMR and protein interaction studies showed that the unstructured C-terminal polypeptide (TP) domain of the MLV IN directly binds the BET ET domain and acquires a secondary structure upon complex formation. Deletion of the TP domain does not disrupt the catalytic properties of IN but widely alters the MLV targeting profile in human293T cells, with significantly reduced association to TSSs, CpG islands, and BRD2, BRD3, and BRD4 binding sites. Downregulation of BET protein expression by small interfering RNAs (siRNAs), or inhibition of their binding to chromatin by the small-molecule inhibitor JQ-1, dramatically alters the canonical MLV integration profile in HEK293T cells, with a dose-dependent, 4-fold reduction of the integrations around TSSs.94, 95 Complementary, genetic evidence for the role of BET association in directing MLV integration was provided by expressing an artificial LEDGFp75-BET fusion protein in murineNIH 3T3 and humanSupT1 cells, which retargeted MLV integration into actively transcribed gene bodies, mimicking the HIV-1 integration pattern.Overall, these studies demonstrate that BET proteins are the MLV counterpart of LEDGF/p75 for HIV-1, i.e., specific chromatin tethers that interact with the PIC by binding the IN and possibly stimulating its enzymatic function (Figures 2 and 4). Although the interaction of MLV IN with BET proteins is probably the main determinant of gamma-retroviral integration target site selection, it is remarkable that even in the presence of potent inhibitors of BET activity such as JQ-1, integration at active regulatory elements is still substantially higher compared to random controls or to HIV-1. This suggests that additional host and/or viral factors may contribute to the integration preferences of MLV. Early studies with chimeric HIV viruses packaged with MLVGag proteins had suggested that other components of the PIC, in particular the p12Gag protein, may have a role in MLV target site selection. The MLVp12 is a constituent of the PIC, but its function in the complex remains ill-defined. Imaging of the MLV PIC traffic in live cells allowed the visualization of its docking to mitotic chromosomes and its release upon exit from mitosis. Docking occurs concomitantly with the breakdown of the nuclear envelope and is impaired in PICs carrying lethal p12 mutations. The insertion of a heterologous chromatin binding module into p12 restores PIC attachment to chromosomes, confirming the role of p12 as a tethering factor to mitotic chromosomes. Later studies, however, demonstrated that mutations altering the p12 interaction with chromatin have no detectable effects on the MLV integration pattern, indicating that p12 plays a role in MLV integration by allowing chromosome association but has no role in the target site selection process.
Figure 4
Overview of the Genomic Characteristics Favoring Gamma-Retroviral Integration
Retroviral integration occurs preferentially in active transcriptional regulatory elements (promoters, enhancers, and super-enhancers), associated to specific histone modifications (H3K4me1-3, H3K27ac) and bound by histone acetyltransferases and bromodomain/extraterminal domain-containing proteins (CBP/p300, BETs) (see Figure 2). The dotted line indicates the disassembled nuclear membrane, necessary for the MLV PIC to access chromatin. IN, MLV integrase; MLV, Moloney leukemia virus.
Overview of the Genomic Characteristics Favoring Gamma-Retroviral IntegrationRetroviral integration occurs preferentially in active transcriptional regulatory elements (promoters, enhancers, and super-enhancers), associated to specific histone modifications (H3K4me1-3, H3K27ac) and bound by histone acetyltransferases and bromodomain/extraterminal domain-containing proteins (CBP/p300, BETs) (see Figure 2). The dotted line indicates the disassembled nuclear membrane, necessary for the MLV PIC to access chromatin. IN, MLV integrase; MLV, Moloney leukemia virus.
Alpha-Retroviruses Integrate Almost Randomly in Mammalian Cells
ASLV is a member of the alpha-retrovirus family whose natural host is chicken. The natural viral host tropism can be altered in ASLV-derived vectors by envelope pseudotyping, in order to efficiently infect mammalian cells. Genome-wide analysis of the integration pattern of alpha-retroviral vectors in humanCD34+ HSPCs revealed a weak preference for open chromatin regions, as in the case of gamma-retroviral and LVs, but no bias for transcribed genes or transcriptional regulatory elements, as defined by the epigenetic marks H3K4me1 and H3K4me3. The nearly random ASLV insertion profile in mammals has encouraged the development of optimized vectors as gene transfer tools for gene therapy applications. However, the biology of ASLV-host interaction is much less studied compared to HIV or MLV, and nothing is known about the viral and cellular determinants of its association to chromatin.
Integration Preferences of Other Retroviruses
The mouse mammary tumor virus (MMTV) is a representative of the beta-retrovirus family. The only MMTV integration profile so far described was obtained by a very limited (<500) set of ISs in the human and murine genomes, essentially from infected mammary (murine NmuMG and human Hs587T) and non-mammary (HeLa and HEK293) cell lines. In all cases, MMTV displayed the most random IS distribution among all known retroviruses, with no preference for active genes, TSSs, gene-dense regions, CpG islands, or DNase hypersensitive sites.102, 103The human T cell leukemia virus type 1 (HTLV-1) is the only component of the delta-retroviral family for which an integration profile has been determined to date. Initially, 541 unique HTLV-1 ISs were cloned and mapped in HeLa cells, showing that the virus integrates into the human genome with a modest though significant preference for TSSs, transcription units, promoters, and gene-dense regions. More recent data, obtained from >2,100 ISs of HTLV-1 in the humanJurkat T cell line, revealed a strong bias for genes, promoters (identified by CpG islands), and epigenetic marks associated to gene expression control. The same pattern was observed in vivo in peripheral blood mononuclear cells of HTLV+ patients, but mitigated by a strong selection against clones harboring a highly expressed provirus operated by the host immune response against the HTLV-1 infection.The humanFV is the prototype of the Spumaviridae genus of retroviruses. First isolated from a human cell line, it was later reported to have a chimpanzee origin. FV vectors have been designed and pseudotyped to have a broad host range, large packaging capacity, and high transduction efficiency in human hematopoietic cells, and developed as an alternative to MLV vectors for gene therapy of hematological disorders, particularly AIDS. Low-resolution profiling of FV ISs showed integration preferences similar to, though weaker than, those of MLV for CpG islands and TSSs,107, 108 confirmed by more recent analysis on human HSPCs after transplantation in immunodeficient mice. A study analyzed 139 FV-derived vector insertion sites recovered ex vivo from human HSPCs after transplantation in NOD/SCIDmice, which showed a weak preference for integration close to oncogenes. Given the very small size of the analyzed samples and the confounding effect of the in vivo selection, it is difficult to conclude from these studies whether FV may have evolved a tethering mechanism similar to that of MLV. The FVGag contains a C-terminal chromatin-binding site (CBS) that allows the interaction of PICs with the core histones H2A and H2B on host chromatin. FV PICs can be retargeted to satellite elements and H3K9me3-positive heterochromatin by modifying the Gag CBS and the IN proteins, indicating that a Gag/IN-based tethering mechanism is most likely at the basis of virus-host interaction also in the case of spumaviruses.
Conclusions
A large body of studies on the biochemistry and molecular genetics of retroviral integration has provided a wealth of information on the remarkably complex and evolved patterns of interactions between the retroviral integration machinery and the mammalian, and particularly human, genome. This knowledge has helped in understanding the basis of the “genotoxic” integration events that caused the occurrence of malignant and myelodysplastic proliferation in some gene therapy trials for inherited blood diseases, and provided clues for the design of safer gene transfer tools. The current clinical applications of retroviral gene transfer technology, either ex vivo or in vivo, involve the transduction of millions of cells, in most cases with stem-progenitor characteristics, and the generation of a very high number of potentially mutagenic insertion events. The integration preferences of each vector type make some of these events more or less likely to happen, but given the numbers involved, even a completely random integration machinery would have only a few-fold lower probability of inducing a potentially oncogenic or otherwise dangerous mutation than an MLV- or an HIV-based vector. This evidence points to the design of the vector and the nature of the sequences it carries into the genome as major factors determining its potential genotoxicity: replacement of potent, long-range-acting promoter/enhancer elements of viral origin with constitutive promoters of cellular origin dramatically reduces the potential mutagenic effect of vector integration.113, 114 The development of high-throughput technology for mapping viral ISs in small samples has made long-term monitoring of the dynamics of genetically modified cells in patients a reality, increasing the safety of using viral vectors in the clinics while providing new knowledge on stem cell dynamics. Many approaches have been proposed in the last few years to replace transgenesis based on viral INs with more “intelligent” machineries achieving site-directed rather than semi-random integration, and gene correction rather than gene addition. Although extremely promising, these techniques will probably take years before matching the unsurpassed transduction efficiency of retroviral vectors and becoming practically applicable in a clinical context. For the time being, improving the safety profile of retroviral vectors and predicting and evaluating at best the risks and benefits of each specific therapeutic application remain the priorities in clinical gene therapy.
Authors: Anne M Meehan; Dyana T Saenz; James Morrison; Chunling Hu; Mary Peretz; Eric M Poeschla Journal: J Virol Date: 2011-01-26 Impact factor: 5.103
Authors: Jakob Lovén; Heather A Hoke; Charles Y Lin; Ashley Lau; David A Orlando; Christopher R Vakoc; James E Bradner; Tong Ihn Lee; Richard A Young Journal: Cell Date: 2013-04-11 Impact factor: 41.582
Authors: Gary LeRoy; Iouri Chepelev; Peter A DiMaggio; Mario A Blanco; Barry M Zee; Keji Zhao; Benjamin A Garcia Journal: Genome Biol Date: 2012-08-16 Impact factor: 13.583
Authors: Valentina Poletti; Sabine Charrier; Guillaume Corre; Bernard Gjata; Alban Vignaud; Fang Zhang; Michael Rothe; Axel Schambach; H Bobby Gaspar; Adrian J Thrasher; Fulvio Mavilio Journal: Mol Ther Methods Clin Dev Date: 2018-03-10 Impact factor: 6.698
Authors: Valentina Poletti; Fabrizia Urbinati; Sabine Charrier; Guillaume Corre; Roger P Hollis; Beatriz Campo Fernandez; Samia Martin; Michael Rothe; Axel Schambach; Donald B Kohn; Fulvio Mavilio Journal: Mol Ther Methods Clin Dev Date: 2018-11-01 Impact factor: 6.698