A Tacheny1, S Michel, M Dieu, L Payen, T Arnould, P Renard. 1. Laboratory of Biochemistry and Cell Biology (URBC), NAmur Research Institute for LIfe Sciences, University of Namur, 61 rue de Bruxelles, 5000 Namur, Belgium.
Abstract
To depict the largest picture of a core promoter interactome, we developed a one-step DNA-affinity capture method coupled with an improved mass spectrometry analysis process focused on the identification of low abundance proteins. As a proof of concept, this method was developed through the analysis of 230 bp contained in the 5'long terminal repeat (LTR) of the human immunodeficiency virus 1 (HIV-1). Beside many expected interactions, many new transcriptional regulators were identified, either transcription factors (TFs) or co-regulators, which interact directly or indirectly with the HIV-1 5'LTR. Among them, the homeodomain-containing TF myeloid ectopic viral integration site was confirmed to functionally interact with a specific binding site in the HIV-1 5'LTR and to act as a transcriptional repressor, probably through recruitment of the repressive Sin3A complex. This powerful and validated DNA-affinity approach could also be used as an efficient screening tool to identify a large set of proteins that physically interact, directly or indirectly, with a DNA sequence of interest. Combined with an in silico analysis of the DNA sequence of interest, this approach provides a powerful approach to select the interacting candidates to validate functionally by classical approaches.
To depict the largest picture of a core promoter interactome, we developed a one-step DNA-affinity capture method coupled with an improved mass spectrometry analysis process focused on the identification of low abundance proteins. As a proof of concept, this method was developed through the analysis of 230 bp contained in the 5'long terminal repeat (LTR) of the humanimmunodeficiency virus 1 (HIV-1). Beside many expected interactions, many new transcriptional regulators were identified, either transcription factors (TFs) or co-regulators, which interact directly or indirectly with the HIV-1 5'LTR. Among them, the homeodomain-containing TF myeloid ectopic viral integration site was confirmed to functionally interact with a specific binding site in the HIV-1 5'LTR and to act as a transcriptional repressor, probably through recruitment of the repressive Sin3A complex. This powerful and validated DNA-affinity approach could also be used as an efficient screening tool to identify a large set of proteins that physically interact, directly or indirectly, with a DNA sequence of interest. Combined with an in silico analysis of the DNA sequence of interest, this approach provides a powerful approach to select the interacting candidates to validate functionally by classical approaches.
The real crossing points between the genome and the proteome of an organism are transcription factors (TFs). They are much more than simple sequence-specific proteins bound or not on conserved cis-regulatory sequences in gene promoters/enhancers (1). Indeed, these factors are responsible for the fine coordination of gene expression by modulating chromatin accessibility, general transcriptional machinery recruitment and also for coordinating interplays between transcription and other nuclear processes such as DNA repair or RNA processing and stability (2). However, despite the importance of deciphering such complex regulatory mechanisms, systematic identification of DNA–protein interactions occurring on regulatory regions of interest is still challenging.In spite of recent progresses due to the development of chromatin immunoprecipitation (ChIP)-based methods such as ChIP-Seq, allowing genome-wide analysis of all the DNA sequences bound by a protein of interest (3,4), methods identifying proteins that interact with a sequence of interest are still poorly developed. Indeed, in silico analysis of the sequence of interest allows the prediction of putative binding sites for TF based on the comparison with consensus-binding sites contained in TF databases such as TRANSFAC (5) or JASPAR (6). However, the results are limited to database-contained TF, highly depend on the algorithm used, do not take into account binding site context like flanking sequences and chromatin organization (7). Thus, they generate a very large number of candidates among which many false positives occur (8). The selection of relevant candidates to validate by classical and time-consuming approaches, including DNA-binding assays, ChIP or reporter-based assays, is therefore uncertain.The past decades have witnessed the progressive development of DNA-affinity approaches combining the capture of DNA-binding proteins on oligonucleotide probes fixed on a chromatographic support followed by the identification of captured proteins by mass spectrometry (MS) (9–11). Such approaches, although quite simple in their principle, are really challenging for essentially two reasons. First, most transcriptional regulators are of low abundance when compared with the bulk of other nuclear proteins. This problem of dynamic range makes critical the efficiency of the capture and the sensitivity of the MS-based identification process. Second, number of proteins, some of them of high abundance, is unspecifically captured by the negatively charged oligonucleotide probe and/or by the chromatographic support. Although different strategies have been proposed to improve the specificity of the DNA-affinity capture, such as prefractionation of nuclear extracts (NE) on successive columns prior to the DNA-affinity purification (12–14) or use of DNA competitors or detergents added before and during binding step (9,15), the major drawback of such strategies is the non-negligible risk to lose weak specific interactions (16). Separation of DNA/proteins complexes from the solid support before protein identification is another way to limit contamination of the results by proteins trapped by the solid support (17). Despite such improvements, relevant identified proteins are still embedded in large amounts of unspecifically bound proteins. Therefore, current DNA-affinity methods are mostly used to compare proteins captured by a short wild-type DNA sequence to those captured by the same sequence in which the binding site of interest has been mutated, providing a list of proteins to subtract (11,18). This principle has been successfully implemented using quantitative proteomics based on isotope-coded affinity tag (ICAT) (19–22) or stable isotope labeling by amino acids in cell culture (SILAC) (23) methodologies, a powerful strategy to identify the TF selectively captured by a specific binding site. Although efficient, this strategy is not fully compatible with an unbiased identification of the entire set of proteins interacting with a relatively large DNA sequence like a core promoter.In this article, we describe an improved DNA-affinity method allowing the one-step identification of an unbiased large set of transcriptional regulators interacting with a relatively long capture probe (over 200 bp), that could correspond to a core promoter interactome. This was made possible by an efficient separation procedure of the protein/DNA complexes from the solid support, by an adapted chromatographic separation of the complex peptide mixture and by a specific MS analysis focused on the identification of low abundant proteins. The proof of concept of this method was made for the analysis of a fragment of the human immunodeficiency virus (HIV)-1 5′ long terminal repeat (LTR), a DNA sequence that contains numerous TF-binding sites. Although this is probably one of the best studied regulatory DNA sequence (24), using this approach, several TF/regulators that were not known to interact with the HIV-1 5′LTR were identified. Among them, we identify myeloid ectopic viral integration site (Meis) that functionally interacts with and down-regulates the transcription of a HIV-1 5′LTR-luciferase construct.
MATERIALS AND METHODS
Cell culture and nuclear extract preparation
HeLa cells (human epithelial cell line, ATCC CCL-2) were grown in Dulbecco's modified eagle medium (DMEM) high glucose (Gibco) supplemented with 10% foetal bovine serum (FBS). The T-lymphoid Jurkat cell line (ATCC TIB-152) was cultured in Roswell Park Memorial Institute (RPMI) 1640 medium (Gibco) supplemented with 10% FBS. Nuclear protein extracts were obtained from unstimulated or interleukin (IL)-1β-stimulated (5 ng/ml during 45 min) HeLa cells were isolated as previously described (25). Protein concentrations were determined using the Pierce 660 nm protein assay (Thermo).
Plasmids
The pLTRwt plasmid containing a fragment of the HIV-1 5′LTR (corresponding to nt 1–789 where nt 1 is the start of the 5′LTR U3 region) upstream of the Firefly luciferase gene and the pLTR*κB mutated for two NFκB-binding sites were previously described (26) and generously given by Prof. C. Van Lint (IBMM, ULB, Belgium). The pLTRwt was used as a template to generate the pLTR*Meis mutant variant using the QuikChange site-directed mutagenesis kit (Stratagene) following the recommendations of the supplier. The point mutation was introduced in the Meis putative binding site with the following pair of primers: Fw: 5′-GTGTTAGAGTGGAGGTTTCAAGCCGCCTAGCATTTC-3′ and Rv: 5′-GAAATGCTAGGCGGCTTGAAACCTCCACTCTAACAC-3′ in which the Meis putative binding site is underlined and the introduced mutation is highlighted in boldface. The mutation was confirmed by sequencing.
Capture probe production and DNA-affinity approach
A 226-bp-long desthiobiotinylated double-stranded oligonucleotide corresponding to a fragment of the HIV-1 5′LTR (nt 229–455, where nt 1 is the start of the 5′LTR U3 region) was produced by polymerase chain reaction (PCR) using pLTR plasmid as template and the following pair of primers: Fw: 5-TGGATGACCCTGAGAGAGAA-3′ and Rv: 5′-CCAGTACAGGCAAAAAGCAG-3′. These primers were modified to allow a reversible immobilization of the capture probe to streptavidin-coated magnetic beads (by adding a 5′-desthiobiotin moiety on the forward primer) and to estimate the probe binding efficiency on beads (Cy3-labelled reverse primer). To eliminate excess of free desthiobiotinylated primers after oligonucleotide amplification, the PCR product was purified using the Wizard SV Gel and PCR clean-up system (Promega). Using the pLTR*κB as a template, a 226-bp-long capture probe mutated for two NFκB-binding sites was also produced. In order to study specific protein interactions at the level of the Meis-1 putative binding site, two 101-bp-long capture probes centered on this site (nt 229–330) were also produced using the following pair of primers: Fw: 5′-TGGATGACCCTGAGAGAGAA-3′ and Rv: 5′-GCAGTTCTTGAAGTACTCCG-3′ and pLTR or pLTR*Meis as template for a wild type or a mutated version of the Meis-centred capture probe, respectively.20 pmoles of capture probe in a final volume of 100 µl of phosphate buffered saline (PBS)50 (10 mM PO4 pH 7.4, 50 mM NaCl) were incubated for 1 h at 21°C on a rotary wheel with 1 mg of streptavidin-coupled magnetic beads (Dynabeads M-280 Streptavidin, Dynal) that have been prior equilibrated with six successive washes (200 µl of PBS50). Three washes with PBS50 were then performed in order to eliminate unbound desthiobiotinylated oligonucleotides prior to the incubation with proteins. An equivalent of 1 mg of NE was pre-incubated for 15 min on ice with 1.5× volume of binding buffer [4 mM Hepes, pH 7.5, 120 mM KCl, 8% glycerol, 2 µM dithiothreitol (DTT), salmon sperm DNA (0.166 µg/µl), PolydIdC (0.166 µg/µl)] and then incubated with beads for 1 h at 21°C on a rotary wheel. Magnetic beads were then extensively washed: once with 500 µl of binding buffer, three times with 1 ml of PBS50 + 0.1% Tween20 and twice with 50 mM NH4HCO3. In order to separate the DNA–protein complexes from beads, beads were re-suspended in a biotin-containing solution (30 µl of 5 mM biotin in 50 mM NH4HCO3) and incubated for 2 h at 21°C on a rotary wheel. Supernatant was containing the DNA–proteins complexes were then collected. Eventually, beads were washed with 10 µl of 5 mM biotin and both supernatants were combined.Proteins bound to capture probes were next prepared for tryptic digestion. Proteins were boiled 5 min with PPS (3-[3-(1,1-bisalkyloxyethyl)pyridin-1-yl]propane-1-sulfonate) Silent Surfactant (Protein Discovery; 0.8 % final concentration), reduced for 30 min at 50°C with DTT (Sigma; 5 mM final concentration) and then alkylated for 30 min in dark using iodoacetamide (15 mM final concentration). Samples were then digested overnight at 37°C with 1.8 µl of trypsin (1 µg/µl in 50 mM NH4HCO3, 1 mM CaCl2; Trypsin Gold, Mass Spectrometry Grade; Promega). After digestion, samples were incubated for 30 min at 21°C on a rotary wheel in the presence of fresh streptavidin-coupled magnetic beads (600 µg) that have been prior equilibrated with four successive washes with PBS50 followed by two washes of 50 mM NH4HCO3. Once the excess of free biotin has been captured and prior to MS run, samples were acidified by adding 1 µl of 12 N HCl and PPS detergent was hydrolysed by a 45-min incubation at 37°C followed by 10 min of centrifugation (13 000 RPM) at 4°C.
Liquid chromatography/tandem mass spectrometry analysis and protein identification
Peptide analyses were performed on a nano-liquid chromatography (LC) system Ultimate 3000 (Dionex) directly coupled to a maXis 4 G electrospray Ultra-High Resolution Q-TOF mass spectrometer (Bruker). Peptides were separated at 60°C by reverse-phase LC using a 75 µm × 500 mm C18 Dionex column (Acclaim PepMap 100 C18) in the Ultimate 3000 LC system. Mobile phase A was 0.1% formic acid in water. Mobile phase B was 0.1% formic acid in 80% acetonitrile. For each run, 15 µl of the digest product was injected and the organic content of the mobile phase was linearly increased from 4% B to 40% during 150 min, from 40% B to 90% B for the next 5 min, maintained at 90% B for 5 min and then decreased at 4% B for the last 25 min. The column effluent was connected to an electrospray ionization (ESI) nano Sprayer (Bruker). As the peptide mixture was complex and presented a high dynamic range, the complete analysis was composed of two successive runs. For the first one, a scheduled precursor list (SPL) was generated on the basis of the most abundant ions sequenced. During the second run, SPL ions were excluded to allow the mass spectrometer to be more focused on less abundant ions. Peak lists from both runs were generated using DataAnalysis 4.0 (Bruker) and saved for use with ProteinScape 2.1 (Bruker) with Mascot 2.2 as search engine (Matrix Science). The peak lists were searched against a decoy database obtained from the mammalian National Center of Biotechnology Information (NCBI) database. For each protein identification, a false-positive rate (FPR) was determined, corresponding to the number of decoy entries/total number of protein entries in the list. Peptide sequences were accepted if the peptide Mascot score was >20 (15 for the differential DNA-affinity capture with the Meis-centred capture probe). Protein identification was accepted if the FPR was <1% otherwise if the protein Mascot score was >60. Protein identification was further analysed and manually verified by using ProteinScape 2.1.
Cell transfection and luciferase reporter assays
HeLa cells were transiently co-transfected with 0.9 µg of Firefly luciferase reporter plasmid (pLTRwt or pLTR*Meis) and 0.1 µg pCMV-βgal (for transfection efficiency normalization) using Superfect (Qiagen) following the manufacturer’s instructions. Cells were seeded 1 day before transfection at 70 000 cells/well in 12-well plates. The DNA/Superfect ratio was 1/2 (µg/µl). Twenty-four hours post-transfection, cells were stimulated or not with IL-1β (5 ng/ml) for an extra 24 h. Luciferase reporter assay (Promega) was performed, as well as β-galactosidase activities to normalize luciferase activities.For Meis-1 knockdown assays, small interfering RNA (siRNA) transfection of HeLa cells was performed with Oligofectamine (Invitrogen) following the manufacturer’s instructions 24 h before the DNA transfection. ON-TARGET plus SMARTpool siRNA against humanMeis-1 (Thermo, L-011726-00) and ON-TARGET plus non-targeting pool (Thermo) were used at the final concentration of 50 nM. Efficient knockdown after siRNA transfection was confirmed after 24 h at the protein level by western blot using goat polyclonal antibodies against Meis-1/2 (Santacruz).Jurkat cells were transiently co-transfected with 0.45 µg of the Firefly luciferase reporter plasmid (pLTRwt or pLTR*Meis) and 0.05 µg pRenillaLuc-TK (for transfection efficiency normalization) using Jurkat Trans-IT (Mirus) according to the manufacturer’s instructions. Cells were seeded the day of transfection at 300 000 cells/well in 24-well plates. The DNA/Jurkat Trans-IT agent ratio was 1/2.5 (µg/µl). Twenty-four hours post-transfection, cells were stimulated or not with IL-1β (5 ng/ml), TNFα (10 ng/ml) or PMA (25 ng/ml) + ionomycin (500 nM). After 24 h of treatment, transfected cells were lysed and luciferase activities were measured using the dual luciferase reporter assay kit (Promega). Firefly luciferase activities were normalized for the Renilla luciferase activities. All assays were performed in three independent experiments containing triplicates and results were presented as means ± SD, and differences were statistically analysed by a Student’s t-test (P values are indicated in the figure legends).
RESULTS
Unbiased proteomic analysis of proteins interacting with a DNA sequence: proof of concept
DNA-affinity capture
As a proof of concept for this method, a 226-bp-long sequence of the HIV-1 5′LTR was used as a model capture probe. This sequence covers the core promoter, the enhancer and a short fragment of the modulatory region of the HIV-1 5′LTR. This well-described promoter can be activated or repressed by the binding of several cellular TF (27,24).Streptavidin-coated magnetic beads were first chosen to immobilize the biotinylated DNA capture probe. Nevertheless, the identification of proteins captured by this complex revealed exclusively unspecifically bound proteins (mainly bovine serum albumin and actin; data not shown), demonstrating the necessity to dissociate the DNA–protein complex from the solid support as suggested by Praseuth and co-workers (17). We therefore took advantage of the reversibility of the desthiobiotin/streptavidin interaction (28). The same DNA sequence, amplified by PCR using a desthiobiotinylated forward primer and a Cy3-labelled reverse primer, was immobilized on streptavidin-coated beads and then displaced from the beads by an excess of biotin. The yields of the oligonucleotide captured by magnetic beads and of the recovery in the presence of biotin were quantified by measuring the Cy3-associated fluorescence. The binding efficiency in optimized conditions represents 60% and this yield could not be increased by higher amounts of oligonucleotide. However, the recovery yield after displacement in the presence of a free biotin excess is total, allowing an efficient recovery of the DNA–protein complexes before MS analysis (data not shown). Recovery using restriction enzyme or photocleavable biotin-based methods was also tested but gave lower recovery yields (data not shown).
Protein separation and identification
Identification of proteins interacting with a relatively long DNA sequence is of great biological interest but represents a technical challenge, as it requires the analysis of a complex mixture of peptides generated by the digestion of many proteins for which abundance might be distributed on a high dynamic range. Current progress in nano-LC-tandem mass spectrometry (MS/MS) encouraged us to choose a gel-free approach rather than a one- or two-dimensional electrophoresis protein separation that might suffer from poor detection of low abundant proteins (29). Proteins purified by DNA affinity were digested with trypsin. Of note, in-solution trypsic digestion had to be improved as regular digestion protocols generated incompletely digested proteins (data not shown). The complex peptide mixture obtained underwent a long reverse-phase chromatography separation before being sequenced by MS (Figure 1). To cope with the problematic large dynamic range between highly abundant non-specifically bound proteins and potentially low abundant transcriptional regulators, MS/MS analysis was adapted to proceed in two successive runs for each sample. Raw data from the first run, largely composed of peptides from abundant bound proteins, were used to generate a SPL. This list contained the most abundant peptides sequenced during the first run and was then used as a peptide exclusion list during the second run to specifically focus on the sequencing of low abundant peptides. Peptides sequenced from both runs were then merged to proceed to protein identification.
Figure 1.
Schematic representation of the DNA-affinity capture method coupled with an improved mass spectrometry analysis process.
Schematic representation of the DNA-affinity capture method coupled with an improved mass spectrometry analysis process.
Identification of the TF NFκB captured by the HIV-1 5′LTR sequence
The process described above and illustrated in Figure 1 was first applied, as a proof of concept, to identify a TF previously reported to bind the HIV-1 5′LTR sequence. We chose NFκB as the HIV-1 5′LTR sequence contains two distinct κB-binding sites essential for efficient HIV-1 transcription (30). Although the NFκB family contains several members (p105/p50, p100/p52, c-Rel, RelB and RelA) (31), the p50/RelA heterodimer is the most common transcriptionally active form described to interact with the HIV-1 5′LTR (32). NFκB is a TF sequestered in the cytoplasm in most resting cell types, which can be activated and translocated in the nucleus after cell stimulation with IL-1β, among others (33). Thus, comparing NE prepared from unstimulated and IL-1β-stimulated cells provides conditions for differential binding of NFκB to the HIV-1 5′LTR.The NFκB family members identified in these conditions are listed in Table 1. In NE from unstimulated cells, p50 is the only NFκB member captured by the probe and identified. This is consistent with the repressive role of p50 homodimer, due to the lack of a transactivation domain, known to contribute to viral latency (34). By contrast, after stimulation with IL-1β, various peptides corresponding to different proteins belonging to the NFκB family were identified. The large number of peptides sequenced for p50 and RelA suggests that these proteins are the most abundant NFκB-family members captured by the probe, but it is not surprising to find peptides of c-Rel and p52 members, which are also expressed in HeLa cells (35). The specificity of this interaction was assessed by using a capture probe with NFκB-mutated binding sites (26). Indeed, when the mutated capture probe was incubated with NE prepared from IL-1β-stimulated HeLa cells, no peptides of any NFκB-family members were identified among proteins captured, confirming the specificity of the assay.
Table 1.
NFκB family proteins identified by LC-MS/MS after DNA-affinity capture
Capture probe
HeLa nuclear extracts
Identified NFκB family proteins
Uniprot accession no.
Mascot score
Peptide number
SC (%)
WT
Unstimulated
Nuclear factor NFκB p105 subunit/p50
P19838
62.67
2
4.3
WT
IL-1β
Transcription factor RelA (p65)
Q04206
504.6
68
30.2
Nuclear factor NFκB p105 subunit/p50
P19838
660.55
52
16.6
Nuclear factor NFκB p100 subunit/p52
Q00653
533.79
49
34.2
Proto-oncogene c-Rel
Q04864
263.85
22
13.7
NFκB-mutated
IL-1β
None
A fragment of the HIV-1 5′LTR (nt 229–455), wild-type or mutated for two NFκB-binding sites, was used as a capture -probe and incubated in the presence of nuclear extracts from unstimulated or IL-1β-stimulated HeLa cells. LC-MS/MS data were compared by Mascot to a decoy database and only identified proteins with a false-positive rate <1%, otherwise a protein Mascot score >60, were accepted. SC represents the percentages of sequence coverage for each identified protein.
NFκB family proteins identified by LC-MS/MS after DNA-affinity captureA fragment of the HIV-1 5′LTR (nt 229–455), wild-type or mutated for two NFκB-binding sites, was used as a capture -probe and incubated in the presence of nuclear extracts from unstimulated or IL-1β-stimulated HeLa cells. LC-MS/MS data were compared by Mascot to a decoy database and only identified proteins with a false-positive rate <1%, otherwise a protein Mascot score >60, were accepted. SC represents the percentages of sequence coverage for each identified protein.
In-depth analysis of proteins that directly or indirectly interact with a fragment of the HIV-1 5′LTR
This analysis aims to depict the most general picture of the proteins bound to a DNA sequence of interest, in a completely unbiased approach, in order to identify putative new interacting partners. The analysis of relatively long DNA sequences implies dealing with a large number of proteins, requiring a rigorous step-by-step data processing, detailed in Supplementary Figure S1. This analysis process takes into account the FPR, the quality of the sequencing (indicated by a minimum Mascot score), the redundantly identified proteins with several names and the technical contaminants previously defined by a “blank” experiment consisting of a MS identification of proteins captured during the complete procedure by oligonucleotide probes incubated without any NE (Supplementary Table S1). For instance, for one biological sample, 79 proteins were finally considered, out of 259 initially identified candidates listed in the raw data. Eventually, to reinforce protein identification resulting from the purification process, sets of data were accumulated from a minimum of three independent biological replicates. In this study, all the identified proteins captured by the 226-bp-long HIV-1 5′LTR were pooled in a global list and proteins identified only once were not considered. A total protein number of 125 was generated after this global data treatment (111 identified proteins from unstimulated samples and 117 from IL-1β-stimulated samples). According to Uniprot general protein annotations and to information from the literature, proteins identified from both experimental conditions were further classified into four categories based on their functions, specifically, transcription-related proteins, nucleic acid-binding proteins without described implication in transcription, proteins without any reported nuclear function and proteins with unknown function. As shown in Figure 2, a large majority of identified proteins might be implicated in the regulation of transcription. Detailed list of other identified proteins is available in Supplementary Table S1.
Figure 2.
Functional classification of proteins identified by LC-MS/MS after DNA-affinity capture. A fragment of the HIV-1 5′LTR (nt 229–455) was used as a capture probe and incubated in the presence of nuclear extracts from unstimulated or IL-1β-stimulated HeLa cells. LC-MS/MS data were compared by Mascot to a decoy database and only identified proteins with a false-positive rate <1% otherwise a Mascot score >60, were accepted. Each experimental condition was reproduced independently three times. Identified proteins from negative control analysis (“blank”) or identified proteins present only in one of the six analyses were subtracted from the total number of identified proteins.
Functional classification of proteins identified by LC-MS/MS after DNA-affinity capture. A fragment of the HIV-1 5′LTR (nt 229–455) was used as a capture probe and incubated in the presence of nuclear extracts from unstimulated or IL-1β-stimulated HeLa cells. LC-MS/MS data were compared by Mascot to a decoy database and only identified proteins with a false-positive rate <1% otherwise a Mascot score >60, were accepted. Each experimental condition was reproduced independently three times. Identified proteins from negative control analysis (“blank”) or identified proteins present only in one of the six analyses were subtracted from the total number of identified proteins.We focused our attention on identified TF trapped by this promoter sequence (Table 2). For half of them, an interaction with the HIV-1 sequence has already been described in the literature (Figure 3A). This is the case for previously discussed NFκB family members (c-Rel, p105/p50, p100/p52 and RelA) and also for three members of the upstream stimulatory factor (USF) family (36), the activating enhancer-binding protein-4 (AP-4) (37), the constitutively activated specific protein-1 (Sp1) and -3 (Sp3) (38), the nuclease-sensitive element-binding protein 1 (YB-1) (39) and the chicken ovalbumin upstream promoter TF (COUP-TF), described to interact indirectly with this sequence through an interaction with Sp1 (40). Of note, the HIP116/helicase-like TF has also been reported to interact with HIV-1 5′LTR sequence (41) and was identified by MS in three of the six samples analysed, although the identification did not fit to the MS validation criteria adopted in this study. These results support the efficiency of this strategy.
Table 2.
List of transcription factors identified by LC-MS/MS after DNA-affinity capture
Protein name
Uniprot accession no.
Previously described interaction with the sequencea
Unstimulated cells
IL-1β stimulated cells
Experiment 1
Experiment 2
Experiment 3
Experiment 1
Experiment 2
Experiment 3
Mascot score
Peptide number
Mascot score
Peptide number
Mascot score
Peptide number
Mascot score
Peptide number
Mascot score
Peptide number
Mascot score
Peptide number
Proto-oncogene c-Rel
Q04864
Y (93)
0
0
0
0
0
0
263.85
22
204.81
19
326.3
12
Nuclear factor NFκB p105 subunit/p50
P19838
Y
62.67
2
0
0
0
0
660.55
52
499.19
43
714.45
20
Nuclear factor NFκB p100 subunit/p52
Q00653
Y
25.6
1
25.6
1
135.53
6
533.79
49
414.28
41
700.39
20
Transcription factor RelA (p65)
Q04206
Y
35.8
1
36.4
1
86.18
2
504.6
68
593.75
83
808.04
36
Upstream stimulatory factor 1
P22415
Y (36,94)
279.62
23
219.22
25
256.02
8
302.89
42
240.57
22
131.68
4
Upstream stimulatory factor 2
Q64705
Y
238
21
0
0
106.85
5
327.3
41
0
0
56.12
1
Upstream stimulatory factor 2 c (USF2c)
Q6YI47
Y
265.45
19
117.88
12
0
0
347.74
30
0
0
0
0
Transcription factor AP-4
Q01664
Y (37)
110.72
4
27.7
1
272.1
10
101.14
8
45.7
2
325.74
10
Transcription factor Sp1
P08047
Y (38)
191.39
13
0
0
0
0
115.75
9
35.9
1
0
0
Transcription factor Sp3
Q02447
Y (38)
192.19
15
0
0
0
0
37.5
1
0
0
0
0
Nuclease-sensitive element-binding protein 1 (YB-1)
P67809
Y (39)
99.48
2
161.55
14
114.4
3
28.8
1
176.54
13
96.09
2
COUP transcription factor 1
P10589
Y (40)
238.31
12
25.7
1
0
0
265.99
22
72.73
5
256.22
8
Homeobox protein Meis
O00470
N
0
0
0
0
0
0
77.45
3
0
0
75.17
1
Pre-B-cell leukemia transcription factor 1 (PBX1)
P40424
N
69.16
6
0
0
70.86
1
179.78
11
0
0
137.86
3
Steroid hormone receptor ERR1
P11474
N
346.44
19
81.45
10
0
0
342.18
35
79.16
8
166.95
6
Circadian locomoter output cycles protein kaput (clock)
O15516
N
84.66
4
0
0
0
0
137.61
5
0
0
0
0
Class E basic helix-loop-helix protein 40 (DEC1)
O14503
N
123.09
6
0
0
0
0
181.68
11
41.7
1
0
0
Kruppel-like factor 16 (BTE-binding protein 4)
Q9BXK1
N
145.34
8
0
0
0
0
142.88
12
0
0
0
0
Kruppel-like factor 5 (BTE-binding protein 2)
Q13887
N
60.45
2
0
0
0
0
44.8
1
0
0
0
0
MYC associated factor X (MAX)
P61244
N
150.08
7
0
0
152.04
4
121.65
14
0
0
108.41
3
MAX gene associated protein (MGA)
B9EGR5
N
175.3
10
0
0
0
0
309.47
18
0
0
0
0
Max dimerization protein 3 (MAD3)
Q9BW11
N
168.54
9
0
0
0
0
309.89
17
0
0
0
0
Nuclear factor 1 X-type (CCAAT-box-binding TF)
Q14938
N
0
0
0
0
51.7
1
0
0
0
0
111.27
5
Endothelial differentiation-related factor 1
O60869
N
29.9
1
0
0
0
0
63.2
4
0
0
0
0
A fragment of the HIV-1 5¢LTR (nt 229–455) was used as a capture probe and incubated in the presence of nuclear extracts from unstimulated or IL-1b-stimulated HeLa cells. Three independent DNA-affinity capture experiments were performed for each experimental condition (Exp 1–3). Identified proteins from negative control analysis or proteins identified only in one of the six analyses were subtracted from the list. Blank boxes indicate validated protein identification (false-positive rate <1% otherwise a Mascot score >60). Grey boxes indicate non-validated proteins identification according to MS validation criteria (only shown if the protein identification was already validated in another replicate)a.
Figure 3.
Schematic representation of HIV-1 LTR5′ sequences used as capture probes, TF-binding sites and corresponding interacting proteins. The 226-bp-long capture probe (A) and the 101-bp-long Meis-centred capture probe (B) are represented. Previously characterized DNA-binding sites for transcription factors present on these fragments of the HIV-1 5′LTR are indicated by grey boxes (modified from (24,95,37)). Black boxes indicate the PBX and the Meis/TGIF putative binding sites predicted by an in silico analysis of the sequence of interest (using TRANSFAC database). For the longer capture probe, corresponding transcription factors identified in this study and in the two experimental conditions (nuclear extract from unstimulated or IL-1β-stimulated HeLa cells) are indicated in the Table 2. HIP116: non-validated identification according MS criteria. COUP-TF interacts indirectly with this DNA-binding site as a Sp1transcriptional co-activator.
Schematic representation of HIV-1 LTR5′ sequences used as capture probes, TF-binding sites and corresponding interacting proteins. The 226-bp-long capture probe (A) and the 101-bp-long Meis-centred capture probe (B) are represented. Previously characterized DNA-binding sites for transcription factors present on these fragments of the HIV-1 5′LTR are indicated by grey boxes (modified from (24,95,37)). Black boxes indicate the PBX and the Meis/TGIF putative binding sites predicted by an in silico analysis of the sequence of interest (using TRANSFAC database). For the longer capture probe, corresponding transcription factors identified in this study and in the two experimental conditions (nuclear extract from unstimulated or IL-1β-stimulated HeLa cells) are indicated in the Table 2. HIP116: non-validated identification according MS criteria. COUP-TF interacts indirectly with this DNA-binding site as a Sp1transcriptional co-activator.List of transcription factors identified by LC-MS/MS after DNA-affinity captureA fragment of the HIV-1 5¢LTR (nt 229–455) was used as a capture probe and incubated in the presence of nuclear extracts from unstimulated or IL-1b-stimulated HeLa cells. Three independent DNA-affinity capture experiments were performed for each experimental condition (Exp 1–3). Identified proteins from negative control analysis or proteins identified only in one of the six analyses were subtracted from the list. Blank boxes indicate validated protein identification (false-positive rate <1% otherwise a Mascot score >60). Grey boxes indicate non-validated proteins identification according to MS validation criteria (only shown if the protein identification was already validated in another replicate)a.Moreover, an in silico analysis of the capture probe sequence, using bioinformatics parameters minimising false negatives, reveals the presence of putative binding sites (Supplementary Table S2) for several other identified TF. That is the case for two homeodomain-containing proteins, the Meis and the pre-B-cell leukaemia TF 1 (PBX1) (Figure 3A), the oestrogen-related receptor alpha (ERR1), the MYC-associated factor (MAX) and two of its potential partners (the MAX gene-associated protein, MGA and the MAX dimerization protein 3, MAD3), the class E basic helix-loop-helix protein 40 (DEC1) and the nuclear factor 1 X-type. Furthermore, Krüppel-like factors (KLF-16 and KLF-5) and the circadian locomoter output cycles protein kaput (Clock) could, respectively, interact with GC-boxes and E-boxes present on the capture probe. Only the identification of the endothelial differentiation-related factor 1 (MBF1) could not be explained by the presence of a putative binding site in the sequence analysed. However, this TF is also known to act as a bridging factor between Tata-binding proteins (TBP) or TBP-associated proteins and several TFs belonging to the basic leucine zipper family or to the nuclear receptor family (42–44).The identification of a relatively large number of proteins—38—known to participate in the regulation of gene expression by indirect interactions with DNA (Supplementary Table 3) suggests that this method of DNA-affinity purification of interacting proteins can not only be used to identify the TFs involved in the expression of a gene of interest but also to unravel their co-regulators. Indeed, several of the identified co-regulators listed in Supplementary Table 3 have previously been described to interact with NFκB family members such as poly(ADP-ribose) polymerase-1 (45), CAPER (RNA binding motif protein 39) (46), DNA-PK (47), the high mobility group protein B1 (HMGB1) involved in the repression of HIV-1 transcription (48), the NFκB reducing enzyme APEX/Ref-1 (the reduction of NFκB increases its binding capacity, notably on the HIV-1 5′LTR) (49,50), or some members of the DEAD-box RNA helicases, enzymes involved in RNA metabolism but recently identified as RelA co-activators (51,52). However, while RelA has only been identified in the IL1β-stimulated condition, as expected, these potential co-activators for RelA were identified in both stimulated and unstimulated conditions. This might be explained by the fact that most co-regulators, if not all, are able to interact with multiple TF, some of them being listed in Table 2. This is the case for Ref-1, which can also interact with YB-1 (53) and for DNA-PK, able to transactivate USF proteins. What is known and already reported about the role/function of these 38 proteins in the HIV-1 transcriptional regulation and/or their interactions with other identified proteins (co-regulators or other proteins in regulatory complexes) has been non-exhaustively synthesized (Supplementary Table S4). Interestingly, several members of the Sin3A co-repressor complex have been identified: the paired amphipathic helix protein Sin3A, the histone-binding protein RBBP7 and the histone deacetylase 1 (HDAC1) involved in the epigenetic control of HIV-1 latency (54). This suggests that the method used might allow the formation of protein complexes on the capture probe, as well as their identification.As shown by the identification of a remarkable number of transcription-related proteins, this newly developed method can be used as a starting point in the study of an unknown promoter sequence in order to identify new putative transcriptional regulators. However, these candidates must be validated to confirm the direct or indirect DNA interaction and to address their biological effect on promoter activity.
Functional analysis of a Meis putative binding site in the HIV-1 5′LTR-dependent transcriptional activity
The identification of two members of the homeodomain-containing protein family, PBX1 and Meis, as HIV-1 5′LTR interacting proteins, has attracted our attention for several reasons. First, both TFs could heterodimerize together (55). Second, an in silico analysis of the capture probe sequence, performed with the TRANSFAC database (5), revealed the presence of two Meis putative binding sites (with a core match of 1.0). Both sites are localized close to one of the several degenerated PBX putative binding sites identified on this sequence, but only one (in position nt 266-271 of the entire HIV-1 5′LTR sequence) displays the perfect consensus core 5′-TGACAG-3′ (56,57). Finally, as shown in Table 2, if PBX1 was identified in both stimulated and unstimulated conditions, the identification of Meis seemed to be limited to the IL-1β treatment. Even if based on only few sequenced peptides, this observation suggested a differential Meis recruitment after IL-1β stimulation.To functionally characterize the TGACAG Meis putative binding site on the viral promoter activity, transactivation assays were performed in HeLa cells, stimulated or not with IL-1β, to compare the luciferase activities of a Luc-reporter gene driven by either the wild-type HIV-1 5′LTR sequence or a variant containing a critical point mutation in the Meis putative binding site (Figure 4A). This point mutation was previously described to significantly decrease the DNA binding of Meis on the Pax6pancreatic enhancer sequence (58). As shown in Figure 4B, a significant up-regulation of the promoter activity was observed in the presence of the mutated Meis-binding site, in both unstimulated and IL-1β-stimulated HeLa cells, suggesting a repressive influence of this site even in basal conditions.
Figure 4.
Effect of a mutation in the Meis putative binding site on the HIV-1 LTR5′-dependent transcriptional activity. (A) Schematic representation of the single-nucleotide mutation introduced in the Meis-putative binding site 1 of the HIV-1 5′LTR containing reporter construct. The two TRANSFAC-proposed Meis putative binding sites are underlined in black and the closest PBX putative binding sites are underlined in grey. The mutation is highlighted in boldface. (B) HeLa cells were transiently co-transfected with wild-type or with a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pCMV-βgal plasmid. At 24-h post-transfection, cells were stimulated or not with IL-1β (5 ng/ml). Luciferase and β-galactosidase activities were measured at 48-h post-transfection. The fold change values represent the ratio Luc/β-gal compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. *, **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. + , ++: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. (C) Jurkat cells were transiently co-transfected with wild-type or with a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pRL-TK plasmid. At 24-h post-transfection, cells were stimulated or not with IL-1β (5 ng/ml), TNF-α (10 ng/ml) or PMA (25 ng/ml) + ionomycin (500 nM). Luciferase activities were measured at 48 h post-transfection. The fold change values represent the ratio LucFirefly/LucRenilla compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. *, **, ***: significantly different as determined by a t-test (unpaired test) with, respectively, P < 0.05, P < 0.01 and P < *, **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. + , ++: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01.
Effect of a mutation in the Meis putative binding site on the HIV-1 LTR5′-dependent transcriptional activity. (A) Schematic representation of the single-nucleotide mutation introduced in the Meis-putative binding site 1 of the HIV-1 5′LTR containing reporter construct. The two TRANSFAC-proposed Meis putative binding sites are underlined in black and the closest PBX putative binding sites are underlined in grey. The mutation is highlighted in boldface. (B) HeLa cells were transiently co-transfected with wild-type or with a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pCMV-βgal plasmid. At 24-h post-transfection, cells were stimulated or not with IL-1β (5 ng/ml). Luciferase and β-galactosidase activities were measured at 48-h post-transfection. The fold change values represent the ratio Luc/β-gal compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. *, **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. + , ++: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. (C) Jurkat cells were transiently co-transfected with wild-type or with a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pRL-TK plasmid. At 24-h post-transfection, cells were stimulated or not with IL-1β (5 ng/ml), TNF-α (10 ng/ml) or PMA (25 ng/ml) + ionomycin (500 nM). Luciferase activities were measured at 48 h post-transfection. The fold change values represent the ratio LucFirefly/LucRenilla compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. *, **, ***: significantly different as determined by a t-test (unpaired test) with, respectively, P < 0.05, P < 0.01 and P < *, **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. + , ++: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01.To confirm the role of this Meis-binding site in a more physiologically relevant cell type, transcriptional activities of both constructs were measured in T-lymphoid Jurkat cells treated with IL-1β, TNF-α or PMA/ionomycin. As shown in Figure 4C, although IL-β was without any effect in Jurkat cells on the activation of the pLTRwt, TNF-α and PMA/ionomycin were potent HIV-1 transcription activators in this cell line. As observed in HeLa cells, the presence of the mutation in the Meis putative binding site on the LTR sequence systematically induced an approximated twofold increase in the promoter activity when compared with the wild-type sequence in the corresponding condition.These results strongly suggest that this sequence, predicted as a putative Meis-binding site by TRANSFAC analysis, could interact with cellular factor(s) acting as transcriptional repressor(s) on the HIV-1 5′LTR, even in basal conditions. Although the results obtained by DNA-affinity capture suggest that these TFs could be members of the homeodomain-containing protein family, the exact nature of this trans-repressor complex needed to be confirmed.
Identification of members of the trans-repressor complex that specifically interact with the Meis putative binding site
To further identify proteins involved in the trans-repressor complex interacting with the Meis putative binding site in basal conditions, our analysis was completed by a more classic comparative DNA-affinity approach carried out on shorter capture probes. In this case, a 101 bp-long capture probe centred on the Meis putative binding site and a mutated version were designed (Figure 3B) and incubated with NE prepared from unstimulated HeLa cells. Specific interactions with the TGACAG site could then be highlighted by a comparative analysis of results obtained with the wild-type versus the mutated capture probe. Table 3 contains the transcription-related proteins identified in both conditions for three independent biological replicates (in this analysis, proteins identified by only one peptide in only one replicate were not considered). Twenty-seven TF and 35 potential co-regulators, among which, respectively, 17 and 32 candidates, were validated by adopted MS criteria. Confirming the expected homeodomain-containing proteins implication, PBX1 was exclusively captured by the probe containing the wild-type TGACAG site. It was also the case for Meis1, highlighting the capture of this protein even in basal conditions. This experiment was also performed in the presence of NE prepared from IL-1β-stimulated HeLa cells (data not shown) and, in this case, Meis1 was captured and detected to the same extent than in basal conditions (with an average of 7.6 and 6 sequenced peptides, respectively, for unstimulated and IL-1β-stimulated conditions) suggesting that the Meis differential recruitment observed during the DNA-affinity capture performed with the longer capture probe was probably due to a weak sensitivity.
Table 3.
List of transcription-related proteins identified by LC-MS/MS after differential DNA-affinity capture on Meis-centred capture probe
Nuclear extracts from unstimulated HeLa cells were incubated in the presence of a capture probe corresponding to the 101 bp-long sequence around the Meis putative binding site contained in the HIV-1 5′LTR (nt 229-330). Two different versions of the capture probe (wild-type or Meis-mutated) were used. Three independent DNA-affinity captures were performed for each experimental condition (Exp 1 to 3). Identified proteins from negative control analysis were subtracted from the list. Blank boxes indicate validated protein identification (false positive rate <1% otherwise a Mascot score >60). Grey boxes indicate non-validated proteins identification according to MS validation criteria. Transcription factor identifications are represented in bold.
List of transcription-related proteins identified by LC-MS/MS after differential DNA-affinity capture on Meis-centred capture probeNuclear extracts from unstimulated HeLa cells were incubated in the presence of a capture probe corresponding to the 101 bp-long sequence around the Meis putative binding site contained in the HIV-1 5′LTR (nt 229-330). Two different versions of the capture probe (wild-type or Meis-mutated) were used. Three independent DNA-affinity captures were performed for each experimental condition (Exp 1 to 3). Identified proteins from negative control analysis were subtracted from the list. Blank boxes indicate validated protein identification (false positive rate <1% otherwise a Mascot score >60). Grey boxes indicate non-validated proteins identification according to MS validation criteria. Transcription factor identifications are represented in bold.Very interestingly, it was also the case for several proteins belonging to the Sin3A-HDAC co-repressor complex, specifically the scaffold protein Sin3A, the histone deacetylase HDAC1, the histone deacetylase complex subunit SAP130 and two histone-binding proteins, RBBP4 and RBBP7 (59,60). The histone deacetylase complex subunit SAP30 was also identified in two replicates, but this identification was not validated according to MS criteria. Moreover, the TF Rox, also known as the MntMAX-binding protein, was also specifically captured by the wild-type probe. This TF, known to act as a key transcriptional regulator of the Myc/Max network, represses transcription by competing with c-Myc for Max-binding and through a well-described direct interaction with Sin3A complex (61,62). These results are consistent with the repressive activity associated with this site.Beyond these interactions specific to the Meis-binding site, several protein identifications (common to both capture probes) complete the results obtained with the long capture probe, confirming expected and also new putative protein interactions with this fragment of the HIV-1 5′LTR. For instance, this second DNA-affinity-based analysis confirms Clock and BMAL1 as TF potentially interacting with the HIV-1 5′LTR. The BMAL1-Clock heterodimer, generally associated with histone acetyltransferase activity, is a transcriptional activator that can be repressed by DEC1 through competition for the E-box-binding site (63). DEC1 is also known to repress retinoid X receptor (RXR)-mediated transactivation through direct interaction (64). Although these TFs are mainly known to be involved in the circadian rhythm control, BMAL1-Clock heterodimers have recently been associated with the transcriptional regulation of the Herpes Simplex Virus 1 (65,66).
Meis contributes to repress HIV-1 5′LTR-dependent transcription
As our results suggest that Meis1 takes place in a trans-repressor complex interacting with a specific binding site localized in nt 266–271 of the HIV-1 5′LTR sequence, the effect of a selective knockdown of Meis in HeLa cells was evaluated on the HIV-1 5′LTR-driven reporter gene activity. Cells were first transfected with a pool of siRNA targeting Meis1, leading to an efficient depletion of both Meis1 and Meis2 isoforms (Figure 5A). As shown in Figure 5B, in both basal and IL-1β-stimulated HeLa cells, the Meis knock-down induces a significant up-regulation of the promoter activity, to the same extent than with a mutated Meis1-binding site vector. Even if these results cannot distinguish the specific impact of different Meis1 or Meis2 isoforms, they confirm the central role of Meis in HIV-1 transcriptional regulation through direct binding on a specific binding site on 5′LTR and recruitment of different members of the Sin3A repressor complex.
Figure 5.
Impact of Meis-1 knockdown on the HIV-1 5′LTR transcriptional activity HeLa cells were transfected with 50 nM of Meis-1 targeting siRNA (siMeis) or with 50 nM of siRNA non-targeting (siNT). (A) Western blot analysis performed on HeLa proteins extracted 24 h after siRNA transfection. Lamin B signal was used as a loading control. (B) At 24-h post-siRNA transfection, cells were co-transfected with wild-type or a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pCMV-βgal plasmid. At 24 h post-DNA transfection, HeLa were stimulated or not with IL-1β (5 ng/ml). Luciferase and β-galactosidase activities were measured at 48 h post-DNA transfection. The fold change values represent the ratio Luc/β-gal compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct in the absence of siRNA. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.01. #, ##: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. +: significantly different from corresponding siNT-transfected cells as determined by a t-test (unpaired test) with P < 0.05.
Impact of Meis-1 knockdown on the HIV-1 5′LTR transcriptional activity HeLa cells were transfected with 50 nM of Meis-1 targeting siRNA (siMeis) or with 50 nM of siRNA non-targeting (siNT). (A) Western blot analysis performed on HeLa proteins extracted 24 h after siRNA transfection. Lamin B signal was used as a loading control. (B) At 24-h post-siRNA transfection, cells were co-transfected with wild-type or a Meis mutant version of a luciferase-linked HIV-1 5′LTR promoter construct together with a pCMV-βgal plasmid. At 24 h post-DNA transfection, HeLa were stimulated or not with IL-1β (5 ng/ml). Luciferase and β-galactosidase activities were measured at 48 h post-DNA transfection. The fold change values represent the ratio Luc/β-gal compared with the ratio obtained for unstimulated cells transfected with the wild-type pLTR construct in the absence of siRNA. This assay was performed in three biologically independent experiments and results were presented as means ± S.D. **: significantly different from corresponding unstimulated cells as determined by a t-test (unpaired test) with, respectively, P < 0.01. #, ##: significantly different from corresponding wild-type reporter construct transfected cells as determined by a t-test (unpaired test) with, respectively, P < 0.05 and P < 0.01. +: significantly different from corresponding siNT-transfected cells as determined by a t-test (unpaired test) with P < 0.05.
DISCUSSION
Deciphering the transcriptional regulation of a promoter of interest is a tough task considering the long lists of putative TF-binding sites generated by in silico analyses and the absolute necessity to validate each ensued interesting candidate. Although an interesting starting point, in silico analyses are nevertheless restrictive as they are limited to already characterized TF-binding sites and cannot specify the TF family members involved. In addition, they do not take into account the accessibility, or not, of the suggested TF and never give access to the layer of transcriptional regulators indirectly interacting with DNA through the first layer of TF (67). These drawbacks of in silico analysis have highlighted the crucial need for developing methods to identify proteins that physically interact with DNA—no matter if this interaction is direct or not. Just like ChIP-Seq provides the identity of all the DNA sequences bound by an immunoprecipitated TF, methods combining DNA-affinity protein capture with MS-based protein identification should ideally provide the identity of all the proteins bound to a DNA sequence of interest. However, a major hindrance comes from the fact that DNA can interact with an intricate protein network presenting a high dynamic range between some very low abundant transcriptional regulators and high abundant unspecific DNA-binding proteins. Therefore, most current techniques are focused on a precise binding site present on a short oligonucleotide, for which unspecific background is subtracted thanks to a comparative analysis of the proteins captured by the mutated DNA sequence (20,21).On the contrary, the goal of the method described in this study is to rapidly provide the most complete picture of the endogenous proteins interacting with a relatively long (200–300 bp) DNA sequence, starting from reasonable amounts of material (<2 mg of nuclear proteins). These criteria make this procedure different from other recently published DNA-affinity methods. Indeed, most of them require large amounts of starting material (from 40 mg (13,20) to 700 mg of nuclear proteins (22)), narrowing the range of biological samples that can be studied by these techniques. In addition, we have preferred a one-step 1D-LC separation of the peptide mixture instead of “multidimensional protein identification technology”-based strategies (68) requiring laborious and time-consuming analyses of all generated fractions (19) and increasing the risk to lose low abundant proteins. Finally, this method is completely unbiased as it does not require pre-existing knowledge of the sequence, like the mapping of regulatory binding sites. Indeed, most DNA-affinity-based methods currently used are centred on one binding site, taking advantage of the comparative analysis of the proteins captured by the mutated counterpart of the bait, either in non-quantitative (18,69) or in quantitative analyses based on metabolic (SILAC) (23) or chemical labelling (ICAT) (70,22). Although quantitative data are impressive by their specificity and undoubtedly useful to depict the fine tuning of transcriptional complex regulation, such questions are centred on pre-defined regulatory binding sites, pursuing a different purpose than a large-scale unbiased analysis to identify new putative interactants of a relatively long sequence.To identify a maximum number of proteins physically interacting with a long DNA sequence, without having the benefit of a comparison with an inactive bait, the main difficulty was inherent to the abundant unspecific protein binding and to the high dynamic range of the analysed peptide mixture. These obstacles were circumvented using an efficient separation step of the DNA–protein complexes from the solid support, avoiding irrelevant support interacting protein identification, and an adapted gel-free LC-MS/MS analysis process that improves the sensitivity of the complex peptide mixture analysis using peptide exclusion lists.Applying this DNA-affinity method to a well-described 226-bp-long sequence of the HIV-1 promoter region, we identified over a hundred different captured proteins among which >50% have a demonstrated link with transcription. Among the other half of identified proteins, one can notice the presence of many nucleic-acid-binding proteins, including RNA-binding proteins or proteins involved in DNA repair, which might represent contaminant proteins or potential uncharacterized transcriptional regulators (71). Focusing on transcription-related identified proteins, the unbiased identification of a large number of TFs described to interact with the HIV-1 5′LTR sequence gives weight to this approach (Figure 3), although not all the expected TFs were identified, like the lymphoid enhancer-binding factor 1 (LEF-1) (72) or the ubiquitous Ying Yang 1 factor and the late SV40 factor (32). The reasons for this might be either biological or technical. From a biological point of view, beside cell type specificities (for instance, LEF-1 has not been described to bind the HIV-1 5′LTR in HeLa cells), we have to point out that the capture probe used is a PCR-produced double stranded oligonucleotide, different from an in vivo chromatin environment. An elegant PICh method (for proteomics of isolated chromatin segments), based on the isolation of genomic DNA with its associated proteins, has recently been developed to characterize the in vivo telomere interactome (73). However, the limited sensitivity of protein identification by MS represents the major restriction of this technique, which is currently limited to study repetitive DNA sequences and requires huge amounts of starting material. In addition, we cannot exclude some technical limitations of the method we developed dealing with an inefficient protein capture possibly due to an incomplete nuclear protein extraction or an unadapted binding or washing conditions (74), a non-specific binding of some peptides on beads during the biotin removal step or a lack of MS sensitivity for detecting very low abundant peptides in spite of the use of exclusion peptide lists. Of note, some proteins such as KLF-5 were identified on the basis of relatively small number of sequenced peptides, and not in each replicate experiment. As this method involves sequential steps of cell culture, nuclear extract preparation, DNA-affinity purification, trypsin digest and reverse-phase chromatography upstream the MS sequencing, it is essential to repeat the analysis out of several independent biological replicates. Moreover, the candidates that will be selected by the researcher for further functional studies should have been identified on the basis of several peptides sequencing.Beyond the identification of several TFs known to regulate HIV-1, the most interesting data are probably the identification of 12 TFs that are potential new candidates for the regulation of HIV-1 transcription. This was unexpected, considering the large number of studies devoted to this regulatory DNA sequence of the highest pathophysiological interest. An in silico analysis of the sequence indicates the presence of at least one putative binding site for 11 of them. Such results might constitute an interesting starting point for future studies in the field of HIV-1 regulation, like in the case of ERR1, for which this putative novel interaction might explain the reduced HIV-1 promoter activity due to estradiol treatment in glial cells (75).We focused our attention on two other interesting candidates, the homeodomain-containing proteins, PBX1 and Meis. Mainly described as homeobox protein (HOX) co-regulators, either co-activators or co-repressors (76), these TFs can also, in a HOX-independent manner, act as homo- (77) or hetero-dimers together (78,79) or with other homeodomain- and non-homeodomain-containing TFs (80–83). Interestingly, some members of the homeodomain-containing family among which PBX1 were already documented as transcriptional regulators of different viruses such as the herpes simplex virus (83), the human papillomavirus (84,85), the murine Leukemia virus (86) and the human Cytomegalovirus (87). After an in silico analysis of the sequence (supplementary data), we studied the functionality of the 5′-TGACAG-3′ sequence (nt 266–271), corresponding to a perfect consensus core for Meis, located nearby a degenerated PBX-binding site (Figure 4A), that could correspond to a PBX-Meis heterodimer-binding criteria (88). A point mutation in this binding site increases the HIV-1 5′LTR promoter activity, as shown by luciferase reporter gene assay in both HeLa and Jurkat cells, even in basal conditions (Figure 4B and C). The silencing of Meis by RNA interference also provokes an increase in the HIV-1 5′LTR promoter activity (Figure 5B), suggesting that this binding site recruits a Meis-containing transcriptional repressor complex, possibly involved in the preservation of the viral latency.In an attempt to define more precisely the composition of this transcriptional repressor complex, we applied the DNA-affinity technique to a shorter fragment of the HIV-1 5′LTR promoter centred on the newly validated Meis-binding site, either wild-type or mutated. The results we obtained confirm the specific capture of Meis-1 by the wild-type probe. In addition, seven proteins that could take place in the mSin3A repressive complex were also identified as specific interactants of this Meis putative-binding site. While, to our knowledge, it is the first time that a Meis-mSin3A interaction is documented, other members of homeodomain-containing TF family like the transcriptional repressor transforming growth-interacting factor (TGIF) can recruit mSin3 and HDAC (89). Collectively, these results led us to propose in Figure 6 a partial transcriptional interactome surrounding this precise region.
Figure 6.
Proposed model for the organization of the proteins identified in this DNA-affinity study interacting—directly or not—with a fragment of the HIV-1 5′LTR (from nt 229 to 330). The binding sites indicated in this Figure are either experimentally demonstrated in the literature (bold) or predicted by in silico analysis of this sequence. All the indicated proteins have been captured by this DNA sequence and identified in this study. Proteins previously described to interact with this sequence are underlined. The proteins represented by dashed ovals are indicative as the MS validation criteria for identification were not fulfilled. Proteins in light grey are specifically recruited by the Meis-binding site, as they were not captured by this DNA sequence containing the mutated Meis-binding site.
Proposed model for the organization of the proteins identified in this DNA-affinity study interacting—directly or not—with a fragment of the HIV-1 5′LTR (from nt 229 to 330). The binding sites indicated in this Figure are either experimentally demonstrated in the literature (bold) or predicted by in silico analysis of this sequence. All the indicated proteins have been captured by this DNA sequence and identified in this study. Proteins previously described to interact with this sequence are underlined. The proteins represented by dashed ovals are indicative as the MS validation criteria for identification were not fulfilled. Proteins in light grey are specifically recruited by the Meis-binding site, as they were not captured by this DNA sequence containing the mutated Meis-binding site.Although the method developed here was initially devoted to conduct a totally unbiased analysis of proteins interacting with a relatively long DNA sequence, it was also successfully applied to a shorter DNA sequence centred on a binding site of interest. This functional “zoom” on the promoter interactome, through the comparative analysis of proteins captured by the wild-type versus a mutant sequence, makes the investigation of transcriptional co-regulators more potent, as illustrated by the Meis-dependent recruitment of the Sin3A repressing complex—identified by seven different components. This underlines that this DNA-affinity method can be used to display not only direct DNA-interacting proteins such as TF but also proteins that indirectly interact with DNA. In addition, we have to note that the analyses of a 226-bp sequence or of a 101-bp capture probe (centred on the Meis-binding site) generated lists of, respectively, 52/58 (CTL/IL-1β conditions) and 62 putative interacting transcriptional regulators, pinpointing that this method is still limited to a maximum number of proteins that can be identified, the LC/MSMS analysis being probably the limiting step of the whole process.In summary, we have described a performant DNA-affinity procedure followed by a gel-free proteomic analysis of the promoter interactome. As a proof of concept, this procedure was applied to a very well-described regulatory sequence, the HIV-1 5′LTR, providing a list of >50 recognized transcriptional regulators (24 TFs and 38 transcriptional regulators not reported to bind DNA), including expected and unexpected TFs which represent new candidates for the transcriptional regulation of HIV-1. This highlights the strength of a completely unbiased analysis of regulatory DNA sequences. When deciphering the mechanism underlying the transcriptional regulation of a gene of interest, we believe that having a list of putative candidates that physically interact with the DNA sequence, although this list is probably not exhaustive, as for any proteomic analysis, constitutes a great advantage over in silico analyses. Indeed, the candidates that will further be selected for functional validations will have much more chance to be confirmed.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online: Supplementary Table 1, Supplementary Figures 1–4 and Supplementary References [32,46-50,53,96-132].
FUNDING
Funding for open access charge: University of Namur (FUNDP), Belgium.Conflict of interest statement. None declared.
Authors: Marianne Parent; Tetsu M C Yung; Ann Rancourt; Erick L Y Ho; Stéphane Vispé; Fumihiko Suzuki-Matsuda; Aki Uehara; Tadashi Wada; Hiroshi Handa; Masahiko S Satoh Journal: J Biol Chem Date: 2004-10-21 Impact factor: 5.157
Authors: Rocío Montes de Oca; Christopher J Shoemaker; Marjan Gucek; Robert N Cole; Katherine L Wilson Journal: PLoS One Date: 2009-09-16 Impact factor: 3.240
Authors: T K Edwards; A Saleem; J A Shaman; T Dennis; C Gerigk; E Oliveros; M R Gartenberg; E H Rubin Journal: J Biol Chem Date: 2000-11-17 Impact factor: 5.157
Authors: Saqlain Suleman; Annette Payne; Johnathan Bowden; Sharmin Al Haque; Marco Zahn; Serena Fawaz; Mohammad S Khalifa; Susan Jobling; David Hay; Matteo Franco; Raffaele Fronza; Wei Wang; Olga Strobel-Freidekind; Annette Deichmann; Yasuhiro Takeuchi; Simon N Waddington; Irene Gil-Farina; Manfred Schmidt; Michael Themis Journal: Gene Ther Date: 2022-05-05 Impact factor: 5.250
Authors: Fengchun Ye; David Alvarez-Carbonell; Kien Nguyen; Konstantin Leskov; Yoelvis Garcia-Mesa; Sheetal Sreeram; Saba Valadkhan; Jonathan Karn Journal: PLoS Pathog Date: 2022-07-07 Impact factor: 7.464
Authors: Lindsay G A McKay; Jordan Thomas; Wejdan Albalawi; Antoine Fattaccioli; Marc Dieu; Alessandra Ruggiero; Jane A McKeating; Jonathan K Ball; Alexander W Tarr; Patricia Renard; Georgios Pollakis; William A Paxton Journal: Front Immunol Date: 2022-03-15 Impact factor: 7.561
Authors: Moritz Schaefer; Amena Nabih; Daniel Spies; Victoria Hermes; Maxime Bodak; Harry Wischnewski; Patrick Stalder; Richard Patryk Ngondo; Luz Angelica Liechti; Tatjana Sajic; Ruedi Aebersold; David Gatfield; Constance Ciaudo Journal: EMBO Rep Date: 2022-07-28 Impact factor: 9.071