Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.
Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.
Poly-C-binding proteins (PCBP) are ubiquitous oligonucleotide-binding proteins in eukaryotic cells that play a fundamental role in the regulation of gene expression via interaction with C-rich oligonucleotides. The family consists of the archetypal hnRNP K (heterogeneous nuclear ribonucleoprotein K) and isoforms of αCP1 (also known as PCBP and hnRNP E) including αCP1-4 and αCP-KL (1,2). The effects of binding by αCPs vary—and are thought to be dependent upon the ternary complex within which the αCPs are involved. αCPs are involved at several levels of post-transcriptional gene regulation. Within the nucleus, αCP binding at the 3′-UTR or intron 1 of α-globin mRNA impacts upon its splicing and enhances its cleavage and polyadenylation (3,4). Outside the nucleus, αCPs are implicated in the stabilization of specific mRNAs, leading to the up-regulation of their gene products. They have been shown to be sufficient for formation of the ‘α-complex’ at a specific C-rich region of the 3′-UTR of α-globin mRNA, causing its accumulation during terminal erythroid differentiation (5,6). Binding of αCPs to 3′-UTR mRNA have been implicated in the stabilization of tyrosine hydroxylase (7), erythropoietin (8), β-globin (9) and collagen α1(I) (10) mRNAs. αCP proteins have also been shown to effect translational control. Their binding to a CU-rich region of the 3′-UTR differentiation control element (DICE) of 15-lipoxygenase mRNA along with hnRNP K suppresses translation through interference with the joining of the ribosomal 60S and 40S subunits at the initiation AUG codon (11,12). Similarly, human papillomavirus type 16 L2 mRNA appears to be silenced via binding to αCPs (13). In contrast, translational enhancement has been reported due to αCP binding to the 5′-UTR of the folate receptor mRNA (14), the 3′-UTR of phosphatase 2A mRNA (15) and the 5′-UTR of picornavirus mRNA (16,17). Thus, αCP binding to RNA can result in both silencing and enhancement of translation through a diverse set of mechanisms.In addition to their more recognized ability to bind RNA, αCPs have also been shown to bind single stranded DNA (ssDNA). Such interactions play a role in transcriptional regulation with αCP identified as the ssDNA binding protein underlying proximal promoter activity of mouse µ-opioid receptor (18). The closely related hnRNP K is established as a transcription factor, binding to the CT element in the promoter region of c-myc (19), and also to specific ssDNA elements within the promoter region of a neuronal nicotinic acetylcholine receptor gene (20). The αCPs have also been found to recognize the C-rich strand of human telomeric DNA with high affinity (21). Of all the αCPs, αCP1 in particular, showed remarkable specificity for the telomeric (CCCTAA)n repeat motif (22).The structural basis for αCP interactions with oligonucleotide is thus of interest, including the basis for their affinity and specificity for RNA and ssDNA. Oligonucleotide binding by αCPs is via their triple K homology (KH) domain structure, as first identified in hnRNP K (23). These type I KH domains are 68–72 amino acid structures involving a three-stranded anti-parallel β-sheet packed against three α-helices (βααββα) (24). The two N-terminal KH domains of αCPs are closely spaced, whereas the C-terminal KH domain is separated by a linker of variable length. Nuclear localization sequences in the linker regions between KH2 and KH3 and within KH3 of the αCPs have been shown to dictate their differential subcellular localization (25). αCP1 and αCP2 are predominantly nuclear, whereas αCP3 and αCP4 are restricted to the cytoplasm and the splice variant, αCP2-KL, is present in significant amount in both the nucleus and cytoplasm. It may be that this differential localization dictates the involvement of the αCP family members in their various functions.The three-dimensional structures of several αCP-KH domains in apo- and oligonucleotide-bound form have been reported including: αCP1–KH3, KH domains 1, 2 and 3 of αCP2 and hnRNPK-KH3 (26–31), providing considerable insight into their basis of binding and specificity. Oligonucleotide binding occurs at a cleft formed across α-helix 1 and bounded by two unstructured surface loops. The first loop, between α-helices 1 and 2, contains an invariant GXXG sequence, crucial to oligonucleotide binding and forming contacts with the sugar–phosphate backbone. The second loop, between β-strands 2 and 3, is of variable length and sequence, and flanks the edge of the RNA binding cleft that the bases are directed towards. The oligonucleotide binding cleft is narrow and reported to more readily accommodate pyrimidine over purine bases, and four bases are accommodated within the oligonucleotide binding site (26,27,31). Here, surprisingly, specific contacts to cytosines at the two central positions are consistently observed, but the existence of specificity at the first and fourth position is still unclear.Another feature of the αCP/oligonucleotide interaction that is not completely understood concerns the relative roles of the three KH domains and their overall topological arrangement when bound to target oligonucleotide (32). The optimal target sequence of the αCP-2KL isoform generated by in vitro SELEX contained three short C-rich patches within an exposed single stranded conformation spaced 2–6 nt apart (33). This suggests a close spatial arrangement of all three αCP KH domains. It is also notable that the disruption of any of the three C-rich patches within the αCP target site of α-globin mRNA interferes with α-complex formation and decreases α-globin stability in vivo (5,34). Interestingly, X-ray crystallographic and nuclear magnetic resonance (NMR) spectroscopic analysis of αCP KH domains reveal both hetero- and homo-dimer formation between KH domains (29–31). These protein–protein interactions can occur simultaneously to oligonucleotide binding. Whether these interactions occur within the context of the full-length protein bound to its target oligonucleotide is not yet known.We have previously reported αCP binding to a specific UC-rich region of the 3′-UTR of androgen receptor mRNA (35). Through mutual binding with HuR, αCP1 and αCP2 are thought to be a part of the post-transcriptional control mechanism for androgen receptor expression. They have been shown to bind to a specific 5′-CCCUCCC-3′ motif (AR mRNA nucleotides 3317–3323) immediately adjacent to a U-rich sequence (AR mRNA 3275–3316) which is the target for HuR binding. It is speculated that the combination of αCP and HuR binding plays a role in the regulation of AR mRNA gene expression. Here, we report a binding analysis of full-length αCP1, as well as the three separate αCP1–KH domains to their target sequence in the 3′-UTR of androgen receptor mRNA using electrophoretic mobility shift assays (EMSA) and surface plasmon resonance (SPR). A binding comparison is made with the analogous DNA sequence. We also report the crystallization and structure determination of αCP1–KH1 bound to a target DNA 11-mer analogous to the androgen receptor mRNA target site. This structure reveals the mode of interaction with the oligonucleotide, as well as the topological arrangement of two αCP1–KH domains bound at adjacent target sites. Further, SPR experiments are used to investigate the preferential binding of cytosine at each of the four nucleotide binding positions. This is further explored using a structural approach in which we determine the structure of a αCP1–KH1 mutant bound to a cytosine tetrad. Together, our data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity to a C-rich tetrad.
MATERIALS AND METHODS
Protein expression and purification
Full-length αCP1 or individual KH domains were expressed as fusion proteins with glutathione-S-transferase (GST). The DNA coding sequences comprising amino acids 1–356 (αCP1), 14–86 (αCP1–KH1), 97–150 (αCP1–KH2) and 279–356 (αCP1–KH3), were cloned into pGEX-6P2 plasmids and expressed by Escherichia coli BL21 (Codon+) in Luria Broth at 37°C. Protein expression was induced with 0.02 mM IPTG (isopropyl-D-thiogalactoside) at an optical density of 0.8 at 595 nm. The cells were harvested after 3 h of further growth by centrifugation, resuspended in phosphatebuffered saline (PBS) (140 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 1.8 mM KH2PO4, pH 7.4) containing 0.5% Triton X-100 and was then lysed by French Pressing (SLM Instruments, Inc.), supplemented with 0.5 mM phenylmethanesulphonyl fluoride (PMSF) and, in the case of full-length αCP1, a cocktail of protease inhibitors were used including aprotinin (2 μg/ml), leupeptin (2 μg/ml) and pepstatin (1 μg/ml). The GST-fusion proteins were purified by affinity chromatography using glutathione agarose beads equilibrated with PBS buffer at 4°C and the GST was cleaved using 2 U Prescission protease (GE Healthcare) in 50 mM Tris–HCl, pH 7.0, 150 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol (DTT) at 4°C. αCP1 was finally purified using anion exchange chromatography (Mono-Q Pharmacia) and αCP1–KH1 and αCP1–KH2 purified by size exclusion chromatography using a Sephadex 75 column (Pharmacia). The purified proteins were dialysed into phosphate buffer pH 6.0 (1 mM DTT, 25 mM NaH2PO4, 150 mM NaCl, 1 mM EDTA), concentrated with centrifugal concentrators of 3 K cut-off (Millipore) and quantified using a detergent compatible (Biorad) protein assay. The cloning, overexpression and purification of the αCP1–KH1.W.C54S mutant protein is as described previously (36). In brief, this protein was prepared as a His-tagged protein, and the His-tag was not removed prior to crystallization trials.
Preparation of oligonucleotides for EMSA
Long RNA transcript
A 107 nt RNA containing vector sequences and nucleotides 3275–3325 of the androgen receptor mRNA 3′UTR was generated from the relevant DNA and cloned into pBLUESCRIPT II KS+ using Ambion in vitro transcription kits according to manufacturers specifications. After purification of full-length transcripts on denaturing polyacrylamide gels, RNAs were end-labelled with γ[32P]ATP (Perkin Elmer). Radiolabelled RNAs were subjected to a second round of denaturing gel purification, then eluted, precipitated and resuspended in water. Prior to use, radiolabelled RNAs were heated to 70°C for 10 min, then quenched on ice and diluted to 2× the final concentration.
Short oligonucleotides
Synthetic RNA sequence 5′-UUCCCUCCCUA-3′ corresponding to nucleotides 3315–3325 of AR mRNA, was purchased from Dharmacon (Dallas, Tx). The analogous synthetic DNA, of sequence 5′-TTCCCTCCCTA-3′, was purchased from Sigma Genosys (Castle Hill, NSW, Australia). Both were gel purified and 5′-end labelled as described above.
RNA and DNA EMSA
The EMSA was utilized for examining the binding of full-length αCP1 and αCP1–KH1 to target RNA and DNA. Purified full-length αCP1 and αCP1–KH1 were thawed and diluted to 2× the final concentration in binding buffer [10 mM HEPES pH 7.5, 3 mM MgCl2, 14 mM KCl, 5% (v/v) glycerol, 0.2% (v/v) Nonidet® P-40, 1 mM DTT]. A quantity of 5 μl of diluted protein was mixed with 1 μl of tRNA (1 μg/μl) and incubated at room temperature for 10 min to prevent non-specific binding, before addition of an equal volume of labelled probe, also diluted to 2× the final concentration in binding buffer. After 30 min incubation at room temperature, 4 μl of loading dye [binding buffer also containing 50% (v/v) glycerol, 0.1% (w/v) bromophenol blue and 0.1% (w/v) xylene cyanol] was added and the mixture loaded immediately on to a running non-denaturing polyacrylamide/0.5 × tris/borate/ethylenediaminetetraacetic acid (TBE) gel and run at 10 V/cm. After electrophoresis, the gel was dried and then exposed to either a phosphorimager plate or X-ray film overnight for detection of protein and oligonucleotide complexes.
SPR studies
SPR (using a BIAcore 2000 or T100 instrument) was employed to characterize the αCP1–KH domain interactions with target RNA and DNA. A research grade CM5 chip coated with streptavidin was purchased from BIAcore. For comparison of KH1, KH2 and KH3 binding, 5′-biotinylated mRNA (5′-CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3′) representing nucleotides 3296–3325 of androgen receptor mRNA was obtained from Dharmacon and immobilized on flow cell 2 as the captured molecule. 5′-biotinylated DNA (5′-CTCTCCTTTCTTTTTCTTCTTCCCTCCCTA-3′) analogous to the above RNA sequence was also obtained from Dharmacon and immobilized on flow cell 3 as the captured molecule. For comparison of αCP1–KH1 binding to a series of tetrad DNA target sites, the following 5′-biotinylated sequences were purchased from Geneworks Australia: (5′-AAAAAATCCA-3′; 5′-AAAAAACTCA-3′; 5′-AAAAAACCTA-3′; 5′-AAAAAACCCA-3′; 5′-AAAAACCCCA-3′).The first flow cell coated with only streptavidin was used as the reference surface. The immobilization steps were carried out at a flow rate of 10 μl/min in immobilization buffer 10 mM Tris–HCl (pH 7.4), 150 mM NaCl, 0.5% Triton X (or 0.025% P20) and 1–2 mM dithiothreitol, 2 mM EDTA and finally blocked with 1 mM Biotin in HEPESbuffered saline (HBS) at 10 µl/min for 3 min. An average of 30 RU of oligonucleotide was immobilized. αCP-KH domains were injected over flow cells 1–3 at a range of concentrations from 50–0.3 μM using a flow rate of 50 μl/min and at room temperature in running buffer 10 mM Tris–HCl (pH 7.4), 150 mM NaCl, 0.5% Triton X (or 0.025% P20) and 1–2 mM dithiothreitol, 2 mM EDTA, 66–125 μg/ml tRNA and 62.5 mg/ml bovine serum albumin. All experiments were replicated (duplicate or triplicate) to confirm the reproducibility of the signal. Regeneration involved removal of the bound protein from the streptavidin chip with a 1–2 min wash at 20–50 μl/min with 2 M NaCl. Data were analysed with the BIAevaluation software to obtain a binding constant using a steady state model.
Preparation of DNA targets for co-crystallization
The DNA ‘11-mer’ sequence 5′-TTCCCTCCCTA-3′ analogous to nucleotides 3315–3325 of AR mRNA, was purchased from Dharmacon in crude form and further purified by denaturing PAGE. After the separation of the sample by 20% PAGE, the band was visualized by UV shadowing when the DNA had run approximately half way down the gel. The sample was recovered by excising the band, which was then crushed and eluted overnight in 0.3 M sterile sodium acetate at 37°C. The eluent was filtered and desalted using a reverse-phase solid-extraction cartridge (C18 Sepak cartridge, Waters). The eluted fractions were lyophilized and dissolved in distilled water for quantification using UV spectroscopy. The DNA ‘6-mer’ sequence 5′-ACCCCA-3′ was purchased from Geneworks Australia in purified form and resolubilized in water without further purification steps. DNA concentrations were determined by measuring the absorbance at 260 nm and assuming one absorbance unit to be equivalent to 34 μg/ml.
Preparation of αCP1–KH1/DNA complexes for co-crystallization
DNA and protein complexes were prepared by dissolving the lyophilized DNA with the protein solution to a final molar ratio of 1:1. The mixture was left on ice for 30 min prior to setting crystal drops. Crystals were grown using vapour diffusion in 1 μl hanging drops containing 1:1 mixtures of protein and reservoir solutions. The αCP1–KH1/11-mer complex solution contained 309 μM of protein and DNA in 50 mM Tris–HCL pH 8.0, 1 mM DTT, 1 mM EDTA, 150 mM NaCl, and the reservoir solution was composed of 0.1 M Na cacodylate pH 6.5 in 0.2 M magnesium acetate, 30% MPD (2-methyl-2,4-pentanediol) Sigma Crystal Screen reagent formulation number 21. Crystals typically grew in 2 months to dimensions of ∼0.2 × 0.2 × 0.04 mm with the outline of a diamond. The αCP1–KH1.W.C54S/6-mer were produced using 0.1 M phosphate–citrate pH 4.2, 40%(v/v) PEG 300 as the precipitant as reported (36).
X-Ray data collection, structure determination and refinement of αCP1–KH1/DNA complexes
Data for the αCP1–KH1/11-mer complex were recorded in-house using a Rigaku RUH2R X-ray source with a rotating copper anode equipped with Osmic confocal optics, a MAR Research MAR345 detector and an Oxford cryosystems 600 series cryostream from a single crystal. The diffraction measurements indicated that the space group was P21 with unit cell constants a = 45.6, b = 76.8, c = 61.4 Å and β = 111.7° and diffraction was recorded to 3.0 Å resolution. Data were integrated and scaled with DENZO and SCALEPACK (37). Structure factor amplitudes were calculated using TRUNCATE (38). The data collection statistics are given in Table 1.
Table 1.
Statistics for X-ray crystallographic data collection and refinement
αCP1–KH1/11-mer
αCP1–KH1.W.C54S/6-mer
Data collectiona
Space group
P21
P21
Cell dimensions
a, b, c (Å)
45.6, 76.8, 61.4
38.6, 114.9, 43.4
abg (°)
90.0, 111.7, 90.0
90.0, 93.4, 90.0
Resolution (Å)
30.0 – 3.0 (3.11 – 3.0)
30.0 – 1.77 (1.87 – 1.77)
Rmergeb(%)
5.7 (65.8)
9.0 (60.2)
I/σ(I)
22.7 (2.1)
7.9 (2.2)
Reflections measured (total)
35 603 (394)
113 451 (16 460)
Completeness (%)
99.8 (99.9)
99.7 (100)
Redundancy
4.4 (4.0)
3.2 (3.1)
Refinement
Resolution (Å)
3.0
1.77
No. of reflections used (unique)
7994
34 271
Rwork/Rfree
23.0 / 27.9
16.4 / 21.3
No. of atoms
Protein
2193
2281
Oligonucleotide
308
392
Water
71
B-factors (Å2)
Protein
79.2
30.4
Oligonucleotide
91.5
34.3
Water
43.1
RMS deviations
Bond lengths (Å)
0.007
0.027
Bond angles (°)
1.164
2.243
Ramachandran plotc (%)
Favoured regions
95.3
98.9
The statistics for both the αCP1–KH1/11-mer and αCP1–KH1.W.C54S/6-mer structures are shown.
aValues in parentheses refer to the highest resolution shell.
bRmerge = |I − |/ where I is the intensity of individual reflections.
cDetermined using the program MOLPROBITY(45).
Statistics for X-ray crystallographic data collection and refinementThe statistics for both the αCP1–KH1/11-mer and αCP1–KH1.W.C54S/6-mer structures are shown.aValues in parentheses refer to the highest resolution shell.bRmerge = |I − |/ where I is the intensity of individual reflections.cDetermined using the program MOLPROBITY(45).Initial phases were derived by molecular replacement using the protein coordinates from the Nova2-KH3/RNA structure (pdb accession code: 1EC6) as a search model. The search model included residues 4–76 with the oligonucleotide and solvent molecules removed. The programme PHASER (39) was used to locate four molecules in the asymmetric unit. Electron density derived from the initial set of phases revealed the positions of the two molecules of DNA in addition to the four KH domains in the asymmetric unit. The protein and oligonucleotide were built using COOT (40). To aid in interpretation of the initial electron density maps, 4-fold non-crystallographic symmetry averaging was applied by the programme DM (38). Tight non-crystallographic symmetry constraints were maintained throughout the refinement. Translation libration screw (TLS) refinement was also utilized. The refinement statistics are reported in Table 1 and the atomic coordinates are available from the Protein data bank (PDB code: 1ZTG).The data collection for the αCP1-KH1.W.C54S/6-mer complex is described previously (36). The structure was solved by molecular replacement using PHASER (39) included in the CCP4 suite (38). A single copy of the polypeptide chain was extracted from the coordinate of αCP1–KH1/11-mer structure and used as search model for both rotation and translation functions. Four copies of the protein were identified in the asymmetric unit and the oligonucleotide was manually built in COOT (40). Several cycles of refinement were carried out using non-crystallographic symmetry constraints in REFMAC5 (41) and PHENIX (42). Water molecules were added by ARP/WARP (43) and amino acids side chains were manually checked and rebuilt in COOT. TLS refinement was also utilized. Data collection statistics are previously reported (36) and the refinement statistics are reported in Table 1. The structure factors and the coordinate of the final model are available in the protein data bank (PDB code: 3VKE).
RESULTS
Multimeric RNA complexes are formed by full-length αCP1, but not αCP1–KH1 alone
αCP1 has been shown to bind in a sequence-specific manner to a UC-rich region, located in the 3′-UTR of androgen receptor mRNA (nucleotides 3275–3325), that is implicated in the regulation of AR mRNA stability and translational efficiency (35). We have previously shown that αCP1–KH3 independently binds this sequence and that αCP1–KH2 does not using EMSA (44). In the current study, we examine the binding of this AR mRNA sequence by full-length αCP1 and αCP1–KH1.The binding of full-length αCP1 to the target 51-nt AR mRNA sequence in an EMSA experiment is shown in Figure 1A. The probe (10 nM RNA) is shifted upon the addition of increasing concentrations of αCP1 (10 nM–1 μM), with half of the probe shifted when between 20 and 50 nM, αCP1 is added. This is consistent with a previous report of a KD in the nM range (35). Notably, full-length αCP1 shifts the probe to positions that increase in size with αCP1 concentration revealing the formation of higher molecular weight complexes. Binding by αCP1–KH1 to the target RNA is also shown in Figure 1A. The single KH domain shifts the target RNA (likely via a 2:1 protein:RNA binding stoichiometry) to a discrete position showing that full-length αCP1, but not αCP1–KH1 alone, forms large multimeric complexes with RNA.
Figure 1.
Binding of αCP1 and αCP1–KH1 to oligonucleotides representing the UC-rich region of androgen receptor 3′-UTR. (A) A mobility shift assay, run on a 5% PAA/0.5 × TBE gel is shown. 5′-end labelled RNA was incubated either alone (−) or in the presence of purified αCP1 or αCP1–KH1 as indicated above the gel. The probe comprises a total of 107-nt RNA containing vector sequences and nucleotides 3275–3325 of the androgen receptor mRNA. Increasing protein concentrations are indicated by a wedge and are, from the left: 1 × 10−8 M, 2 × 10−8 M, 5 × 10−8 M, 1 × 10−7 M, 2 × 10−7 M, 5 × 10−7 M or 1 × 10−6 M for both proteins. In all cases, RNA concentration is 1 × 10−8 M. All lanes also contained 1 μg yeast tRNA added as a non-specific competitor. (B) A mobility shift assay, run on a 10% PAA/0.5 × TBE gel is shown. 5′ = -end labelled 11-mer RNA or DNA was incubated either alone (−) or in the presence of purified αCP1 or αCP1–KH1 as indicated above the gel. Probe sequences are: RNA: 5′-UUCCCUCCCUA; DNA: 5′-TTCCCTCCCTA. Increasing protein concentrations are indicated by a wedge and are, from the left: for αCP 1, 1 × 10−7 M, 3 × 10−7 M and 1 × 10−6 M; for αCP1–KH1, 1 × 10−6 M, 3 × 10−6 M and 1 × 10−5 M. In all cases, RNA or DNA concentration is 1 × 10−7 M. All lanes also contained 1 μg yeast tRNA added as a non-specific competitor.
Binding of αCP1 and αCP1–KH1 to oligonucleotides representing the UC-rich region of androgen receptor 3′-UTR. (A) A mobility shift assay, run on a 5% PAA/0.5 × TBE gel is shown. 5′-end labelled RNA was incubated either alone (−) or in the presence of purified αCP1 or αCP1–KH1 as indicated above the gel. The probe comprises a total of 107-nt RNA containing vector sequences and nucleotides 3275–3325 of the androgen receptor mRNA. Increasing protein concentrations are indicated by a wedge and are, from the left: 1 × 10−8 M, 2 × 10−8 M, 5 × 10−8 M, 1 × 10−7 M, 2 × 10−7 M, 5 × 10−7 M or 1 × 10−6 M for both proteins. In all cases, RNA concentration is 1 × 10−8 M. All lanes also contained 1 μg yeast tRNA added as a non-specific competitor. (B) A mobility shift assay, run on a 10% PAA/0.5 × TBE gel is shown. 5′ = -end labelled 11-mer RNA or DNA was incubated either alone (−) or in the presence of purified αCP1 or αCP1–KH1 as indicated above the gel. Probe sequences are: RNA: 5′-UUCCCUCCCUA; DNA: 5′-TTCCCTCCCTA. Increasing protein concentrations are indicated by a wedge and are, from the left: for αCP 1, 1 × 10−7 M, 3 × 10−7 M and 1 × 10−6 M; for αCP1–KH1, 1 × 10−6 M, 3 × 10−6 M and 1 × 10−5 M. In all cases, RNA or DNA concentration is 1 × 10−7 M. All lanes also contained 1 μg yeast tRNA added as a non-specific competitor.
Full-length αCP1 and αCP1–KH1 bind to target RNA, as well as DNA
The ‘CCCUCCC’ motif at the 3′-end of the target 51-nt AR mRNA has been shown to be the binding site of αCP proteins through mutational analysis of the two poly(C) triads (35). In order to verify whether full-length αCP1 and αCP1–KH1 binding occurs to this motif in vitro, we conducted gel shift assays using an 11-nt probe corresponding to nucleotides 3315–3325 of AR mRNA (5′-UUCCCUCCCUA-3′). Figure 1B shows that the probe is shifted by full-length protein to a constant position, indicating good binding to the probe via a single binding interaction. αCP1–KH1 also binds, but only marginally shifts the probe under the conditions of this experiment, indicating a weaker interaction. Interestingly, the binding profiles of αCP1 and αCP1–KH1 to an 11-nt DNA probe analogous to the AR target sequence above (DNA: 5′-TTCCCTCCCTA-3′) are very similar to the binding observed to RNA (Figure 1B). Full-length αCP1 and αCP1–KH1 bind to the DNA with good and weak binding respectively. Notably, the binding of full-length αCP1 appears to shift more of the DNA probe than the RNA probe, suggesting a marginally higher affinity for DNA over RNA.
αCP1–KH1 forms the most stable interactions to a C-rich RNA and DNA
Having shown that αCP1–KH1 binds to both DNA and RNA using EMSA, we utilized SPR for a more sensitive comparison of KH domain binding affinities and binding kinetics to target RNA and the analogous DNA sequence. SPR binding curves showing a series of αCP1–KH interactions (at concentrations of between 0.312 μM and 10 μM) with a 30-nt oligonucleotide representing AR mRNA (nucleotides 3296–3325) are shown in Figure 2. This RNA sequence includes 19-nt of U-rich sequence preceding the 11-nt C-rich target sequence (as used in the EMSA above) as a spacer between the Biacore chip and the binding site. Alongside these curves are the sensorgrams obtained simultaneously using the analogous sequence of DNA. It is to be noted that the binding curves exhibited complex binding kinetics, most likely due to the lengthy oligonucleotides immobilized on the chip and the presence of two cytosine triplets at the 3′-ends that constitute two αCP1–KH domain target sites. Where binding was observed, an approximated steady state binding analysis is presented.
Figure 2.
Binding analysis of separate KH domains of αCP1 to target RNA and DNA using SPR. Sensorgrams of αCP1 KH1, KH2 and KH3 binding to biotinylated mRNA (5′-CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3′) representing nucleotides 3296–3325 of androgen receptor mRNA (flow cell 2) and biotinylated DNA (5′-CTCTCCTTTCTTTTTCTTCTTCCCTCCCTA-3′) analogous to the above RNA sequence (flow cell 3) captured on SA-coated sensor chips at a range of protein concentrations are shown. Binding curves, derived from the approximated steady state binding of the proteins, were used to determine equilibrum dissociation constants (KDs). Errors are standard errors arising from fits.
Binding analysis of separate KH domains of αCP1 to target RNA and DNA using SPR. Sensorgrams of αCP1 KH1, KH2 and KH3 binding to biotinylated mRNA (5′-CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3′) representing nucleotides 3296–3325 of androgen receptor mRNA (flow cell 2) and biotinylated DNA (5′-CTCTCCTTTCTTTTTCTTCTTCCCTCCCTA-3′) analogous to the above RNA sequence (flow cell 3) captured on SA-coated sensor chips at a range of protein concentrations are shown. Binding curves, derived from the approximated steady state binding of the proteins, were used to determine equilibrum dissociation constants (KDs). Errors are standard errors arising from fits.αCP1–KH1 was found to readily associate with the RNA and a slow dissociation phase was observed, indicating a relatively stable interaction. A dissociation equilibrium constant (KD) calculated from a steady state analysis was estimated to be 3.6 ± 0.6 μM. In the case of αCP-KH1 binding to DNA, a relatively slow dissociation was also observed and the KD was determined to be 0.33 ± 0.04 μM, representing ∼10-fold increase in affinity of αCP1–KH1 for DNA over RNA.The interactions of αCP1–KH2 with target RNA and DNA were also examined for comparison. The αCP1–KH2 domain has not previously been shown to be able to independently bind to RNA (32,44). Consistent with these reports, no binding interaction was detectable between αCP1–KH2 and RNA even at the highest protein concentrations. However, we detected binding to DNA, albeit with a low response. A steady state analysis of the αCP1–KH2 interaction with DNA yielded a KD of 13 ± 4 μM. This indicates that a functional αCP1–KH2 was formed, but that interactions with RNA could not be detected.In contrast, αCP1–KH3 bound to both RNA and DNA with similar affinities to αCP1–KH1. Dissociation equilibrium constants calculated from a steady state analysis were KD = 3.5 ± 0.8 μM and KD = 1.5 ± 0.2 μM, respectively. The complex half-life of αCP1–KH3 with DNA and RNA, however, were shorter than that for αCP1–KH1, with response curves returning to the baseline promptly at the end of the protein injection phase.Thus, oligonucleotide binding could be measured for all three individual αCP1–KH domains with the exception of KH2 binding to RNA. KH1 engages in the most stable interaction with both RNA and DNA of the three KH domains. KH2 and KH3 are also both capable of binding DNA.
Structural overview of αCP1–KH1 bound to adjacent CCCT binding sites
In order to examine the structural basis for the differences in KH domain interactions with target oligonucleotide, crystal trials were undertaken with each of the three αCP1–KH domains in complex with the DNA sequence 5′-TTCCCTCCCTA-3′ (analogous to nucleotides 3315–3325 of AR mRNA). Only αCP1–KH1 yielded crystals, which was not unexpected considering that KH1 forms the most stable complexes with DNA. αCP1–KH1 (residues 14–86, numbered as in Swiss-Prot entry Q15365, preceded by the sequence GPLGSPGI present due to cloning procedures) yielded crystals containing two crystallographically independent copies of a 2:1 protein–DNA complex in the asymmetric unit. Equivalent crystallization experiments utilizing RNA did not produce crystals suitable for structure determination. Experimental phases were obtained by molecular replacement using coordinates from the Nova2-KH3 structure (PDB ID:1EC6) with the oligonucleotide removed. The current refinement model has a working R-factor of 23.0% and a free R-value of 27.9% at 3.0 Å resolution (Table 1), with good stereochemistry 95% in the favoured regions of a Ramachandran plot (100% in the allowed regions). The coordinates are deposited in the protein structure database (pdb id:1ZTG).The αCP1–KH1/11-mer structure reveals two αCP1–KH1 domains bound at adjacent CCCT sequences (Figure 3A). The final model defined the position of residues 13–83 of αCP1–KH1 and the four bound bases in each of the binding clefts. Although the two KH domains bound to the same oligonucleotide are held very closely, they do not make contact with one another (Figure 3A). Interestingly, in this crystal form, each KH1 domain is covalently linked to an adjacent KH1 domain by a disulphide bond formed through their C54 residues. Furthermore, each KH domain exists as a dimer with a KH domain from an adjacent unit as previously observed for other KH domain structures (26,27,30,31).
Figure 3.
Schematic representations of the αCP1–KH1/11-mer DNA complex. (A) αCP1–KH1 formed as a dimer with two protein molecules bound to a single 11-nt strand of DNA, resulting in a continuous network throughout the crystal. (B) A cartoon representation of the KH domain bound to the target DNA. The 5′-tetrad of the target DNA that form contacts with the first KH domain is shown, illustrating the positioning of the critical bases about α-helix 1 and between the GXXG and variable loops. (C) The electrostatic potential emanating from the αCP1–KH1 (in the same orientation as cartoon alongside) structure calculated using the APBS software package (46). Potential contours are shown at +1 kT/e (blue) and −1 kT/e (red) and were obtained by solution of the linearized Poisson–Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents a positive potential directing oligonucleotides to the binding cleft. (D) Summary of the contacts between αCP1–KH1 and bound DNA tetrad of sequence 5′-CCCT-3′. Van der Waals contacts are coloured orange, and hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar–phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine rings, and thus underling base specificity, are listed on the right.
Schematic representations of the αCP1–KH1/11-mer DNA complex. (A) αCP1–KH1 formed as a dimer with two protein molecules bound to a single 11-nt strand of DNA, resulting in a continuous network throughout the crystal. (B) A cartoon representation of the KH domain bound to the target DNA. The 5′-tetrad of the target DNA that form contacts with the first KH domain is shown, illustrating the positioning of the critical bases about α-helix 1 and between the GXXG and variable loops. (C) The electrostatic potential emanating from the αCP1–KH1 (in the same orientation as cartoon alongside) structure calculated using the APBS software package (46). Potential contours are shown at +1 kT/e (blue) and −1 kT/e (red) and were obtained by solution of the linearized Poisson–Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents a positive potential directing oligonucleotides to the binding cleft. (D) Summary of the contacts between αCP1–KH1 and bound DNA tetrad of sequence 5′-CCCT-3′. Van der Waals contacts are coloured orange, and hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar–phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine rings, and thus underling base specificity, are listed on the right.The protein conforms to the classical type I KH domain structure, with a three-stranded anti-parallel β-sheet packed against three α-helices in a βααββα topological arrangement (Figure 3B). The structure is consistent with that reported for the αCP2-KH1 homologue bound to a telomeric DNA sequence solved to 1.7 Å resolution [pdb id: 2AXY; identity 97% and Cα pair-wise root mean square deviation (RMSD) 1.1 Å] (31). A hydrophobic core provides the structure's stability, with hydrophobic residues emanating from the inner face of the β-sheet (including Leu14, Ile16, Leu18, Met20, Ala45, Ile47, Ile49, Ile59 and Leu61) and all three helices (including Glu24, Val25, Ile28, Val36, Ile39, Arg40, Ala67, Ile68, Ala71, Ile75, Lys78 and Leu79). This core is partly exposed to create the base of the hydrophobic oligonucleotide binding cleft.The dimer interface is formed via an anti-parallel interaction between the third α-helices of paired αCP1–KH1 domains via hydrophobic interactions including a ring stacking arrangement of Phe72 side chain aromatic rings. This interaction also brings together the first β-strand of each αCP1–KH1 domain to effectively form a continuous six-stranded anti-parallel β-sheet, though the two strands are only connected via backbone hydrogen bonds between Arg17 NH and Leu19 CO of respective KH domains. Additional inter-molecular hydrogen bond interactions are made from the Arg17 side chain guanidinium to both the Leu19 backbone carbonyl and Glu24 side chain carboxylic acid groups of the opposing αCP1–KH1 domain and from the Glu80 side chain to the Thr65 side chain hydroxyl of the opposing αCP1–KH1 domains.The oligonucleotide is accommodated in a hydrophobic cleft formed across the top of α-helix 1 and bounded by the ‘GXXG’ motif and variable loop (Figure 3B). The DNA binding cleft is on the opposite side of the molecule to the dimerization interface so that both interactions can occur simultaneously. Basic residues surrounding the binding site, including Lys 23, Lysines 31 and 32 at the ‘XX’ positions, Lys37 and Arginines 40, 46 and 57, create a positive potential along the length of the cleft (Figure 3C). Such a potential, similarly observed for αCP1–KH3 (28), likely provides a driving force for the docking of the oligonucleotide to the site, and also provides specific electrostatic contacts to the bound oligonucleotide. The two αCP1–KH1 monomers bound to the 11-base oligonucleotide make equivalent contacts to sequential ‘CCCT’ tetrads, so that the two monomers are arranged in a tail to head arrangement on the DNA. The oligonucleotide is arranged in a linear fashion with bases arranged planar to the surface of the protein except for Thy-4 of each tetrad that partially base stacks over Cyt-3. Only a single intra-molecular hydrogen bond between Cyt-2 N4 of the first (5′) tetrad and the phosphate preceding Cyt-1 is formed. The oligonucleotide twists about the phosphate bond of the fourth base to allow the second KH1 monomer to bind downstream of the first site ∼180° about the oligonucleotide axis (Figure 3A). Similarly to reports of other KH domains bound to DNA with multiple binding sites (26,30), this shows the way in which two KH domains may be closely juxtaposed when bound at adjacent C-rich binding sites.
Binding specificity of αCP1–KH1 for cytosines in the CCCT tetrad
Of particular interest in this study, was the determination of the inter-molecular contacts underlying cytosine specificity. The αCP1–KH1/11-mer interactions are summarized in Figure 3D. As seen for isoform αCP2-KH1 (30), the central two cytosines of the bound tetrad (Cyt-2 and Cyt-3) engage in hydrogen bond interactions that form the basis for cytosine specificity at these positions. The cytosine in the second position forms van der Waals contacts with Val25, Gly26 and Ile29. The basis for cytosine specificity in this position is dominated by interactions with Arg57 which projects from the variable loop region and forms bipartite hydrogen bonds to Cyt-2 O2 and N3 atoms. In addition, the backbone carbonyl of Gly22 is able to form a hydrogen bond with Cyt-2 N4. The cytosine in the third position (Cyt-3) is positioned over hydrophobic residues Ile29, Ile49 and Val36. Binding specificity for Cyt-3 is conferred by Ile49 and Arg40 which are conservatively substituted and conserved respectively in poly(C)-binding proteins. The backbone carbonyl of Ile49 is positioned to form a hydrogen bond with Cyt-3 N4. Arg40 extends from the C-terminal end of α-helix 2 and is able to make hydrogen-bond contact with the Cyt-3 O2 atom.Unlike previous studies, this structure also reveals possible interactions underlying base specificity at positions 1 and 4. Cyt-1, in the first position, forms a hydrogen bond with the side chain of Asp82 via its N4 group. In addition, non-specific interactions between Cyt-1 and Gly26, Ser27, Gly30 and Lys31 backbone atoms, that form part of the oligonucleotide binding cleft, are also observed. Thy-4 is not extensively contacted by the KH domain. It forms base stacking interactions with Cyt-3 rather than forming many protein contacts. However, Arg40 is positioned such that its guanidiniumnitrogens are positioned to form hydrogen bonds with both Thy-4 and Cyt-3 O2 groups. Such an interaction could favour either thymine or cytosine at position 4.
Cytosine is preferentially bound by αCP1–KH1 at all four base positions
In previous studies of αCP-KH1 domains, it has been shown that the first position in the oligonucleotide binding cleft can accommodate an A, T or C, with little to indicate a preferred base at this position (30,31). The fourth position has been shown to accommodate T or C, but not a purine base. Yet, systematic evolution of ligand by exponential enrichment (SELEX) studies originally performed to characterize the αCP target sequence clearly identified sequences containing C-triplets and tetrads (33).To investigate the position-specific binding preferences of αCP1–KH1, we used SPR to monitor binding to a series of oligonuceotides in which cytosines were systematically replaced by adenine or thymine. Figure 4 shows sensorgrams of αCP1–KH1 binding to different 10-mer DNA containing oligonucelotide target tetrads and the steady state binding analysis in the cases where quantifiable binding was measured. A 5-nt A-rich spacer was included at the 5′-end of the oligonucleotide to distance the binding site from the matrix surface and another adenine was added at the 3′-end to restrict the possible binding mode to the target tetrad since purines have not been observed to bind at the fourth position. By this reasoning, we could assume that the target tetrad (underlined in each of the sequences) would be bound in the same register at the αCP1–KH1 binding site.
Figure 4.
Binding analysis of αCP1–KH domains of αCP1 to target RNA and DNA using SPR. Sensorgrams of αCP1–KH1 binding to a series of biotinylated DNA sequences, designed to test the preferential binding of cytosine at each of the four nucleotide binding positions. The sequences are shown above each set of sensorgrams with the binding tetrad underlined. Five adenines were used as a spacer between the biotin and the oligonucleotide binding site. Oligonucletides were captured on SA-coated sensor chips at a range of protein concentrations. Binding curves, derived from the approximated steady state binding of the proteins, were used to determine equilibrum dissociation constants (KDs). Errors are standard errors arising from fits.
Binding analysis of αCP1–KH domains of αCP1 to target RNA and DNA using SPR. Sensorgrams of αCP1–KH1 binding to a series of biotinylated DNA sequences, designed to test the preferential binding of cytosine at each of the four nucleotide binding positions. The sequences are shown above each set of sensorgrams with the binding tetrad underlined. Five adenines were used as a spacer between the biotin and the oligonucleotide binding site. Oligonucletides were captured on SA-coated sensor chips at a range of protein concentrations. Binding curves, derived from the approximated steady state binding of the proteins, were used to determine equilibrum dissociation constants (KDs). Errors are standard errors arising from fits.The first experiment was conducted with the sequence 5′-AAAAAACCCA-3′. The binding to this short single binding site DNA sequence occurred with a fast off-rate compared with the slow off-rate observed in the previous experiments, and readily reached equilibrium. Binding by αCP1–KH1 to this oligonucleotide was found to occur with micromolar affinity (KD = 33 μM). The second experiment, in comparison with the first, shows that cytosine in position 2 is important for binding. The sequence 5′-AAAAAATCCA-3′ (in which Cyt-2 is replaced by thymine) showed no binding by αCP1–KH1. This experiment also confirmed the assumption that adenine cannot be accommodated in position 4 (since, if it could, there would be no impediment to TCC binding in positions 1, 2 and 3). The non-binding of the ATCC sequence then, demonstrates the critical role of cytosine at position 2. Likewise, in the third experiment, the sequence 5′-AAAAAACTCA-3′ (in which Cyt-3 is replaced) showed no binding by αCP1–KH1. By the same logic, this confirms the critical role of cytosine in position 3. In contrast, the fourth experiment shows that cytosine is preferred, but not essential, in the fourth position. The sequence 5′-AAAAAACCTA-3′ (in which the Cyt-4 is replaced), showed some binding (though not quantifiable). The binding was reduced compared with the oligonucleotide containing a C-triplet, and therefore, shows that a thymine is tolerated in position 4, though cytosine is preferred. Last, the sequence 5′-AAAAACCCCA-3′ containing a C-rich tetrad was bound by αCP1–KH1 with 10-fold higher affinity (KD = 3.5 μM) than the ACCC sequence. This enhanced binding suggests that, in fact, a cytosine is preferred over adenine in the first position. From this series of experiments, it was established that in fact, there is a binding preference for cytosine by αCP1–KH1 at positions 1 and 4 (as well as 2 and 3).
High resolution structure αCP1–KH1.W.C54S bound to a CCCC tetrad
In order to obtain higher resolution structural data for the αCP1–KH1/DNA complex, we strategically designed protein and DNA constructs for co-crystal studies. Having recognized the tendency of C54 to form inter-molecular disulphide bonds upon crystallization, we prepared a C54S αCP-KH1 mutant and also added a tryptophan to the C-terminus to facilitate detection at 280 nm that would assist with purification. This construct, referred to as αCP1-KH1.W.C54S formed high quality crystals in complex with a C-tetrad DNA sequence 5′-ACCCCA-3′ established by our binding studies as a high affinity ligand. The crystals were grown as reported and diffracted to 1.77 Å resolution (36). The structure was solved by molecular replacement using the coordinates of a monomer from the αCP1–KH1/11-mer DNA complex as starting model (pdb id: 1ZTG). The oligonucleotide was removed from the coordinate prior to calculation. The refined model has a working R-factor of 16.35% and a free R-value of 21.34% at 1.77 Å resolution with an excellent stereochemistry including 99% of all amino acids in the favoured regions of a Ramachandran plot (100% in the allowed regions) (Table 1). As indicated in the coordinate file, the final model includes residues 14–86 of αCP1 and supplementary residues due to cloning procedures at N- and C-terminus. The coordinates are deposited in the protein structure data base (pdb id:3VKE).The αCP1-KH1.W.C54S/6-mer structure reveals the same organization as that described above for the αCP1-KH1/11-mer structure (Cα pair-wise RMSD 0.395 Å), establishing that the W and C54S mutations do not affect the structure or binding of oligonucleotide. The asymmetric unit contains four monomers organized as a pair of dimers. Each monomer binds a single copy of the 6-mer DNA. The position of 6 nt out of 6 was clearly identified in two chains in the electron density map (Figure 5A). In the two remaining chains, only the central four nucleotides are present. Absence of the extreme nucleotides can be attributed to a crystallization artefact, since those positions coincide with the crystallographic 2-fold screw axis. The oligonucleotide is arranged in an analogous fashion to that observed for the αCP1–KH1/11-mer DNA. In this case though, as well as partial base stacking of Cyt-4 across Cyt-3, base stacking is also observed for the terminal adenosine bases. Interestingly, an intra-molecular hydrogen bond is formed between Cyt-2 N4 and the phosphate preceding Cyt-1, as also seen for the αCP1–KH1/11-mer DNA. In addition, an equivalent hydrogen bond is formed between Cyt-4N4 and the phosphate preceding Cyt-3. Such hydrogen bonds, that are uniquely formed by cytosine, may help to stabilize the bound oligonucleotide conformation.
Figure 5.
Schematic representations of the αCP1–KH1/6-mer DNA complex. (A) αCP1–KH1.W.C54S is shown in cartoon representation with the electron density (blue mesh representation) surrounding the bound oligonucleotide (stick representation) showing the well resolved positioning of the oligonucleotide. (B) The surface of the αCP1–KH1.W.C54S structure is shown in grey illustrating the binding cleft that accommodates the oligonucleotide (stick representation). (C) Summary of the contacts between αCP1–KH1.W.C54S and bound DNA of sequence 5′-ACCCCA-3′. Van der Waals contacts are coloured orange, and hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar–phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine rings, and thus underling base specificity, are listed on the right. (D) Cartoon view of each of the four cytosines of the C-tetrad (stick representations) within the binding cleft of αCP1–KH1.W.C54S (cartoon representation and transparent surface). Key interacting residues are coloured in orange and hydrogen bonds underlying specificity are shown in black.
Cytosine recognition at all four nucleotide binding positions
The four cytosines are positioned in the hydrophobic binding cleft in similar positions as seen for the CCCT sequence in the αCP1–KH1/11-mer structure (Figure 5B). The αCP1–KH1.W.C54S/6-mer interactions are summarized in Figure 5C and D. The central two cytosines of the tetrad (Cyt-2 and Cyt-3) are positioned analogously to that previously described. Specific contacts to Cyt-2 are made via hydrogen bond interactions with the Arg57guanidinium side chain and Gly22 backbone oxygen groups. Specific contacts to Cyt-3 are made via hydrogen bonds with the Arg40guanidinium side chain and the backbone carbonyl of Ile49.Schematic representations of the αCP1–KH1/6-mer DNA complex. (A) αCP1–KH1.W.C54S is shown in cartoon representation with the electron density (blue mesh representation) surrounding the bound oligonucleotide (stick representation) showing the well resolved positioning of the oligonucleotide. (B) The surface of the αCP1–KH1.W.C54S structure is shown in grey illustrating the binding cleft that accommodates the oligonucleotide (stick representation). (C) Summary of the contacts between αCP1–KH1.W.C54S and bound DNA of sequence 5′-ACCCCA-3′. Van der Waals contacts are coloured orange, and hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar–phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine rings, and thus underling base specificity, are listed on the right. (D) Cartoon view of each of the four cytosines of the C-tetrad (stick representations) within the binding cleft of αCP1–KH1.W.C54S (cartoon representation and transparent surface). Key interacting residues are coloured in orange and hydrogen bonds underlying specificity are shown in black.In addition to these contacts underlying specificity at positions 2 and 3, the αCP1–KH1.W.C54S/6-mer structure revealed further contacts that underlie a preference for cytosine in position 1, as well as the hydrogen bond between the side chain of Asp82 and the Cyt-1 N4 group, a hydrogen bond is also seen between the Lys31 backbone NH and the Cyt-1 O2 group. In the case of Cyt-4, a clear difference is seen from the αCP1–KH1/11-mer structure in which thymine was present at this position. The side chain of Glu51, that was not positioned towards the oligonucleotide in the αCP1–KH1/11-mer structure is clearly defined in the density of the αCP1–KH1.W.C54S/6-mer structure and forms a hydrogen bond to the Cyt-4N4 group. Together, these favourable and specific interactions likely contribute towards the higher affinity of a C-tetrad at the binding site of αCP1–KH1.Interestingly, the 5′-Ade also makes contact with the αCP1–KH1 binding cleft in the αCP1–KH1.W.C54S/6-mer structure. It forms base stacking interactions with Cyt-1 and also is able to form hydrogen bonds with the Ser85 side chain hydroxyl. Ser85 was not positioned to interact with the oligonucleotide in the αCP1–KH1/11-mer structure, suggesting that an additional interaction with adenine, not previously detected, to our knowledge, in the study of KH domains, may play a role in the binding of αCP1–KH1 to target oligonucleotides. In contrast, the 3′-Ade forms base stacking interactions with Cyt-4 and is held away from the surface of the protein where it does not make direct protein interactions.
DISCUSSION
PCBP are triple KH domain proteins that exert an extraordinarily diverse array of functions through their ability to interact with C-rich single stranded oligonucleotides. This includes the stabilization of mRNA, translational repression, translational enhancement, transcriptional enhancement and possible involvement in telomere functioning. This functional diversity suggests that PCBP do not confer a specific functional outcome, but may play a general scaffolding role and help to form a variety of functional oligonucleotide structures, depending upon the target C-rich sequence. Such a role could be achieved via the interactions of the individual KH domains of PCBP with adjacent C-rich regions of the target oligonucleotide, and also through intra- and/or inter-molecular interactions between PCBP and other RNA-binding proteins.In the current study, we focus on the PCBP αCP1 (also known as PCBP1 and hnRNP E1). In the previous work, we identified αCP1 as one of the proteins that targets the androgen receptor (AR) mRNA at a specific 3′-UTR site, and plays a role in the regulation of its translation (35). In an effort to better understand the mode of interaction of αCP1 with this mRNA target, we first undertook EMSA experiments with the 51-nt target (representing AR mRNA nucleotides 3275–3325). This showed not only an interaction, but the formation of increasingly large multimeric structures as higher proportions of αCP1 were added. This suggests that either αCP1 is capable of binding the target RNA at multiple sites at high concentrations (i.e. not only at the C-rich site) or that αCP1 can initiate the formation of multi-protein/RNA complexes. Since αCP1–KH1 alone was also able to shift the probe, but did not cause increasingly large structures to be formed, it suggests that the triple KH motif structure of full-length αCP1 facilitates the formation of multi-protein/RNA complexes. This is most likely to represent a function of the dimerizing ability of KH domains 1 and 2 that have been identified as forming intra-molecular homodimer, but could play a role in inter-molecular heterodimerization (30).Through further EMSA studies focusing on the specific 11-nt C-rich site of AR mRNA and its DNA equivalent sequence, we confirmed that αCP1 could interact with both oligonucleotides. αCP1–KH1 alone could also shift these probes, though did not bind with as high affinity as the full-length protein. Thus, the interaction of full-length αCP1 with RNA and DNA is supported by its multiple KH domain interactions. Furthermore, it appeared from these experiments that DNA was more readily shifted than RNA, implying a higher affinity interaction with DNA. This would be in contrast to a previous report of higher affinity binding by αCP proteins to RNA over DNA (32). In their report, however, binding to poly-C-RNA was being compared with binding to a ssDNA corresponding to the sequence from the V8 domain from human ribosomal RNA, i.e. a long GC-rich sequence that may not have availed the equivalent density of C-rich target sites. This study, in contrast, utilized directly equivalent RNA and DNA. The moderately higher binding affinity to DNA over RNA in vitro likely reflects the greater structural accessibility of the DNA (due to the absence of the sugar 2′-OH that assists RNA secondary structure formation) for forming productive binding interactions. Within the cell, however, the actual target of αCPs will be dictated by the localization of the protein and the accessibility of single-stranded C-rich regions of the oligonucleotides, rather than be a function of the difference in affinity.In order to more quantitatively define the roles of the three individual αCP1 KH domains in interacting with RNA and DNA, SPR studies were carried out with each of the individual KH domains. The binding affinities of the individual KH domains were in the micro-molar range, in contrast to the nano-molar affinity determined for full-length αCP-KH1 to target AR mRNA (35). The KH domains thus act synergistically to achieve tighter binding by several orders of magnitude over their individual binding. A comparison of KH domain binding interactions revealed that αCP1–KH1 and KH3 contribute the most to the binding affinity of αCP1 to the C-rich oligonucleotide, with a slow dissociation phase observed for KH1 indicating the most stable interaction. Interestingly, whereas interactions between KH2 and RNA could not be detected, KH2 binding to DNA was observed. Furthermore, for all three αCP1 KH domains, higher affinities were again observed for DNA over RNA. This was not a reflection of different protein preparations, as experiments were conducted simultaneously (with RNA and DNA on individual flow cells of the same chip). Overall, it is evident that KH1 and KH3 are likely to make the primary interactions with target oligonucleotide, with KH1 conferring the greatest stability to the interaction. Although RNA binding by KH2 was not observed, in the context of the full-length αCP1 where it is covalently tethered close to the RNA via KH1 and KH3, it would have a greatly enhanced chance of interaction.The structure of αCP1–KH1 bound to an 11-mer with two immediately adjacent C-rich sites provides further insight into the topological arrangement between poly-C-binding proteins and their target RNA or DNA. The αCP1–KH1 domains were arranged without the formation of contacts between αCP1 and KH1 domains bound at adjacent DNA binding sites. The way in which the αCP2-KH domains can bind in close proximity without forming contacts has previously been shown crystallographically at C-rich sites separated by 2 nt (30). In contrast, the structure of KH3 of hnRNPK bound at immediately adjacent sites places the DNA tightly sandwiched between the two KH domains (26). Their close proximity potentially allows interactions between the GXXG region of one KH domain with the variable loop region of the next. The current structure shows, however, that it is possible for KH domains to be bound arranged head-to-tail at immediately adjacent oligonucleotide sites without forming contacts with one another. This may reflect the way in which covalently attached KH domains within full-length αCP1 interact with target oligonucleotide.Furthermore, the current structure demonstrates that αCP1–KH1 binding to two immediately adjacent C-rich sites can occur simultaneously with dimer formation. Throughout the crystal lattice of the αCP1–KH1/11-mer DNA complex was a network of protein/DNA complexes interconnected via homo-dimer KH domain interactions. Both homo- and heterodimer formation have previously been observed for KH domains (26,27,30,31) and, especially within the context of the higher affinity multivalent full-length αCP1, this type of network could explain the observation of increasing molecular weight species observed with increasing αCP1 added to target oligonucelotide.In the current study, we were particularly interested to better understand the basis for αCP1–KH1 specificity for C-rich oligonucleotides. In previous studies of related KH domains, it has been established that the oligonucleotide binding cleft accommodates four nucleotide binding positions and that it is the central two positions that confer specificity to cytosines (26–31). The structural studies showed, however, A, T or C bound in the first position and C or T, but not a purine, bound in the fourth position. In our own αCP1–KH1/11-mer structure, we observed CCCT bound in the oligonucleotide binding site in the same way as reported for αCP2–KH1 bound to a telomeric sequence containing CCCT (31). A subsequent study by the same group however, showed that this telomeric sequence could also be bound by αCP2–KH1 in a different register, with ACCC placed in the oligonucleotide binding site (30). This raises the question of whether there is, in fact, a binding preference by αCP–KH1 domains for specific bases in the first and fourth positions.We therefore used SPR to measure the affinity of a series of oligonucleotides in which cytosines within a target tetrad were systematically replaced by thymine or adenine. These experiments confirmed that the presence of cytosine at the central positions (positions 2 and 3) is essential for binding by αCP1–KH1. The experiments also showed that cytosine was preferentially bound at the fourth position, though thymine was tolerated and interestingly, that a cytosine tetrad bound with the highest affinity interactions of all. Thus for the first time, it is demonstated that, in fact, αCP1–KH1 domain preferentially binds cytosine at all four positions of its oligonucleotide binding cleft.The structural basis for this preferential binding is revealed in the high resolution structure of a αCP1–KH1 mutant (αCP1–KH1.W.C54S) in complex with a cytosine tetrad DNA (5′-ACCCCA-3′). At position 1, the same protein–oligonucleotide interactions as observed for αCP2-KH1 bound to a pair of tandem ‘CCCT’ motifs were observed (30). In the current structure, however, we observed an interaction between Cyt-1 and the Asp82 side chain not observed for αCP2–KH1 since the αCP2–KH1 construct stopped short of residue 82. We also observed an interaction between Cyt-1 O2 with the backbone NH of Lys31 that was present, but not reported for αCP2-KH1. Both of these interactions would favour the binding of cytosine at position 1. A further feature of this structure is the positioning of the nucleotide that precedes the bound tetrad. A nucleotide in this position has not been previously reported to form significant interactions with a KH domain, but owing to the few extra residues at the C-terminal end of our αCP1–KH1.W.C54S construct we were able to observe hydrogen bonds to the adenine base in this position. Together, these interactions may contribute to the higher affinity of the C-tetrad observed and reveal the molecular basis for the preferential binding of cytosine in the first nucleotide binding position.With respect to the fourth position, we did not observe any base-specific interactions to Thy-4 in the αCP1–KH1/11-mer structure. A hydrogen bond from the Arg40guanidinium group to the Thy-4 O2 was observed, but this could equally well support binding to cytosine in this position. The only reported base-specific interaction to a base in the fourth position has been a hydrogen bond between the Glu51 side chain and the cytosine-N4 group (31). Indeed, this was also observed in the current αCP1–KH1.W.C54S/6-mer structure and clearly supports specific binding to cytosine. A further feature of cytosine in this position, however, is the intra-molecular hydrogen bond that is formed between the Cyt-4N4 group and the phosphate linking Cyt-3 and Cyt-4. This interaction, that thymine does not form, may contribute to the overall preferential binding of cytosine though the stabilization of the bound oligonucleotide conformation.Thus, together these data demonstrate that αCP1–KH1 preferentially binds a C-rich tetrad and reveal the molecular basis for this specificity. In the context of the full-length protein binding to target oligonucleotide, KH1, by virtue of its highest binding affinity, may form the initial interaction. We speculate that this would enable binding by KH2 through its restricted proximity to an adjacent C-rich site on the RNA. KH3, also able to bind independently to RNA, may bind at a nearby C-rich site. Due to the capacity of αCP1 KH1 to form homodimers or heterodimers with KH2, interactions with other RNA bound αCPs may underlie the formation of higher order multi-protein/RNA complexes that direct the outcome for the RNA.
ACCESSION NUMBERS
1ZTG, 3VKE.
FUNDING
Australian Research Council Project Grant funding (awarded to M.C.J.W. and J.A.W.); National Health and Medical Research Senior Research Fellowship (awarded to M.C.J.W.). Funding for the open access charge: Australian Research Council.Conflict of interest statement. None declared.
Authors: Yano M K Yoga; Daouda A K Traore; Jacqueline A Wilce; Matthew C J Wilce Journal: Acta Crystallogr Sect F Struct Biol Cryst Commun Date: 2011-09-30
Authors: M Sidiqi; J A Wilce; J P Vivian; C J Porter; A Barker; P J Leedman; M C J Wilce Journal: Nucleic Acids Res Date: 2005-02-24 Impact factor: 16.971
Authors: Mikael Feracci; Jaelle N Foot; Sushma N Grellscheid; Marina Danilenko; Ralf Stehle; Oksana Gonchar; Hyun-Seo Kang; Caroline Dalgliesh; N Helge Meyer; Yilei Liu; Albert Lahat; Michael Sattler; Ian C Eperon; David J Elliott; Cyril Dominguez Journal: Nat Commun Date: 2016-01-13 Impact factor: 14.919
Authors: Anastasia S Grosheva; Dmitry O Zharkov; Joachim Stahl; Alexander V Gopanenko; Alexey E Tupikin; Marsel R Kabilov; Dmitri M Graifer; Galina G Karpova Journal: Nucleic Acids Res Date: 2017-04-20 Impact factor: 16.971
Authors: Sawsan Napthine; Emmely E Treffers; Susanne Bell; Ian Goodfellow; Ying Fang; Andrew E Firth; Eric J Snijder; Ian Brierley Journal: Nucleic Acids Res Date: 2016-06-02 Impact factor: 16.971