| Literature DB >> 17985212 |
Ronnie O Frederick1, Lai Bergeman, Paul G Blommel, Lucas J Bailey, Jason G McCoy, Jikui Song, Louise Meske, Craig A Bingman, Megan Riters, Nicholas A Dillon, John Kunert, Jung Whan Yoon, Ahyoung Lim, Michael Cassidy, Jason Bunge, David J Aceti, John G Primm, John L Markley, George N Phillips, Brian G Fox.
Abstract
A simple approach that allows cost-effective automated purification of recombinant proteins in levels sufficient for functional characterization or structural studies is described. Studies with four human stem cell proteins, an engineered version of green fluorescent protein, and other proteins are included. The method combines an expression vector (pVP62K) that provides in vivo cleavage of an initial fusion protein, a factorial designed auto-induction medium that improves the performance of small-scale production, and rapid, automated metal affinity purification of His8-tagged proteins. For initial small-scale production screening, single colony transformants were grown overnight in 0.4 ml of auto-induction medium, produced proteins were purified using the Promega Maxwell 16, and purification results were analyzed by Caliper LC90 capillary electrophoresis. The yield of purified [U-15N]-His8-Tcl-1 was 7.5 microg/ml of culture medium, of purified [U-15N]-His8-GFP was 68 microg/ml, and of purified selenomethione-labeled AIA-GFP (His8 removed by treatment with TEV protease) was 172 microg/ml. The yield information obtained from a successful automated purification from 0.4 ml was used to inform the decision to scale-up for a second meso-scale (10-50 ml) cell growth and automated purification. 1H-15N NMR HSQC spectra of His8-Tcl-1 and of His8-GFP prepared from 50 ml cultures showed excellent chemical shift dispersion, consistent with well folded states in solution suitable for structure determination. Moreover, AIA-GFP obtained by proteolytic removal of the His8 tag was subjected to crystallization screening, and yielded crystals under several conditions. Single crystals were subsequently produced and optimized by the hanging drop method. The structure was solved by molecular replacement at a resolution of 1.7 A. This approach provides an efficient way to carry out several key target screening steps that are essential for successful operation of proteomics pipelines with eukaryotic proteins: examination of total expression, determination of proteolysis of fusion tags, quantification of the yield of purified protein, and suitability for structure determination.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17985212 PMCID: PMC2668602 DOI: 10.1007/s10969-007-9032-5
Source DB: PubMed Journal: J Struct Funct Genomics ISSN: 1345-711X
Fig. 1Expression vector pVP62K. (a) Linear map showing key features of the vector and location of the Bar-CAT toxic cassette and 3′ homology region (3′-hmr) for Flexi Vector cloning. (b) Nucleotide and encoded protein sequence in the linker region near to the SgfI cloning site. The TVMV protease site is ETVRFQS, where proteolysis occurs between the Q and S residues. The fusion protein may be cleaved in the expression host due to the presence of a low level of TVMV protease produced by constitutive expression from pVP62K. The TEV protease site is ENLYFQA, where proteolysis occurs between the Q and A residues. After purification of the His8-tagged protein, the His8 tag can be removed by treatment with TEV protease to release an N-terminal AIA-target
Alternating coding and non-coding strand PCR primers used for construction of an optimized GFP gene
| Primer | Length | Nucleotide sequencea |
|---|---|---|
| 1 | 90 | CCCCCGGGGGCC |
| 2 | 95 | |
| 3 | 94 | |
| 4 | 95 | |
| 5 | 94 | |
| 6 | 94 | |
| 7 | 92 | |
| 8 | 94 | |
| 9 | 94 | |
| 10 | 90 |
aThe underlined DNA sequences are the overlapping complementary annealing segments used to assemble the synthetic codon optimized GFP gene. Primers numbered 1 (containing an NdeI site in bold) and 10 (with a Bam HI site in bold) are the flanking oligonucleotides at the beginning and end of the gene, respectively
Structural genomics target proteins investigated
| Lanea | Proteinb | Mol. wt. (Da) | pI | Original pipelinec | Purification screend | Better predictione | ||
|---|---|---|---|---|---|---|---|---|
| Score | Decision | μg/ml | Decision | |||||
| A1 | 51,892.0 | 6.9 | MMH | Purify | 883 | Purify | ||
| A2 | 54,803.5 | 9.5 | HHM | Purify | 0 | Work stopped | Yes | |
| A3 | 56,755.3 | 8.6 | MMM | Purify | 273 | Purify | ||
| A4 | 57,037.5 | 10.1 | HMW | Work stopped | 93 | Work stopped | ||
| A5 | 69,328.8 | 4.1 | MMM | Purify | 286 | Purify | ||
| A6 | 70,033.3 | 9.4 | MMM | Purify | 36 | Work stopped | Yes | |
| A7 | 74,013.6 | 6.6 | WWW | Work stopped | 82 | Work stopped | ||
| A8 | 78,593.1 | 7.7 | WWW | Work stopped | 370 | Purify | Nof | |
| A9 | 79,989.0 | 4.5 | MMW | Work stopped | 491 | Purify | Nof | |
| A10 | 79,337.4 | 9.7 | HHM | Purify | 45 | Work stopped | Yes | |
| A11 | 79,916.3 | 6.0 | HMM | Purify | 283 | Purify | ||
| A12 | 84,575.6 | 4.8 | HMM | Purify | 394 | Purify | ||
| B1 | 84,058.9 | 9.1 | MMM | Purify | 10 | Work stopped | Yes | |
| B2 | 84,420.4 | 9.4 | MMW | Work stopped | 11 | Work stopped | ||
| B3 | His8-MBP | 44,000.0 | 5.5 | nd | Purify | 361 | Purify | |
| B4 | AIA–GFP | 29,225.0 | 5.59 | HHH | Purify | 859 | Purify | |
aLane corresponding to the electropherogram image of Fig. 2. Lanes A1–B2 are His8-MBP-target fusion proteins expressed from pVP56 that cannot undergo in vivo cleavage. Lane B4 is His8-GFP expressed from pVP62 that can undergo in vivo cleavage
bOrganism and mammalian gene collection identification number or control protein described in the text
cThe three-letter score represents an assessment of protein production (first letter), solubility (second letter), and cleavage by TEV protease (third letter) with H indicating high expression, high solubility or high efficiency of proteolysis, M indicating intermediate behavior for these properties, and W indicating weak, unsuitable behavior. In our protocol “W” for any category is cause for “work stopped” on the target, while “H” or “M” scores advance a target to large-scale purification. Use of this quantification system has been published elsewhere [29]. In the original pipeline scoring, 65% of the targets gave the exact same result, and the remaining 35% had variations, primarily in the scores for solubility and proteolysis. The use of the in vivo cleavage vector pVP62 and purification screening (Figs. 3 and 4) addresses the issue of proteolysis as a part of the screening process by emphasizing the recovery of a purified protein rather than evaluation of intermediate steps
dThe μg/ml reported for fusion proteins after automated purification using the Maxwell 16 was from the Caliper LC90 analysis. A yield of purified fusion protein greater than 100 μg/ml was used as the comparator for efficacy of the purification screening of His8-MBP fusion proteins
eAssessment of whether the Maxwell 16 purification would lead to better predictive behavior in small-scale screening. Entries marked “Yes” correspond to targets that would not advance to purification scale-up because they failed the automated Maxwell 16 purification
fTargets marked “No” would have been advanced to purification scale-up, but without an assessment of TEV proteolysis. Treatment of the Maxwell-purified fusion protein with TEV protease gave weak cleavage, as observed in the original pipeline screen. This weak cleavage behavior was reproduced in three additional samples produced in 2 l growths. The use of the in vivo cleavage vector pVP62 (Figs. 3 and 4) addresses the issue of protease cleavage as part of the screening process
Fig. 2Conditional methionine auxotrophy in E. coli B834. (a) Genome organization near to the metE gene in E. coli K12 [51]. (b) Genome organization near to the metE gene in E. coli B834. In this organism, DNA sequencing revealed a large insert in the metE gene, which caused the protein to be truncated to 56 amino acids (aa), non-functional peptide
Fig. 3Caliper LC90 analysis of His8-tagged proteins purified by Maxwell 16. Lanes LA and LB are molecular weight markers. Lanes A1–B2 are structural genomics target proteins (protein bands marked with ovals) with molecular weight ∼50–75 kDa. They were expressed in factorial evolved auto-induction medium containing selenomethionine [33] as an N-terminal fusion with MBP from pVP56K, a vector that does not give in vivo proteolysis of the fusion protein. Lane B3 contains His8-MBP (protein band marked with oval), while lane B4 (1.1 mg/ml) contains His8-GFP expressed from pVP62 after in vivo cleavage from MBP. Lanes with a purified expressed fusion protein with yield than 100 μg/ml are marked with a star (also see Table 2 )
Fig. 4Small-scale purification screening of human embryonic stem cell proteins. Human stem cell proteins were expressed in E. coli B834 by auto-induction, liberated by in vivo proteolysis, and purified by the Maxwell 16 purification system. Table 2 provides further information on these proteins. Lane 1, molecular weight markers. Lanes 2 and 3, total cell lysate and eluted sample from purification of CCNF. No purified protein was detected. Lanes 4 and 5, C10orf96 was obtained in detectable amounts, but not sufficient for scale-up, along with two higher molecular weight contaminants. Lanes 6 and 7, His8-Tcl-1 was expressed, proteolyzed, and successfully purified. Lanes 8 and 9, NPM2 was expressed and proteolyzed, but only a small amount of protein was purified. In addition, the purified protein appeared to be partially degraded. Lanes 10 and 11, His8-GFP
Human embryonic stem cell proteins and others characterized by in vivo cleavage and purification screening
| Proteina | Annotation | Database ID | Mol. wt. (Da) | Yield (μg/ml)b |
|---|---|---|---|---|
| [ | T-cell leukemia/lymphoma | MGC:20335, 2260, 2170 | 13,459.6 | 7.5 |
| CCNF | Cyclin F | MGC:20163 | 87,639.8 | <1 |
| C10orf96 | Chromosome 10 open reading frame 96 | MGC: 35062 | 31,035.3 | Not detected |
| NPM2 | Nucleophosmin/nucleoplasmin 2 | MGC:78655 | 24,152.0 | <1 |
| [ | Control protein, synthetic gene | 26,842.4 | 68 | |
| SeMet-AIA–GFP | Control protein, synthetic gene | 29,226.0 | 172 |
acDNA for the human proteins provided by Prof. James Thomson
bFrom Caliper LC90 analysis of the protein obtained from Maxwell 16 purification
Fig. 5Replicate Maxwell 16 purification of human embryonic stem T-cell lymphoma-1 protein. Lane 1, molecular weight markers. Lanes 2–12, replicate purifications of His8-Tcl-1. Lane 13, His8-MBP-At2g34690.1, an Arabidopsis thaliana protein expression control
Fig. 61H–15N HSQC NMR spectra of Maxwell-purified proteins Tcl-1 and GFP. (a) 750 MHz spectrum of His8-Tcl-1 obtained at 35°C (1.75 mg in 250 μl of 10 mM KHPO4, pH 7, containing 50 mM KCl). The total NMR time required to obtain this spectrum was 9.5 h. (b) 600 MHz spectrum of His8-GFP obtained at 35°C (5.6 mg in 250 μl of 10 mM KHPO4, pH 7, containing 50 mM KCl). The total NMR time required to obtain this spectrum was 1 h
Summary of crystallization conditions observed for AIA–GFP
| Conditiona | Precipitant (w/v) | Buffer | Salt |
|---|---|---|---|
| 1 | 24% MEPEG 5K | 0.1 M MES, pH 6.0 | 160 mM CaCl2 |
| 2 | 16% MEPEG 5K | 0.1 M BTP, pH 7.0 | 200 mM glycine |
| 3 | 28% MEPEG 2K | 0.1 M HEPES, pH 7.5 | 100 mM CaCl2 |
| 4 | 20% PEG 4K | 0.1 M HEPPS, pH 8.5 | 80 mM CaCl2 |
| 5 | 60% MPD | 0.1 M MES, pH 6.0 | None |
aConditions present in the UW192 crystallization screen used at UW Center for Eukaroytic Structural Genomics
Summary of data collection, crystal structure, and refinement statistics for AIA–GFP
| Data collectiona | |
| Space group | P212121 |
| Cell dimensions | |
| a, b, c (Å) | 51.46, 61.99, 70.02 |
| α, β, γ (°) | 90, 90, 90 |
| Resolution (Å)b | 46.42–1.70 (1.74–1.70) |
| No. reflections | 23752 |
| Rmerge (%)c | 15.7 (36.6) |
| (I/σI)d | 28.27 (2.93) |
| Completeness (%) | 98.98 (97.32) |
| Redundancy | 29.45 (5.42) |
| Refinemente,f | |
| Resolution (Å) | 46.42–1.70 (1.74–1.70) |
| No. reflections | 23752 (1276) |
| Rwork/Rfree | 0.168/0.220 (0.246/0.323) |
| No. atoms | |
| Protein | 1915g |
| Water | 377 |
| Mean B-value (overall) | 12.34 |
| Ramachandran analysis | |
| Most favored regions | 92.4 |
| Additional allowed regions | 7.6 |
| RMS deviations | |
| Bond lengths (Å) | 0.012 |
| Bond angles (°) | 1.548 |
aData collected at University of Wisconsin Center for Eukaryotic Structural Genomics. Coordinates and structure factors were deposited in the Protein Data Bank with accession number 2qu1
bNumbers in parentheses indicate the highest resolution shell of 20
cRmerge = Σ |I − < I >| / Σ I, where I is the observed intensity and < I > is the average intensity obtained from multiple measurements
dThe root-mean-squared value of the intensity measurements divided by their estimated standard deviation
eRwork = Σ | | F0 | − | Fc | | / Σ | F0 |, where | F0 | is the observed structure factor amplitude and | Fc | is the calculated structure factor amplitude
fRfree = R-factor based on 5.1% of the data excluded from refinement
gNumber of non-hydrogen protein atoms included in refinement
Fig. 7X-ray structure of AIA–GFP. The chromophore is shown as green cylinders representing bonded atoms
Fig. 8Schematic of a purification screening protocol. Steps from obtaining a sequence-verified target in auto-cleavage vector pVP62K to identification of purified proteins. The transformed expression host is grown in auto-induction medium. Cells from production trials are loaded into the Maxwell 16 instrument for automated purification, and purified proteins are detected by Caliper LC90 capillary electrophoresis. Successful purification of a protein from auto-cleavage expression with yield exceeding 50 μg/ml of culture medium indicates feasibility of scale-up efforts