| Literature DB >> 28177304 |
Abstract
With continuous technical improvements at synchrotron facilities, data-collection rates have increased dramatically. This makes it possible to collect diffraction data for hundreds of protein-ligand complexes within a day, provided that a suitable crystal system is at hand. However, developing a suitable crystal system can prove challenging, exceeding the timescale of data collection by several orders of magnitude. Firstly, a useful crystallization construct of the protein of interest needs to be chosen and its expression and purification optimized, before screening for suitable crystallization and soaking conditions can start. This article reviews recent publications analysing large data sets of crystallization trials, with the aim of identifying factors that do or do not make a good crystallization construct, and gives guidance in the design of an expression construct. It provides an overview of common protein-expression systems, addresses how ligand binding can be both help and hindrance for protein purification, and describes ligand co-crystallization and soaking, with an emphasis on troubleshooting.Entities:
Keywords: construct design; protein crystallization; protein–ligand complexes; soaking
Mesh:
Substances:
Year: 2017 PMID: 28177304 PMCID: PMC5297911 DOI: 10.1107/S2059798316020271
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
Figure 1Experimental cycle for protein–ligand complex crystal structure generation. For targets with limited structural information, the cycle starts with selecting suitable start and end points for the protein (subdomain) of interest. The resulting protein fragments can be combined with different expression vectors, adding affinity tags for purification. Not all of these constructs will express equally well, and usually only the subset with sufficient expression levels will be taken forward into purification. Extensive optimization of expression and purification conditions should be weighed against the design of more constructs and the use of different solubilizing and affinity tags. If previous structures of the protein (fragment) are available, the cycle is typically entered at the crystallization or soaking stage. Co-crystallization and soaking is ligand-dependent, even for ligands of similar binding affinity. If available, testing batches of 3–5 similar compounds in parallel is recommended. After a co-crystal structure has been determined, the ligand complex should be carefully checked and, if necessary, the cycle re-entered.
Experimental considerations for different entry points to the protein–ligand complex structure-determination cycle illustrated in Fig. 1 ▸
| Starting point/assessment result | Checks | Experimental considerations |
|---|---|---|
|
| ||
| Good-quality data, structure explains SAR | Do new compounds differ from previous (affinity, MW, solubility)? | Use same construct. If soaking fails, try optimizing soaking time and ligand concentration. If not successful, try co-crystallization. |
| Poor occupancy of ligands | Check affinity/solubility of ligands | Increase ligand concentration and/or soaking time |
| Binding mode cannot explain SAR | Check structure: packing issues around ligand-binding site? | Try generating a new crystal form(i) change crystallization conditions (ii) co-crystallize ligand (iii) change construct |
| Check construct used: mutations/modifications that might impair ligand binding? | Test alternative constructs | |
|
| ||
| Good-quality data, structure explains SAR | Do new compounds differ from previous (affinity, MW, solubility)? | Use same construct. Co-crystallization might be ligand-dependent and a wider crystallization screen might be necessary. |
| Poor occupancy of ligands | Check affinity/solubility of previous ligands | Pre-incubate protein at higher compound excess. Reduce protein concentration during incubation for compounds with low solubility. Co-crystallize (with lower affinity compound) and back-soak. |
| Binding mode cannot explain SAR | Close crystal-packing contacts near ligand-binding site? | Try generating a new crystal form by changing crystallization conditions |
| Mutations/modifications that might impair ligand binding? | Test alternative constructs | |
|
| ||
| Full-length structure or isolated domain of target protein needed? | Consider preparing full-length protein alongside domain fragments as it can be used as a reference in biochemical assays, and its limited proteolytic digest can assist in the choice of suitable boundaries for the protein fragments | |
| Homologue structure available? Do proteins align well at termini of homologue construct? Secondary-structure prediction: low complexity/secondary-structure elements at domain boundary? | Chose boundaries similar to homologue in case of good alignment, but avoid cutting into predicted α-helices and β-strands or including long stretches of low structural complexity. Follow guidelines given in §3.1 | |
| PTM to be considered? | Include mutations that mimic/prevent PTM | |
Figure 2Cartoon representation of the construct-design process. (a) For multi-domain proteins, the domain of interest is identified using experimental and/or alignment data. (b) The domain architecture of the isolated domain is then inspected for suitable start and end points (bright green), avoiding cutting through secondary-structure elements (magenta) and with the aim of including all residues required for function. Limited proteolysis data as well as secondary-structure prediction tools can be used as a guide, as well as structural data of homologue proteins. Sample homogeneity can be achieved at the sequence level through the mutation of residues targeted by post-translational modifications (PTMs) such as phosphorylation (orange) that either prevent or mimic the PTM. Further construct optimization can involve the mutation of surface residues with flexible or charged side chains (cyan), either as single mutants or in clusters, to alanine or residue types that reverse or remove the charge. Regions of low structural complexity (grey, dark green) can be replaced by short linker residues or equivalent residues in homologue proteins to reduce conformational variability in the construct. (c) Events such as cofactor binding or PTMs can affect the conformational state and ligand-binding ability of the protein and should be considered in the design of the experiment.
Figure 3Construct design of the PDE5/PDE4 chimera protein. (a) Domain boundaries of the original PDE5A1 catalytic domain construct with disordered residues highlighted in blue and the proteolytic sites marked. (b) Sequence alignment between the loop region of PDE5A1 (top row) with equivalent residues in PDE4B2B (bottom row) and the sequence of the PDE5A1/PDE4B2B chimera protein (middle row). (c) Cartoon representation of the crystal structure of PDE4B2B (PDB entry 4nw7) with the loop insertion region highlighted in orange, created using CCP4mg molecular-graphics software (McNicholas et al., 2011 ▸).
Figure 4Rounds of construct optimization for the HDAC4 histone deacetylase domain. (a) The construct boundaries of the original HDAC4 catalytic domain structure (PDB entry 2vqj, top) and the HDAC7 catalytic domain in PDB entry 3c0y (bottom) are shown. The HDAC7 numbering is aligned with the HDAC4 numbering. The first residue modelled in the original HDAC4 structure corresponds to the first residue modelled in the HDAC7 structure and was used as the starting point for the optimized HDAC4 construct (middle). The C-terminus of the optimized HDAC4 model was chosen based on the last residue visible in the HDAC7 structure. This shortened the new HDAC4 construct by about 20 residues compared with the original HDAC4 structure, and the new C-terminal boundary is highlighted in the cartoon representation of HDAC4 (PDB entry 2vqj). It was speculated that this truncation would help to generate an alternative crystal form with more favourable packing contacts. (b) Crystal structure of the new HDAC4 construct with bound ligand (PDB entry 4cbt). The new HDAC4 construct did result in an alternative crystal form. Here, close crystal contacts are observed between ligand (yellow) and Leu728 (magenta) at the terminus of a disordered loop in a neighbouring chain (blue). These contacts were thought to hinder co-crystallization with larger ligands. (c) In a second round of construct optimization, Leu728 was mutated to alanine (green) and the mutant readily crystallized in the presence of ligand in yet another crystal form devoid of packing interactions at the ligand-binding site (PDB entry 4cby). In addition, a loop that was disordered in the corresponding wild-type structure could be modelled into the electron density (orange). Figures were created with the CCP4mg molecular-graphics software (McNicholas et al., 2011 ▸).
Comparison of expression systems
| Expression system | Advantages | Disadvantages |
|---|---|---|
|
| Simple, cheap, easily available to most laboratories, many specialized strains and plasmid expression vectors available, many with strong inducible promoters. Quick and easy to scale-up; good for multi-construct screening. Potential for inclusion-body/refolding route to protein generation. | Lack of post-translational modification (glycosylation, phosphorylation |
| Yeast | Media cheap, scale-up easy and expression levels often very high. Good for some post-translational modifications. Can be good for secreted and membrane proteins |
|
| Insect cells/baculovirus | Generally good expression levels. Relatively easy to propagate cells. Good for most post-translational modifications. Good for secreted and membrane proteins or where there is a requirement for co-expression of multiple subunits (can use multiple viruses). | Media expensive; virus generation is quite time-consuming. Lytic system reduces opportunities for stable expression: inducible systems limited. Requires sterile environment for propagation. |
| Mammalian | Good for screening expression (as transients). Can obtain stable expression (integrated). Best for authentic post-translational modification. Inducible systems available. Good for secreted and membrane proteins. | Media often very expensive; transient scale-up requires expensive transfection reagents and large quantities of plasmid DNA. Complex glycosylation can interfere with crystallization. Requires sterile environment for propagation and CO2 atmosphere in many cases. |
Figure 5Diffraction image for a crystal apparently damaged during soaking. For the example shown, a data set could be collected and the structure solved without significant loss in resolution or map quality compared with nonsoaked apo crystals.