Literature DB >> 31793898

Non-merohedral twinning: from minerals to proteins.

Madhumati Sevvana¹, Michael Ruf², Isabel Usón³, George M Sheldrick⁴, Regine Herbst-Irmer⁴.

Abstract

In contrast to twinning by merohedry, the reciprocal lattices of the different domains of non-merohedral twins do not overlap exactly. This leads to three kinds of reflections: reflections with no overlap, reflections with an exact overlap and reflections with a partial overlap of a reflection from a second domain. This complicates the unit-cell determination, indexing, data integration and scaling of X-ray diffraction data. However, with hindsight it is possible to detwin the data because there are reflections that are not affected by the twinning. In this article, the successful solution and refinement of one mineral, one organometallic and two protein non-merohedral twins using a common strategy are described. The unit-cell constants and the orientation matrices were determined by the program CELL_NOW. The data were then integrated with SAINT. TWINABS was used for scaling, empirical absorption corrections and the generation of two different data files, one with detwinned data for structure solution and refinement and a second one for (usually more accurate) structure refinement against total integrated intensities. The structures were solved by experimental phasing using SHELXT for the first two structures and SHELXC/D/E for the two protein structures; all models were refined with SHELXL. open access.

Entities: Chemical Disease Gene Species

Keywords: non-merohedral twinning; twinned structure refinement; twinned structure solution

Mesh：

Substances：

Year: 2019 PMID： 31793898 PMCID： PMC6889912 DOI： 10.1107/S2059798319010179

Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN： 2059-7983 Impact factor: 7.652

Introduction

Twins are defined as regular aggregates consisting of individual crystals of the same species joined together in some definite mutual orientation (Giacovazzo, 2002 ▸). Therefore, twins may be defined by a symmetry operator that transforms one orientation into another, the so-called twin law, and by the fractional contribution k of each component. In reciprocal space, the twin law describes the symmetry operator that transforms the h 1 k 1 l 1 indices of one domain into the indices h 2 k 2 l 2 of a second domain. Twins can be classified depending on the twin law (Herbst-Irmer, 2016 ▸; Herbst-Irmer & Sheldrick, 1998 ▸; Parsons, 2003 ▸; Yeates, 1997 ▸; Dauter, 2003 ▸; Banumathi et al., 2004 ▸; Luo & Dauter, 2016 ▸). For merohedral and pseudo-merohedral twins the reciprocal lattices of the different domains overlap (nearly) exactly. Therefore, the intensities of reflection h 1 k 1 l 1 of domain 1 and the twin-related reflection h 2 k 2 l 2 of domain 2 sum up to a single observed intensity. This complicates the space-group determination and structure solution. However, after having solved the structure, refinement can be performed against these summed intensities and the fractional contribution k of each component can be refined. For pure merohedral twins of macromolecules, this has been automated in the programs REFMAC (Murshudov et al., 2011 ▸) and phenix.refine (Adams et al., 2010 ▸) and is widely used with good results, although sometimes just to lower the R factor of crystals that are not actually twinned. However, twinning should only be invoked when there is independent evidence apart from a lower R factor. The twin law for non-merohedral twins does not belong to the crystal class or to the metric symmetry of the lattice. The different reciprocal lattices may not overlap exactly and not every reflection has contributions from all twin domains. Therefore, under normal circumstances this kind of twinning can be spotted during data collection. Quite often, autoindexing (automatic cell-determination) programs that were designed for single crystals fail or do not routinely handle multiple lattices to obtain the unit-cell parameters. Split reflection profiles can be observed and it may not be possible to index all reflections (see Fig. 1 ▸).

Figure 1

Diffraction patterns in APEX (Bruker’s crystallography software suite; Bruker, 2018 ▸) (a) indicating unindexed reflections (black arrow) and split reflections (white arrow) and (b) showing a split reflection profile

To index reflections for a non-merohedral twin, more than one orientation matrix is required. Therefore, the autoindexing program must take into account that only a certain fraction of the reflections can be indexed as a single domain. After indexing the reflections from all of the domains, the data-integration program must be able to use all of the orientation matrices to obtain the intensities of reflections from the individual components (a simple strategy would be to integrate each component separately with its respective orientation matrix). This leads to three kinds of reflections: reflections with no overlap, reflections with an exact overlap and reflections with a partial overlap from further domains [see Fig. 2 ▸(b)]. The non-overlapped reflections are not affected by twinning. Both the non-overlapped and the exactly overlapped reflections can be used in model refinement. They determine the fractional contributions of the twin domains. For the partially overlapped reflections, the degree of overlap is unknown and therefore only a fraction of the reflections from the second domain can be integrated, so one option might be to omit reflections involving other domains. A second and much better strategy is to simultaneously integrate the reflections using orientation matrices from all of the components. Here, the overall intensity of every reflection is integrated, giving rise to two kinds of reflections: non-overlapped and overlapped reflections, which are also called single and composite reflections, respectively.

Figure 2

Reciprocal-space plot of the k = 2 layer of a monoclinic structure (a) and the overlay of this plot with a rotated plot simulating non-merohedral twinning (b).

Standard scaling and absorption-correction programs cannot be used under these circumstances because special treatment is needed for composite reflections. Most of these challenges have been solved for twinned data from small-molecule crystals. Programs such as DIRAX (Duisenberg, 1992 ▸), GEMINI (Sparks, 2000 ▸), CELL_NOW (Sheldrick, 2008 ▸), CrysAlis (Rigaku, 2015 ▸) and MOSFLM (Battye et al., 2011 ▸) can index a diffraction pattern with more than one orientation matrix. The programs SAINT (Bruker, 2017 ▸), EVAL15 (Schreurs et al., 2010 ▸), X-Area (Stoe & Cie, 2017 ▸) and CrysAlis (Rigaku, 2015 ▸) can integrate with more than one orientation matrix simultaneously. Here, we describe the successful treatment of non-merohedral twins using the programs CELL_NOW, SAINT and TWINABS (Sheldrick, 2012 ▸), where TWINABS is a special version of SADABS (Krause et al., 2015 ▸) that is used for scaling and absorption correction of data from non-merohedrally twinned crystals. Example structures of a mineral and an organometallic small molecule as well as two test protein structures will be discussed.

General strategy

Cell determination

The program CELL_NOW tries to find sets of equally spaced parallel reciprocal-lattice planes that pass close to as many reflections as possible. Each set of planes corresponds to a potential unit-cell vector perpendicular to the planes with a length given by the reciprocal of the inter-planar separation. Combinations of three such vectors form potential unit cells that are ranked by a figure of merit that favours the smallest possible unit-cell volume, the highest possible metric symmetry and the largest number of indexed reflections, i.e. reflections that lie within 0.2 times the interplanar separation from all three sets of planes. CELL_NOW rotates each potential cell in turn to locate further twin domains by iteratively checking only those reflections that were not indexed by the cell in question. The rotation matrix from the first orientation to the second corresponds to the twin law. Therefore, the orientation matrices and the twin law are determined in one step. An additional advantage is that even weaker domains can be indexed. The alternative procedure of separately indexing the unindexed reflections from scratch might fail if there are too few reflections from the weaker domain in the list of harvested reflections for indexing.

Integration

In SAINT, a refineable integration box size is used. The intensity of non-overlapped reflections can be accurately determined when a single orientation matrix from one domain is used during data integration. However, the intensities determined for exactly overlapped reflections should be the sum of the intensities from all of the domains that contribute (see Fig. 3 ▸). The treatment of partially overlapped reflections is nontrivial, because the degree of overlap is unknown and differs from one reflection to the next. When using a single orientation matrix only, the measured intensity may be contaminated by contributions from other domains. However, in a simultaneous integration procedure with all of the orientation matrices from different domains it is possible to determine the overlap between the integration boxes of the reflections from different domains. The combined box size can then be used for integration, leading to the sum of all intensities from all of the domains. Using this procedure only two kinds of reflections remain: overlapped and non-overlapped reflections, which are also called single and composite reflections, respectively. For composite reflections, an additional column in the output raw data file specifies the domain numbers. Additionally, SAINT derives rough estimates of the individual intensities of the involved reflections by using the learnt reflection profile.

Figure 3

Schematic picture of reflections from two domains (blue and red) with different degrees of overlap. The rectangles represent the integration boxes. (a) Only one orientation matrix is used; (b) both orientation matrices are used.

Absorption correction, scaling, merging and generation of datafiles

The new raw datafile needs a special version of the scaling and absorption correction program SADABS (Krause et al., 2015 ▸) called TWINABS (Sheldrick, 2012 ▸). The modelling of systematic errors such as absorption by the multi-scan method can be performed either for each domain separately by only using the non-overlapped reflections, or for reflections of several domains also considering overlapped reflections. TWINABS can detwin the data by using the rough overlap estimates from SAINT and refining these estimates using symmetry-related reflections. Symmetry-related non-overlapped reflections can only be merged if they belong to the same domain. For overlapped reflections, the ratios of the contributions from different domains need to be constant (for details, see the supporting information). In order to increase the number of unique data, reflections of all domains are used by default. Only if one or more domains are much weaker than the others does the program suggest using only single and composite reflections involving at least one of the stronger domains. This HKLF 4-format file with detwinned and merged data can be used in the same way as a standard HKLF 4 datafile from an untwinned single crystal for structure solution and refinement. Additionally, TWINABS produces a datafile containing summed intensities and the information about overlap in HKLF 5 format (further details of this format are given in the supporting information). The default option here is to use only reflections that contribute to the first domain. All possible refinement programs can be used for the refinement against the HKLF 4 detwinned data. However, for small molecules the HKLF 5-format file containing the summed intensities with information about overlap and twin domains is often superior, e.g. example structure 4 in Herbst-Irmer (2016 ▸). In such cases refinement programs that are capable of handling this file can be used, for example SHELXL (Sheldrick, 2015a ▸), OLEX2 (Bourhis et al., 2015 ▸) and CRYSTALS (Betteridge et al., 2003 ▸).

Examples

The mineral chromite

The mineral chromite, an iron chromium oxide FeCr2O4, crystallizes in the cubic space group Fd m (see Fig. 4 ▸). Iron can be substituted by magnesium in variable amounts (Lenaz et al., 2004 ▸). A data set from a twinned crystal was collected using a Bruker D8 Quest at Mo Kα wavelength at 292 K. Two domains could easily be identified using the graphical viewer RLATT (Bruker, 2016 ▸; see Fig. 5 ▸). CELL_NOW found a hexagonal cell with a = b = 5.88, c = 14.41 Å when the default settings were used (for details, see the supporting information). The systematic absences for the obverse setting could not be identified by the program because 11.5% of the indexed reflections are outliers. This obverse cell can be transformed to the true cubic F-centred cell. On restricting the vector search for cell edges between 8 and 9 Å, the correct F-centred cubic cell was identified, indexing 58.6% of the harvested reflections. 0.5% of the reflections violate the systematic absences for the F-centring. Rotating this cell by 180° around −2 −1 1 led to a second orientation matrix that indexed 94.6% of the hitherto unindexed reflections (for details, see the supporting information).

Figure 4

Structure of chromite with the Fe2+ tetrahedron in orange and the Cr3+ octahedron in blue, produced with VESTA v.3.4.6 (Momma & Izumi, 2011 ▸).

Figure 5

RLATT plot showing both orientations for chromite.

Both orientation matrices were used in SAINT for integration, which produced the raw datafile with information about the domain overlap for individual reflections. TWINABS distinguished three types of reflections: singles from domain 1, singles from domain 2 and composite reflections (see Table 1 ▸ and the supporting information). Parameter refinement was applied separately for both domains using only single reflections. The detwinning procedure estimated a twin fraction of 0.574 for the major domain, with R int values of 0.0489 for both domains and 0.0439 using only data from the major domain. The merged data files consisted of only 69 reflections. SHELXT (Sheldrick, 2015b ▸) could solve the structure immediately using the detwinned data. The refinement can be performed against either the detwinned HKLF 4 data set or the HKLF 5 data set consisting of reflections from domain 1, domain 2 or both domains. The results from refinements using different HKLF 5 files are comparable. Domain 2 was weaker than the other domain, with slightly worse figures of merit.

Table 1

Data and refinement statistics for the mineral example

Domain	1	2	Both	Detwinned
TWINABS
No. of data	675	659	45	—
No. of unique data	69	69	19	—
I/σ(I)	60.4	51.9	82.4	—
R _int	0.0439	—	0.0489	—
k _i	0.574	0.426	—	—
SHELXL
Data used	60	60	138	60
Unique data used	60	60	60	60
Completeness (%)	97.4	97.4	97.4	97.4
No. of parameters	10	10	10	9
R1 [I > 2σ(I)]	0.0189	0.0271	0.0264	0.0161
wR2 (all data)	0.0521	0.0697	0.0680	0.0441
Bond precision (Cr—O) (Å)	0.0017	0.0030	0.0017	0.0015
R1 (after dispersion correction and merging)	0.0185	0.0269	0.0174	0.0166
k ₂	0.463 (10)	0.417 (11)	0.423 (5)	—

Organometallic example

The compound Cp*2MeZrOTiMe2Cp* (where Cp* is pentamethylcyclopentyl) crystallizes as a non-merohedral twin (Gurubasavaraj et al., 2007 ▸). A data set was collected at 100 (2) K using a Bruker SMART APEX II diffractometer with a D8 goniometer (graphite-monochromated Mo Kα radiation). Indexing with automatic single-crystal cell-determination programs failed. Two domains could easily be identified using RLATT (see Fig. 6 ▸). CELL_NOW produced an extensive list of 172 possible cells with different cell volumes but with very similar percentages of indexed reflections (for details, see the supporting information and Table 2 ▸). The first cell indexed 54.6% of the reflections with an I-centred monoclinic cell. After a rotation of 180° about the 0 1 1 reciprocal axis, 69.2% of the as-yet unindexed reflections could be indexed with a second orientation matrix. No further meaningful orientation matrices were found. Therefore, an initial cell with a slightly higher percentage of indexed reflections (Cell 4 in Table 2 ▸) and a doubled cell volume for a primitive monoclinic cell was chosen. After rotating by 180° about the 0 −1 2 reciprocal axis, all of the remaining reflections could be indexed.

Figure 6

RLATT plot showing both orientations for Cp*2MeZrOTiMe2Cp*.

Table 2

Excerpt of CELL_NOW output: list of possible cells for Cp*2MeZrOTiMe2Cp*

	FOM	Indexed (%)	a (Å)	b (Å)	c (Å)	α (°)	β (°)	γ (°)	V (Å³)	Lattice type
1	1.000	54.6	8.676	15.514	11.578	89.93	94.47	89.87	1553.7	I
2	0.846	54.2	13.923	15.514	8.676	89.87	123.97	90.18	1554.2	C
3	0.723	60.0	23.245	30.889	8.676	90.06	94.46	90.05	6210.8	C?
4	0.720	60.0	8.676	15.514	23.168	90.02	94.49	90.13	3108.8	P
5	0.678	59.6	23.168	31.009	8.676	90.11	94.49	90.05	6213.8	C?
6	0.583	58.4	8.676	15.452	23.245	90.03	94.46	90.07	3106.8	P
7	0.540	55.6	8.676	10.399	10.434	96.13	111.98	111.85	776.5	P
8	0.535	56.0	8.676	10.399	10.782	66.27	63.70	68.15	775.8	P

These two orientations were used in SAINT for integration. TWINABS indicated that the two domains are rather similar in size (see Table 3 ▸). The systematic absences are consistent with space group P21/c (see the supporting information), but SHELXT correctly identified Pc as the true space group. By default, TWINABS merges Friedel opposites, but this option can be changed for non-centrosymmetric space groups. In principle, for non-centrosymmetric structures additional twinning by inversion is possible. There is an additional option in TWINABS to generate an HKLF 5 file using four domains: the major domain 1, the minor domain 2, the inverse of domain 1 and the inverse of domain 2. For this data set, the fractional contributions k refined to k 2 = 0.45 (3), k 3 = 0.50 (3) and k 4 = 0.02 (3), where k 1 = 1 − (k 2 + k 3 + k 4). These values indicated that the absolute structure is wrong for domain 1 but correct for domain 2. Therefore, the atomic coordinates had to be inverted in SHELXL and additionally the indices of the reflections of the second domain had to be inverted in TWINABS. The final results with this option are listed in Table 3 ▸. The HKLF 4 and HKLF 5 files gave similar results. However, judging from the R value after dispersion correction and merging, which has the same number of reflections for all refinements, the data set using complete data from both domains produces better results. This can be explained by the fact that both domains are similar in size and both are well centred in the beam (see the normalized scale-factor plot in the supporting information).

Table 3

Data and refinement statistics for Cp*2MeZrOTiMe2Cp*

Domain	1	2	Both	Detwinned
TWINABS
No. of data	30843	30852	4354	—
No. of unique data	5738	5739	1814	—
I/σ(I)	3.1	2.9	4.8	—
R _int	0.0951	0.0992	0.0976	—
Fractional contribution	0.532	0.468
SHELXL
R1 [I > 2σ(I)]	0.0581	0.0588	0.0624	0.0481
Data used	11343	11325	24945	11172
Unique data used (Friedel pairs merged)	5596	5595	5596	5596
Completeness (%)	100	100	100	100
No. of parameters	686	686	686	685
wR2 (all data)	0.1260	0.1246	0.1314	0.1019
Bond precision C—C (Å)	0.0131	0.0136	0.0136	0.099
R1 (after dispersion correction and merging)	0.0527	0.0548	0.0456	0.0450
k ₂	0.4752 (19)	0.4640 (17)	0.4691 (10)	—

There are two molecules in the asymmetric unit in space group Pc (see Fig. 7 ▸). There is no inversion centre or 21 axis between the two molecules. However, there is a pseudo-21 axis relating the Zr atom of molecule 2 to the Ti atom of molecule 1 and vice versa (for details, see the supporting information). Additionally, there is a pseudo-translation between the two Zr atoms and the two Ti atoms. They are related by x + 0.5, y + 0.5, z + 0.25, which would lead to I-centring if the c cell axis were to be halved. This corresponds to the smaller cell proposed by CELL_NOW (see Table 2 ▸). It also explains why the true cell indexes only 60% of the reflections compared with 54% for this smaller cell, which has a four times smaller primitive volume. Owing to the pseudo-translation, there are many weak reflections that will not be found in the list of reflections from the peak search.

Figure 7

Structure of one of the two molecules of Cp*2MeZrOTiMe2Cp*.

Twinned protein crystals

The methods described above have successfully been used for twinned small molecules for many years, and SHELXD and SHELXE have also been used to assist in the SAD phasing of merohedrally twinned macromolecules (Dauter, 2003 ▸; Rudolph et al., 2003 ▸). To show that the procedures described in this paper are also valid for macromolecular structures, we grew non-merohedrally twinned crystals of two benchmark protein structures: cubic insulin and glucose isomerase (Sevvana, 2006 ▸; Fig. 11, right). Both data sets were collected at 100 K with ω scans using a Bruker rotating-anode generator at Cu Kα wavelength equipped with Osmic focusing mirrors and a Bruker SMART6000 4K detector. The data were collected in low-, medium- and high-resolution passes at detector distances of 10 or 18 cm in thin-slice mode to minimize artificial overlap of the spots because of detector geometry. A minimum of three runs for each of the low-, medium- and high-resolution passes were collected at different φ angles to obtain complete and multiple observations of data in order to maximize the weak anomalous signal from sulfur (in the case of cubic insulin) and manganese (in the case of glucose isomerase) at the Cu Kα wavelength. It was important to collect data as precisely as possible, avoiding ice rings etc., so that the only problems that were encountered during data processing were caused by twinning. The complications of data collection using these twinned protein crystals were similar to the small-molecule examples. Automatic cell determination failed, but both RLATT (see Fig. 8 ▸) and CELL_NOW (see the supporting information) clearly identified two domains in the case of insulin and three domains for glucose isomerase (see Fig. 10), and their orientation matrices were used in SAINT in the same way as for the small molecules. TWINABS produced detwinned HKLF 4 data and several HKLF 5 data sets. Both substructures were solved using dual-space recycling methods in SHELXD (Schneider & Sheldrick, 2002 ▸). The normalized difference structure factors were calculated using XPREP from the HKLF 4 file prepared by TWINABS. Density modification and autotracing were carried out using SHELXE (Usón et al., 2007 ▸; Usón & Sheldrick, 2018 ▸). PDB2INS (Lübben & Sheldrick, 2019 ▸) was used to convert the .pdb file to a SHELX.ins file. Both the insulin and glucose isomerase models were refined using SHELXL by alternating with model building in real space using Coot (Emsley et al., 2010 ▸).

Figure 8

RLATT plot showing the two orientations of insulin.

Although refinement of the models against the HKLF 5 files could produce better results, one of the challenges is to annotate the R free reflections in this file format. It should be ensured that twin-related reflections are either both in the work set or both in the free set. For (pseudo)-merohedral twins this can be achieved by assigning them in thin shells instead of randomly in XPREP (Sheldrick, 2015c ▸). Because of the exact overlap of the different reciprocal lattices, twin-related reflections have the same θ value. In the case of non-merohedral twins the θ values could differ slightly. Therefore, one of the twin-related reflections could be in the θ shell for the free reflections, while the other is in the shell of the work reflections. Depending on the degree of overlap, it might be possible to derive a R free set by successively adding these work reflections into the free set. In our example structures we ended with ∼90% of the reflections in the R free set, even when we started with just one reflection in the first R free set (for details, see the supporting information). The residual 10% could also not be used as an R free set because they do not represent the whole data set. If one takes only single reflections as R free reflections, it is questionable whether these reflections are a random representative of the whole data set. For our insulin data set, only 10% of the data were single. Additionally, the standard R free procedure in SHELXL is not possible for the HKLF 5 format, because the information about overlap and twin domains is given in the same column as the identification of R free reflections. This could be solved by either using the detwinned data or separating the work data and the free data into two separate files. However, if we assume that the detwinning works perfectly, it is no longer necessary to take care of twin-related reflections and the usual procedures for selecting the R free reflections can be used for the detwinned (HKLF 4) data. It is also known that all R values of structures from twinned crystals are artificially too low (Murshudov, 2011 ▸). This is also observed here for the refinements against the different HKLF 5 data sets, which show lower R values than refinements against the detwinned data. The latter values seem to be more realistic. In order to judge whether a model derived by refinement against the HKLF 5 data is superior to the model derived from the detwinned data, the R values of these models against the detwinned data were calculated by refining just the scale factor. This was inspired by the procedure of paired refinement developed by Diederichs and Karplus (Diederichs & Karplus, 2013 ▸; Karplus & Diederichs, 2012 ▸).

Cubic insulin

Bovine insulin (Sigma; catalogue No. I5500) was dissolved in 0.02 M Na2HPO4 and 0.01 M Na3EDTA to a final concentration of 30 mg ml−1. Crystals were grown by the hanging-drop vapour-diffusion method at 20°C by equilibration against a reservoir consisting of 0.2 M Na2HPO4/Na3PO4 pH 10.0, 0.01 M Na EDTA. Cubic crystals grew in about 1 h, and most of the crystals were interpenetrant owing to the high concentration of protein (which was deliberate in order to encourage the growth of twinned crystals) [Fig. 11(a), right]. Cubic insulin crystallizes in space group I213, which belongs to the lower symmetry cubic Laue group. Therefore, there are two independent possibilities for indexing the reflections related by the matrix (0 1 0, 1 0 0, 0 0 −1). The integration of two sets of reflections with different indexing leads to artificial merohedral twinning. Therefore, one has to be careful when indexing two different domains. In our case, CELL_NOW indexed the two domains using alternative settings. However, this was easily identified in TWINABS. The program detwins the data using an iterative process minimizing the R int value for symmetry-equivalent reflections. Here, TWINABS advised converting the indices of component 2 by applying the matrix (0 1 0, 1 0 0, 0 0 −1), decreasing R int to 0.0347. The resulting detwinned data set extends to a maximum resolution of 1.55 Å. The structure contains 51 amino acids in two chains connected by three disulfide bonds. Both the higher symmetry space group and the three disulfide bonds make cubic insulin an ideal crystal for structure solution using in-house sulfur-SAD. To locate the anomalous scatterers the data were truncated to 1.9 Å resolution and E-values (normalized difference structure factors) were calculated in XPREP. Using these data, SHELXD (Sheldrick et al., 2012 ▸) found the positions of three disulfide bridges (see the supporting information). Density modification and autotracing modules in a beta version of SHELXE could trace two chains. Sequence information was read from a file in FASTA format, and probing γ positions and side-chain shape along with the sulfur sites in the substructure was used to dock the polyalanine trace into the sequence after the last main-chain tracing cycle. Side chains were then built and refined. The total overhead for side-chain tracing was 0.2 s and the CC (Fujinaga & Read, 1987 ▸) from the trace against the normalized observed amplitudes increased from 40.0% (for a polyalanine trace as in previous versions of SHELXE) to 58.2% (for almost complete side chains). 86% of the side chains were traced with a largest side-chain difference within 1.5 Å and 8% with a greater difference, while 6% were missing or wrong (see Fig. 9 ▸). The missing side chains and the water structure were built in Coot (Emsley et al., 2010 ▸) and the model was refined using SHELXL. The results of all refinements of this final model against the different datafiles are summarized in Table 4 ▸.

Figure 9

Part of the SHELXE map (a) and the final refined map (b) for cubic insulin contoured at 1σ.

Table 4

Data and refinement statistics for cubic insulin

Raw data have been deposited in the Integrated Resource for Reproducibility in Macromolecular Crystallography (Grabowski et al., 2016 ▸; https://proteindiffraction.org) at https://doi.org/10.18430/m3.irrmc.5325.

Domain	1	2	Both	Detwinned
PDB code	6or0	6or0	6or0	6or0
Space group	I2₁3	I2₁3	I2₁3	I2₁3
a = b = c (Å)	78.03 (8)	78.03 (8)	78.03 (8)	78.03 (8)
Mosaicity (°)	0.33	0.33	0.33	0.33
Resolution (Å)	1.55	1.55	1.55	1.55
TWINABS data statistics
No. of data	202583	202218	29318	—
No. of unique data	11532	11502	19152	—
I/σ(I)	9.4	7.5	10.2	—
R _int	0.0336	0.0382	0.0347	—
R _r.i.m †	0.0345	0.0392	0.0352	—
Fractional contribution	0.581	0.419	—	—
Overall B factor from Wilson plot (Å²)	12.70	12.51	12.55	12.35
SHELXL refinement statistics
R1 [I > 4σ(I)]	0.112	0.109	0.128	0.158
Data used	21453	21417	42151	11668
Unique data used	11560	11531	11663	11668
Completeness (%)	97.4	97.2	98.3	98.4
No. of parameters	3855	3855	3855	3854
wR2 (all data)	0.305	0.301	0.350	0.386
R1 (after dispersion correction and merging)	0.140	0.131	0.163	0.168
k ₂	0.428 (3)	0.414 (4)	0.420 (3)	—
R1(free) (all 588 data)	—	—	—	0.215
R1 (after dispersion correction and merging) against the detwinned data	0.173	0.174	0.164	—
Solvent content (%)	65	65	65	65
No. of non-H atoms
Protein	395	395	395	395
Water	34	34	34	34
R.m.s.d., bonds (Å)	0.0137	0.0136	0.0212	0.0102
R.m.s.d., angles (°)	2.40	2.52	3.11	1.95
Average B factors (Å²)
Main chain	16.67	16.76	16.67	16.35
Side chain and water	26.29	26.44	27.42	24.89
Ramachandran plot
Most favoured (%)	97.83	95.65	97.83	97.83
Allowed (%)	2.17	4.35	2.17	2.17
Outliers (%)	0.0	0.0	0.0	0.0

Calculated as [N/(N − 1)]1/2 × R int, where N is the data multiplicity.

Both domains were well centred in the beam (for details see Fig. 11) and the quality of the data from the different domains is very similar. Both the HKLF 4 and HKLF 5 data yield very similar models, but the R values for the HKLF 5 refinement in this and other examples appear to be artificially low. Since it also can be problematic to obtain a suitable set of reflections for the free-R test in the HKLF 5 case, it is better to use the HKLF 4 data for R free.

Glucose isomerase

The active form of glucose isomerase consists of 385 amino acids with eight methionines, a magnesium ion and a manganese ion at the active site. Glucose isomerase (Hampton Research; catalogue No. HR7-102) was dialysed against 5 mM Tris–HCl buffer pH 7.5, 10 mM MnCl2, 5 mM MgCl2 and then concentrated to a final concentration of 20 mg ml−1 and crystallized by the hanging-drop vapour-diffusion method by equilibration against a reservoir consisting of 0.05 mM Tris–HCl buffer pH 7.5, 0.1 M MnCl2, 14% MPD. The crystals grew in about two days. In contrast to the interpenetrant twinned crystals of bovine insulin, here it appears that three separate crystals grew in contact with each other [Fig. 11(b), right]. 25% MPD was used as a cryoprotectant and data were collected at 100 K with a detector distance of 18 cm because of the long cell axis. For glucose isomerase, CELL_NOW found three different orientation matrices (for details, see Fig. 10 ▸ and the supporting information) for data integration in SAINT. The scaling procedure in TWINABS indicated that the larger domains 1 and 2 with fractional contributions of 0.44 and 0.41 (see Table 5 ▸) were much better centred in the beam [Fig. 11 ▸(b)]. The detwinned data extended to a resolution of 1.6 Å and were truncated to 2.0 Å resolution for substructure solution of the two Mn sites in SHELXD. One site has a much lower peak height (see the supporting information), which was interpreted as a mixture of Mn and Mg. Density modification and autotracing of the inverted substructure in SHELXE identified 352 residues in ten chains. The total overhead for side-chain tracing was 2.3 s and the CC of the trace against the normalized observed amplitudes increased from 38.5% (for a polyalanine trace as in previous versions of SHELXE) to 50.0% (for almost complete side chains). 71.7% of the side chains were traced with a largest side-chain difference within 1.5 Å and 10.3% with a greater difference, while 17.8% were missing or wrong (see Fig. 12 ▸). The model was further improved by alternating refinement in SHELXL and model building in Coot. The second Mn site was partly occupied by Mg. The Mn and Mg atoms were constrained to have the same isotropic displacement parameter and x, y, z coordinates. The occupancy of the Mg atom refined to 0.64 (6). This is in accordance with the peak heights in the anomalous map. All atoms were refined isotropically with appropriate restraints. The addition of H atoms as well as anisotropic refinement increased the R free value.

Figure 10

An example image of a glucose isomerase triplet. (a) An image taken at 2θ = 40° and a detector distance of 18 cm. (b) The indexed image using CELL_NOW. The first domain is coloured blue, the second domain is in green and the third domain is in red.

Table 5

Data and refinement statistics for glucose isomerase

Domain	1	2	3	1 + 2	1 + 2 + 3	Detwinned
PDB code	6oqz	6oqz	6oqz	6oqz	6oqz	6oqz
Space group	I222	I222	I222	I222	I222	I222
a (Å)	92.93 (9)	92.93 (9)	92.93 (9)	92.93 (9)	92.93 (9)	92.93 (9)
b (Å)	97.94 (10)	97.94 (10)	97.94 (10)	97.94 (10)	97.94 (10)	97.94 (10)
c (Å)	102.71 (10)	102.71 (10)	102.71 (10)	102.71 (10)	102.71 (10)	102.71 (10)
Mosaicity (°)	0.46	0.46	0.46	0.46	0.46	0.46
Resolution (Å)	1.6	1.6	1.6	1.6	1.6	1.6
TWINABS data statistics
No. of data	237474	237699	237893	126763	10261	—
No. of unique data	47082	51266	48298	75981	7435	—
I/σ(I)	9.0	8.2	5.7	10.4	9.9	—
R _int	0.0546	0.0572	0.0804	—	0.0592	—
R _r.i.m †	0.0594	0.0631	0.0885	—	0.0615	—
Fractional contribution	0.436	0.406	0.158	—	—	—
Overall B factor from Wilson plot (Å²)	8.62	8.6	7.0	9.38	9.52	9.03
SHELXL
R1 [I > 4σ(I)]	0.133	0.131	0.129	0.146	0.149	0.177
Data used	73778	75962	73480	181519	229809	61943
Unique data used	51919	57462	52891	61334	61848	61943
Completeness (%)	83.7	92.7	85.3	98.9	99.8	99.9
No. of parameters	13377	13377	13377	13377	13377	13375
wR2 (all data)	0.360	0.358	0.357	0.402	0.411	0.452
R1 (after dispersion correction and merging)	0.155	0.158	0.157	0.177	0.181	0.191
k ₂	0.4259 (16)	0.393 (3)	0.416 (3)	0.4146 (13)	0.4183 (13)	—
k ₃	0.1622 (13)	0.167 (2)	0.1650 (15)	0.1696 (15)	0.1636 (8)	—
R1(free) (all 3101 data)						0.221
R1 (after dispersion correction and merging) against the detwinned data	0.199	0.198	0.202	0.189	0.188	—
Solvent content (%)	55	55	55	55	55	55
No. of non-H atoms
Protein	3050	3050	3050	3050	3050	3050
Ion	2	2	2	2	2	2
MPD	8	8	8	8	8	8
Water	284	284	284	284	284	284
R.m.s.d., bonds (Å)	0.0102	0.0100	0.0097	0.0184	0.0213	0.0090
R.m.s.d., angles (°)	2.00	2.00	2.02	2.81	3.13	1.82
Average B factors (Å²)
Main chain	13.56	13.38	13.34	13.73	13.54	13.53
Side chain and water	19.59	19.49	19.51	20.09	20.12	19.17
Ions	10.53	10.30	10.70	10.13	10.17	10.52
MPD	29.06	28.18	29.17	29.08	29.67	25.72
Ramachandran plot
Most favoured (%)	96.85	96.59	97.11	97.11	96.85	97.11
Allowed (%)	2.62	2.89	2.36	2.36	2.62	2.36
Outliers (%)	0.52	0.52	0.52	0.52	0.52	0.52

Calculated as [N/(N − 1)]1/2 × R int, where N is the data multiplicity.

Figure 11

Normalized scale factor against run/frame number from TWINABS for (a) cubic insulin and (b) glucose isomerase; domain 1 is coloured blue, domain 2 is in red and domain 3 (only for the triple twin of glucose isomerase) is in green. The corresponding crystal pictures demonstrate the correlation between crystal growth and different centring in the beam.

Figure 12

Part of the SHELXE map (a) and the final refined map (b) for glucose isomerase contoured at 1σ.

As in the case of insulin, all refinements against the different data sets are of similar quality. Again, the higher the multiplicity the better the models are. The difference between the HKLF 5 models and the model of the detwinned data is negligible, so there is no requirement for refinement against the HKLF 5 data. The scale-factor plots in Fig. 11 ▸ show little variation with rotation angle for cubic insulin [Fig. 11 ▸(a)] because the two interpenetrating crystals have virtually the same centres, but for the cluster of three glucose isomerase crystals [Fig. 11 ▸(b)] there are substantial variations, especially for the smallest crystal 3 (green) that is furthest from the beam centre.

Conclusions

The same procedures may be used for the treatment of non-merohedral twins in minerals, organometallic structures and proteins when the data are processed using the programs CELL_NOW, SAINT and TWINABS. CELL_NOW and SAINT are also incorporated into the Bruker APEX3 system. The resulting HKLF 4- and HKLF 5-format files can be used for structure solution and refinement with the SHELX and several other program systems. The detwinned HKLF 4 data are more widely applicable, but refinement against the composite reflections without detwinning using the HKLF 5 format may be slightly more accurate. If all domains are of similar quality and all of them are well centred in the beam, refinement against the HKLF 5 data should lead to the best results because the multiplicity is the highest. Quite often data from one domain might be of superior quality to those from other domains. In this case, only reflections with a contribution from that domain should be used for model refinement. However, in order to use R free the HKLF 4 format may be required. PDB reference: insulin, 6or0 PDB reference: glucose isomerase, 6oqz Crystal structure: contains datablock(s) chromite_4, chromite_5_1, chromite_5_2, chromite_5_12, zrti_4, zrti_5_1, zrti_5_2, zrti_5_12. DOI: 10.1107/S2059798319010179/rr5182sup1.cif Supporting information including Supplementary Figures and Tables. DOI: 10.1107/S2059798319010179/rr5182sup2.pdf Structure factors: contains datablock(s) chromite_4. DOI: 10.1107/S2059798319010179/rr5182chromite_4sup3.hkl Structure factors: contains datablock(s) chromite_5_1. DOI: 10.1107/S2059798319010179/rr5182chromite_5_1sup4.hkl Structure factors: contains datablock(s) chromite_5_2. DOI: 10.1107/S2059798319010179/rr5182chromite_5_2sup5.hkl Structure factors: contains datablock(s) chromite_5_12. DOI: 10.1107/S2059798319010179/rr5182chromite_5_12sup6.hkl Structure factors: contains datablock(s) zrti_4. DOI: 10.1107/S2059798319010179/rr5182zrti_4sup7.hkl Structure factors: contains datablock(s) zrti_5_1. DOI: 10.1107/S2059798319010179/rr5182zrti_5_1sup8.hkl Structure factors: contains datablock(s) zrti_5_2. DOI: 10.1107/S2059798319010179/rr5182zrti_5_2sup9.hkl Structure factors: contains datablock(s) zrti_5_12. DOI: 10.1107/S2059798319010179/rr5182zrti_5_12sup10.hkl Non-merohedral twinning: from minerals to proteins (insulin data set).: https://doi.org/10.18430/m3.irrmc.5325 Non-merohedral twinning: from minerals to proteins (isomerase data set).: https://doi.org/10.18430/m3.irrmc.5324 CCDC references: 1940918, 1940919, 1940920, 1940921, 1940922, 1940923, 1940924, 1940925

20 in total

1. Substructure solution with SHELXD.

Authors: Thomas R Schneider; George M Sheldrick
Journal: Acta Crystallogr D Biol Crystallogr Date: 2002-09-28

2. Use of multiple anomalous dispersion to phase highly merohedrally twinned crystals of interleukin-1beta.

Authors: Markus G Rudolph; Matthew S Kelker; Thomas R Schneider; Todd O Yeates; Vanessa Oseroff; David K Heidary; Patricia A Jennings; Ian A Wilson
Journal: Acta Crystallogr D Biol Crystallogr Date: 2003-01-23

3. Structural effects of radiation damage and its potential for phasing.

Authors: Sankaran Banumathi; Petrus H Zwart; Udupi A Ramagopal; Miroslawa Dauter; Zbigniew Dauter
Journal: Acta Crystallogr D Biol Crystallogr Date: 2004-05-21

4. Structure determination of the O-methyltransferase NovP using the 'free lunch algorithm' as implemented in SHELXE.

Authors: Isabel Usón; Clare E M Stevenson; David M Lawson; George M Sheldrick
Journal: Acta Crystallogr D Biol Crystallogr Date: 2007-09-19

5. Detecting and overcoming crystal twinning.

Authors: T O Yeates
Journal: Methods Enzymol Date: 1997 Impact factor: 1.600

6. A public database of macromolecular diffraction experiments.

Authors: Marek Grabowski; Karol M Langner; Marcin Cymborowski; Przemyslaw J Porebski; Piotr Sroka; Heping Zheng; David R Cooper; Matthew D Zimmerman; Marc André Elsliger; Stephen K Burley; Wladek Minor
Journal: Acta Crystallogr D Struct Biol Date: 2016-10-28 Impact factor: 7.652