| Literature DB >> 35647919 |
Nicolas Foos1, Mahmoud Rizk1, Max H Nanao1.
Abstract
The use of single isomorphous replacement (SIR) has become less widespread due to difficulties in sample preparation and the identification of isomorphous native and derivative data sets. Non-isomorphism becomes even more problematic in serial experiments, because it adds natural inter-crystal non-isomorphism to heavy-atom-soaking-induced non-isomorphism. Here, a method that can successfully address these issues (and indeed can benefit from differences in heavy-atom occupancy) and additionally significantly simplifies the SIR experiment is presented. A single heavy-atom soak into a microcrystalline slurry is performed, followed by automated serial data collection of partial data sets. This produces a set of data collections with a gradient of heavy-atom occupancies, which are reflected in differential merging statistics. These differences can be exploited by an optimized genetic algorithm to segregate the pool of data sets into `native' and `derivative' groups, which can then be used to successfully determine phases experimentally by SIR. open access.Entities:
Keywords: genetic algorithms; machine learning; microcrystallography; serial crystallography; single isomorphous replacement
Year: 2022 PMID: 35647919 PMCID: PMC9159287 DOI: 10.1107/S2059798322003977
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 5.699
Data-collection parameters
| Lysozyme Gd | Insulin I | Thermolysin Sm | Proteinase K Hg | |
|---|---|---|---|---|
| Beam size (horizontal × vertical FWHM) (µm) | 7 × 5 | 7 × 5 | 7 × 5 | 7 × 5 |
| Photon flux (photons s−1) | 8.3 × 1010 | 5 × 1011 | 5 × 1011 | 7 × 1010 |
| Exposure per image (s) | 0.1 | 0.03 | 0.01 | 0.03 |
| No. of images per data set | 100 | 100 | 100 | 100 |
| Oscillation range per image (°) | 0.1 | 0.1 | 0.1 | 0.1 |
| No. of partial data sets | 67 | 149 | 53 | 91 |
| Ring mode, current | 16 bunch, 74 mA | 4 bunch, 35 mA | 16 bunch, 84 mA | 7/8 multibunch, 195 mA |
Figure 1Program workflow for phasing. Data sets are collected from multiple crystals on a single support and indexed and integrated in XDS. These partial data sets are then submitted to CODGAS for grouping, and each group is submitted pairwise in both ‘directions’ to SHELXC/D/E for phasing.
Figure 2Segregation of native and isomorphous data sets can be used for SIR phasing in lysozyme Gd (upper panel), proteinase K Hg (middle panel) and thermolysin Sm (lower panel). Algorithm progress is shown on the x axis and the partial CC is shown on the y axis. Representative electron density from SHELXE is shown on the right at 1.5σ. The figure was produced using ggplot2 (https://ggplot2.tidyverse.org/), R (https://www.r-project.org/) and PyMOL (Schrödinger).
Figure 3(a) Improvement of phasing success with the introduction of an isomorphous term in the genetic algorithm fitness function for insulin I. The frequency of the CC of the partial model is shown for w iso of 0, 1, 10, 100, 1000 and 10 000. Average chain lengths of <11 residues per chain are shown in red and those of ≥11 are shown in cyan. (b) Experimental electron density from SHELXE contoured at 1.5σ.
Values in parentheses are for the outer shell. Note that some partial data sets were not assigned to either native or derivative groups.
| Lysozyme Gd | Insulin I | |||||
|---|---|---|---|---|---|---|
| All | Native | Derivative | All | Native | Derivative | |
| Wavelength (A˙) | 0.873 | 0.873 | 0.873 | 0.873 | 0.873 | 0.873 |
| Resolution range (A˙) | 39.02–1.50 (1.55–1.50) | 39.02–1.50 (1.55–1.50) | 39.02–1.50 (1.55–1.50) | 55.47–1.60 (1.66–1.60) | 55.47–1.60 (1.66–1.60) | 55.47–1.60 (1.66–1.60) |
| Space group |
|
|
|
|
|
|
| Unit-cell average and [range] | ||||||
|
| 77.97 [76.95–78.50] | 78.04 [77.53–78.23] | 78.04 [77.63–78.33] | 78.53 [78.06–78.74] | 78.46 [78.06–78.61] | 78.56 [78.36–78.74] |
|
| 77.97 [76.95–78.50] | 78.04 [77.53–78.23] | 78.04 [77.63–78.33] | 78.53 [78.06–78.74] | 78.46 [78.06–78.61] | 78.56 [78.36–78.74] |
|
| 38.46 [37.80–38.95] | 38.57 [38.02–38.95] | 38.57 [37.80–38.76] | 78.53 [78.06–78.74] | 78.46 [78.06–78.61] | 78.56 [78.36–78.74] |
| α (°) | 90 | 90 | 90 | 90 | 90 | 90 |
| β (°) | 90 | 90 | 90 | 90 | 90 | 90 |
| γ (°) | 90 | 90 | 90 | 90 | 90 | 90 |
| Total no. of reflections | 912946 (86831) | 328389 (31081) | 467234 (44672) | 1690273 (169635) | 328587 (33071) | 476936 (47848) |
| No. of unique reflections | 19625 (1917) | 19623 (1917) | 19625 (1917) | 10773 (1077) | 10773 (1077) | 10773 (1077) |
| Multiplicity | 46.52 (45.30) | 16.73 (16.21) | 23.81 (23.30) | 156.90 (157.51) | 30.50 (30.71) | 44.27 (44.43) |
| Completeness (%) | 100.00 (100.00) | 99.99 (100.00) | 100.00 (100.00) | 100.00 (100.00) | 100.00 (100.00) | 100.00 (100.00) |
| 〈 | 10.9 (1.3) | 6.7 (0.6) | 10.1 (1.1) | 54.8 (12.0) | 24.1 (4.5) | 29.1 (6.4) |
| Wilson | 12.45 | 12.45 | 12.45 | 13.17 | 13.17 | 13.17 |
|
| 0.370 (7.501) | 0.344 (7.020) | 0.261 (4.021) | 0.110 (0.956) | 0.094 (0.912) | 0.119 (0.989) |
|
| 0.374 (7.585) | 0.355 (7.248) | 0.267 (4.111) | 0.110 (0.959) | 0.096 (0.928) | 0.120 (1.000) |
|
| 0.054 (1.114) | 0.086 (1.778) | 0.054 (0.841) | 0.009 (0.076) | 0.018 (0.166) | 0.018 (0.149) |
| CC1/2 | 0.998 (0.520) | 0.996 (0.227) | 0.999 (0.459) | 1.000 (0.991) | 0.995 (0.944) | 0.999 (0.970) |
| Partial data-set statistics | ||||||
| No. of partial data sets | 67 | 24 | 34 | 149 | 29 | 42 |
| Average completeness (%) | 44.21 (42.96) | 44.3 (42.53) | 44.38 (43.62) | 67.07 (66.93) | 67.86 (67.90) | 67.20 (66.27) |
| Average 〈 | 2.72 (0.19) | 2.68 (0.15) | 3.19 (0.28) | 7.05 (1.12) | 7.19 (1.00) | 6.70 (1.10) |
| Average | 0.52 (1.75) | 0.33 (3.91) | 0.35 (7.88) | 0.09 (0.80) | 0.08 (0.96) | 0.11 (0.49) |
| Average CC1/2 | 0.92 (0.05) | 0.97 (0.05) | 0.95 (0.09) | 0.99 (0.50) | 1.00 (0.47) | 0.98 (0.49) |
| Thermolysin Sm | Proteinase K Hg | |||||
|---|---|---|---|---|---|---|
| All | Native | Derivative | All | Native | Derivative | |
| Wavelength (A˙) | 0.873 | 0.873 | 0.873 | 0.873 | 0.873 | 0.873 |
| Resolution range (A˙) | 80.74–1.60 (1.66–1.60) | 80.74–1.60 (1.66–1.60) | 80.74–1.60 (1.66–1.60) | 57.43–1.40 (1.45–1.40) | 57.43–1.40 (1.45–1.40) | 57.43–1.40 (1.45–1.40) |
| Space group |
|
|
|
|
|
|
| Unit-cell average and [range] | ||||||
|
| 93.10 [92.24–93.48] | 92.96 [92.94–93.02] | 93.10 [92.93–93.20] | 67.95 [67.58–68.22] | 67.93 [67.76–68.03] | 67.93 [67.79–68.06] |
|
| 93.10 [92.24–93.48] | 92.96 [92.94–93.02] | 93.10 [92.93–93.20] | 67.95 [67.58–68.22] | 67.93 [67.76–68.03] | 67.93 [67.79–68.06] |
|
| 129.33 [127.60–130.88] | 129.04 [129.03–129.06] | 129.05 [128.84–129.29] | 107.60 [106.17–108.51] | 107.57 [106.89–108.12] | 107.66 [107.09–107.97] |
| α (°) | 90 | 90 | 90 | 90 | 90 | 90 |
| β (°) | 90 | 90 | 90 | 90 | 90 | 90 |
| γ (°) | 120 | 120 | 120 | 90 | 90 | 90 |
| Total no. of reflections | 2337994 (232519) | 134562 (13566) | 223412 (22425) | 3190131 (301988) | 772620 (72946) | 561441 (53061) |
| No. of unique reflections | 44378 (4362) | 31848 (3133) | 43712 (4315) | 50269 (4940) | 50238 (4939) | 50243 (4938) |
| Multiplicity | 52.68 (53.31) | 4.23 (4.33) | 5.11 (5.20) | 63.46 (61.13) | 15.38 (14.77) | 11.17 (10.75) |
| Completeness (%) | 100.00 (100.00) | 71.76 (71.82) | 98.49 (98.92) | 100.00 (100.00) | 99.94 (99.98) | 99.95 (99.96) |
| 〈 | 10.6 (2.4) | 7.2 (1.5) | 6.3 (1.5) | 9.9 (2.3) | 6.4 (1.3) | 5.0 (1.0) |
| Wilson | 13.87 | 13.87 | 13.87 | 13.20 | 13.20 | 13.20 |
|
| 0.436 (3.548) | 0.106 (0.775) | 0.141 (0.910) | 0.553 (3.049) | 0.362 (2.055) | 0.393 (2.305) |
|
| 0.441 (3.582) | 0.121 (0.883) | 0.157 (1.009) | 0.558 (3.074) | 0.375 (2.128) | 0.412 (2.422) |
|
| 0.061 (0.486) | 0.056 (0.411) | 0.066 (0.423) | 0.070 (0.391) | 0.094 (0.544) | 0.123 (0.732) |
| CC1/2 | 0.997 (0.790) | 0.996 (0.583) | 0.993 (0.618) | 0.995 (0.806) | 0.993 (0.570) | 0.987 (0.402) |
| Partial data-set statistics | ||||||
| No. of partial data sets | 53 | 3 | 5 | 91 | 22 | 16 |
| Average completeness (%) | 57.82 (58.78) | 68.82 (69.10) | 56.52 (57.76) | 48.05 (47.03) | 47.49 (47.05) | 50.84 (48.80) |
| Average 〈 | 2.58 (0.38) | 4.53 (0.93) | 4.34 (0.96) | 1.86 (0.31) | 2.30 (0.40) | 2.16 (0.36) |
| Average | 0.33 (4.22) | 0.13 (0.84) | 0.15 (1.01) | 0.51 (4.68) | 0.42 (1.03) | 0.45 (0.33) |
| Average CC1/2 | 0.96 (0.16) | 0.99 (0.51) | 0.99 (0.46) | 0.90 (0.14) | 0.94 (0.18) | 0.93 (0.16) |