| Literature DB >> 34981757 |
Airlie J McCoy1, Massimo D Sammito1, Randy J Read1.
Abstract
The AlphaFold2 results in the 14th edition of Critical Assessment of Structure Prediction (CASP14) showed that accurate (low root-mean-square deviation) in silico models of protein structure domains are on the horizon, whether or not the protein is related to known structures through high-coverage sequence similarity. As highly accurate models become available, generated by harnessing the power of correlated mutations and deep learning, one of the aspects of structural biology to be impacted will be methods of phasing in crystallography. Here, the data from CASP14 are used to explore the prospects for changes in phasing methods, and in particular to explore the prospects for molecular-replacement phasing using in silico models. open access.Entities:
Keywords: AlphaFold2; crystallographic phase problem; molecular replacement; protein structure prediction
Mesh:
Year: 2022 PMID: 34981757 PMCID: PMC8725160 DOI: 10.1107/S2059798321012122
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
Figure 1Histogram of the distribution of structures in the PDB by resolution. The relationship between the resolution of the data and the size of the models that are appropriate for molecular replacement is indicated.
Figure 2Illustration of the relationship between r.m.s.d., f and LLG per reflection (equation 2). (a) LLG for f = 0.8, linear scale. (b) LLG for f = 0.8, logarithmic scale. (c) Contour plot showing LLG for r.m.s.d. versus f, linear scale. (d) Contour plot showing LLG for r.m.s.d. versus f, logarithmic scale. The value f = 0.8 is shown with an orange line on the contour plots.
The 34 crystal structures included in CASP14 and the targets associated with each crystal structure
| Crystal No. | Target No. | CASP target | Residues | CASP domain | Residues | ‘Multidom’ domains(s) | Residues |
|---|---|---|---|---|---|---|---|
| 1 | 1 | T1024 | 408 | T1024-D1 | 193 | ||
| T1024-D2 | 204 | ||||||
| 2 | 2 | T1030 | 273 | T1030-D0 | (273) | T1030-D1 | 154 |
| T1030-D2 | 119 | ||||||
| 3 | 3 | T1031 | 95 | T1031-D1 | (95) | ||
| 4 | T1033 | 100 | T1033-D1 | (100) | |||
| 5 | T1035 | 102 | T1035-D1 | (102) | |||
| 6 | T1037 | 404 | T1037-D1 | (404) | |||
| 7 | T1039 | 161 | T1039-D1 | (161) | |||
| 8 | T1040 | 130 | T1040-D1 | (130) | |||
| 9 | T1041 | 242 | T1041-D1 | (242) | |||
| 10 | T1042 | 289 | T1042-D1 | 276 | |||
| 11 | T1043 | 148 | T1043-D1 | (148) | |||
| 4 | 12 | T1032 | 284 | T1032-D1 | 169 | ||
| 5 | 13 | T1034 | 156 | T1034-D1 | (156) | ||
| 6 | 14 | T1038 | 199 | T1038-D0 | 190 | T1038-D1 | 114 |
| T1038-D2 | 76 | ||||||
| 7 | 15 | T1046s1 | 216 | T1046s1-D1 | 72 | ||
| 16 | T1046s2 | 216 | T1046s2-D1 | 141 | |||
| 8 | 17 | T1048 | |||||
| 9 | 18 | T1049 | 141 | T1049-D1 | 134 | ||
| 10 | 19 | T1050 | 779 | T1050-D1 | 321 | ||
| T1050-D2 | 316 | ||||||
| T1050-D3 | 128 | ||||||
| 11 | 20 | T1052 | 832 | T1052-D0 | (832) | ||
| 12 | 21 | T1053 | 580 | T1053-D0 | 576 | T1053-D1 | 405 |
| T1053-D2 | 171 | ||||||
| 13 | 22 | T1054 | 190 | T1054-D1 | 143 | ||
| 14 | 23 | T1056 | 186 | T1056-D1 | 169 | ||
| 15 | 24 | T1058 | 382 | T1058-D0 | (382) | T1058-D1 | 221 |
| T1058-D2 | 161 | ||||||
| 16 | 25 | T1064 | 106 | T1064-D1 | 92 | ||
| 17 | 26 | T1065s1 | 225 | T1065s1-D1 | 11 | ||
| 27 | T1065s2 | 225 | T1065s2-D1 | 98 | |||
| 18 | 28 | T1067 | 220 | T1067-D1 | (221) | ||
| 19 | 29 | T1070 | 335 | T1070-D1 | 76 | ||
| T1070-D2 | 101 | ||||||
| T1070-D3 | 76 | ||||||
| T1070-D4 | 68 | ||||||
| 20 | 30 | T1073 | 58 | T1073-D1 | (59) | ||
| 21 | 31 | T1074 | 131 | T1074-D1 | (132) | ||
| 22 | 32 | T1079 | 483 | T1079-D1 | 451 | ||
| 23 | 33 | T1080 | 137 | T1080-D1 | 133 | ||
| 24 | 34 | T1082 | 97 | T1082-D1 | 75 | ||
| 25 | 35 | T1083 | 196 | T1083-D1 | 92 | ||
| 26 | 36 | T1084 | 146 | T1084-D1 | 71 | ||
| 27 | 37 | T1085 | 588 | T1085-D0 | 406 | T1085-D1 | 167 |
| T1085-D2 | 182 | ||||||
| T1085-D3 | 57 | ||||||
| 28 | 38 | T1086 | 408 | T1086-D0 | 381 | T1086-D1 | 193 |
| T1086-D2 | 188 | ||||||
| 29 | 39 | T1087 | 186 | T1087-D1 | 93 | ||
| 30 | 40 | T1089 | 404 | T1089-D1 | 377 | ||
| 31 | 41 | T1090 | 193 | T1090-D1 | 191 | ||
| 32 | 42 | T1091 | 863 | T1091-D1 | 139 | ||
| T1091-D2 | 107 | ||||||
| T1091-D3 | 106 | ||||||
| T1091-D4 | 112 | ||||||
| 33 | 43 | T1100 | 338 | T1100-D1 | 171 | ||
| T1100-D2 | 166 | ||||||
| 34 | 44 | T1101 | 318 | T1101-D0 | 307 | T1101-D1 | 83 |
| T1101-D2 | 224 |
Cancelled.
Figure 3Classifications and accuracy for the 34 crystal structures in CASP14. (a) Proportion of the different modelling categories FM (free modelling) and TBM (template-based modelling). PDB entry 6vr4 was counted as a single FM target. Histogram of distribution of all five submitted AlphaFold2 models for 44 crystallographic targets of interest for (b) LGA_S and (c) RMSD.
Best models for the targets for the 34 crystal structures included in CASP14
The model is given as (CASP group No.)_(ranked model No.). In brackets are the LGA_S and RMSD. Group Nos.: 427, AlphaFold2; 013, FEIG-S; 029, Venclovas; 071, Kiharalab; 080, FOLDYNE; 081, MUFOLD; 129, Zhang; 132, PBuild; 140, Yang-Server; 217, CAO-QA1; 259, AWSEM-CHEN; 288, DATE; 334, FEIG-R3; 337, CATHER; 392, trfold; 403, BAKER-experimental; 473, BAKER; 480, FEIG-R2.
| Crystal No. | Target(s) in ASU | Best | Best model overall by LGA_S | Best | Best model overall by RMSD |
|---|---|---|---|---|---|
| 1 | T1024 | 427_3 [87.5, 1.83] | 427_3 | 427_1 [58.8, 1.60] | 427_1 |
| 2 | T1030 | 427_2 [62.0, 1.82] | 427_2 | 427_2 | 013_2 [39.2, 1.27] |
| 3 | T1031 | 427_2 [94.0, 1.12] | 427_2 | 427_4 [93.7, 0.98] | 427_4 |
| T1033 | 427_1 [93.3, 1.39] | 427_1 | 427_3 [92.5, 1.36] | 259_4 [39.3, 1.29] | |
| T1035 | 427_5 [99.0, 0.81] | 427_5 | 427_5 | 427_5 | |
| T1037 | 427_4 [95.4, 1.12] | 427_4 | 427_5 [93.7, 1.11] | 427_5 | |
| T1039 | 427_1 [86.3, 1.61] | 427_1 | 427_1 | 071_1 [33.5, 1.17] | |
| T1040 | 427_1 [77.5, 1.95] | 427_1 | 427_2 [76.3, 1.90] | 140_1 [16.4, 1.31] | |
| T1041 | 427_1 [94.7, 1.21] | 427_1 | 427_1 | 427_1 | |
| T1042 | 427_3 [93.8, 1.22] | 427_3 | 427_5 [93.4, 1.21] | 427_5 | |
| T1043 | 427_3 [90.2, 1.42] | 427_3 | 427_1 [90.0, 1.41] | 427_1 | |
| 4 | T1032 | 427_3 [71.1, 1.67] | 427_3 | 427_1 [70.1, 1.65] | 427_1 |
| 5 | T1034 | 427_1 [96.9, 1.00] | 427_1 | 427_2 [95.7, 0.87] | 427_2 |
| 6 | T1038 | 427_2 [91.9, 1.17] | 427_2 | 427_2 | 427_2 |
| 7 | T1046s1 | 427_4 [98.1, 0.68] | 427_4 | 427_1 [98.1, 0.64] | 427_1 |
| T1046s2 | 427_1 [98.9, 0.69] | 427_1 | 427_1 | 427_1 | |
| 8 | T1048 | ||||
| 9 | T1049 | 427_1 [95.3, 0.82] | 427_1 | 427_1 | 427_1 |
| 10 | T1050 | 427_1 [93.3, 1.26] | 427_1 | 427_1 | 427_1 |
| 11 | T1052 | 427_4 [63.4, 1.17] | 427_4 | 427_5 [62.9, 1.14] | 337_5 [45.5, 1.13] |
| 12 | T1053 | 427_3 [96.9, 0.98] | 427_3 | 427_3 | 427_3 |
| 13 | T1054 | 427_3 [93.7, 0.84] | 427_3 | 427_2 [93.0, 0.81] | 029_1 [49.7, 0.76] |
| 14 | T1056 | 427_2 [99.3, 0.66] | 427_2 | 427_2 | 427_2 |
| 15 | T1058 | 427_3 [93.7, 1.25] | 427_3 | 427_3 | 427_3 |
| 16 | T1064 | 427_1 [92.6, 1.34] | 427_1 | 427_2 [91.0, 1.31] | 427_2 |
| 17 | T1065s1 | 427_2 [98.4, 0.91] | 427_2 | 427_4 [97.8, 0.85] | 427_4 |
| T1065s2 | 427_1 [99.5, 0.60] | 427_1 | 427_1 | 427_1 | |
| 18 | T1067 | 427_3 [92.9, 0.86] | 427_3 | 427_3 | 427_3 |
| 19 | T1070 | 427_5 [45.0, 1.69] | 427_3 | 427_3 [41.2, 1.52] | 334_1 [30.4, 1.22] |
| 20 | T1073 | 427_3 [86.7, 1.76] | 288_4 [95.2, 1.47] | 427_5 [85.5, 1.41] | 217_3 [83.8, 0.97] |
| 21 | T1074 | 427_2 [93.7, 1.15] | 427_2 | 427_4 [92.6, 1.06] | 427_4 |
| 22 | T1079 | 427_4 [96.7, 1.05] | 427_4 | 427_2 [96.7, 1.03] | 427_2 |
| 23 | T1080 | 427_4 [91.6, 1.37] | 427_4 | 427_4 | 427_4 |
| 24 | T1082 | 427_1 [97.9, 0.88] | 427_1 | 427_2 [97.3, 0.88] | 427_2 |
| 25 | T1083 | 427_4 [91.9, 1.09] | 427_4 | 427_4 | 392_2 [88.7, 0.96] |
| 26 | T1084 | 427_5 [94.6, 0.85] | 129_3 [95.1, 1.39] | 427_4 [94.0, 0.77] | 480_4 [92.6, 0.60] |
| 27 | T1085 | 427_1 [88.2, 1.86] | 427_1 | 427_1 | 473_4 [32.5, 1.45] |
| 28 | T1086 | 427_1 [89.6, 1.78] | 427_1 | 427_4 [88.7, 1.64] | 080_1 [53.0, 1.60] |
| 29 | T1087 | 427_3 [97.0, 0.63] | 427_3 | 427_2 [96.8, 0.57] | 081_3 [40.3, 0.38] |
| 30 | T1089 | 427_2 [99.0, 0.71] | 427_2 | 427_2 | 427_2 |
| 31 | T1090 | 427_3 [95.4, 1.16] | 427_3 | 427_1 [92.1, 1.04] | 427_1 |
| 32 | T1091 | 427_2 [79.8, 2.00] | 427_2 | 427_5 [77.0, 1.96] | 403_2 [27.3, 1.47] |
| 33 | T1100 | 427_2 [90.3, 1.68] | 427_2 | 427_4 [55.9, 1.06] | 132_2 [18.2, 0.81] |
| 34 | T1101 | 427_4 [92.0, 1.17] | 427_4 | 427_3 [91.9, 1.15] | 427_3 |
Cancelled
Summary of phasing of the 34 crystal structures of interest in CASP14 with AlphaFold2 models
Crystals and targets listed in bold are discussed in the text.
| Crystal | Target | PDB code |
| No. of reflections |
| Domains | Filter | TFZ |
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | T1024 |
| 2.9 | 12686 | 1 | 1 | No | 24.7 | 0.52 | 0.42 |
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
| 3.5 | 92907 | 2 | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| 5 | T1034 |
| 2.1 | 47702 | 4 | 1 | No | 31.9 | 0.45 | |
| 6 | T1038 |
| 2.5 | 20426 | 3 | 1 | No | 24.3 | 0.35 | |
| 7 | T1046s1 |
| 1.7 | 69112 | 2 | 2 | No | 31.8 | 0.35 | |
| T1046s2 | ||||||||||
|
|
|
|
|
|
| |||||
| 9 | T1049 |
| 1.8 | 12228 | 1 | 1 | No | 23.7 | 0.34 | |
| 10 | T1050 | 2.7 | 97731 | 3 | 1 | No | 39.0 | 0.30 | ||
| 11 | T1052 | 2.0 | 88914 | 2 | 1 | No | 48.6 | 0.43 | ||
| 12 | T1053 |
| 3.2 | 49627 | 4 | 1 | No | 51.0 | 0.35 | |
| 13 | T1054 |
| 1.7 | 25547 | 1 | 1 | No | 33.2 | 0.34 | |
| 14 | T1056 |
| 2.3 | 17863 | 2 | 1 | No | 19.7 | 0.37 | |
| 15 | T1058 |
| 3.1 | 20228 | 2 | 1 | No | 25.5 | 0.44 | |
| 16 | T1064 |
| 2.0 | 16787 | 2 | 1 | No | 19.1 | 0.41 | |
| 17 | T1065s1 |
| 1.6 | 35695 | 1 | 2 | No | 48.7 | 0.22 | |
| T1065s2 | ||||||||||
| 18 | T1067 | 1.4 | 51025 | 1 | 1 | No | 57.2 | 0.26 | ||
| 19 | T1070 | 2.5 | 25412 | 1 | 1 | No | 6.8 | 0.49 | 0.42 | |
|
|
|
|
|
|
|
|
|
| ||
| 21 | T1074 |
| 1.5 | 25800 | 1 | 1 | No | 21.1 | 0.29 | |
| 22 | T1079 | 3.2 | 47985 | 4 | 1 | No | 33.6 | 0.38 | ||
|
|
|
|
|
|
|
|
|
| ||
| 24 | T1082 |
| 1.1 | 97672 | 2 | 1 | No | 33.6 | 0.44 | 0.31 |
| 25 | T1083 | 1.3 | 81236 | 4 | 1 | 0.6 | 30.0 | 0.48 | 0.35 | |
| 26 | T1084 | 1.9 | 23901 | 3 | 1 | No | 32.1 | 0.38 | ||
| 27 | T1085 | 2.5 | 10758 | 1 | 3 | No | 22.0 | 0.40 | ||
| 28 | T1086 | 2.3 | 21887 | 1 | 1 | No | 18.7 | 0.41 | ||
| 29 | T1087 | 1.4 | 69617 | 4 | 1 | No | 43.2 | 0.25 | ||
| 30 | T1089 | 2.2 | 55192 | 2 | 1 | No | 63.5 | 0.29 | ||
| 31 | T1090 |
| 1.8 | 22947 | 1 | 1 | No | 27.2 | 0.29 | |
| 32 | T1091 | 2.2 | 62789 | 1 | 4 | No | 22.5 | 0.44 | ||
|
|
|
|
|
|
|
|
|
|
| |
| 34 | T1101 | 1.4 | 58030 | 1 | 1 | No | 16.2 | 0.34 |
Number of copies in the asymmetric unit.
Threshold for predicted RMSD below which residues were pruned from the model.
Translation-function Z-score, a measure of significance of the MR solution.
Partial solution.
No solution.
Figure 4Crystal 3, PDB entry 6vr4, targets T1031 (1, red), T1033 (2, orange), T1035 (3, yellow), T1037 (4, khaki), T1039 (5, green), T1040 (6, blue), T1041 (7, purple), T1042 (8, magenta) and T1043 (9, violet). Two targets are discontinuous in the primary sequence. (a) Structure with targets highlighted; regions not corresponding to a target are shown in grey. The figure was created with Mol* (Sehnal et al., 2021 ▸). (b) Sequence with targets highlighted; regions that are not highlighted were not included in targets.
Phasing of crystal 3 (PDB entry 6vr4) with AlphaFold2 models
For CASP14, the single polypeptide chain of the target sequence was divided into nine assessment domains. There were two copies of the target sequence in the crystallographic asymmetric unit. Two copies each of six targets were found by molecular replacement and given chain identifiers A–L in order of placement. Targets T1031, T1040 and T1043 could not be placed.
| Model |
| Type | Residues | R.m.s.d. (Å) | Chain | |
|---|---|---|---|---|---|---|
| 1 | T1031TS427_1-D1 | 2 | FM | 95 | 2.91 | |
| 2 | T1033TS427_1-D1 | 2 | FM | 100 | 1.58 |
|
| 3 | T1035TS427_1-D1 | 2 | FM/TBM | 102 | 0.81 |
|
| 4 | T1037TS427_1-D1 | 2 | FM | 404 | 1.25 |
|
| 5 | T1039TS427_1-D1 | 2 | FM | 161 | 2.50 |
|
| 6 | T1040TS427_1-D1 | 2 | FM | 130 | 2.76 | |
| 7 | T1041TS427_1-D1 | 2 | FM | 242 | 1.70 |
|
| 8 | T1042TS427_1-D1 | 2 | FM | 276 | 1.79 |
|
| 9 | T1043TS427_1-D1 | 2 | FM | 148 | 2.46 |
Figure 5Phasing methods since 2000 as recorded in the ‘structure determination method’ field in the PDB: SAD (single-wavelength anomalous dispersion), MAD (multi-wavelength anomalous dispersion), IR (isomorphous replacement), FS (Fourier synthesis) and MR (molecular replacement). Both the deposition and release dates for the submission were within the five-year time period shown. (a) All methods as a percentage of PDB submissions per time period. (b) Experimental phasing methods as the number of PDB submissions per time period.