| Literature DB >> 34387007 |
Xavier Robin1,2, Juergen Haas1,2, Rafal Gumienny1,2, Anna Smolinski1,2, Gerardo Tauriello1,2, Torsten Schwede1,2.
Abstract
The Continuous Automated Model EvaluatiOn (CAMEO) platform complements the biennial CASP experiment by conducting fully automated blind evaluations of three-dimensional protein prediction servers based on the weekly prerelease of sequences of those structures, which are going to be published in the upcoming release of the Protein Data Bank. While in CASP14, significant success was observed in predicting the structures of individual protein chains with high accuracy, significant challenges remain in correctly predicting the structures of complexes. By implementing fully automated evaluation of predictions for protein-protein complexes, as well as for proteins in complex with ligands, peptides, nucleic acids, or proteins containing noncanonical amino acid residues, CAMEO will assist new developments in those challenging areas of active research.Entities:
Keywords: benchmarking; blind assessment; continuous evaluation; ligands; macromolecular complexes; molecular structure prediction; non-canonical residues
Mesh:
Substances:
Year: 2021 PMID: 34387007 PMCID: PMC8673552 DOI: 10.1002/prot.26213
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Number of targets of each experimental type released by the PDB in 2020, remaining after clustering, and selected for submission
| Released by the PDB | Clustering | Selection | ||||||
|---|---|---|---|---|---|---|---|---|
| Total | X‐ray | EM | Solution NMR | Other | ||||
| Current CAMEO | 15 028 | 12 551 | 2182 | 247 | 48 | 7466 | 1038 | |
| of which homo‐oligomeric | 4494 | 3823 | 631 | 20 | 20 | 2341 | 405 | |
| Protein complexes | all | 12 901 | 10 570 | 2050 | 235 | 46 | 7511 | 4141 |
| only proteins | 11 566 | 9705 | 1604 | 212 | 45 | 6465 | 3158 | |
| of which hetero‐oligomers | 2304 | 1361 | 930 | 11 | 2 | 1496 | 1130 | |
| … homo‐oligomers | 4032 | 3383 | 608 | 20 | 21 | 2284 | 1011 | |
| … monomers | 5230 | 4961 | 66 | 181 | 22 | 2685 | 1017 | |
| Protein–ligand complexes | all | 9929 | 8577 | 1298 | 31 | 23 | 8889 | 3567 |
| only protein‐small molecule | 9040 | 8007 | 979 | 31 | 23 | 8094 | 3491 | |
| of which hetero‐oligomers | 1543 | 939 | 598 | 6 | 0 | 1235 | 296 | |
| … homo‐oligomers | 3218 | 2873 | 335 | 1 | 9 | 2904 | 1291 | |
| … monomers | 4278 | 4195 | 45 | 24 | 14 | 3954 | 1904 | |
| Peptide complexes | all | 749 | 614 | 56 | 68 | 11 | 605 | 536 |
| only peptides | 107 | 40 | 5 | 51 | 11 | 90 | 83 | |
| of which hetero‐oligomers | 6 | 6 | 0 | 0 | 0 | 6 | 5 | |
| … homo‐oligomers | 23 | 16 | 5 | 1 | 1 | 23 | 22 | |
| … monomers | 78 | 18 | 0 | 50 | 10 | 61 | 56 | |
| DNA complexes | all | 513 | 280 | 208 | 25 | 0 | 391 | 390 |
| only DNA | 61 | 33 | 4 | 24 | 0 | 58 | 57 | |
| of which hetero‐oligomers | 13 | 6 | 4 | 3 | 0 | 12 | 12 | |
| … homo‐oligomers | 28 | 24 | 0 | 4 | 0 | 26 | 25 | |
| … monomers | 20 | 3 | 0 | 17 | 0 | 20 | 20 | |
| RNA complexes | all | 422 | 123 | 275 | 21 | 3 | 327 | 323 |
| only RNA | 78 | 48 | 12 | 16 | 2 | 45 | 42 | |
| of which hetero‐oligomers | 14 | 10 | 0 | 4 | 0 | 6 | 6 | |
| … homo‐oligomers | 8 | 8 | 0 | 0 | 0 | 6 | 4 | |
| … monomers | 56 | 30 | 12 | 12 | 2 | 33 | 32 | |
| Mixed complexes | 1335 | 865 | 446 | 23 | 1 | 1046 | 983 | |
| protein‐peptide | 608 | 563 | 28 | 17 | 0 | 483 | 421 | |
| protein‐RNA | 243 | 46 | 191 | 5 | 1 | 200 | 199 | |
| protein‐DNA | 381 | 225 | 155 | 1 | 0 | 279 | 279 | |
| protein‐RNA–DNA | 69 | 20 | 49 | 0 | 0 | 52 | 52 | |
| protein‐RNA‐peptide | 32 | 9 | 23 | 0 | 0 | 30 | 30 | |
| protein‐RNA‐peptide | 2 | 2 | 0 | 0 | 0 | 2 | 2 | |
| Complexes with noncanonical residues | 1075 | 940 | 113 | 20 | 2 | 666 | 444 | |
| proteins | 824 | 717 | 103 | 3 | 1 | 496 | 286 | |
| peptides | 198 | 180 | 0 | 17 | 1 | 124 | 112 | |
| RNA | 34 | 22 | 12 | 0 | 0 | 28 | 27 | |
| DNA | 52 | 52 | 0 | 0 | 0 | 35 | 35 | |
For protein–ligand complexes, the selection criterion includes both the existence of closely related homolog complexes in the PDB and the presence of the ligands in DrugBank.
FIGURE 1Target 2020‐12‐19_00000231 (PDB ID 7 K93) is a hetero‐2‐2‐mer protein complex of a Dengue virus nonstructural protein (NS1) (green) in complex with a mouse neutralizing single chain Fab variable region (orange). While templates can be easily identified with HHblits for both entities, there is no overlap between the template lists, meaning the two proteins have never been observed in a homologous complex. Specifically, no homologs of this Dengue virus protein have been observed in complex with an antibody. Hence, this constitutes an interesting target for modeling heteromeric protein complexes
FIGURE 2Hypothetical hetero‐2‐2‐mer target (AABB, left) with a ligand, and a hypothetical model of the target (right). (1) The lDDT score assesses the accuracy of each individual chain and measures local and global differences between model and reference structure. When more than one chain is predicted for an entity (B1, B2), only the best‐scoring one (B2) is kept. (2) The oligo‐lDDT score assesses the accuracy of all chains simultaneously while penalizing for missing (A1) or extra chains. (3) The QS‐score assesses the accuracy of the interface(s) between chains. It identifies correct (green dashed line) and inaccurate (orange dashed line) interfaces, and penalizes missing (red dashed line) interfaces. (4) The lDDT‐BS score assesses the accuracy of the binding site of biologically relevant ligands (gray circle, center). (5) Ligand scores assess the accuracy of the ligand (yellow) pose
FIGURE 3Target 2020‐05‐09_00000305 (PDB ID 7BRP) is a structure of the SARS‐CoV‐2 main protease in complex with Boceprevir. At the time of prerelease, the structure of the protease had already been solved, and was therefore a trivial modeling target on its own. However, it had not been observed in complex with Boceprevir, and therefore, this complex represents a challenging ligand modeling target
FIGURE 4Target 2020‐05‐30_00000276 (PDB ID 6LQF) is an ARID‐PHD protein cassette in complex with a peptide, DNA, and zinc ions. The protein only has remote similarity (<30% sequence identity) to known structures, and none of them are in complex with DNA or the H3K4me3 peptide, making it an extremely challenging target. We are not aware of any methods that would currently be able to model this type of complex with acceptable accuracy. It should be noted that the peptide contains a noncanonical residue (N‐Trimethyllysine, derived from lysine)