| Literature DB >> 34282049 |
Minkyung Baek1,2, Frank DiMaio1,2, Ivan Anishchenko1,2, Justas Dauparas1,2, Sergey Ovchinnikov3,4, Gyu Rie Lee1,2, Jue Wang1,2, Qian Cong5,6, Lisa N Kinch7, R Dustin Schaeffer6, Claudia Millán8, Hahnbeom Park1,2, Carson Adams1,2, Caleb R Glassman9,10,11, Andy DeGiovanni12, Jose H Pereira12, Andria V Rodrigues12, Alberdina A van Dijk13, Ana C Ebrecht13, Diederik J Opperman14, Theo Sagmeister15, Christoph Buhlheller15,16, Tea Pavkov-Keller15,17, Manoj K Rathinaswamy18, Udit Dalwadi19, Calvin K Yip19, John E Burke18, K Christopher Garcia9,10,11,20, Nick V Grishin6,7,21, Paul D Adams12,22, Randy J Read8, David Baker23,2,24.
Abstract
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34282049 PMCID: PMC7612213 DOI: 10.1126/science.abj8754
Source DB: PubMed Journal: Science ISSN: 0036-8075 Impact factor: 47.728
Fig. 1Network architecture and performance.
(A) RoseTTAFold architecture with 1D, 2D, and 3D attention tracks. Multiple connections between tracks allow the network to simultaneously learn relationships within and between sequences, distances, and coordinates (see Methods and fig. S1 for details). (B) Average TM-score of prediction methods on the CASP14 targets. Zhang-server and BAKER-ROSETTASERVER were the top 2 server groups while AlphaFold2 and BAKER were the top 2 human groups in CASP14; BAKER-ROSETTASERVER and BAKER predictions were based on trRosetta. Predictions with the 2-track model and RoseTTAFold (both end-to-end and pyRosetta version) were completely automated. (C) Blind benchmark results on CAMEO medium and hard targets; model accuracies are TM-score values from the CAMEO website (https://cameo3d.org/).
Fig. 2Enabling experimental structure determination with RoseTTAFold.
(A-B) Successful molecular replacement with RoseTTAFold models. (A) SLP. (top) C-terminal domain: comparison of final refined structure (gray) to RoseTTAFold model (blue); there are no homologs with known structure. (bottom) N-terminal domain: refined structure is in gray, and RoseTTAFold model is colored by the estimated RMS error (ranging from blue for 0.67 Å to red for 2 Å or greater). 95 Cα atoms of the RoseTTAFold model can be superimposed within 3 Å of Cα atoms in the final structure, yielding a Cα-RMSD of 0.98 Å. In contrast, only 54 Cα atoms of the closest template (4l3a, brown) can be superimposed (with a Cα-RMSD of 1.69 Å). (B) Refined structure of Lrbp (gray) with the closest RoseTTAFold model (blue) superimposed; residues having estimated RMS error greater than 1.3 Å are omitted (full model is in fig. S5C). (C) Cryo-EM structure determination of p101 Gβγ binding domain (GBD) in a heterodimeric PI3Kγ complex using RoseTTAFold. (top) RoseTTAFold models colored in a rainbow from the N-terminus (blue) to the C-terminus (red) have a consistent all-beta topology with a clear correspondence to the density map. (bottom) Comparison of the final refined structure to the RoseTTAFold model colored by predicted RMS error ranging from blue for 1.5 Å or less to red 3 Å or greater. The actual Cα-RMSD between the predicted structure and final refined structure is 3.0 Å over the beta-sheets. Figure prepared with ChimeraX (35).
Fig. 3RoseTTAFold models provide insights into function.
(A) TANGO2 model, colored in a rainbow from the N-terminus (blue) to the C-terminus (red), adopts an Ntn hydrolase fold. Pathogenic mutation sites are in magenta spheres. (B) Predicted TANGO2 active site colored by ortholog conservation in rainbow scale from variable (blue) to conserved (red) with conserved residues in stick and labeled. Pathogenic mutations (spheres with wild-type side chains in the sticks) are labeled in magenta; select neighboring residues are depicted in the sticks. (C) ADAM33 prodomain adopts a lipocalin-like barrel shown in a rainbow from N-terminus (blue) to C-terminus (red). (D) ADAM33 model surface rendering colored by ortholog conservation from blue (variable) to red (conserved), highlighting a conserved surface patch. (E) CERS1 transmembrane structure prediction is colored from N-terminus (blue) to C-terminus (red), with a pathogenic mutation in TMH2 near a central cavity in magenta. (F) Zoom of CERS1 active site with residues colored by ortholog conservation from variable (blue) to conserved (red). Residues that contribute to catalysis (H182 and D213) or are conserved (W298 and D213) line the cavity. The conserved pathogenic mutation is adjacent to the active site.
Fig. 4Complex structure prediction using RoseTTAFold.
(A, B) Prediction of structures of E.coli protein complexes from sequence information. Experimentally determined structures are on the left, RoseTTAFold models, on the right; the TMscores below indicate the extent of structural similarity. (A) Two chain complexes. The first subunit is colored in gray, and the second subunit is colored in a rainbow from blue (N-terminal) to red (C-terminal). (B) Three chain complexes. Subunits are colored in gray, cyan, and magenta. (C) IL-12R/IL-12 complex structure generated by RoseTTAFold fits the previously published cryo-EM density (EMD-21645).