| Literature DB >> 31602685 |
Andrew W Senior1, Richard Evans1, John Jumper1, James Kirkpatrick1, Laurent Sifre1, Tim Green1, Chongli Qin1, Augustin Žídek1, Alexander W R Nelson1, Alex Bridgland1, Hugo Penedones1, Stig Petersen1, Karen Simonyan1, Steve Crossan1, Pushmeet Kohli1, David T Jones2,3, David Silver1, Koray Kavukcuoglu1, Demis Hassabis1.
Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.Entities:
Keywords: CASP; deep learning; machine learning; protein structure prediction
Mesh:
Substances:
Year: 2019 PMID: 31602685 PMCID: PMC7079254 DOI: 10.1002/prot.25834
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1A schematic of the GDT‐net system (A). Feature extraction stages are shown in yellow, structure‐prediction neural network in green, and structure realization in blue
Figure 2An overview of the simulated annealing framework. A pool of workers runs simulated annealing to optimize the backbone structure. Another pool refines these structures to add side‐chain atoms. Fragments from these full‐atom structures are reused in simulated annealing in a continuous fashion
Figure 3A schematic of the fragment network 3. The blue parts of the network describe the conditioning network, the purple parts are the encoding network used to approximate the posterior, and the orange parts are the generative decoder
Figure 4Number of FM + FM/TBM domains (out of 43) solved to a GDT_TS threshold for all groups in CASP13
Figure 5A7D CASP13 submission accuracies by domain. The GDT_TS for each of the five A7D CASP13 submissions are shown. Submissions are colored by method with fragment assembly submissions (B) colored red, GDT‐net submissions (A) colored green, and gradient descent submissions (C) colored blue. T0999 (1589 residues) was manually segmented based on HHpred28 homology matching
A7D CASP13 accuracies by method. Average GDT_TS scores of the A7D CASP13 submissions broken down by method. Since the methods used changed after T0975, we show the means for these two sets separately. Domains in which only one method was used have been excluded to make the numbers comparable
| Mean GDT_TS for targets | ||
|---|---|---|
| Method | Before T0975 | T0975 onwards |
| Fragment assembly with GDT‐net | 63.8 | N/A |
| Fragment assembly with distance potential | 62.4 | 63.4 |
| Gradient descent on distance potential | N/A | 64.4 |
Figure 6The TM‐score of the A7D submissions plotted against the length‐normalized number of effective sequence alignments found (N). Each domain decoy is colored by difficulty category, with a shape indicating the method by which it was generated
Figure 7Accuracy curves for three domains of T0990. Curves show the fraction of residues that are correct within a given alignment threshold. All groups' submissions are shown with one curve per submission (396, 396, and 397 models, respectively), highlighting the five A7D submissions in magenta. Graphs from http://predictioncenter.org