| Literature DB >> 35099504 |
Carlos Outeiral1, Daniel A Nissley1, Charlotte M Deane1.
Abstract
Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modelling driven by deep learning have achieved unprecedented success at predicting a protein's crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2022 PMID: 35099504 PMCID: PMC8963306 DOI: 10.1093/bioinformatics/btab881
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Proto col for the analysis of simulated folding pathways. (a) Trajectory generation process. Protein sequences are used to generate the necessary input features for a modified protein structure predictor using default processing scripts. The structure prediction software outputs detailed search trajectories, that are then summarized as the fraction of native contacts between pairs of secondary structure elements. (b) The trajectories are smoothed, and the positions of maximum change are identified via numerical differentiation. These peaks are subsequently clustered using KDE with a Gaussian kernel, allowing us to identify main phases of folding, and establishing whether the trajectory proceeds in one or more steps; and into the structural intermediates, which can be compared with HDX experiments
Performance of the different protein structure prediction methods at determining folding kinetics
| RoseTTAFold | trRosetta | RaptorX | DMPfold | EVfold | SAINT2 | Rosetta | Length | |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Unsupervised accuracy | 0.614 | 0.614 | 0.560 | 0.565 | 0.552 | 0.554 | 0.552 | — |
| Unsupervised F1-score | 0.637 | 0.588 | 0.472 | 0.679 | 0.525 | 0.586 | 0.513 | — |
| Supervised accuracy | 0.607 | 0.576 | 0.551 | 0.588 | 0.568 | 0.538 | 0.527 |
|
| Supervised F1-score | 0.637 | 0.620 | 0.558 | 0.667 | 0.643 | 0.620 | 0.655 |
|
| AUROC | 0.675 | 0.654 | 0.626 | 0.594 | 0.605 | 0.608 | 0.560 |
|
|
| ||||||||
| Unsupervised accuracy | 0.623 | 0.546 | 0.576 | 0.556 | 0.608 | — | — | — |
| Unsupervised F1-score | 0.663 | 0.638 | 0.610 | 0.687 | 0.616 | — | — | — |
| Supervised accuracy | 0.612 | 0.573 | 0.563 | 0.581 | 0.610 | — | — |
|
| Supervised F1-score | 0.649 | 0.640 | 0.565 | 0.667 | 0.645 | — | — |
|
| AUROC | 0.669 | 0.631 | 0.602 | 0.622 | 0.658 | — | — |
|
Note: Unsupervised metrics use a simple rule that assigns a protein the most frequent kinetics, i.e. if 50% or more of the decoys display multistate kinetics, the protein is taken to fold in multiple steps; otherwise it is considered two-state. Supervised metrics fit a logistic regression on and report the average of 1000 fivefold cross-validation experiments; note that the supervised score may sometimes be worse than the unsupervised one if the model does not generalize well. The baseline is a logistic regression that uses only the length of the protein. Accuracy reports the average recall per class, to account for the slight imbalance of the dataset (90 two-state folders and 80 multistate folders). The F1-score is the harmonic mean of recall and precision. The area under the receiver-operating curve (AUROC) for length is computed by projecting the values to the interval. Bold indicates the top metric. We observe that chain length outperforms any of the protein structure prediction methods at predicting folding kinetics.
Fig. 2.Correlation between the folding rate constant and folding events in simulated trajectories of the seven structure prediction methods considered, the length of the protein chain and the average contact order of the native structure. Every point represents the average over the maximum number of decoys possible (200 decoys for RoseTTAFold, trRosetta, RaptorX, DMPfold and EVfold; and 10 decoys for SAINT2 and Rosetta)
Performance of the structure predictors at identifying the secondary structure interactions present in an intermediate
| RoseTTAFold | trRosetta | RaptorX | DMPfold | EVfold | SAINT2 | Rosetta | Random | |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Accuracy | 0.453 | 0.534 | 0.495 | 0.489 |
| — | — | 0.502 |
| F1-score | 0.222 | 0.169 | 0.110 | 0.026 |
| — | — | 0.252 |
| Jaccard | 0.052 | 0.052 | 0.052 | 0.052 | 0.052 | — | — |
|
| AUROC | 0.441 | 0.503 | 0.502 | 0.492 |
| — | — | 0.498 |
Note: The ground truth corresponds to a dataset of 11 proteins whose intermediates have been characterized with HDX experiments. Accuracy reports the average recall per class, to account for the slight imbalance of the dataset. The Jaccard score reflects the average Jaccard similarity of the predictions, expressed as a binary string (where 1 means that the native contacts between secondary structure elements are formed in the intermediate, while 0 means they are not), with the true answer. The random baseline corresponds to an unbiased coin predicting whether two secondary structure elements are in contact.
Fig. 3.Average pairwise Jaccard similarity between multistate folding trajectories across all proteins in the dataset, for the seven structure prediction programs. Most methods exhibit significant variability between independent trajectories