Literature DB >> 35634779

Complementing machine learning-based structure predictions with native mass spectrometry.

Timothy M Allison¹, Matteo T Degiacomi², Erik G Marklund³, Luca Jovine⁴, Arne Elofsson⁵, Justin L P Benesch⁶, Michael Landreh⁷.

Abstract

The advent of machine learning-based structure prediction algorithms such as AlphaFold2 (AF2) and RoseTTa Fold have moved the generation of accurate structural models for the entire cellular protein machinery into the reach of the scientific community. However, structure predictions of protein complexes are based on user-provided input and may require experimental validation. Mass spectrometry (MS) is a versatile, time-effective tool that provides information on post-translational modifications, ligand interactions, conformational changes, and higher-order oligomerization. Using three protein systems, we show that native MS experiments can uncover structural features of ligand interactions, homology models, and point mutations that are undetectable by AF2 alone. We conclude that machine learning can be complemented with MS to yield more accurate structural models on a small and large scale.

Entities: Chemical

Keywords: integrative modeling; machine learning; protein structure prediction; structural proteomics

Mesh：

Substances：

Year: 2022 PMID： 35634779 PMCID： PMC9123603 DOI： 10.1002/pro.4333

Source DB: PubMed Journal: Protein Sci ISSN： 0961-8368 Impact factor: 6.993

INTRODUCTION

Machine learning (ML)‐based algorithms have been hailed as the solution to the protein structure prediction problem and are already being used to predict structures across entire proteomes. , For example, using protein sequence data as the only user input, AF2 can generate models of ordered, monomeric proteins that rival in quality experimentally derived structures, which can be assembled into complexes using AF2 Multimer. However, it is important to remember that the models are generated according to user‐provided input. For example, AF2 Multimer does not suggest an oligomeric state; instead, the stoichiometry for the model must be specified along with the sequences of the components. Moreover, AF2 may propose seemingly plausible models for a protein interaction even if this is not biologically relevant, for example, because the proteins are in different cellular compartments. Furthermore, using AF2 to predict interactions involving dynamic regions, ligand binding sites, or point mutations, all of which are major focal points of structural biology, remains challenging. In these cases, additional structural data may be required to assess the validity of the computed structures, for example, from X‐ray crystallography and cryo‐EM. However, obtaining such data is challenging, resulting in a need for alternative strategies. Mass spectrometry (MS), with its rapidly expanding structural biology toolbox, can provide structural data that are directly complementary to ML (Figure 1a). Despite not being a stand‐alone structure determination technique, MS offers a wealth of information for hybrid structural biology approaches. It has a well‐developed capacity to provide proteoform primary structure information, such as post‐translational modifications, via MS‐sequencing. In combination with in‐solution labeling methods such as hydrogen‐deuterium exchange (HDX), MS can inform about local structural dynamics. Native MS, where the non‐covalent interfaces in macromolecules are preserved in the experiment, is widely used to determine oligomeric states, which is of particular importance when building models of protein complexes. Crosslinking and ion mobility (IM) measurements reveal the spatial arrangements of components in a protein complex. Unlike other biophysical methods, MS offers the crucial advantage of being able to provide structural data on the proteome scale. For example, proteome‐wide crosslinking studies can help to filter biologically irrelevant interactions. Collision‐cross sections (CCSs, effectively 2D‐projections of the structures) can be calculated for entire model proteomes and used to filter complex architectures by IM‐MS. Last, hybrid MS methods, such as NativeOmics, can reveal direct connections between primary and quaternary structure variations, as well as help to identify ligands or cofactors that may be structurally and functionally important.

FIGURE 1

(a) The structural mass spectrometry (MS) toolbox offers information that is directly complementary to machine learning‐based structure prediction. MS can inform about proteoforms (MS sequencing), structural dynamics (HDX‐MS), the spatial arrangements of proteins in a complex (ion mobility and crosslinking MS), and oligomeric states (native MS). (b) Left: Experimental and predicted structures for holo‐ (left) and apo‐DHODH show near‐identical three‐dimensional folds. Middle: Native MS reveals the presence of a small population of apo protein. Right: IM‐MS of the 13+ charge states of apo‐ and holo‐DHODH shows that the protein with co‐factor has a native‐like CCS, whereas the protein without co‐factor is unfolded. (c) Left: Crystal structures for the HSP 17.7 and 18.1 homodimers are virtually indistinguishable from the AF2‐predicted heterodimer. Native MS of a mixture of HSP 17.7 and 18.1 under denaturing conditions (middle) and after refolding (right) reveals that no heterodimer formation takes place. (d) Left: AF2 predicts that the D40N mutant of MaSp1 NT forms a homodimer that closely resembles the dimeric structure of wt MaSp1 NT, despite showing partial loss of the D39/D40/K65 salt bridge. Middle: pLDDT plots indicate that the D40N mutation does not affect the prediction confidence for the subunits in the NT dimer. Right: Native MS analysis of both NT variants at pH 6.0 shows that the D40N mutation abolishes NT dimerization. All AF predictions were carried out using ColabFold V1.5, using AF2 Multimer 2.2. Predictions were run with the AMBER refinement step but without templates. The MS data for all three proteins were taken from each respective reference publications.

RESULTS AND DISCUSSION

As a first example, we tested the ability of AF2 to predict the structure of dihydroorotate dehydrogenase (DHODH), a mitochondrial enzyme involved in uracil synthesis. Inhibition of DHODH selectively kills cancer cells, making it a prime target for the development of novel therapeutics. When using AF2 to predict the structure of the soluble domain of DHODH, the result is nearly indistinguishable from the available X‐ray structures, with a Cα root‐mean‐square deviation (RMSD) of 0.5 Å2 (Figure 1b), with the exception that the predicted structure contains a central cavity which in the experimental structures is occupied by the cofactor flavin mononucleotide (FMN). In fact, overlaying the ligand‐binding sites of the AF2 prediction and the X‐ray structure reveals a nearly identical arrangement of the residues that coordinate FMN (Figure S1a). We have previously used native MS to assess the relationship between ligand binding and folding of DHODH and found that the protein exists mostly in the holo‐form. We also detected a small apo population with higher charge states, indicating unfolding in solution. Indeed, IM‐MS revealed that FMN‐bound protein adopts a compact conformation, whereas the FMN‐free protein is largely unfolded, as evident from the CCS distributions of the 13+ charge state of both populations (Figure 1b). When we computed the CCSs of the experimental and the predicted structures, we found them to be virtually identical (Figure 1b). Taken together, we find that AF2 predicts the fold of the holo‐form of DHODH even without the co‐factor. The recently solved crystal structure of the FMN‐free form of the homologous DHODH from Trypanosoma brucei reveals backbone re‐arrangements in the FMN pocket which result in increased local flexibility. Native MS shows that the human protein cannot maintain the correct conformation in the absence of FMN in MS, which strongly supports that FMN is required to adopt a stable conformation. This discrepancy could arise from co‐factor‐bound proteins being part of the AF2 training set, yet the co‐factors themselves are not considered in the prediction. Although alternative computational tools may be used to incorporate ligands in AF2 models, the connection between binding and folding is not considered in the predictions. As shown for DHODH, native MS can inform about the role of the co‐factor in promoting the correct fold of DHODH, a role that is not evident from the ML‐based prediction alone. Next, we asked whether native MS and AF2 could capture the effect of a flexible segment on the formation of a protein complex. For this purpose, we turned to the paralogous small heat shock proteins 17.7 and 18.1 from Pisum sativum. Both form highly similar homodimeric protomers via a conserved dimerization interface and swapping of a flexible loop, which then assemble into tetrahedral dodecamers. Using AF2, we could correctly predict both homodimers (Figure 1c), and also the hypothetical HSP 17.7–18.1 heterodimer with a per‐residue confidence score (pLDDT, which corresponds to the model's predicted score on the Local Distance Difference Test and measures distances between atom pairs ) equal to those of the homodimers, and a Cα RMSD of 0.73 and 0.66 Å2 for the 17.7 and 18.1 heterodimer, respectively. Similarly, the predicted alignment error plots show no discernable difference (Figure S1). We also used the pTM score in Alphafold Multimer 2.2 to assess the quality of the interface predictions and found that the heterodimer scored essentially the same (0.891 ± 0.005) as the homodimers (0.879 ± 0.007 and 0.882 ± 0.007). However, the proteins do not coassemble in vivo, despite being colocalized and coexpressed to high concentrations during heat stress. Upon refolding a mixture of denatured HSP 17.7 and 18.1, native MS revealed homodimer formation and assembly into dodecamers, while at the same time suggesting that, despite no direct steric hindrance and seemingly compatible dimer interfaces, heterodimerization is practically impossible (Figure 1c). This preference arises from an inability of the different monomers to bind each other's flexible loops due to differences in non‐interfacial residues, which provides a penalty for hetero‐oligomerization. Such a preference of homo‐ over hetero‐oligomerization is likely a wide‐spread phenomenon. However, as it is mediated by a flexible region outside of the well‐defined dimerization surface, it has no significant impact on the confidence of the AF2 model, but can be readily detected by MS. Last, we investigated the ability of MS and AF2 to capture the impact of point mutations on protein complex formation. Mutations that do not introduce significant steric hindrance yield near‐identical AF2 structures that nonetheless show measurable differences in stability. However, it is unclear to what extent AF2 can inform about the effect of mutations on protein–protein interactions. We chose the N‐terminal domain (NT) of the spider silk protein Major ampullate Spidroin 1 (MaSp1) from Euprosthenops australis, which is monomeric above, and dimeric below, pH 6.5. , This pH sensitivity is in part due to a conserved salt bridge between D39/D40 and K65 on the opposing subunit. , We used AF2 to predict the structure of the dimeric wild‐type protein, as well as a point mutant with a weakened salt bridge, D40N (Figure 1c). Importantly, AF2 does not explicitly address the protonation state of ionizable residues, but may indirectly reflect the interactions observed under the solution conditions used to solve the structures included in the training set. Comparison of the pLDDT scores of the top five models for each variant showed no discernable differences (Figure 1d) with a Cα RMSD of 0.2 Å2, indicating highly similar structures. Native MS analysis of both proteins at pH 6.0, on the other hand, showed that the D40N mutation abolished dimerization nearly completely (Figure 1d). In summary, mutating aspartate 40 to asparagine does not introduce structural changes or steric clashes and does not appear to have notable consequences for the F2 model of the dimer. The impact of losing this salt bridge on dimer formation, therefore, requires experimental validation, such as through native MS analysis.

CONCLUSIONS

Here, we examined the ability of MS to provide complementary information to ML‐based structure predictions of protein complexes. While AF2 predictions are generally highly accurate, they do not specifically address the influence of bound ligands, flexible regions, and point mutations on protein interactions. Native MS, on the other hand, does not provide structural details but can capture a wide range of protein interactions with a single measurement. Of particular importance for structure prediction is the ability of MS to provide accurate information on protein oligomeric states. While MS is unrivaled in the detail of the mass measurements, reliable mass measurement of multimeric stoichiometries can be obtained from various alternative techniques, opening even more ways to complement ML predictions. Going forward, MS should be combined with ML either by defining the modeling question a priori using MS data (MS/AI) or by using MS data to identify a likely model a posteriori (AI/MS). We anticipate that whole‐proteome structural MS data, and even mass measurements in physiological solutions, such as analytical ultracentrifugation and small‐angle X‐ray scattering, but also new methods like mass photometry, could be incorporated into large‐scale ML predictions, for example in the form of constraints, to generate accurate structural maps of the entire cellular environment.

AUTHOR CONTRIBUTIONS

Timothy M. Allison: Conceptualization (equal); writing – review and editing (equal). Matteo T. Degiacomi: Conceptualization (equal); writing – review and editing (equal). Erik G. Marklund: Conceptualization (equal); writing – review and editing (equal). Luca Jovine: Conceptualization (supporting); writing – review and editing (equal). Arne Elofsson: Conceptualization (supporting); writing – review and editing (equal). Justin L. P. Benesch: Conceptualization (supporting); writing – review and editing (equal). Michael Landreh: Conceptualization (equal); project administration (lead); writing – original draft (lead); writing – review and editing (equal). Figure S1 (A) Overlay of the residues that line the FMN binding pocket in human DHODH. The side‐chain orientations in the AF2 prediction of apo‐DHODH (gold) agree closely with the Xray structure for holo‐DHODH (blue), except for a 30° rotation of the imidazole moiety of H56 and the orientation of the terminal amine group of K100. (B) The PAE plots for the top‐scoring predictions of homodimeric sHSP18.1 and sHSP17.7 as well as the 17.7–18.1‐heterodimer show no significant differences Click here for additional data file.

24 in total

1. Spidroin N-terminal domain promotes a pH-dependent association of silk proteins during self-assembly.

Authors: William A Gaines; Michael G Sehorn; William R Marcotte
Journal: J Biol Chem Date: 2010-10-19 Impact factor: 5.157

2. Quantitative mass imaging of single biological macromolecules.

Authors: Gavin Young; Nikolas Hundt; Daniel Cole; Adam Fineberg; Joanna Andrecka; Andrew Tyler; Anna Olerinyova; Ayla Ansari; Erik G Marklund; Miranda P Collier; Shane A Chandler; Olga Tkachenko; Joel Allen; Max Crispin; Neil Billington; Yasuharu Takagi; James R Sellers; Cédric Eichmann; Philipp Selenko; Lukas Frey; Roland Riek; Martin R Galpin; Weston B Struwe; Justin L P Benesch; Philipp Kukura
Journal: Science Date: 2018-04-27 Impact factor: 47.728

3. Can AlphaFold2 predict the impact of missense mutations on structure?

Authors: Gwen R Buel; Kylie J Walters
Journal: Nat Struct Mol Biol Date: 2022-01 Impact factor: 15.369

Review 4. The diverse and expanding role of mass spectrometry in structural and molecular biology.

Authors: Philip Lössl; Michiel van de Waterbeemd; Albert Jr Heck
Journal: EMBO J Date: 2016-10-26 Impact factor: 11.598

5. Lipids Shape the Electron Acceptor-Binding Site of the Peripheral Membrane Protein Dihydroorotate Dehydrogenase.

Authors: Joana Costeira-Paulo; Joseph Gault; Gergana Popova; Marcus J G W Ladds; Ingeborg M M van Leeuwen; Médoune Sarr; Anders Olsson; David P Lane; Sonia Laín; Erik G Marklund; Michael Landreh
Journal: Cell Chem Biol Date: 2018-01-18 Impact factor: 8.116

6. Accurate prediction of protein structures and interactions using a three-track neural network.

Authors: Minkyung Baek; Frank DiMaio; Ivan Anishchenko; Justas Dauparas; Sergey Ovchinnikov; Gyu Rie Lee; Jue Wang; Qian Cong; Lisa N Kinch; R Dustin Schaeffer; Claudia Millán; Hahnbeom Park; Carson Adams; Caleb R Glassman; Andy DeGiovanni; Jose H Pereira; Andria V Rodrigues; Alberdina A van Dijk; Ana C Ebrecht; Diederik J Opperman; Theo Sagmeister; Christoph Buhlheller; Tea Pavkov-Keller; Manoj K Rathinaswamy; Udit Dalwadi; Calvin K Yip; John E Burke; K Christopher Garcia; Nick V Grishin; Paul D Adams; Randy J Read; David Baker
Journal: Science Date: 2021-07-15 Impact factor: 47.728

7. Combining native and 'omics' mass spectrometry to identify endogenous ligands bound to membrane proteins.

Authors: Joseph Gault; Idlir Liko; Michael Landreh; Denis Shutin; Jani Reddy Bolla; Damien Jefferies; Mark Agasid; Hsin-Yung Yen; Marcus J G W Ladds; David P Lane; Syma Khalid; Christopher Mullen; Philip M Remes; Romain Huguet; Graeme McAlister; Michael Goodwin; Rosa Viner; John E P Syka; Carol V Robinson
Journal: Nat Methods Date: 2020-05-04 Impact factor: 28.547

8. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests.

Authors: Valerio Mariani; Marco Biasini; Alessandro Barbato; Torsten Schwede
Journal: Bioinformatics Date: 2013-08-27 Impact factor: 6.937

9. Highly accurate protein structure prediction for the human proteome.

Authors: John Jumper; Demis Hassabis; Kathryn Tunyasuvunakool; Jonas Adler; Zachary Wu; Tim Green; Michal Zielinski; Augustin Žídek; Alex Bridgland; Andrew Cowie; Clemens Meyer; Agata Laydon; Sameer Velankar; Gerard J Kleywegt; Alex Bateman; Richard Evans; Alexander Pritzel; Michael Figurnov; Olaf Ronneberger; Russ Bates; Simon A A Kohl; Anna Potapenko; Andrew J Ballard; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Ellen Clancy; David Reiman; Stig Petersen; Andrew W Senior; Koray Kavukcuoglu; Ewan Birney; Pushmeet Kohli
Journal: Nature Date: 2021-07-22 Impact factor: 69.504

10. Highly accurate protein structure prediction with AlphaFold.

Authors: John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli
Journal: Nature Date: 2021-07-15 Impact factor: 49.962

1 in total

1. Complementing machine learning-based structure predictions with native mass spectrometry.

Authors: Timothy M Allison; Matteo T Degiacomi; Erik G Marklund; Luca Jovine; Arne Elofsson; Justin L P Benesch; Michael Landreh
Journal: Protein Sci Date: 2022-06 Impact factor: 6.993

1 in total