Literature DB >> 28208298

Surface Accessibility and Dynamics of Macromolecular Assemblies Probed by Covalent Labeling Mass Spectrometry and Integrative Modeling.

Carla Schmidt¹, Jamie A Macpherson², Andy M Lau³, Ken Wei Tan³, Franca Fraternali², Argyris Politis³.

Abstract

Mass spectrometry (MS) has become an indispensable tool for investigating the architectures and dynamics of macromolecular assemblies. Here we show that covalent labeling of solvent accessible residues followed by their MS-based identification yields modeling restraints that allow mapping the location and orientation of subunits within protein assemblies. Together with complementary restraints derived from cross-linking and native MS, we built native-like models of four heterocomplexes with known subunit structures and compared them with available X-ray crystal structures. The results demonstrated that covalent labeling followed by MS markedly increased the predictive power of the integrative modeling strategy enabling more accurate protein assembly models. We applied this strategy to the F-type ATP synthase from spinach chloroplasts (cATPase) providing a structural basis for its function as a nanomotor. By subjecting the models generated by our restraint-based strategy to molecular dynamics (MD) simulations, we revealed the conformational states of the peripheral stalk and assigned flexible regions in the enzyme. Our strategy can readily incorporate complementary chemical labeling strategies and we anticipate that it will be applicable to many other systems providing new insights into the structure and function of protein complexes.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2017 PMID： 28208298 PMCID： PMC5299547 DOI： 10.1021/acs.analchem.6b02875

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Mass spectrometry (MS) is an emerging technique in biophysics, and in the last two decades, it has gained in importance when studying the structure and dynamics of macromolecular protein assemblies.[1] Particularly those assemblies which exhibit a certain flexibility and heterogeneity or undergo dynamic interactions with their ligands are the primary targets of structural MS.[2] Various MS techniques each addressing a different question have evolved and are now commonly employed to gain information on composition, stoichiometry, topology, conformation and dynamics. Most commonly applied is chemical cross-linking,[3−5] a technique which involves covalent linkage of two amino acid side chains in close proximity thus allowing the identification of protein interactions by sequencing the cross-linked dipeptides after enzymatic digestion. MS of intact protein complexes, also called native MS, delivers protein stoichiometries and stable interaction modules enabling the generation of protein interaction networks.[6,7] Together with ion mobility (IM), native MS yields conformation and topology of proteins and their complexes.[8−10] Combining complementary information from chemical cross-linking and native MS delivers valuable insights into the structural arrangements of protein complexes.[11−13] While cross-linking and native MS identify protein interactions, labeling strategies such as covalent labeling[14] or hydrogen–deuterium exchange (HDX)[15,16] explore solvent accessible surfaces of protein–ligand assemblies. This is of particular interest when studying the dynamics of proteins and their conformational changes,[17,18] for instance upon ligand binding.[19] HDX utilizes the ability of protons to be exchanged with deuterium in solution. The slow exchange rate of protein backbone amide protons causes a mass shift of the protein/peptide, which can be probed by MS. Likewise, chemical labeling approaches introduce modifications to amino acid side chains which can be identified by standard proteomics. Very prominent is hydroxyl radical footprinting involving oxidation of various amino acid side chains.[20] Other labeling strategies employ chemical reagents which are reactive toward specific amino acid side chains.[14] Diethylpyrocarbonate (DEPC), employed in this study, was initially used to modify histidine residues. However, DEPC also modifies, with different reactivity, lysine, arginine, tyrosine, threonine and cysteine residues.[21,22] It is an efficient labeling reagent and can probe up to 30% of the protein amino acid sequence. Under acidic and basic conditions or in the presence of nucleophiles, however, DEPC labeling is reversible[23] and experimental conditions have to be carefully optimized.[24] Structural modeling of proteins and their assemblies includes various computational techniques such as homology modeling, coarse-grained modeling, docking studies or structure prediction.[25−28] In addition, computational simulations can improve our understanding on the dynamic behavior of proteins and their ligands in solution[29] or in the gas phase.[30] The combination of MS approaches and computational methods is increasingly used to study protein complex structures and dynamics. Recent success of hybrid approaches is demonstrated by novel structures of the proteasome,[31,32] the ribosome,[33,34] eukaryotic initiation factors,[35,36] amyloid oligomers,[37] and ATP synthases.[38] A milestone in integrative analysis was the merging of complementary methods[39] and their integration with molecular electron microscopy (EM) maps[35] enabling atomic-level characterization of protein complexes. We introduce a strategy to study protein complex dynamics by extending the structural toolbox and integrating covalent labeling, cross-linking and native MS with computational modeling. For this, we convert the respective MS data into modeling restraints, which in turn were used to inform a scoring function for generating candidate model structures, while we analyze the prospective models using molecular dynamic simulations (Figure ). We exemplify this strategy on four well-characterized protein complexes, tryptophan synthase, carbamoyl phosphate synthetase (CPS), the RvB1/RvB2 complex and the catalytic core of cATPase, for which crystal structures are available (Figure S1). We then utilize available information from previous studies together with novel findings on surface accessibility obtained here from covalent labeling and generate a model of the intact F-type ATP synthase purified from spinach chloroplasts. We also subject the top-scoring model to molecular dynamics simulations and identified dynamic and flexible regions within the macromolecular assembly, delivering insights into its function as a nanomotor. The strategy described here is applicable to any protein assembly and provides new opportunities in structural biology linking macromolecular models and their structural dynamics.

Figure 1

Strategy for protein assembly modeling. (A) Solvent accessibility, inter-residue proximities and disassembly pathways are encoded into modeling restraints. (B) A Bayesian scoring function is employed to build an ensemble of models that match the input data. (C) A representative structure within the top scoring models is subjected to MD simulations enabling to probe the conformational dynamics of the assembly.

Experimental Section

Protein Purification

Purified tryptophan synthase was a gift of I. Schlichting, Max Planck Institute for Medical Research, Heidelberg, Germany. The RvB1/RvB2 complex was a gift of Karl-Peter Hopfner, Ludwig Maximilian University, Munich, Germany. CPS was provided by F. Raushel, Texas A&M University, College Station. cATPase was purified from spinach leaves and reconstituted in DDM detergent micelles as described previously.[12,40]

DEPC Labeling

Approximately 10 μM of the purified protein complexes were incubated with 8.75, 17.5, 35, or 70 μM DEPC for 1 min at 37 °C. The reaction was quenched by addition of 10 mM imidazole. After quenching the reaction mixture was kept on ice. The proteins were then precipitated with ethanol for 2 h and subsequently digested with trypsin in the presence of RapiGest (Waters) according to manufacturer’s protocols.

LC-MS/MS

Dried peptides of cATPase and tryptophan synthase were dissolved in 1% (v/v) formic acid and separated by nanoflow-liquid chromatography on an Dionex UltiMate 3000 RSLC nano System (Thermo Scientific); mobile phase A, 0.1% (v/v) formic acid (FA); mobile phase B, 80% (v/v) acetonitrile 0.1% (v/v) FA. The peptides were loaded onto a precolumn (HPLC column Acclaim PepMap 100, C18, 100 μm I.D. particle size 5 μm; Thermo Scientific) and separated on an analytical column (50 cm, HPLC column Acclaim PepMap 100, C18, 75 μm I.D. particle size 3 μm; Thermo Scientific) at a flow rate of 300 nL/min with a gradient of 5–80% solvent B over 80 min. Peptides were directly eluted into an LTQ-Orbitrap XL hybrid mass spectrometer (Thermo Scientific). MS conditions were: spray voltage of 1.6 kV; capillary temperature of 180 °C; normalized collision energy 35% (q = 0.25, activation time 30 ms). The LTQ-Orbitrap XL was operated in data-dependent mode. MS spectra were acquired in the orbitrap (m/z 300–2000) with a resolution of 30 000 at m/z 400 and an automatic gain control target of 106. The five most intense ions were selected for CID fragmentation in the linear ion trap at an automatic gain control target of 30 000. Previously selected ions were dynamically excluded for 30 s. Singly charged ions as well as ions with unrecognized charge state were also excluded. Internal calibration of the orbitrap was performed using the lock mass option.[41] Peptides and labeled sites were identified using MassMatrix Database Search Engine.[70] Search parameters were as follows: tryptic peptides with a maximum of two missed cleavage sites; carbamidomethylation of cysteine, oxidation of methionine and DEPC-labeled serine, threonine, tyrosine and histidine as variable modifications; mass accuracy filter, 10 ppm for precursor ions, 0.8 Da for fragment ions; minimum pp and pp2 values 5.0, minimum pptag 1.3. Dried peptides of RvB1/2 and CPS complexes were dissolved in 2% (v/v) ACN, 0.1% FA and separated by nanoflow-liquid chromatography on an Dionex UltiMate 3000 RSLC nano System (Thermo Scientific); mobile phase A, 0.1% (v/v) formic acid (FA); mobile phase B, 80% (v/v) acetonitrile/0.1% (v/v) FA. The peptides were loaded onto a precolumn (HPLC column Acclaim PepMap 100, C18, 100 μm I.D. particle size 5 μm; Thermo Scientific) and separated on an analytical column (50 cm, HPLC column Acclaim PepMap 100, C18, 75 μm I.D. particle size 3 μm; Thermo Scientific) at a flow rate of 300 nL/min with a gradient of 8–90% solvent B over 62 min. Peptides were directly eluted into a Q Exactive Plus Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Scientific). MS conditions were as follows: spray voltage of 1.6 kV; capillary temperature of 250 °C; normalized collision energy 30. The Q Exactive Plus mass spectrometer was operated in data-dependent mode. MS spectra were acquired in the orbitrap (m/z 350–1600) with a resolution of 70 000 and an automatic gain control target of 3 × 106. The 20 most intense ions were selected for HCD fragmentation in HCD at an automatic gain control target of 1 × 105. Previously selected ions were dynamically excluded for 30 s. Singly charged ions, as well as ions with unrecognized charge state, were also excluded. Internal calibration of the orbitrap was performed using the lock mass option.[41] Peptides and labeled sites were identified using Mascot Search Engine v2.3.02. Search parameters were: Tryptic peptides with a maximum of two missed cleavage sites. Carbamidomethylation of cysteine, oxidation of methionine and DEPC-labeled serine, threonine, tyrosine, and histidine as variable modifications. Mass accuracy filter: 10 ppm for precursor ions, 0.02 Da for fragment ions.

Chemical Cross-Linking of Tryptophan Synthase

Twenty microliters of 20 μM tryptophan synthase were incubated with 20 μL of 2.5 mM bis(sulfosuccinimidyl)suberate (BS3) cross-linker for 1 h at 25 °C at 350 rpm in a thermomixer. After cross-linking, proteins were precipitated with ethanol and digested with trypsin in the presence of RapiGest (Waters) according to manufacturer’s protocols. Cross-linked peptides were further separated using SCX Stage Tips (Thermo Scientific) according to the manufacturer’s protocol. Peptides were then analyzed by MS and identified as described previously.[12]

Chemical Cross-Linking of CPS and RvB1/B2 Complexes

Ten microliters of 10 μM CPS and 5 μL of 25 μM RvB1/2 were incubated with various concentrations of BS3 cross-linker (final concentrations = 0.5, 0.83, and 1.25 mM) for 1 h at 25 °C at 350 rpm in a thermomixer. Cross-linked proteins were separated by gel electrophoresis (NuPAGE, Invitrogen) and digested in gel as described.[42] Peptides were then analyzed by MS and identified as described previously.[12]

Native Mass Spectrometry

Native MS experiments on tryptophan synthase, CPS, and RvB1/2 were performed on a quadrupole time-of-flight mass spectrometer (Synapt G2Si HDMS, Waters Corp., Manchester, UK). Ten micromolar purified sample was buffer-exchanged in 200 mM ammonium acetate and electrosprayed using gold coated glass capillaries prepared in-house.[43] Typical MS parameters were capillary voltage 1.5–1.7 kV, sampling cone voltage 25–40 V, collision voltage 20 V, bias voltage 20 V, trap collision energy 5 V. MS spectra were processed and analyzed using Masslynx 4.1 (Waters). The spectra were calibrated externally using CsI. Backing pressure: 3.84 mbar. Trap: 0.04 mbar. Helium cell: 3.5 mbar. IMS: 2.6 mbar. In solution disruption was performed by addition of an organic solvent to the protein complex in ammonium acetate (AA) buffer as described elsewhere.[44] Subcomplexes were generated using 10–40% methanol, dimethyl sulfoxide (DMSO), and acetonitrile (ACN).

Modeling Restraints from Covalent-Labeling MS

Solvent accessibility information from covalent labeling followed by MS was converted into modeling restraints using in-house developed code (https://github.com/apolitis/covalent_labelling_MS). This code iteratively estimates the solvent accessible surface area (SASA) for each residue within all models generated using our sampling algorithm. To calculate the SASA on the surface of each residue we simulated the rolling motion of sphere using a solvent accessible surface function (see Figure S-3). In this function the probe radius of the sphere was 1.8 Å and 5.0 sampling density/ Å2 for area estimation. The function uses a set of nodal points attributed by xyz coordinates and radius to compute the SASA values. Overall, we report a dimensionless SASA ratio defined asThe returned SASA value per residue is implemented as a structural restraint using a threshold value of 0.25, where if SASA > 0.25, then the residue x is exposed, or if SASA < 0.25, then the residue x is buried, where x denotes the amino-acid residue. We iteratively applied this algorithm to all structural models of cATPase, tryptophan synthase, CPS, and RvB1/B2 generated using our Monte Carlo-based strategy. Briefly, we used the list of labeled residues from our covalent labeling mass spectrometry experiments (Tables S4–S7) to interrogate the structural models by satisfaction of modeling restraints. A model was considered if it satisfies the restraint for a specific labeled residue x (histidine, threonine, tyrosine or serine) when the SASA for this residue is greater than 0.25, whereas it violates such restraint for SASA if less than 0.25. For each model structure generated we examined all restraints corresponding to labeled residues and the total score was calculated aswhere SSASA, is the score for each model structure i (i = 1, 2, 3, ...) which takes values 0 or 1, RS the number of covalent labeling restraints satisfied in the structure and RT the number of all restraints used, which correspond to the labeled residues from covalent labeling experiments. The SASA scoring algorithm was implemented within the Integrative Modelling Platform (IMP).[25]

Integrative Modeling

We used an integrative modeling strategy for MS data.[36,39] Structural models of the assemblies were generated using a Monte Carlo search algorithm developed in-house and implemented into IMP.[25] The model building was guided by a scoring function, which estimates the probability of a structural model given existing knowledge of the investigated system and the MS data acquired. The posterior probability P(M|DMS, PI) for MS Data (DMS) and prior information (PI) iswhere P(M|PI) is the prior, the probability of a model given only existing information on the system and P(DMS|M, PI) is the likelihood function, expressed as the probability of observing MS data given a structural model and knowledge of the system in question. The score is calculated as the negative logarithm of the likelihood and the existing information (called prior)The most likely structural model scores higher according to the posterior distribution. The prior P(M|PI) is the prior probability P(M) accounting for intersubunit connectivities, solvent accessibility, distance restraints and an additional parameter composed of uncertainties; these are the false positives for native MS, cross-linking and covalent labeling MS. The likelihood function P(DMS|M, PI) for a data point of a data set D of experimentally measured connectivites (native MS, cross-linking MS), distance restraints (cross-linking MS) and solvent accessibilities (CL-MS) is given aswhere Y is the structure coordinates, σ the uncertainty, α denotes other parameters, such as ambiguities due to flexibilities, and ω is the weight. The forward function (f) predicts the data points, that is, randomly picking a residue that is solvent exposed for a given time point in the experimental measurement (CL-MS) and adopts a conformation consistent with the given connectivities and distance restraints. The uncertainty corresponds to the data points from both measurements that are inconsistent with the structure Y. We judged the uniqueness of the ensemble of generated models by performing ensemble analysis (e.g., clustering of best-ranking solutions), and the final solution was selected from the major cluster.[44] The Visual Molecular Dynamics (VMD) and the UCSF Chimera packages were used for visualization of the model structures.[45]

Distance Restraints from Cross-Linking MS

Upper bound distance restraints (35 Å) specified from the identified cross-links by applying a cross-linking strategy followed by MS.[36,42] The individual links were implemented into our modeling approach enabling us to guide the search for candidate model structures that fit the input MS data.

Simulations in Explicit Solvent

Explicit solvent MD of the ATPase protein complex were performed and analyzed using the GROMACS 4.6 program[46] using the Amber99sb*-ildn force field parameters.[47] The input structure of the F1 cATPase was assembled from its individual components (crystal structure and homology models) using an MS-restrained strategy as described elsewhere.[39] The initial complex structure, consisting of 56,826 protein heavy atoms, was solvated and minimized in a dodecahedral periodic box of 952 838 TIP3P water molecules[48] with a minimum distance of 1.0 nm between any protein atom and the periodic box. The system charge was neutralized by adding 75 sodium counterions to the solvent. The equations of motion were integrated using the leapfrog method[49] with a 2 fs time step. The equilibration protocol hereafter outlined was used: an initial 500 steps of steepest descent energy minimization in solution. This was followed by an equilibration of the system in the canonical ensemble with harmonic positional restraints on the protein heavy atoms using a force constant of 10 000 kJ/mol/nm2 and gradually reduced to 1000 kJ/mol/nm2, while increasing the temperature from 50 to 300 K at a constant volume. During this NVT ensemble equilibration, the Berendsen algorithm[49] was employed to regulate the temperature and pressure of the system with coupling constants of 0.2 and 0.5 ps, respectively. A 5 ns NVT equilibration run at 300 K and 1 bar was then performed, following with 2 ns of equilibration in NPT conditions. After successful equilibration of the system, the cATPase complex was then simulated for 40 ns under constant pressure and temperature conditions. Temperature was regulated using the velocity-rescaling algorithm,[50] with a coupling constant (τ) of 0.1. All protein covalent bonds were frozen with the LINCS method,[51] while SETTLE[52] was used for water molecules. Electrostatic interactions were calculated with the particle mesh Ewald method,[53] with a 1.4 nm cutoff for direct space sums, a 0.12 nm FFT grid spacing and a four-order interpolation polynomial for the reciprocal space sums. van der Waals interactions were measured using a 1.4 nm cutoff. The neighbor list for noncovalent interactions were updated every five integration steps.

Modeling of the Peripheral Stalk

We performed homology modeling of the peripheral stalks using the MODELLER package.[54] We obtained a reliable homology model (sequence identity >25%) using as templates the Thermus thermophilus H-type (PDB ID 3V6I) and bovine mitochondrial (PDB ID 2CLY) ATPases. To compensate for the lack of lower part of stalks linking the core with the transmembrane ring, we modeled in the helices using as guide the distance estimated from the missing residues. Homology models for of ε, δ, and γ subunits were also utilized as previously described.[12]

Modeling Scripts, Data, and Results

Our integrative method was implemented in the open source IMP software package (http://integrativemodeling.org). The input data files, modeling scripts, and output models for the tryptophan synthase and cATPase complex are available at https://github.com/apolitis/covalent_labelling_MS. This will allow keen scientists to use our data and/or integrate with their own results for protein assembly modeling.

Results and Discussion

Integrating Covalent Labeling into Computational Modeling

We assessed the predictive power of our integrative method for three-dimensional protein modeling based on structural MS restraints on four protein complexes previously characterized by X-ray crystallography: the 143 kDa tryptophan synthase from Salmonella typhimurium (PDB ID 1WBJ),[55] the α4β4 CPS (PDB ID 1BXR, ∼640 kDa), the double-heterohexameric ring RvB1/2 (PDB ID 4WVY; 621 kDa) (Figure S1) and the hexameric α3β3-head of cATPase from Spinacia oleracea (PDB ID 1FX0; ∼328 kDa).[56] Covalent labeling using DEPC, cross-linking with BS3 (Figure S-2) and native MS (Figure S-3) allowed us to label serine, threonine, tyrosine, and histidine residues on the surface of the complex, map cross-linked lysines and define stable subcomplexes, respectively. Overall, we identified inter- and intrasubunit cross-links (Tables S1–S3), up to 151 labeled residues (Tables S4, S5, and S7) and several (sub)complexes for tryptophan synthase, CPS and RvB1/2, respectively (Figure A, 3A and B, and S4). For the cATPase hexameric head we used previously published cross-linking results and native mass spectrometry[12] and in this study identified 58 solvent-exposed residues (Table S7). With the complementary MS-based data in hand, we applied a computational workflow by first encoding our data into modeling restraints (Figure A) and then using a scoring function to guide generation of structural models (Figures B and S5 and Experimental Section).

Figure 2

Figure 3

Benchmark analysis on phosphate synthetase (CPS) and RVB1/B2 heterododecamer (A, B) Native and cross-linking MS reveal distinct (sub)complexes and intra- and intersubunit amino-acid level proximities. Identified oligomeric cross-links are shown in the inset small circular (C, D). Integrative modeling results in models in good agreement with the reference crystal structures. (E, F) Peptide level analysis plotting the frequency of the DEPC total labeled residues and the number of spectra per concentration shows increase in the labeling residues/spectra with increased concentrations.

Benchmark analysis on tryptophan synthase. (A) Native MS of the intact complex yielded disassembly pathway. (B) Cross-linking circular plot. (C) The precision of the methodology was estimated by calculating positive predictive values (PPVs) for different amount of theoretical covalent labeling restraints while we use the experimentally available restraints from native and cross-linking MS. (D) ROC curves, plotting the true positive rate (sensitivity) versus false positive rate (1-specificity), to evaluate the confidence level of the restraints. (E) Peptide level analysis plotting the frequency of the DEPC total labeled residues and the number of spectra per concentration shows increase in the labeling residues/spectra with increased concentrations (F) Representative model of the tetrameric tryptophan synthase and its corresponding crystal structure. Inter-residue proximities (XL-MS) and residue solvent accessibilities (CL-MS) are highlighted. The structural similarity of the model to the X-ray structure was assessed using their pairwise r.m.s.d. Benchmark analysis on phosphate synthetase (CPS) and RVB1/B2 heterododecamer (A, B) Native and cross-linking MS reveal distinct (sub)complexes and intra- and intersubunit amino-acid level proximities. Identified oligomeric cross-links are shown in the inset small circular (C, D). Integrative modeling results in models in good agreement with the reference crystal structures. (E, F) Peptide level analysis plotting the frequency of the DEPC total labeled residues and the number of spectra per concentration shows increase in the labeling residues/spectra with increased concentrations. The covalent labeling experiments enabled solvent accessible surface area (SASA) restraints. A SASA restraint is considered to be satisfied if, for each experimentally labeled residue, the theoretically predicted SASA is greater than 25% (Figure S6 and Experimental Section). We plotted the fraction of satisfied residues on the corresponding crystal structure as a function of the percentage of SASA providing justification for its use as a lower bound restraint for modeling (Figure S6). The cutoff is defined as the highest SASA score that gives <10% false positives while the true positives remain over 80% of the total models. The cross-linking experiments allowed upper bound distance restraints (<35 Å).[39] This distance breaks down into 11.4 Å for the linker (BS3), approximately 13 Å for the two lysine side chains and an additional tolerance of 10 Å accounting for flexibility due to protein’s motion. The resulting models from application of these restraints were considered to match the data and added to the ensemble that is passed on to the next stage for additional analysis. Clustering analysis[36,44] revealed an ensemble of models with close similarity to the reference crystal structure (r.m.s.d. ranging from 9 to 15 Å) (Figures and 3C and D). Finally, a representative structure from the ensemble was used as a starting model for explicit solvent MD simulations (Figure C).

Evaluation of the modeling approach

Having established the validity of using SASA restraints for modeling, we examined the ability of our approach to predict high-resolution models using different levels of theoretically labeled residues ranging from 25% to 100%. Complete theoretical labeling information was extracted from the corresponding crystal structure by assuming as labeled those serines, threonines, tyrosines and histidines with theoretical SASA larger than 25%. The residues with SASA less than 25% were considered buried and therefore were not processed further. A model is defined “good” when it exhibits high structural similarity to the reference crystal structure as calculated by Cα atoms (r.m.s.d. < 12 Å).[39] We estimated ∼90% positive predictive value (PPV) within the top-scoring models when all theoretical information was used and a difference of less than 10% PPV when the experimental available data were used (Figure C). To investigate the merit of modeling restraints in predicting the correct structure of the three training complexes (tryptophan synthase, CPS and RvB1/2), we determined receiver-operating characteristic (ROC) plots for the MS techniques employed (Figures D and S7). This enabled us to test the ability of our method in generating correct model structures on systems with diverse topological features that include symmetry, ring-like geometries and heteromeric subunits. The area under each curve was determined as a measure of the information content of each restraint, where 0.5 indicates that correct and incorrect structures cannot be discriminated.[44,57] The ROC plots of all three complexes studied here show that inclusion of solvent accessibility restraints from covalent labeling markedly increased (∼8–11%) the accuracy of structural prediction (Figures D and S7). Increasing the accuracy of the predictions by approximately 10% is an important improvement of the method particularly when building models of multiprotein systems requires a large number of models. For instance, if 10 000 models are generated, a 10% higher accuracy means that the structural prediction leads to 1000 less false positive and false negative models and therefore allows an increased number of “good models” within the top-scoring model structures. This is particularly important for assembling multicomponent systems in a stepwise manner where degeneracy can significantly hinder the accuracy of the resulting predictions.

Concentration Dependence of DEPC Labeling

To assess the effect of concentration on the labeling efficiency, we covalently labeled solvent accessible residues on the three training complexes using a range of labeling concentrations (8.75–70 μM) (Figures E, 3E and F, and S8). Using DEPC we targeted histidine, threonine, tyrosine and serine residues, covering ∼15–20% of the complex sequence. We plotted the number of experimentally labeled residues over the range of experimental concentrations revealing a significant increase of the labeled residues (10–30%) at higher concentrations (Figures E, 3E and F, and S8). A similar trend was found by counting the total number of spectra measured at each concentration used for the experiments (Figures E and 3E and F). We overall estimated a 5–10% of the total residues uniquely identified in the two lower concentrations. For modeling purposes we accounted for all labeled residues appearing in at least in one concentration. To study the accuracy and precision of SASA restraint from covalent labeling followed by MS, we projected the labeled serines, threonines, tyrosines and histidines on the crystal structures of tryptophan synthase and the cATPase head and examined their SASA (Figure ). We revealed high accuracy (>85%) and precision (>80%), confirming the lower bound SASA as a confident restraint for modeling in all benchmark complexes examined in the study.

Figure 4

Benchmark analysis of SASA restraint derived from covalent labeling MS experiments. We assessed (A) the sensitivity, specificity, and accuracy and (B) the negative predictive value (NPV), positive predictive value (PPV), and false discovery rate (FDR) using SASA as a restraint through the existing crystal structures of tryptophan synthase, CPS and RVB1/B2 and cATPase (F1) as references models. SASA area for all residues in the above structures were calculated and compared to the identified labeled sites from covalent labeling MS. A positive or correctly labeled residue is defined as a residue with SASA more than 0.25. False positives or incorrectly labeled residues are identified with calculated SASA below 0.25. Nonexperimentally labeled residues with calculated SASA below 0.25 in the corresponding structure represent true negatives, while false negatives have SASA above 0.25. Sensitivity = TP/(TP + FN), specificity = TN/(TN + FP), accuracy = (TP + TN)/(TP + FP + TN + FN), FDR = FP/(TP + FP), NPV = TN/(TN + FN), and PPV = TP/(TP + FP). TP: True positive. FP: False positive. FN: False negative. TN: True negative.

Solvent Accessibility and Modeling of cATPase

Next, we assembled a model of the intact cATPase from Spinacia oleracea. The cATPase generates ATP from ADP and inorganic phosphate using an electrochemical proton gradient across the thylakoid membrane.[56] Its stoichiometry is α3 β3 γδε–I–II–III14–IV;[12] however, structural information is limited to crystal structures of the soluble catalytic head (α3 β3)[56] and the III14 transmembrane ring.[58] Little is known about the structural dynamics of the individual subunits within the assembly. From studies on other ATP synthases, we expect enhanced dynamics for the peripheral stalk, a stator that links soluble and membrane domains and counteracts the torque from “wobbling” of the soluble head during motor rotation[59] of the γ-subunit.[60] We covalently labeled solvent accessible residues on the surface of cATPase (Figure A and B). Different concentrations of DEPC (8.75–70 μM), yielded 75 labeled residues (Table S7) in all protein subunits except ring subunit III. The lack of labeled residues in the membrane ring subunit is attributed to the protective layer of the detergent micelle. However, we identified one labeling site on membrane subunit IV (Tyr 160).

Figure 5

Covalent labeling and cross-linking of cATPase. (A) Example spectrum of a labeled cATPase peptide. B- and y-ions are assigned. Fragment ions containing the DEPC-modification are shown in red. (B) Covalent labeling analysis reveals solvent accessible residues (gray space fillings) on the surface of the intact enzyme (left and middle panel. Complementary structural information was obtained from chemical cross-linking (right panel). We used cross-links and dissociation pathways from native MS reported previously[12] providing 11 subcomplexes and a connectivity map (Figures S9 and S10).[36] Covalent labeling data were encoded into modeling restraints and together with distance restraints from cross-linking, enabled us to map the inter-residue proximities and SASA of the cATPase (Figure B). By employing our restraint-based modeling approach, we brought together complementary restraints (Experimental Section) allowing us to assemble a structural model of the cATPase (Figure B). As input we used the crystal structure of α3β3 and ring III14 subcomplexes and homology models of subunits I, II, δ, γ, and ε (Experimental Section). We were unable to position subunit IV as only one residue was labeled and no cross-links or subcomplexes were observed. However, we unambiguously defined the orientation and proximities of the other subunits showing a slight tilting (∼4°) of the central axis of the catalytic head with respect to the axis of the membrane ring,[59] consistent with crystal structures of mitochondrial ATPase[61−64] and a model of the V-type ATPase (Figure S11).[59,38,65]

MD Simulations Reveal Flexibility of cATPase

We used the assembled model of cATPase as a starting structure for explicit solvent MD simulations allowing us to examine the architecture and dynamics of the enzyme. Similar to other ATPases, the cATPase γ stalk subunit consists of a globular domain interacting with the α/β-head and δ subunits. Their extended α helices link the F1 (head) and FO (transmembrane) domains. To allow for movements of the rotor during the catalytic cycle, the peripheral stalk must exhibit conformational flexibility. We therefore performed simulations for the F1 domain (α3β3γδε-I II) (Figure S12 and Experimental Section). Consistent with other ATPases[59,65] we revealed significant flexibility of subunits I, II, γ and δ as calculated by the r.m.s.d. and r.m.s.f. (Figures A and B and S13). We projected the r.m.s.f profiles on the surface of the cATPase visualizing dynamic regions in the assembly. Particularly flexible regions were found within the peripheral stalk and γ subunits (Figure B). In line with a recent study[66] these regions are connected through a rigid section, which may allow the stalk and the γ subunit to retain their rigidity whereas accommodating the wobbling motion from rotary catalysis. The γ subunit contains an additional loop compared with other ATPases, which is responsible for its deactivation in the absence of light.[60] Interestingly, the fluctuation of the γ subunit predicted by our method includes the ∼40 amino acid long loop segment (residues 197–240). It is interesting to speculate that the flexibility of this loop may be related to its role in activation/deactivation of the enzyme suggesting conformational changes during transition from one state to another.

Figure 6

Explicit solvent MD simulations of cATPase (A) Plot of the residue r.m.s.f. within each subunit over time reveals regions of enhanced flexibility (B) Mapping the conformational fluctuation predicted on the surface of cATPase. (C, D) Principal component analysis revealed the local dynamics and directionality of motion within the peripheral stalk subunits. The scatterplot colored with a gradient from green to red indicates “start” and “end” of the simulations, respectively. To reduce the high dimensionality of the MD trajectories and to identify the dominant molecular motions of the peripheral stalks, we performed principal component analysis (PCA). We showed that both peripheral stalks undergo a “bending” motion with particular flexible regions located at the initial and terminal ends of the stalks (Figure C). The flexibility of stalks is likely to be an intrinsic property enabling them to adjust during the catalytic rotation of the molecular motor. This is consistent with the twisting motion of the catalytic head of an A-ATPase proposed previously[59] and may be related to the intermediate states of rotary ATPases during ATP synthesis.[67,68]

Conclusions

We presented here a strategy for interrogating the structure and dynamics of multiprotein assemblies. These assemblies are difficult to study by traditional tools, which limits our knowledge of their function. In our strategy, we incorporated modeling restraints derived from covalent labeling MS in the form of SASA. We integrated, using a scoring function, the SASA restraint with the connectivity and distance restraints from native and chemical cross-linking MS, respectively. We assessed the predictive power of the method by reconstructing the 3D assembly structure of tryptophan synthase, CPS and RvB1/2 with high accuracy and precision. The integration of a novel combination of MS-based methods markedly increased the predictability of the method as shown by ROC plots and enabled us to suggest a confident model of cATPase, a particularly challenging target in structural biology. We observed a ∼10% increase in the overall predictability of the integrative methodology when covalent labeling was added to native and cross-linking MS. Such an increase may have a significant effect in differentiating between closely related states and in cases where ambiguous or incomplete data sets exist. In principle, our workflow allows the incorporation of every labeling strategy and we expect that the application of complementary techniques targeting different amino acid side chains will improve the predictability even further. Such an increase in predictability can lead to high confident models of multiprotein complexes and is primarily important for those systems where limited information are attained by other biophysical methods, such as the cATPase. We provided an additional dimension by subjecting the cATPase model structure to solution phase simulations, allowing us to assign flexible regions within the complex. Performing such simulations were possible by the assembly of a confident model of cATPase from our restraint-based strategy, thus demonstrating how static structural predictions and dynamic simulations can be integrated for understanding complex biological systems. Even though the main strength of our strategy is its ability to simultaneously incorporate various labeling methods, it becomes more powerful when combined with high-resolution information on the individual assembly subunits.[69] We envision that the combination of labeling MS with accurate modeling and simulations may be used in future to study many other multiprotein complexes currently eluding structure determination.

63 in total

Review 1. Probing protein structure by amino acid-specific covalent labeling and mass spectrometry.

Authors: Vanessa Leah Mendoza; Richard W Vachet
Journal: Mass Spectrom Rev Date: 2009 Sep-Oct Impact factor: 10.946

2. Conformational changes in proteins probed by hydrogen-exchange electrospray-ionization mass spectrometry.

Authors: V Katta; B T Chait
Journal: Rapid Commun Mass Spectrom Date: 1991-04 Impact factor: 2.419

3. Ethoxyformylation of proteins. Reaction of ethoxyformic anhydride with alpha-chymotrypsin, pepsin, and pancreatic ribonuclease at pH 4.

Authors: W B Melchior; D Fahrney
Journal: Biochemistry Date: 1970-01-20 Impact factor: 3.162

Review 4. Proteins, lipids, and water in the gas phase.

Authors: David van der Spoel; Erik G Marklund; Daniel S D Larsson; Carl Caleman
Journal: Macromol Biosci Date: 2011-01-10 Impact factor: 4.979

5. On the structural basis of the catalytic mechanism and the regulation of the alpha subunit of tryptophan synthase from Salmonella typhimurium and BX1 from maize, two evolutionarily related enzymes.

Authors: Victor Kulik; Elisabeth Hartmann; Michael Weyand; Monika Frey; Alfons Gierl; Dimitri Niks; Michael F Dunn; Ilme Schlichting
Journal: J Mol Biol Date: 2005-09-23 Impact factor: 5.469

6. Analysis of protein complexes with hydrogen exchange and mass spectrometry.

Authors: John R Engen
Journal: Analyst Date: 2003-06 Impact factor: 4.616

Review 7. Probing native protein structures by chemical cross-linking, mass spectrometry, and bioinformatics.

Authors: Alexander Leitner; Thomas Walzthoeni; Abdullah Kahraman; Franz Herzog; Oliver Rinner; Martin Beck; Ruedi Aebersold
Journal: Mol Cell Proteomics Date: 2010-03-31 Impact factor: 5.911

8. Improved side-chain torsion potentials for the Amber ff99SB protein force field.

Authors: Kresten Lindorff-Larsen; Stefano Piana; Kim Palmo; Paul Maragakis; John L Klepeis; Ron O Dror; David E Shaw
Journal: Proteins Date: 2010-06

9. Characterization of the flexibility of the peripheral stalk of prokaryotic rotary A-ATPases by atomistic simulations.

Authors: Kostas Papachristos; Stephen P Muench; Emanuele Paci
Journal: Proteins Date: 2016-06-01

Review 10. Molecular dynamics simulations: advances and applications.

Authors: Adam Hospital; Josep Ramon Goñi; Modesto Orozco; Josep L Gelpí
Journal: Adv Appl Bioinform Chem Date: 2015-11-19

17 in total

1. Integrative Mass Spectrometry-Based Approaches for Modeling Macromolecular Assemblies.

Authors: Andy M Lau; Argyris Politis
Journal: Methods Mol Biol Date: 2021

2. Coming to Grips with Ambiguity: Ion Mobility-Mass Spectrometry for Protein Quaternary Structure Assignment.

Authors: Joseph D Eschweiler; Aaron T Frank; Brandon T Ruotolo
Journal: J Am Soc Mass Spectrom Date: 2017-07-27 Impact factor: 3.109

3. Mechanistic insight into the assembly of the HerA-NurA helicase-nuclease DNA end resection complex.

Authors: Zainab Ahdash; Andy M Lau; Robert Thomas Byrne; Katja Lammens; Alexandra Stüetzer; Henning Urlaub; Paula J Booth; Eamonn Reading; Karl-Peter Hopfner; Argyris Politis
Journal: Nucleic Acids Res Date: 2017-11-16 Impact factor: 16.971

Review 4. Cross-Linking Mass Spectrometry: An Emerging Technology for Interactomics and Structural Biology.

Authors: Clinton Yu; Lan Huang
Journal: Anal Chem Date: 2017-11-21 Impact factor: 6.986

Review 5. Covalent labeling-mass spectrometry with non-specific reagents for studying protein structure and interactions.

Authors: Patanachai Limpikirati; Tianying Liu; Richard W Vachet
Journal: Methods Date: 2018-04-07 Impact factor: 3.608

Review 6. Mass Spectrometry-Based Protein Footprinting for Higher-Order Structure Analysis: Fundamentals and Applications.

Authors: Xiaoran Roger Liu; Mengru Mira Zhang; Michael L Gross
Journal: Chem Rev Date: 2020-04-22 Impact factor: 60.622