Literature DB >> 34128979

Cheetah-MS: a web server to model protein complexes using tandem cross-linking mass spectrometry data.

Hamed Khakzad^1,2, Lotta Happonen³, Johan Malmström³, Lars Malmström³.

Abstract

SUMMARY: Protein-protein interactions (PPI) are central in many biological processes but difficult to characterize, especially in complex, unfractionated samples. Chemical cross-linking combined with mass spectrometry (MS) and computational modeling is gaining recognition as a viable tool in protein interaction studies. Here, we introduce Cheetah-MS, a web server for predicting the PPIs in a complex mixture of samples. It combines the capability and sensitivity of MS to analyze complex samples with the power and resolution of protein-protein docking. It produces the quaternary structure of the PPI of interest by analyzing tandem MS/MS data (also called MS2). Combining MS analysis and modeling increases the sensitivity and, importantly, facilitates the interpretation of the results. AVAILABILITY: Cheetah-MS is freely available as a web server at https://www.txms.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34128979 PMCID： PMC8665757 DOI： 10.1093/bioinformatics/btab449

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Cross-linking mass spectrometry (XL-MS) is a powerful technique to measure protein–protein interactions (PPIs) directly in complex samples (O’Reilly ). Bi-functional reagents are used to covalently link two specific residues when the proteins are in their native states. The proteins then undergo enzymatic digestions resulting in many peptides linked by the reagents. The length of the crosslinker arm reveals the maximum distance between the two cross-linked amino acids, and this information is then used to identify and characterize the PPI. Using macromolecular modeling tools such as Rosetta (Koehler ), a structural model can be created if enough cross-linked peptides are identified. Here, we propose Cheetah-MS, a web server based on our previously published method, targeted chemical cross-linking MS (TX-MS), a deep integration of protein structure modeling, and chemical XL-MS (Hauri ). The power of Cheetah-MS relies on its fast convergence to the solution due to iterative sampling and filtering by XL peptides, where we reduced the number of decoy sampling by order of magnitude. Cheetah-MS supports tandem MS/MS acquisition data type based on non-cleavable reagents (DSS/BS3, DSG and EGS) and can detect up to 12 post-translational modifications (PTMs).

2 Implementation

Cheetah-MS is implemented using applicake (a python package), making the whole workflow easy to connect and flexible for further development. It is composed of four main applicake nodes, including PDB-tools, XL-generator, modeling-core and Taxlink (Fig. 1). The first node uses PDB-tools (Rodrigues ) to clean up the input PDBs, recognize the chains, retrieve the sequences and combine the two PDBs into a starting conformational model. XL-generator provides a complete list of all theoretical XLs without considering distance cutoff. Next, this list is passed to Taxlink for MS/MS data analysis. In case the input file is not already in Mascot Generic Format, msconvert from ProteoWizard (Kessner ) converts the input mzML file to MGF file format. This file goes then for a filtering/cleaning process according to the XLs provided by the previous step where only spectra containing the monoisotopic mass/charge of interest are passed to the filtered version of the file. Here, for each XL, a set of ion fragments are produced, and their pattern is investigated through the filtered MGF file to find the match. In the modeling-core, selected XLs from the Taxlink node are used to score a set of docking models (2000 models for all runs), provided by Megadock v4.0 (Ohue ), and the top scored models are selected. Finally, the best model that supports the largest number of XLs is chosen to be visualized in the output.

Fig. 1.

The computational workflow of Cheetah-MS. A singularity container is responsible for managing the workflow and the report system

The computational workflow of Cheetah-MS. A singularity container is responsible for managing the workflow and the report system To run Cheetah-MS, users need to provide two PDB files and one MS/MS mzML (or converted MGF file) containing the XL-MS data. The advanced options to set include the XL agent, the PTM(s) of interest, the number of final models, the cutoff threshold for modeling, the delta-window for precursor and product ion detection, and finally, the intensity value to remove the background noise in MS/MS data analysis. After submitting the workflow, the status of the running job is shown, containing the job identifier at the top and the exact processing time of each submodule below. Once the workflow is finished, the best-scoring model is visualized using the NGL viewer (Rose ) together with the data analysis report in a Jupyter Notebook. The report was designed to both allow a user to assess the results quickly and to download and extend them to gain deeper insights, often in project-specific ways.

3 Results and applicability

Cheetah-MS has been applied to several case studies as the core MS/MS analysis part of the TX-MS approach. Table 1 summarizes the list of published studies where Cheetah-MS was applied for MS/MS data analysis. Also, to test the applicability of the workflow in the webserver context, we reconstructed the Streptococcus pyogenes M1 protein interactions with two human plasma proteins (fibrinogen and albumin) based on MS/MS samples obtained from recombinant M1 protein and purified human plasma fibrinogen and albumin. This has resulted in 27 and 10 XLs between M1-fibrinogen and M1-albumin, respectively. Based on the list of detected XLs and produced models, the same binding interface is obtained compared to the initial study (details on the web server manual page).

Table 1.

The applicability of Cheetah-MS as the core MS/MS analysis of the TX-MS approach in several case studies

Study	Partner proteins	# XLs
GAS M1 protein’s interactome (Hauri et al., 2019)	M1, fibrinogen, albumin, haptoglobin, SerpinA1, coagulation factor XIII A, C4BPa and IgG1	204
Membrane attack complex (Khakzad et al., 2020)	Complement proteins: C5b, C6, C7, C8 and C9	126
GAS M1 interaction with human IgGs (Khakzad et al., 2021)	M1, IgG1, IgG2, IgG3 and IgG4	21
Structure determination of Dermatan sulfate epimerase 1 (Hasan et al., 2021)	DS-epi1	24
GAS M28 interaction with human IgAs (Chowdhury et al., 2021)	M28, IgA1, IgA2 and C4BP	14

The applicability of Cheetah-MS as the core MS/MS analysis of the TX-MS approach in several case studies

Funding

This work was supported by the Foundation of Knut and Alice Wallenberg [2016.0023 and 2019.0353 to J.M. and L.M.] as well as Vetenskapsrådet 2020-02419 to L.M., and by the Swiss National Science Foundation [P2ZHP3_191289 to H.K.]. Conflict of Interest: none declared.

11 in total

1. NGL viewer: web-based molecular graphics for large complexes.

Authors: Alexander S Rose; Anthony R Bradley; Yana Valasatava; Jose M Duarte; Andreas Prlic; Peter W Rose
Journal: Bioinformatics Date: 2018-11-01 Impact factor: 6.937

Review 2. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology.

Authors: Francis J O'Reilly; Juri Rappsilber
Journal: Nat Struct Mol Biol Date: 2018-10-29 Impact factor: 15.369

3. ProteoWizard: open source software for rapid proteomics tools development.

Authors: Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick
Journal: Bioinformatics Date: 2008-07-07 Impact factor: 6.937

4. MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers.

Authors: Masahito Ohue; Takehiro Shimoda; Shuji Suzuki; Yuri Matsuzaki; Takashi Ishida; Yutaka Akiyama
Journal: Bioinformatics Date: 2014-08-06 Impact factor: 6.937

5. pdb-tools: a swiss army knife for molecular structures.

Authors: João P G L M Rodrigues; João M C Teixeira; Mikaël Trellet; Alexandre M J J Bonvin
Journal: F1000Res Date: 2018-12-20

6. Structural determination of Streptococcus pyogenes M1 protein interactions with human immunoglobulin G using integrative structural biology.

Authors: Hamed Khakzad; Lotta Happonen; Yasaman Karami; Sounak Chowdhury; Gizem Ertürk Bergdahl; Michael Nilges; Guy Tran Van Nhieu; Johan Malmström; Lars Malmström
Journal: PLoS Comput Biol Date: 2021-01-07 Impact factor: 4.475

7. The structure of human dermatan sulfate epimerase 1 emphasizes the importance of C5-epimerization of glucuronic acid in higher organisms.

Authors: Mahmudul Hasan; Hamed Khakzad; Lotta Happonen; Anders Sundin; Johan Unge; Uwe Mueller; Johan Malmström; Gunilla Westergren-Thorsson; Lars Malmström; Ulf Ellervik; Anders Malmström; Emil Tykesson
Journal: Chem Sci Date: 2020-12-08 Impact factor: 9.825

8. In vivo Cross-Linking MS of the Complement System MAC Assembled on Live Gram-Positive Bacteria.

Authors: Hamed Khakzad; Lotta Happonen; Guy Tran Van Nhieu; Johan Malmström; Lars Malmström
Journal: Front Genet Date: 2021-01-08 Impact factor: 4.599

Review 9. Macromolecular modeling and design in Rosetta: recent methods and frameworks.

Authors: Julia Koehler Leman; Brian D Weitzner; Steven M Lewis; Jared Adolf-Bryfogle; Nawsad Alam; Rebecca F Alford; Melanie Aprahamian; David Baker; Kyle A Barlow; Patrick Barth; Benjamin Basanta; Brian J Bender; Kristin Blacklock; Jaume Bonet; Scott E Boyken; Phil Bradley; Chris Bystroff; Patrick Conway; Seth Cooper; Bruno E Correia; Brian Coventry; Rhiju Das; René M De Jong; Frank DiMaio; Lorna Dsilva; Roland Dunbrack; Alexander S Ford; Brandon Frenz; Darwin Y Fu; Caleb Geniesse; Lukasz Goldschmidt; Ragul Gowthaman; Jeffrey J Gray; Dominik Gront; Sharon Guffy; Scott Horowitz; Po-Ssu Huang; Thomas Huber; Tim M Jacobs; Jeliazko R Jeliazkov; David K Johnson; Kalli Kappel; John Karanicolas; Hamed Khakzad; Karen R Khar; Sagar D Khare; Firas Khatib; Alisa Khramushin; Indigo C King; Robert Kleffner; Brian Koepnick; Tanja Kortemme; Georg Kuenze; Brian Kuhlman; Daisuke Kuroda; Jason W Labonte; Jason K Lai; Gideon Lapidoth; Andrew Leaver-Fay; Steffen Lindert; Thomas Linsky; Nir London; Joseph H Lubin; Sergey Lyskov; Jack Maguire; Lars Malmström; Enrique Marcos; Orly Marcu; Nicholas A Marze; Jens Meiler; Rocco Moretti; Vikram Khipple Mulligan; Santrupti Nerli; Christoffer Norn; Shane Ó'Conchúir; Noah Ollikainen; Sergey Ovchinnikov; Michael S Pacella; Xingjie Pan; Hahnbeom Park; Ryan E Pavlovicz; Manasi Pethe; Brian G Pierce; Kala Bharath Pilla; Barak Raveh; P Douglas Renfrew; Shourya S Roy Burman; Aliza Rubenstein; Marion F Sauer; Andreas Scheck; William Schief; Ora Schueler-Furman; Yuval Sedan; Alexander M Sevy; Nikolaos G Sgourakis; Lei Shi; Justin B Siegel; Daniel-Adriano Silva; Shannon Smith; Yifan Song; Amelie Stein; Maria Szegedy; Frank D Teets; Summer B Thyme; Ray Yu-Ruei Wang; Andrew Watkins; Lior Zimmerman; Richard Bonneau
Journal: Nat Methods Date: 2020-06-01 Impact factor: 28.547