Literature DB >> 32628831

Proteomics Using Protease Alternatives to Trypsin Benefits from Sequential Digestion with Trypsin.

Therese Dau^1,2, Giulia Bartolomucci², Juri Rappsilber^1,2.

Abstract

Trypsin is the most used enzyme in proteomics. Nevertheless, proteases with complementary cleavage specificity have been applied in special circumstances. In this work, we analyzed the characteristics of five protease alternatives to trypsin for protein identification and sequence coverage when applied to S. pombe whole cell lysates. The specificity of the protease heavily impacted the number of proteins identified. Proteases with higher specificity led to the identification of more proteins than proteases with lower specificity. However, AspN, GluC, chymotrypsin, and proteinase K largely benefited from being paired with trypsin in sequential digestion, as had been shown by us for elastase before. In the most extreme case, predigesting with trypsin improves the number of identified proteins for proteinase K by 731%. Trypsin predigestion also improved the protein identifications of other proteases, AspN (+62%), GluC (+80%), and chymotrypsin (+21%). Interestingly, the sequential digest with trypsin and AspN yielded even a higher number of protein identifications than digesting with trypsin alone.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Proteins
Trypsin

Year: 2020 PMID： 32628831 PMCID： PMC7377536 DOI： 10.1021/acs.analchem.0c00478

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Trypsin is the protease of choice for mass spectrometry (MS)-based proteomics. It cleaves carboxyterminal of Arg and Lys residues, resulting in a positive charge at the peptide C-terminus, which is advantageous for MS analysis.[1,2] Nevertheless, other proteases are frequently used to obtain complementary data.[3,4] Among these, AspN and GluC target acidic amino acid residues (Figure a). Both enzymes generate peptide mixtures of comparable complexity to that of trypsin and have been successfully used in many studies.[4−7] Chymotrypsin, which targets primarily aromatic residues, has also been used.[7−9] In contrast, broad specificity proteases are much less widely used in proteomics. This is likely due to the high complexity of the peptide mixtures that they generate. To our knowledge, their application has been limited to prefractionated samples. Proteinase K, for example, was used to “shave” surface-exposed loops from proteins in membrane vesicles.[10,11]

Figure 1

Impact of different proteases and protease combinations on the identification of proteins and peptides. (a) Frequency of the amino acids targeted by AspN, GluC, chymotrypsin, elastase, and proteinase K according to the UniProtKB/TrEMBL release report. Number of (b) proteins and (c) peptides identified with different protease combinations. Trypsin–other protease, light blue; other protease–trypsin, dark blue; single protease, green. Error bars are standard deviation (SD) of at least five independent digestion experiments. Our group has previously shown that the number of identified peptides, when using alternatives to trypsin, could largely be improved by a sequential combination with trypsin. This includes AspN, GluC, chymotrypsin, and elastase for the detection of cross-link sites[12−15] and elastase applied to S. pombe whole cell lysates.[14] The sequential digestion increased the number of identified cross-links up to 19-fold for the Taf4–12 complex compared to digesting with elastase alone.[14] Introducing positively charged C-termini through trypsin improves the detection of previously nontryptic peptides. Importantly, smaller peptides are protected from the second protease.[12,14] Thus, use of two proteases does not lead to the very small peptides that in silico digestion would predict. As a consequence, using elastase after trypsin does not lead to the same peptide complexity as using elastase alone. In this study, we analyzed whether the introduction of trypsin in a sequential digest might improve the application of AspN, GluC, chymotrypsin, and proteinase K on unfractionated S. pombe lysate.

Methods

Public Data Sets

The data on trypsin, elastase, trypsin–elastase, and elastase–trypsin were taken from our previous work[14] and retrieved from PRIDE with the data set identifier PXD011459.

Sample Preparation

One gram of frozen and ground S. pombe cells were resuspended in 2 mL of RIPA (Sigma-Aldrich, St. Louis, MO) supplemented with the protease inhibitor cocktail cOmplete according to the manufacturer’s instructions (Roche, Basel). To remove the cell debris, the samples were centrifugated at 1200g for 15 min. The lysates were subjected to gel electrophoresis on a 4%–12% Bis-Tris gel (Life Technologies, Carlsbad, CA) for 5 min and stained using Imperial Protein Stain (Thermo Fisher Scientific, Rockford, IL). After excising the stained gel area as a single fraction, the proteins were first reduced with dithiothreitol and then alkylated with iodoacetamide. The first protease (trypsin (1:100), elastase (1:100), AspN (1:100), GluC (1:50), chymotrypsin (1:50), and proteinase K (1:50)) was incubated for 16 h at 37 °C (besides chymotrypsin at RT). The second protease was added for 4 h at 37 °C (besides elastase for 30 min). We used a standardized protocol to desalt and concentrate the peptides on C18 StageTips for subsequent analysis.[16,17] For each condition, the equivalent of 1 μg protein starting material was used.

LC-MS/MS

All samples were analyzed on a linear iontrap–orbitrap mass spectrometer (Orbitrap Elite, Thermo Fisher Scientific, Rockford, IL) coupled online to a liquid chromatograph (Ultimate 3000 RSLCnano Systems, Dionex, Thermo Fisher Scientific, UK) with a C18-column (EASY-Spray LC Column, Thermo Fisher Scientific, Rockford, IL). The flow rate was 0.2 μL/min using 98% mobile phase A (0.1% formic acid) and 2% mobile phase B (80% acetonitrile in 0.1% formic acid). To elute the peptides, the percentage of mobile phase B was first increased to 40% over a time course of 110 min followed by a linear increase to 95% in 11 min. Full MS scans were recorded in the orbitrap at a 120,000 resolution for MS1 with a scan range of 300–1700 m/z. The 20 most intense ions (precursor charge ≥2) were selected for fragmentation by collision-induced disassociation, and MS2 spectra were recorded in the ion trap (20,000 ions as a minimal required signal, 35 normalized collision energy, dynamic exclusion for 40 s).

Data Analysis

MaxQuant software[18] (version 1.5.2.8) employing the Andromeda search engine[19] in combination with the PombeBase database[20] was used to analyze the samples. The following parameters were used for the search: carbamidomethylation of cysteine as a fixed modification, oxidation of methionine as a variable modification, MS accuracy of 4.5 ppm, and MS/MS tolerance of 0.5 Da. Up to six miscleavages were allowed for digests involving trypsin, AspN, GluC, or chymotrypsin and up to 10 miscleavages for digests containing elastase or proteinase K. Frequencies of amino acids were taken from the statistics of the UniProtKB/TrEMBL protein database release 2019_11 (https://www.ebi.ac.uk/uniprot/TrEMBLstats).

Results and Discussion

Lysate from S. pombe was digested either with trypsin, AspN, GluC, chymotrypsin, elastase, or proteinase K. We also combined each of the proteases other than trypsin in a sequential digest with trypsin as either the first or second protease. Adding trypsin to the digest with AspN and GluC improved the protein (AspN = 899 ± 69, trypsin–AspN = 1455 ± 85, AspN–trypsin = 1331 ± 50, GluC = 719 ± 28, trypsin–GluC = 1294 ± 37, GluC–trypsin = 1319 ± 25) identification (Figure b). Peptide identifications also improved (AspN = 6828 ± 514, trypsin–AspN = 16087 ± 327, AspN–trypsin = 17968 ± 470, GluC = 4467 ± 182, trypsin–GluC = 13461 ± 260, GluC–trypsin = 15713 ± 600) (Figure c). The order of proteases had only a minor influence on the identifications. Using trypsin prior to chymotrypsin or elastase also improved the identification of proteins (chymotrypsin = 938 ± 27, trypsin–chymotrypsin = 1200 ± 25, elastase = 593 ± 7, trypsin–elastase = 874 ± 40), and peptides (chymotrypsin = 8818 ± 232, trypsin–chymotrypsin = 13611 ± 346, elastase = 6821 ± 84, trypsin–elastase = 9039 ± 374). Using trypsin as the second protease had only a minimal effect on the protein (chymotrypsin–trypsin = 1056 ± 91, elastase–trypsin = 492 ± 115) and peptide identification (chymotrypsin–trypsin = 6869 ± 744, elastase–trypsin = 6280 ± 1680). Interestingly, digesting with trypsin alone did not give the highest number of protein (1403 ± 65) and peptide (14410 ± 571) identifications. We identified more proteins (+4%) and peptides (+12%) when trypsin was followed by AspN. The biggest impact of sequential digestion with trypsin was seen on the performance of proteinase K. Using proteinase K alone led to very few identifications of proteins (proteinase K = 78 ± 33) and peptides (proteinase K = 527 ± 179). This might be due to very short peptides being generated by proteinase K, which cleaves carboxyterminal of half of all the amino acids. Alternatively, or in addition, the high complexity of the peptide mixture generated by proteinase K might reduce identification rates. Surprisingly, adding trypsin to the proteinase K digest increased the number of identifications for proteins (proteinase K–trypsin = 461 ± 17) and peptides (proteinase K–trypsin = 3169 ± 194). Using trypsin prior to proteinase K further improved on these results as this led to the identification of 8 times more proteins (646 ± 36) and 8 times more peptides (4279 ± 530) compared to proteinase K alone. In summary, AspN, GluC, and proteinase K profited most of the five tested proteases from the addition of trypsin. The underlying reasons for the observed gains are likely different. AspN and GluC have low amounts of available cleavage sites and therefore generate relatively long peptides. Many of these will be unfavorably long for mass spectrometric detection. In addition, they are missing a terminal positive charge. Adding trypsin introduces such a C-terminal charge and shortens very long peptides, both enhancing peptide detection in MS analysis. AspN and GluC are highly efficient (Figure a), while for chymotrypsin and especially elastase and proteinase K many miscleavages were detected. Although we cannot exclude that undigested protein from the first digest may be the source for the additional identification of peptides and proteins, the high efficiency of GluC and AspN makes it unlikely to be the case for these enzymes. Also, the LC-MS data did not indicate the presence of a large quantity of semidigested proteins, as judged from the absence of a late eluting and highly charged cluster of ions (data not shown).

Figure 2

(a) Numbers of miscleavages for each protease AspN, dark violet; GluC, light violet; chymotrypsin, very light violet; elastase, light green; proteinase K dark green. (b) Of the promiscuous proteases, only proteinase K showed a reduced number of submitted MS/MS compared to trypsin, chymotrypsin, and elastase. (c) Identification rate of searched MS/MS decreased with decreasing specificity of the protease. To analyze possible reasons for the low identification rates of more promiscuous cutters, we looked at the submitted and identified MS/MS spectra (Figure b, c). Only proteinase K had a reduced number of submitted MS/MS spectra. This might be due to the complexity of the peptide mixture resulting from proteinase K. However, the main problem was the low identification success of these MS/MS spectra. The same applied to the spectra from other less specific proteases. One of the reasons might be cofragmentation of several peptides as the mixture is more complex than for specific proteases. This is supported by the fact that AspN and GluC showed similar identification rates to trypsin. Another reason might be the increase in the database size and the problems associated with it for identification. While AspN and GluC are very specific proteases, over 50% of the residues are potential cleavage sites for proteinase K. The problem for proteinase K is therefore not a lack of cleavage sites. Adding trypsin to proteinase K increased identifications and thus ruled out the possibility that peptides generated by proteinase K alone, at least under standard conditions, are generally too short for proteomics. If therefore complexity of a proteinase K digest is the reason for the low identification yields of proteinase K; then, the addition of trypsin must reduce this complexity. Adding trypsin might unify “ragged” proteinase K peptides that share either the N- or C-terminus but have different lengths (Figure a). In this way, trypsin leads to a concentration increase of peptides by reducing sample complexity. At least when trypsin is used first, an additional mechanism must be considered that was previously described for sequential digestion.[12,14] The second enzyme does not cleave shorter peptides with high efficiency, effectively leading to short tryptic peptides being protected from proteinase K. In either case, the complexity that is normally introduced through proteinase K is reduced by the tryptic treatment.

Figure 3

(a) Comparison of semitryptic peptides with a tryptic N- or C-terminus after digesting whole S. pombe with proteinase K followed by trypsin. (b) Peptides that have been identified in the N-terminal region of 60S acidic ribosomal protein P1-alpha 1 (26–94) with either trypsin, proteinase K, or proteinase K followed by trypsin. Trypsin, red; proteinase K, green; proteinase K–trypsin, dark blue. End trimming and short peptide protection alone are likely not the sole explanations. We observed previously that among all observed mixed-protease action peptides, i.e., those peptides that were generated by trypsin action on one end and another protease at the other end, there is a misbalance: tryptic C-termini are more prevalent than N-termini generated by trypsin (semitryptic peptides with tryptic N-terminus = 652 ± 42, semitryptic peptides with tryptic C-terminus = 763 ± 15) (Figure a). This means also the improved observability of peptides with a basic C-terminal residue contributes to the observed effect of sequential digestion on identification rates. As an example, we analyzed the 60S acidic ribosomal protein P1-alpha 1 (Figure b). There are no trypsin cleavages sites between residues 56 and 90, so this region is not covered when trypsin is used alone. Digesting with proteinase K alone did not improve the coverage for this region, although or possibly because every other residue is a potential cleavage site for proteinase K. Peptides from this region could only be identified when proteinase K and trypsin were used in a sequential digest. We then wondered how far the proteins and peptides that were observed in the different uses of proteases alone or in combination with trypsin covered different sequence space. We measured this in number of unique residues. As one would expect, this followed the same trends seen for protein and peptide identifications. For AspN and GluC, the largest number of residues was covered when trypsin was used following the other protease (Figure S-1a, b). For chymotrypsin, elastase, and proteinase K, the inverse order, i.e., trypsin first, yielded the larger coverage (Figure S-1c–e). Nonetheless, the different conditions yielded substantial nonoverlap. When combining the results of two digestion conditions, one would combine the data obtained by the protease alone with that of a trypsin-first sequential digest. Their overlap is substantially smaller (4 ± 2% to 38 ± 1%) than what we observed here for trypsin replicas (83 ± 2%). Next, we compared the gain of residues on top of the trypsin digest that was observed for each digestion protocol (Figure a). For AspN, GluC, and proteinase K, there was a significant increase of additional identified residues if trypsin was added prior to the digest. Curiously, the highest gain in residues for AspN was achieved with a sequential AspN–trypsin digest. For elastase and chymotrypsin, adding trypsin prior to their usage did not increase the number of identified significantly. Reversing the order in the sequential digest even decreased the gain in residues.

Figure 4

Comparison of residues gained with the three protease combinations for (a) AspN, GluC, chymotrypsin, elastase, and proteinase K on top of trypsin digest. Trypsin–other protease, light blue; other protease–trypsin, dark blue; single protease, green. Comparison of (b) proteins and (c) residues gained on top of a tryptic digest through sequential digestion variants, parallel digestion, and replica of tryptic digestion. AspN, dark violet; GluC, light violet; chymotrypsin, white; elastase, light green; proteinase K, dark green. Finally, we analyzed the gain in identified proteins and residues when using different combinations of digestion conditions (Figure b, c). We combined the results of either five replicas of trypsin, trypsin followed by either of the five other proteases, or either of the five other proteases followed by trypsin. An initial trypsin digest served as the reference, in which 1484 proteins and 202,556 residues were identified. This followed the rationale that one would always use trypsin for an initial analysis, although trypsin followed by AspN in a sequential digest consistently gave here higher protein and peptide identifications. The highest numbers of complementary proteins (344) and residues (119,763) were identified when trypsin was used first in a sequential digest in combination with either of the five other proteases, followed by the inverted setup in which trypsin was used last (proteins = 315, residues = 111,126). Digestions with nontryptic proteases alone were outperformed by trypsin replicas in terms of protein identification (230 versus 296) but not in terms of residue coverage (111,069 versus 76,645).

Conclusion

In this study, we investigated the impact of adding trypsin to other proteases in proteomics. Sequential digestion has been used before,[5,6,21] and we here add a systematic evaluation of different protease combinations. Protein and peptide identifications improved when combining any of the tested proteases with trypsin. This is in line with previous studies on cross-linking identification, which benefited from the sequential digest with trypsin.[12,14] In the most extreme case, the sequential digest with trypsin and AspN outperformed results obtained by trypsin alone. This effect is relatively small, and due to cost considerations, trypsin will remain the protease of first choice in proteomics also after our study. However, situations where alternative proteases are currently used could in the future benefit from adding a sequential digestion step with trypsin. As trypsin is compatible with the buffer conditions of the tested proteases, this requires no other additional step than adding trypsin.

21 in total

1. Andromeda: a peptide search engine integrated into the MaxQuant environment.

Authors: Jürgen Cox; Nadin Neuhauser; Annette Michalski; Richard A Scheltema; Jesper V Olsen; Matthias Mann
Journal: J Proteome Res Date: 2011-02-22 Impact factor: 4.466

2. Confetti: a multiprotease map of the HeLa proteome for comprehensive proteomics.

Authors: Xiaofeng Guo; David C Trudgian; Andrew Lemoff; Sivaramakrishna Yadavalli; Hamid Mirzaei
Journal: Mol Cell Proteomics Date: 2014-04-02 Impact factor: 5.911

3. Six alternative proteases for mass spectrometry-based proteomics beyond trypsin.

Authors: Piero Giansanti; Liana Tsiatsiani; Teck Yew Low; Albert J R Heck
Journal: Nat Protoc Date: 2016-04-28 Impact factor: 13.491

4. Value of using multiple proteases for large-scale mass spectrometry-based proteomics.

Authors: Danielle L Swaney; Craig D Wenger; Joshua J Coon
Journal: J Proteome Res Date: 2010-03-05 Impact factor: 4.466

5. Consecutive proteolytic digestion in an enzyme reactor increases depth of proteomic and phosphoproteomic analysis.

Authors: Jacek R Wiśniewski; Matthias Mann
Journal: Anal Chem Date: 2012-03-01 Impact factor: 6.986

6. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips.

Authors: Juri Rappsilber; Matthias Mann; Yasushi Ishihama
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

7. Expanding the chemical cross-linking toolbox by the use of multiple proteases and enrichment by size exclusion chromatography.

Authors: Alexander Leitner; Roland Reischl; Thomas Walzthoeni; Franz Herzog; Stefan Bohn; Friedrich Förster; Ruedi Aebersold
Journal: Mol Cell Proteomics Date: 2012-01-27 Impact factor: 5.911

8. PomBase 2015: updates to the fission yeast database.

Authors: Mark D McDowall; Midori A Harris; Antonia Lock; Kim Rutherford; Daniel M Staines; Jürg Bähler; Paul J Kersey; Stephen G Oliver; Valerie Wood
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971

9. Detection of Proteome Diversity Resulted from Alternative Splicing is Limited by Trypsin Cleavage Specificity.

Authors: Xiaojing Wang; Simona G Codreanu; Bo Wen; Kai Li; Matthew C Chambers; Daniel C Liebler; Bing Zhang
Journal: Mol Cell Proteomics Date: 2017-12-08 Impact factor: 5.911

10. ProteomeXchange provides globally coordinated proteomics data submission and dissemination.

Authors: Juan A Vizcaíno; Eric W Deutsch; Rui Wang; Attila Csordas; Florian Reisinger; Daniel Ríos; José A Dianes; Zhi Sun; Terry Farrah; Nuno Bandeira; Pierre-Alain Binz; Ioannis Xenarios; Martin Eisenacher; Gerhard Mayer; Laurent Gatto; Alex Campos; Robert J Chalkley; Hans-Joachim Kraus; Juan Pablo Albar; Salvador Martinez-Bartolomé; Rolf Apweiler; Gilbert S Omenn; Lennart Martens; Andrew R Jones; Henning Hermjakob
Journal: Nat Biotechnol Date: 2014-03 Impact factor: 54.908

12 in total

Review 1. Applications of Tandem Mass Spectrometry (MS/MS) in Protein Analysis for Biomedical Research.

Authors: Anca-Narcisa Neagu; Madhuri Jayathirtha; Emma Baxter; Mary Donnelly; Brindusa Alina Petre; Costel C Darie
Journal: Molecules Date: 2022-04-08 Impact factor: 4.927

2. Data-Independent Acquisition Protease-Multiplexing Enables Increased Proteome Sequence Coverage Across Multiple Fragmentation Modes.

Authors: Alicia L Richards; Kuei-Ho Chen; Damien B Wilburn; Erica Stevenson; Benjamin J Polacco; Brian C Searle; Danielle L Swaney
Journal: J Proteome Res Date: 2022-03-02 Impact factor: 5.370

Review 3. Characterizing Endogenous Protein Complexes with Biological Mass Spectrometry.

Authors: Rivkah Rogawski; Michal Sharon
Journal: Chem Rev Date: 2021-08-18 Impact factor: 72.087

Review 4. Post-Proline Cleaving Enzymes (PPCEs): Classification, Structure, Molecular Properties, and Applications.

Authors: Anis Baharin; Tiew-Yik Ting; Hoe-Han Goh
Journal: Plants (Basel) Date: 2022-05-18

5. Comparison of Three Glycoproteomic Methods for the Analysis of the Secretome of CHO Cells Treated with 1,3,4-O-Bu₃ManNAc.

Authors: Joseph L Mertz; Shisheng Sun; Bojiao Yin; Yingwei Hu; Rahul Bhattacharya; Michael J Bettenbaugh; Kevin J Yarema; Hui Zhang
Journal: Bioengineering (Basel) Date: 2020-11-10

6. Purification and quantitative proteomic analysis of cell bodies and protrusions.

Authors: Maria Dermit; Faraz K Mardakheh
Journal: STAR Protoc Date: 2021-04-10

7. Proteomics as a tool to gain next level insights into photo-crosslinkable biopolymer modifications.

Authors: Nele Pien; Fabrice Bray; Tom Gheysens; Liesbeth Tytgat; Christian Rolando; Diego Mantovani; Peter Dubruel; Sandra Van Vlierberghe
Journal: Bioact Mater Date: 2022-01-23

Review 8. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures.

Authors: Alyssa Zi-Xin Leong; Pey Yee Lee; M Aiman Mohtar; Saiful Effendi Syafruddin; Yuh-Fen Pung; Teck Yew Low
Journal: J Biomed Sci Date: 2022-03-17 Impact factor: 8.410

9. Molecular characterization of sequence-driven peptide glycation.

Authors: Michelle T Berger; Daniel Hemmler; Alesia Walker; Michael Rychlik; James W Marshall; Philippe Schmitt-Kopplin
Journal: Sci Rep Date: 2021-06-24 Impact factor: 4.379

Review 10. The Hitchhiker's guide to glycoproteomics.

Authors: Tiago Oliveira; Morten Thaysen-Andersen; Nicolle H Packer; Daniel Kolarich
Journal: Biochem Soc Trans Date: 2021-08-27 Impact factor: 5.407