Literature DB >> 28164708

Expanding Proteome Coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) Combined with Broad Specificity Proteolysis.

Simon Davis¹, Philip D Charles¹, Lin He², Peter Mowlds³, Benedikt M Kessler¹, Roman Fischer¹.

Abstract

The "deep" proteome has been accessible by mass spectrometry for some time. However, the number of proteins identified in cells of the same type has plateaued at ∼8000-10 000 without ID transfer from reference proteomes/data. Moreover, limited sequence coverage hampers the discrimination of protein isoforms when using trypsin as standard protease. Multienzyme approaches appear to improve sequence coverage and subsequent isoform discrimination. Here we expanded proteome and protein sequence coverage in MCF-7 breast cancer cells to an as yet unmatched depth by employing a workflow that addresses current limitations in deep proteome analysis in multiple stages: We used (i) gel-aided sample preparation (GASP) and combined trypsin/elastase digests to increase peptide orthogonality, (ii) concatenated high-pH prefractionation, and (iii) CHarge Ordered Parallel Ion aNalysis (CHOPIN), available on an Orbitrap Fusion (Lumos) mass spectrometer, to achieve 57% median protein sequence coverage in 13 728 protein groups (8949 Unigene IDs) in a single cell line. CHOPIN allows the use of both detectors in the Orbitrap on predefined precursor types that optimizes parallel ion processing, leading to the identification of a total of 179 549 unique peptides covering the deep proteome in unprecedented detail.

Entities: CellLine Chemical Disease Gene Species

Keywords: LC−MS/MS; deep proteome; isoform profiling; protein sequence coverage; sequence coverage

Mesh：

Substances：
Peptides
Proteome

Year: 2017 PMID： 28164708 PMCID： PMC5363888 DOI： 10.1021/acs.jproteome.6b00915

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Human primary cells and cell lines are believed to express between 8000 and ∼11 000 gene products dependent on their differentiation state.[1−3] Modern proteomic workflows are now able to cover deep cellular proteomes through prefractionation and multienzyme digestion strategies.[4,5] The identification of over 8000 cellular proteins is now readily achievable. However, most proteins are detected with only partial sequence coverage, and their level of completeness is biased toward the most abundant (“high content”) proteins.[6] Improvements in protein sequence coverage of deep proteomes allow increasingly comprehensive interrogation of protein isoforms, post-translational modifications, amino acid substitutions, deletions, and insertions, all of which represent prime objectives in the future development of proteome research. Despite the advent of high-speed mass spectrometers,[7,8] prefractionation of biological samples is still necessary to overcome the dynamic range of protein abundance and to grant the mass spectrometer enough time for comprehensive sampling. For instance, ion exchange chromatography (strong cation exchange (SCX)[9−11] and strong anion exchange (SAX)[12,13]), isoelectric focusing of peptides,[14−16] and high-pH reversed-phase chromatography[17−19] have been used with great success to identify an increasing number of proteins in tissues,[20,21] cells,[22] and other biological samples.[23,24] In addition, complementary digestion using proteases with alternative cleavage specificities can increase protein sequence coverage in deep proteome analyses.[5,25−27] Interestingly, the fragmentation/detection modes also deliver complementary data to increase peptide identification rates.[28,29] However, with each additional variant for sample preparation and data acquisition, the analytical burden is multiplied. In addition to limitations in analyte resolving power and dynamic range, the observed ultradeep/high-sequence-coverage proteome appears to stagnate at the depth of ∼9000 protein groups in a single type of cells when no peptide identifications are transferred from reference proteomes[1,30] or super conditions (i.e., “Super-SILAC”[31−34]), even when current state-of-the-art instrumentation is employed. The Orbitrap Fusion and its successor, the Orbitrap Fusion Lumos, update the proven LTQ-Orbitrap dual-detector family of instruments[35,36] with a view to closing this gap. This combination of a linear ion trap with an Orbitrap mass detector has been iteratively improved through previous generations (Orbitrap Classic/XL, Orbitrap Velos/Elite) to tailor the specific capabilities of each detector for the different requirements in speed, sensitivity, and resolution for precursor (MS1) and fragment ion (MS2) scans and offers different fragmentation types (CID, HCD, and ETD) to generate complementary fragment information,[37,38] particularly for modified peptides.[39,40] Changes in instrument design, in particular, the addition of a quadrupole element, allowed parallelization of ion isolation/accumulation and detection during the instrument duty cycle in Q-Exactive models,[41] thereby increasing speed and shortening the duty cycle at the cost of the presence of the secondary detector (linear ion trap). In the Orbitrap Fusion/Lumos, the two strategies of using a quadrupole for ion isolation and a linear ion trap for fragment spectra acquisition have been combined, which further enhanced parallel data acquisition.[7] The parallelization capabilities of the Orbitrap Fusion/Lumos are highlighted in the “Universal Method”, which was developed by Thermo Fisher to maximize peptide detection irrespective of sample abundance and complexity.[42] Essentially, the instrument is programmed to use longer MS2 acquisition times on low abundant peptides if (i) insufficient novel precursors have been detected and (ii) the duty cycle has not reached a set length. Additionally, the instrument uses the quadrupole, C-trap, Orbitrap, and linear ion trap elements in parallel to maximize usage of each module of the instrument and minimize idle time (Figure A). This universal approach may not be as effective as methods specifically optimized for particular samples. However, it has been shown to perform well for the analysis of various sample types and is accessible to all users as a predefined method in the vendor software.[36]

Figure 1

Comprehensive cell proteome coverage by prefractionation and CHOPIN MS analysis workflow. (A) Mass spectrometry acquisition methods demonstrating the dynamic segmentation of analytical channels for MS1 FT (Orbitrap), Q (Quadrupole), and MS2 LTQ (Linear Ion Trap) that were designed for the Universal (upper panel) and CHarge Ordered Parallel Ion aNalysis (CHOPIN) method (lower panel). The Universal Method makes use of the parallel acquisition of MS1 scan in the Orbitrap, while peptide fragments are scanned in the LTQ, ordered by decreasing precursor intensity. Additional parallelization is achieved by concurrent MS2 scans and isolation of the following precursor. Precursor ion accumulation is allowed to proceed for up to 250 ms if no previously unselected precursor is found. CHOPIN adds another level of parallelization by triaging intense and highly charged ions to be analyzed by an Orbitrap MS2 scan, while low abundant precursor ions are prioritized for the more sensitive MS2 scan in the linear ion trap. CHOPIN and the data analysis is further described in Supporting Information. (B) Methodological workflow for the analysis of the MCF-7 breast cancer cell line deep proteome. MCF-7 cell extracts were digested with either trypsin or elastase, and peptide mixtures were separated by high-pH reversed-phase (RP) HPLC to collect 30 fractions that were pooled in a concatenated fashion to 15 fractions. Also, tryptic and elastase digest was mixed and prefractionated as above (“Post Digest Mix”, PDM), followed by concatenation or distinct fraction analysis. Each fraction was subsequently analyzed by LC–MS/MS using both the Universal and CHOPIN acquisition methods. Detailed results for each individual experiment are shown in the Supporting Information. (Orbitrap Fusion Lumos photo by RF).

Methods

Tissue Culture and Cell Lysis

The MCF-7 breast cancer cell line was cultured in DMEM medium (Sigma, no. D6546) supplemented with 10% FCS, 1% penicillin, 1% streptomycin, and 1% glutamine at 37 °C (5% CO2). Five T175 tissue culture flasks of confluent MCF-7 cells were harvested using a trypsin solution (Sigma, no. T3924), washed two times in PBS, and stored at −80 °C until use. The frozen cells were lysed on ice for 30 min in 5 mL of RIPA lysis buffer (Thermo Pierce, no. 89901) supplemented with 4% SDS, 6 M urea, 2 M thiourea, 100 mM DTT, protease, and phosphatase inhibitors (Roche nos. 11836170001 and 04906837001). The lysate was sonicated twice for 1 min (5 s on, 10 s off, repeated four times). After the addition of 1250 units of benzonase (Sigma, no. E1014), the lysate was incubated on ice for 20 min and centrifuged at 21 000 g for 20 min at 4 °C and the pellet discarded. Because of the presence of SDS and DTT in the sample, protein content was estimated by SDS-PAGE and Coomassie staining.

Sample Preparation and Fractionation

Approximately 5 mg of protein was digested using the GASP method.[43] In brief, the lysate was mixed with 30% acrylamide, polymerized, and shredded. The gel slurry was fixed in methanol/acetic acid/water (50/40/10%) and washed twice with alternating 6 M urea and 100% acetonitrile to remove SDS. 50 mM ammonium bicarbonate was added to the gel. The gel slurry was split equally into two by volume for digestion by separate enzymes. 100% acetonitrile was added to dehydrate the gel and was removed prior to the addition of 50 μg of trypsin (Promega, no. V5111) or 50 μg of elastase (Worthington Biochemical, no. LS006365). The samples were incubated at 37 °C overnight and further processed as according to the original GASP method to extract peptides from the shredded gel pieces. The samples were desalted on C18 solid-phase extraction cartridges (Sep-Pak plus, Waters) and resuspended in 2% acetonitrile 0.1% formic acid and peptide concentration determined using a peptide quantitation kit (Thermo Pierce, no. 23275). Off-line high-pH reverse-phase prefractionation was performed on 800 μg of digested material using the loading pump of a Dionex Ultimate 3000 HPLC with an automated fraction collector and a XBridge BEH C18 XP column (3 × 150 mm, 2.5 μm pore size, Waters no. 186006710) over a 100 min gradient using basic pH reverse-phase buffers (A: water, pH 10 with ammonium hydroxide; B: 90% acetonitrile, pH 10 with ammonium hydroxide). The gradient consisted of a 12 min wash with 1% B, then increasing to 35% B over 60 min, with a further increase to 95% B in 8 min, followed by a 10 min wash at 95% B and a 10 min re-equilibration at 1% B, all at a flow rate of 200 μL/min with fractions collected every 2 min throughout the run. 100 μL of the fractions was dried and resuspended in 20 μL of 2% acetonitrile/0.1% formic acid for analysis by LC–MS/MS. Fractions were loaded on the LC–MS/MS following the concatenation scheme shown in Figure B with adjusted sample volumes to analyze ∼1 μg on column.

Mass Spectrometry Analysis Methods

Peptide fractions were analyzed by nano-UPLC–MS/MS using a Dionex Ultimate 3000 nano-UPLC with EASY-Spray column (75 μm × 500 mm, 2 μm particle size, Thermo Scientific) with a 60 min gradient of 0.1% formic acid in 5% DMSO to 0.1% formic acid to 35% acetonitrile in 5% DMSO. MS data were acquired with an Orbitrap Fusion[7] Lumos instrument using the methods described below. A comprehensive description of the method can be found in the Supporting Information in addition to method transcripts and Xcalibur (Tune v. 2.0.1258.14) methods files.

Universal Method

The Universal method has been developed by Eliuk et al.[42] to maximize peptide identification without method optimization for different sample complexities and abundances. In principle, it allows a long ion accumulation time for low abundance precursors with parallel usage of quadrupole, collision cell, and both Orbitrap (FT) and ion trap (IT) detectors (summarized in Figure A). MS scans were acquired at a resolution of 120 000 between 400 and 1500 m/z and an AGC target of 4.0E5. MS/MS spectra were acquired in the linear ion trap (rapid scan mode) after collision-induced dissociation (CID) fragmentation at a collision energy of 35% and an AGC target of 4.0E3 for up to 250 ms, employing a maximal duty cycle of 3 s, prioritizing the most intense ions and injecting ions for all available parallelizable time. Selected precursor masses were excluded for 30 s.

CHOPIN

CHarge Ordered Parallel Ion aNalysis (CHOPIN) employs selection criteria to channel ions to the best suited detector based on precursor ion properties (Figure A). The hallmark of CHOPIN is the simultaneous use of both mass detectors for peptide fragment spectra acquisition, which allows the generation of additional MS/MS scans in the Orbitrap at no cost of duty cycle time. Because only high abundant precursors with higher charge states are analyzed in the Orbitrap after high collision energy dissociation (HCD) fragmentation, the success rate of these scans is very high. At the same time, the higher sensitivity of the ion trap is used to analyze low abundant precursor ions. Details and further description of the method used here have been exported into text format and are available in the Supporting Information. In brief, MS scans were acquired as above. For precursor selection, we prioritized the least abundant signals. Doubly charged ions were scheduled for CID/IT analysis with the same parameters applied as above. Charge states 3–7 with precursor intensity >500 000, however, were scheduled for analysis by a fast HCD/FT scan of maximal 40 ms (15 000 resolution). The remaining charge-state 3–7 ions with intensity <500 000 were scheduled for analysis by CID/IT, as described above. Selected precursor masses were excluded for 12 s, as the gain in MS/MS scan events allows repeated scans of the same precursor across the chromatographic peak without risking undersampling.

Elastase Digests

The elastase digested samples have been analyzed with divergent parameters to address the occurrence of singly charged peptide ions. In the CHOPIN method we added a fourth scan event for singly charged precursor ions to be scanned with a HCD/FT scan, increased collision energy (32% instead of 25%), and a longer injection time (100 ms instead of 40 ms). However, the “no enzyme” database searches benefit from high mass accuracy MS/MS spectra,[44] so we modified the Universal Method to replace the low mass accuracy CID/IT scans for MS/MS data acquisition for 2 HCD/FT scan types recognizing singly charged and multiple charged ions. Because the resulting method does not exactly conform to the parameters of the Universal method anymore, we refer to results obtained with this method as “Universal/FT” and highlight the difference where appropriate. Full details about the method have been exported into text format and are available in the Supporting Information.

Data Analysis

The general workflow of sample processing and identification of MS/MS spectra is shown in Figure S1. CHOPIN produces raw files containing HCD/FT and CID/IT spectra. To allow searching the data in PEAKS,[45] we separated both spectra types into separate MGF files by Proteome Discoverer (V. 2.0) using the top 10 (HCD/FT) and top 15 (CID/IT) peaks in every 100 m/z window. CID/IT spectra derived from CHOPIN or Universal Method were then searched in Peaks 7.5 using the default target decoy approach[46] with 20 ppm mass error tolerance for the precursor and 0.5 Da for fragment masses while HCD/FT spectra were searched with a 0.05 Da mass tolerance for fragment masses. The selection of a 20 ppm mass accuracy tolerance allowed the inclusion of correctly identified peptides for which the 13C isotope peak was wrongly assigned as monoisotopic precursor mass. These identifications will show as deamidated peptides with a larger mass error. The mass error distribution of deamidated peptides is visualized in Figure S5, showing the population of truly deamidated peptides and wrongly assigned precursor masses. We allowed up to four missed cleavage sites and no nonspecific cleavage for tryptic samples and set propionamide as fixed cysteine modification and variable modification on lysine and N-termini as well as Deamidation (N,Q) and Oxidation (M) and maximal 1 variable modification per peptide in the de novo and database searches (three variable PTMs for PTM search nodes[47]). The database used was in all cases the UniProt[48] Reference (UPR) Homo sapiens database (retrieved 15.10.2014). The elastase digest and Post Digest Mix data were searched with no enzyme specificity. Peptide false discovery rate (FDR) was adjusted to 1% and proteins grouped according the parsimony principle described by Nesvizhskii and Aebersold.[49] Subsequently, the protein identification score threshold was adjusted to achieve a protein FDR of ∼1%. The score thresholds for peptide and protein FDRs as well as identification metrics are shown in Table . Because HCD/FT and CID/IT spectra had to be searched individually to appreciate the different fragmentation types and mass accuracies, the results were combined post-search. The result combination includes the following major steps: (i) Read all of the PSMs identified from two sample files, including the ones from both target and decoy databases. (ii) Tune the PSM scores accordingly to make sure the scores of PSMs from different samples are normalized identically. More specifically, the PSM score thresholds at 1% FDR of both samples were calculated; then. using one of the thresholds as the base score, the PSM scores in the other sample were shifted according to the difference between the two score thresholds. (iii) Put all PSMs together and carry out the protein inference algorithm for protein grouping. (iv) Recalculate protein scores and coverage rates. The same procedure was applied to generate single or accumulating results from the prefractionated sample sets.

Table 1

Summary of Identification Metrics Using CHarge Ordered Parallel Ion aNalysis (CHOPIN) and Universal Method on Tryptic (T), Elastase (E), and Post Digest Mix (PDM) Samplesa

	peptide score threshold @1%FDR	PSMs	MSMS scans	effective peptide FDR @score threshold	protein score threshold @1%FDR	effective protein FDR @score threshold	protein groups @1% FDR	proteins (unique and razor)
CHOPIN (T)	21.1	307318	582030	0.924	57	1.052	8745	13019
Universal (T)	20.2	226291	539916	0.943	57	1.001	8692	12770
CHOPIN (E)	19.8	170960	660060	0.701	85	1.069	4951	5521
Universal/FT (E)	16.4	171529	349164	0.617	60	0.99	5143	6866
CHOPIN (PDM)	14	284347	714354	0.891	31	0.977	7958	11974
Universal (PDM)	14.9	192500	671699	0.84	32	1.003	7517	11371
CHOPIN (PDM), unlinked fractions	14	433723	1160032	0.999	37	0.977	9824	13000
All data	10.8	2010579	4677255	0.996	64	0.987	13728	14890
						trypsin	10052	13320
						elastase	7038	8257
						PDM	9834	12452

Fractions have been combined and data searched in PEAKS.

Fractions have been combined and data searched in PEAKS. Data density was visualized by using the Perseus software (v. 1.5.3.0) platform.[50]

Results and Discussion

Because the Orbitrap Fusion/Lumos instrument is capable of using a complex data-dependent decision tree, we decided to make additional use of the parallelization capabilities of an Orbitrap Fusion Lumos and developed a data-dependent acquisition method that would use elements of the Universal Method and add in additional MS2 scans for the idling Orbitrap detector. To maximize spectral quality/success rate and detector usage efficiency, we streamlined the ions to the detector that is best suited for their specific properties. Low abundant precursors with a charge state of 2 would be fragmented with CID, and their fragment spectrum was acquired in the more sensitive linear ion trap (CID/IT), while highly abundant precursors with a charge state of >2 would be fragmented using HCD and their fragment spectrum acquired in the Orbitrap (HCD/FT). In addition, higher charged precursors with an abundance below the HCD/FT selection threshold would be acquired with the same detection parameters as doubly charged ions (CID/IT). Consequently, CHOPIN results in hybrid data, containing both spectra types in a single raw file. The duty cycle of this CHarge Ordered Parallel Ion aNalysis (CHOPIN) is depicted in Figure A. To evaluate if CHOPIN would allow the acquisition of more high-quality MS2 spectra in complex samples, we prepared a total cell lysate of MCF-7 cells in the presence of 4% SDS, 6 M urea, 2 M thiourea, 100 mM DTT and sonicated the lysate to maximize lysis and protein solubilization. We used Gel-Aided Sample Preparation (GASP)[43] to allow the use of SDS and urea/thiourea for maximum solubilization of the sample to introduce missed cleavage sites where some lysine residues would react with acrylamide to create overlapping peptides, resulting in increased sequence coverage, and for ease of use. Samples where then digested with either trypsin or elastase. The individual digests were then prefractionated via high-pH reversed phase chromatography (C18, 30 fractions) and concatenated (15 fraction pools) as described in Figure B. In addition, we also mixed elastase and tryptic digest and analyzed concatenated and individual fractions. Each fraction was analyzed with CHOPIN and the Universal Method on a 1 h gradient resulting in six data sets of 15 × 1 h LC–MS/MS analyses (trypsin, elastase, Post Digest Mix, each acquired with CHOPIN and Universal Method) and one data set with 30 × 1 h LC–MS/MS analyses (Post Digest Mix, individual fractions, CHOPIN method). To evaluate how different search algorithms handle data acquired with CHOPIN and the Universal Method, the whole tryptic data set was reprocessed with PEAKS, Mascot,[51] Andromeda/MaxQuant,[52,53] and SEQUEST[54] (Table S6). Additionally, we addressed robustness and reproducibility by analyzing one tryptic fraction in technical triplicates with CHOPIN and Universal Method (Figure S7). In summary, we obtained comparable results with all used search engines, with PEAKS benefiting slightly from its ability to detect post-translational modifications in an unbiased fashion. Overall, we achieved significantly better identification rates and more peptide spectrum matches employing CHOPIN. The results are summarized and discussed in greater detail in the Supporting Information.

CHOPIN Improves Duty Cycle Usage and Success Rate of MS/MS Identification

One duty cycle of the Universal and CHOPIN methods in the tryptic experiment was extracted (Table S1) to exemplify the working principle of the two data acquisition methods under comparable conditions (similar RT, base peak, and base peak intensity). Here the Universal Method results in a Top35 scan event (1 precursor scan followed by 35 MS2 scans) in a 3 s duty cycle. The accumulated injection time for the 35 precursors is 1.8 s and the total MS2 scan time is 2.14 s. Given a 3 s duty cycle the Universal Method gains 0.94 s through parallel handling of MS2 injection and scan. Employing CHOPIN resulted in a Top42 scan event, of which 29 precursors were scanned with CID/IT and 13 were scanned by HCD/FT. Here the accumulated injection time is similar to the Universal Method with 1.79 s; however, because of parallel acquisition of MS2 scan in the Orbitrap and linear ion trap, the instrument spends 2.75 s on MS2 scans, adding up to a total of 4.54 s in a duty cycle of 3 s. The additional level of parallelization by using both detectors for MS2 scans in the same duty cycle gained 2.54 s through parallel handling. In summary, using CHOPIN we gained seven MS2 scans and 0.6 s MS2 scan time over the Universal Method in the exemplified duty cycle. Because we use HCD/FT for abundant precursors in CHOPIN, the resulting MS2 scans can be expected to have a high success rate. Also, previously scanned intense precursors are moved to the autoexclusion list, effectively precluding them from being selected for a CID/IT scan and therefore improving detector usage efficiency. Consequently, the more sensitive linear ion trap can spend time on less abundant precursors. We plotted the peptide score distribution of the accumulated results of the trypsin digest (Figure A, other digests see Figure S2) as a function of peptide mass and identification numbers (density gradient) for each scan type in Chopin (HCD/FT and CID/IT) and for the CID/IT scans using the Universal Method. We observed overall higher scores for the HCD/FT scan mode across the mass range with 32% of all identified spectra (31 066/97 731) yielding a score of 80 or higher. In contrast, only 86 out of 188 037 (0.05%) CID/IT identifications scored in the same range. Using the CID/IT-based Universal Method, only 899 identifications achieved a score of >80, clearly indicating a significantly lower spectrum quality in addition to overall lower identification numbers.

Figure 2

CHOPIN enhances MS/MS interpretation rates. (A) The density plot shows the number of identifications over precursor mass and peptide score (−10lgP) to demonstrate the gain of spectra quality for peptides by HCD/FT detection (Chopin HCD/FT) in a tryptic digest. The Chopin CID/IT spectra show a similar score distribution compared to peptides identified with the Universal Method. However, the combined data of the CHOPIN result show a clear improvement in the number of identified peptides and confidence. Density plots for the Elastase digest and the Post Digest Mix are shown in Figure S2. (B) Improvements on the peptide level are carried through to ID confidence on the protein group level, especially in the trypsin and Post Digest Mix samples. Because of the inclusion of singly charged precursors, the benefit in the elastase-digested samples is limited to high-confidence identifications. We observed similar frequencies for low-scoring proteins in the tryptic fractions after Universal and CHOPIN data acquisition, with some benefit for the Universal Method for low-to-medium protein scores (100–200). Interestingly, CHOPIN resulted in considerably more high scoring proteins. For the elastase digest we observed a different score distribution, especially when viewed in context with overall identification numbers (compare Figure B and Table S3). While we identified more peptides in the elastase digest with the modified Universal Method (higher success rate of high mass accuracy HCD/FT MS/MS spectra, see Methods section), we needed to use a high protein score threshold to achieve 1% protein FDR (see Table ). This can be explained by the inclusion of short peptides, frequently generated with a single charge, in the precursor selection algorithm, driving protein FDR. For future use of CHOPIN in elastase digests, we would recommend the addition of a precursor mass threshold to exclude singly charged, short peptides. The benefit of CHOPIN is seen most clearly in the Post Digest Mix, where CHOPIN’s improved duty cycle handles the increased sample complexity and mixed enzyme precursor profile more efficiently (Table ). We also compared the proteins and peptides identified with the different acquisition methods by scan types (CID/IT, HCD/FT) for the three experiments. As expected, we can observe a very high success rate for the HCD scans using CHOPIN data acquisition. Interestingly the success rate for CID/IT using CHOPIN is also higher than the success rate using the Universal Method and CID/IT, demonstrating that the CID/IT scan mode is better suited for doubly charged ions than unrestricted use in the Universal Method. In addition to acquiring more spectra due to improved parallelization, CHOPIN increases the spectra quality, yielding a better success rate (Figure S3 and Table S3).

CHOPIN Improves Protein Sequence Coverage

High protein sequence coverage of the deep proteome is key to detecting post-translational modifications in an unbiased way and the discrimination of protein isoforms. Multiple studies have shown to increase proteome sequence coverage by different approaches such as multienzyme proteolysis and extensive prefractionation or combinations thereof. Figure A shows the detected protein sequence coverage using the different here employed analysis strategies (trypsin, elastase, and Post Digest Mix, after high-pH fractionation using CHOPIN and Universal Method) and a combined result on the protein level. Data acquisition with CHOPIN consistently resulted in higher sequence coverage than the Universal Method, although the number of detected protein groups does not necessarily increase when a single protease is used (Figure S8). The limitations of tryptic digestion become obvious when the number of proteins with very high sequence coverage is compared with elastase or even the combined digests; only a small number of protein groups are detected with more than 90% sequence coverage: 123, compared with the far greater number from the Post Digest Mixture of trypsin and elastase proteolyzate: 771 (327 protein groups for the Elastase digest and 1462 protein groups for the complete data set).

Figure 3

Improved global protein sequence coverage using the CHOPIN workflow. (A) Protein sequence coverages observed with different analytical strategies illustrate the benefit of the methods used to improve protein sequence coverage and protein grouping as the number of identified protein groups could be increased significantly. The median protein sequence coverage of 13 728 protein groups (leading protein) was 57%, with 7935 protein groups being identified with more than 50% coverage. (B) Plotting sequence coverage of the combined data (leading protein per group) over molecular protein mass shows a distribution plume similar to a tornado (“Tornado plot”). Interestingly, the density of data points is relatively uniform across protein mass while showing highest density at 70–80% coverage, indicating a similar abundance for the majority of the proteome, independent of molecular weight. The right panel shows the archived protein sequence coverage in the different digests. Trypsin digests alone cannot generate sequence comprehensive data, while elastase digests can cover proteins better. However, the mixture of tryptic and elastase digest (“PDM”) appears to retain the benefits of both proteases and specifically benefits from the improved duty cycle in CHOPIN due to its extreme complexity (compare Table ). (C) 6323 proteins and corresponding iBAQ values[65] could be matched to previously published deep proteome data in MCF-7 cells by Geiger et al.[1] The median sequence coverage for the same set proteins could be improved from 43 to 61%. With the increased sequence coverage generated by CHOPIN and orthogonal digests with trypsin and elastase, more protein isoforms can be distinguished from their canonical variants. This leads to the identification of 13 728 protein groups representing 8949 genes in the combined data. In our database searches (UniProt Reference Homo sapiens database[48] containing a total of 85 889 human proteins and isoforms, retrieved 15/10/2014) we used parsimony-based protein inference, as described by Nesvizhskii and Aebersold,[49] to report the minimal number of proteins that can be observed with unique and razor peptides. We plotted the sequence coverage of the leading protein of all detected protein groups over their molecular weight (Figure B) to illustrate if there is any bias in coverage regarding protein size. The Tornado-shaped plume shows a higher density of data points in the low coverage (0–20%) part of the graph, but we can observe a more even distribution across the plume up to 100%. 7935 protein groups were observed with a sequence coverage for the leading protein of >50% in the merged data (median coverage = 57%). Instead of median sequence coverage this metric can be used to better reflect not only the depth at which a proteome is reported but also the comprehensiveness as it takes “one-hit-wonders” out of the equation. The unbiased search for peptide modifications by the PEAKS PTM search engine[47] allowed for the detection of up to 485 different modifications due to a de novo sequence tag mapping before the database search. In the combined data we discovered a total of 206 different peptide modifications on a total of 193 548 sites (Figure S6). About half of the modifications can be explained by sample processing and plausible artifacts, resulting in a total of 91 modification types on 81 905 sites that can be classified as biological post-translational modifications (Tab. S3). Because the broad cleavage specificity does not allow us to estimate relative protein abundances within a sample, we retrieved iBAQ values for the MCF-7 proteome from Geiger et al.[1] to see if we can cover even low abundant proteins more comprehensively than before (Figure C). Here we plotted the protein sequence coverage of proteins common in both data sets over the corresponding iBAQ value retrieved from Geiger et al.[1] As expected, highly abundant proteins can be observed with higher protein sequence coverage. However, in our data set the median sequence coverage of the same set proteins could be increased from 42.9 (left panel) to 61% (right panel), with a large proportion of proteins covered with >90% (53 vs 1461 protein groups). This result indicates a step toward complete sequence coverage detection, independent of protein abundance.

Application of Elastase in Total Proteome Digests

Elastase is often used to increase protein sequence coverage for noncomplex protein samples due to its broad cleavage specificity.[55] While unspecific proteases such as Proteinase K have been used in the past on membrane proteins[56] and to analyze interpeptide cross-links,[57] the data analysis still represents a major challenge as cleavage specificity significantly reduces the computational effort for peptide identification. Recent sequence tag[58] or de-novo-based[46,59,60] methods for peptide identification can benefit from the detection of sequence information prior to the application of precursor mass and cleavage specificity to reduce the search space and achieve similar result characteristics as standard search algorithms. In this study, for the first time, we used elastase on total cell extracts to supplement for classical multienzyme approaches[5,25,61,62] to increase depth and sequence coverage of the MCF-7 proteome. Interestingly, by examining such a complex data set, we refined the distinct cleavage pattern for elastase,[55] as shown in Figure . We noted that the vast majority of cleavages (86.77%) occur at specifically A, V, I, T, L, and S as P1. Additional 10.3% of cleavages were observed following R, G, M, and K as P1. The identity of P1′ was less relevant with the exception of proline and tryptophan effectively inhibiting cleavage. Taken together, we can conclude that elastase does have a high but broad specificity toward the amino acids A, V, I, T, L, S, R, G, M, K, in the P1 position with a total of 97.7%. Clearly, the ability of elastase to skip multiple cleavage sites generates a peptide population that is highly orthogonal to trypsin-generated peptides and therefore complements a tryptic digest.

Figure 4

Comprehensive elastase cleavage profile analysis reveals preference toward small aliphatic amino acids. This study demonstrates the feasibility of using elastase as orthogonal protease to trypsin with the potential to replace the classical, narrow specificity multienzyme approach. We detected similar specificity as Rietschel et al.[55] based now on 129 677 observed cleavages. 86.77% of cleavages were specific to A, V, I, T, L, and S as P1. However, additional 10.3% of cleavages were detected on R, G, M, and K as P1, indicating a broad but high cleavage specificity of elastase. On the basis of 129 677 identified peptides in the elastase data, we were able to add peptide IDs orthogonal to the trypsin-derived identifications. In combination, these data allowed the differentiation of protein isoforms that are often inseparable using standard digestion methods. This is further improved due to the randomly introduced missed cleavage sites after tryptic digestion by using the GASP sample preparation methods due to lysine alkylation. As a result, we created peptide populations, which are able to distinguish subtle sequence differences between protein isoforms. This can be demonstrated by comparing the number of identified proteins with the number of identified protein groups (Table ) in the same workflow. The difference between identified proteins (13 019 for CHOPIN, trypsin) and protein groups (8745 for CHOPIN, trypsin) indicates a high number of protein groups with multiple protein entries. In the combined data both numbers are relatively similar (14 890 proteins vs 13 728 protein groups), indicating most protein groups contained one protein instead multiple products of the same gene. Even though peptide identification is significantly improved using the de-novo-based search algorithm in PEAKS, an elastase digested cell extract provides a challenge for false-positive estimation due to the presence of short ambiguous peptide sequences. Instead of defining a minimal peptide length we choose the more conservative option to increase the protein score threshold to achieve 1% protein FDR (compare Table ), which effectively results in the necessity of up to five peptides (unique or razor) being identified with a peptide FDR of 1% for a protein hit in the CHOPIN elastase data when all of the peptide scores for this protein hit are low. Consequently, only 16 out of 13 728 protein groups in the complete data set are identified with only a single (high scoring) peptide. The percentage of isoforms in the here-identified protein groups (27.5%) is very similar to the percentage of isoforms in the database used (25.09%), giving us an indication that protein parsimony is not overly optimistic when isoforms can be distinguished into separate protein groups. Moreover, as shown for the trypsin digestion data sets, the limit of detection of proteins in a whole-cell lysate is determined by the absolute sensitivity of the workflow and to a lesser extent by the data acquisition method if undersampling is avoided (Figure S8). However, CHOPIN could be used to significantly increase the sequence coverage of the proteins detected, which is very beneficial for protein metrics, especially if combined with broad specificity digestion protocols. Our data also raises questions with regards to protein isoform identification. The unified modeling of both FDR and protein grouping in large data sets is an ongoing debate.[63,64] Existing models may well lead to inflated protein group counts from high-coverage data set, particularly with the advent of de-novo-based search tools and of broad specificity proteolysis allowing differentiation between isoforms with almost identical sequences. While standard protein parsimony can be applied for protein grouping and single peptide hits can be virtually excluded as demonstrated here, further advances in the detection of protein isoforms (and PTMs) will likely require new FDR models to minimize false-positives. In the data reported here, the number of protein groups identified when all data are combined with a unified FDR model is considerably higher than achieved by any of the method/digest mix combinations separately (Figure A). While the combined data arguably justify these numbers in terms of greater sequence coverage, we report the “All data” total with the above considerations in mind (see also the Supporting Information).

Conclusions

We have developed CHarge Ordered Parallel Ion aNalysis to improve the duty cycle of an Orbitrap Fusion (Lumos) by using both detectors in parallel for MS/MS spectra acquisition in a way that favors spectral quality according to the properties of the peptide precursor. Our results show that this leads to an expanded proteome coverage when combined with a broad specificity digestion approach. In addition, our study also highlights challenges that lie ahead for future developments in proteome research in the coming years. The analysis of data using different mass detectors with distinct mass errors and fragmentation modes has proved to be beneficial for the identification of the deep high-coverage proteome but also presents a major obstacle in the form of the quantity and variety of data generated by modern hybrid instruments. Available search tools need to adapt to such type of complex MS data to allow combined analysis and more sophisticated statistical evaluation. Second-generation search tools incorporating de novo algorithms allow the unbiased detection of hundreds of different modifications on tens of thousands of sites, even in existing data. As the deep proteome becomes more readily accessible, the focus must move to achieving high protein sequence coverage. Detection of proteins, their isoforms, and PTMs in a comprehensive and unbiased way is crucial to an expanded understanding of the proteome.

64 in total

1. Use of proteinase K nonspecific digestion for selective and comprehensive identification of interpeptide cross-links: application to prion proteins.

Authors: Evgeniy V Petrotchenko; Jason J Serpa; Darryl B Hardie; Mark Berjanskii; Bow P Suriyamongkol; David S Wishart; Christoph H Borchers
Journal: Mol Cell Proteomics Date: 2012-03-21 Impact factor: 5.911

2. Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry.

Authors: Marshall Bern; Yuhan Cai; David Goldberg
Journal: Anal Chem Date: 2007-01-23 Impact factor: 6.986

3. The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra.

Authors: Ignat V Shilov; Sean L Seymour; Alpesh A Patel; Alex Loboda; Wilfred H Tang; Sean P Keating; Christie L Hunter; Lydia M Nuwaysir; Daniel A Schaeffer
Journal: Mol Cell Proteomics Date: 2007-05-27 Impact factor: 5.911

4. Combination of FASP and StageTip-based fractionation allows in-depth analysis of the hippocampal membrane proteome.

Authors: Jacek R Wiśniewski; Alexandre Zougman; Matthias Mann
Journal: J Proteome Res Date: 2009-12 Impact factor: 4.466

5. Characterization and usage of the EASY-spray technology as part of an online 2D SCX-RP ultra-high pressure system.

Authors: Fabio Marino; Alba Cristobal; Nadine A Binai; Nicolai Bache; Albert J R Heck; Shabaz Mohammed
Journal: Analyst Date: 2014-12-21 Impact factor: 4.616

Review 6. Evolution of Orbitrap Mass Spectrometry Instrumentation.

Authors: Shannon Eliuk; Alexander Makarov
Journal: Annu Rev Anal Chem (Palo Alto Calif) Date: 2015 Impact factor: 10.745

7. Toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry.

Authors: Christian K Frese; A F Maarten Altelaar; Henk van den Toorn; Dirk Nolting; Jens Griep-Raming; Albert J R Heck; Shabaz Mohammed
Journal: Anal Chem Date: 2012-10-31 Impact factor: 6.986

8. Consecutive proteolytic digestion in an enzyme reactor increases depth of proteomic and phosphoproteomic analysis.

Authors: Jacek R Wiśniewski; Matthias Mann
Journal: Anal Chem Date: 2012-03-01 Impact factor: 6.986

9. Super-SILAC allows classification of diffuse large B-cell lymphoma subtypes by their protein expression profiles.

Authors: Sally J Deeb; Rochelle C J D'Souza; Jürgen Cox; Marc Schmidt-Supprian; Matthias Mann
Journal: Mol Cell Proteomics Date: 2012-03-21 Impact factor: 5.911

10. Performance Investigation of Proteomic Identification by HCD/CID Fragmentations in Combination with High/Low-Resolution Detectors on a Tribrid, High-Field Orbitrap Instrument.

Authors: Chengjian Tu; Jun Li; Shichen Shen; Quanhu Sheng; Yu Shyr; Jun Qu
Journal: PLoS One Date: 2016-07-29 Impact factor: 3.240

42 in total

1. Immunopeptidomic Analysis Reveals That Deamidated HLA-bound Peptides Arise Predominantly from Deglycosylated Precursors.

Authors: Shutao Mei; Rochelle Ayala; Sri H Ramarathinam; Patricia T Illing; Pouya Faridi; Jiangning Song; Anthony W Purcell; Nathan P Croft
Journal: Mol Cell Proteomics Date: 2020-05-01 Impact factor: 5.911

2. Surfactant and Chaotropic Agent Assisted Sequential Extraction/On-Pellet Digestion (SCAD) for Enhanced Proteomics.

Authors: Fengfei Ma; Fabao Liu; Wei Xu; Lingjun Li
Journal: J Proteome Res Date: 2018-07-09 Impact factor: 4.466

3. Lineage-Restricted Regulation of SCD and Fatty Acid Saturation by MITF Controls Melanoma Phenotypic Plasticity.

Authors: Yurena Vivas-García; Paola Falletta; Jana Liebing; Pakavarin Louphrasitthiphol; Yongmei Feng; Jagat Chauhan; David A Scott; Nicole Glodde; Ana Chocarro-Calvo; Sarah Bonham; Andrei L Osterman; Roman Fischer; Ze'ev Ronai; Custodia García-Jiménez; Michael Hölzel; Colin R Goding
Journal: Mol Cell Date: 2019-11-13 Impact factor: 17.970

4. Structural Basis of Dot1L Stimulation by Histone H2B Lysine 120 Ubiquitination.

Authors: Marco Igor Valencia-Sánchez; Pablo De Ioannes; Miao Wang; Nikita Vasilyev; Ruoyu Chen; Evgeny Nudler; Jean-Paul Armache; Karim-Jean Armache
Journal: Mol Cell Date: 2019-04-10 Impact factor: 17.970

5. The effects of endogenously- and exogenously-induced hyperketonemia on exercise performance and adaptation.

Authors: David J Dearlove; Adrian Soto Mota; David Hauton; Katherine Pinnick; Rhys Evans; Jack Miller; Roman Fischer; James S O Mccullagh; Leanne Hodson; Kieran Clarke; Pete J Cox
Journal: Physiol Rep Date: 2022-05

6. Lack of activity of recombinant HIF prolyl hydroxylases (PHDs) on reported non-HIF substrates.

Authors: Matthew E Cockman; Kerstin Lippl; Ya-Min Tian; Johanna Myllyharju; Christopher J Schofield; Peter J Ratcliffe; Hamish B Pegg; William D Figg; Martine I Abboud; Raphael Heilig; Roman Fischer
Journal: Elife Date: 2019-09-10 Impact factor: 8.140

7. Probing the Sensitivity of the Orbitrap Lumos Mass Spectrometer Using a Standard Reference Protein in a Complex Background.

Authors: Michaella J Levy; Michael P Washburn; Laurence Florens
Journal: J Proteome Res Date: 2018-09-13 Impact factor: 4.466

8. Comprehensive identification of RNA-protein interactions in any organism using orthogonal organic phase separation (OOPS).

Authors: Rayner M L Queiroz; Tom Smith; Eneko Villanueva; Maria Marti-Solano; Mie Monti; Mariavittoria Pizzinga; Dan-Mircea Mirea; Manasa Ramakrishna; Robert F Harvey; Veronica Dezi; Gavin H Thomas; Anne E Willis; Kathryn S Lilley
Journal: Nat Biotechnol Date: 2019-01-03 Impact factor: 54.908

9. Nf1-Mutant Tumors Undergo Transcriptome and Kinome Remodeling after Inhibition of either mTOR or MEK.

Authors: Daniela Pucciarelli; Steven P Angus; Benjamin Huang; Chi Zhang; Hiroki J Nakaoka; Ganesh Krishnamurthi; Sourav Bandyopadhyay; D Wade Clapp; Kevin Shannon; Gary L Johnson; Jean L Nakamura
Journal: Mol Cancer Ther Date: 2020-08-26 Impact factor: 6.261

10. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics.

Authors: Mathias Wilhelm; Daniel P Zolg; Michael Graber; Siegfried Gessulat; Tobias Schmidt; Karsten Schnatbaum; Celina Schwencke-Westphal; Philipp Seifert; Niklas de Andrade Krätzig; Johannes Zerweck; Tobias Knaute; Eva Bräunlein; Patroklos Samaras; Ludwig Lautenbacher; Susan Klaeger; Holger Wenschuh; Roland Rad; Bernard Delanghe; Andreas Huhmer; Steven A Carr; Karl R Clauser; Angela M Krackhardt; Ulf Reimer; Bernhard Kuster
Journal: Nat Commun Date: 2021-06-07 Impact factor: 14.919