| Literature DB >> 34070654 |
Matthias Fahrner1,2,3, Lucas Kook4,5, Klemens Fröhlich1,2,3, Martin L Biniossek6, Oliver Schilling1,2,7,8.
Abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance in combination with the enzymatic constraints on the performance of three search engines. Three open-source search engines (Myrimatch, X! Tandem, and MSGF+) were evaluated concerning the suitability in semi- and unspecific searches as well as the importance of accurate fragment mass spectra in non-specific peptide searches. We then performed a semispecific reanalysis of the published NCI-60 deep proteome data applying the most suited parameters. Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines for semi- and non-specific peptide-spectrum matching. Semispecific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel. Highly accurate MS2 fragment data in combination with modern open-source search algorithms enable the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets.Entities:
Keywords: NCI-60 reanalysis; endogenous proteolysis; fragment mass tolerance; mass spectrometry; semispecific peptide search
Year: 2021 PMID: 34070654 PMCID: PMC8162549 DOI: 10.3390/proteomes9020026
Source DB: PubMed Journal: Proteomes ISSN: 2227-7382
Figure 1Effect of less stringent enzymatic constraints and fragment mass tolerances on peptide identification results. Human formalin-fixed, paraffin-embedded (FFPE) samples were digested using LysC and Trypsin and were analyzed using Myrimatch (upper panel) and X! Tandem (lower panel). The number of identified unique non-redundant peptide identifications of the three replicates are shown in a violin plot according to the enzymatic constraint and the fragment mass tolerance (10, 100, and 1000 ppm) of the search engine settings.
Figure 2Peptide search results from three different open-source search engines. Four biological replicates of Human Embryonic Kidney (HEK293T) cell proteome (A) and three adjacent formalin-fixed, paraffin-embedded (FFPE) tissue slides of Murine kidney (B) samples were digested using either LysC (A) or chymotrypsin (B) and were analyzed using MSGF+ (left), Myrimatch (middle), and X! Tandem (right). The number of identified unique non-redundant peptide hits (upper panel), the elapsed analysis time in min (middle panel) as well as the number of identified unique peptides per time (lower panel) is illustrated according to the enzymatic constraint settings.
Figure 3Large-scale semispecific reanalysis of published NCI-60 deep proteome dataset. Workflow used for the analysis of published NCI-60 deep proteome data using OpenMS tools in a workflow within the Galaxy framework. Whole proteome samples of nine representative cancer cell lines were separated into 24 samples using gel-based molecular weight separation. Peptide identification was performed using MSGF+ with semitryptic enzymatic constraint, followed by false discovery rate (FDR) computation and filtering for 1% FDR on the peptide-spectrum matching (PSM) level.
Figure 4Identification of semispecific N-terminal peptides and proteins with prominent endogenous proteolytic processing. (A) Bar chart showing the number of confidently identified semispecific N-terminal peptides. Primary identification results from semispecific peptide searches were filtered for unique peptides, which were identified in at least two out of nine cell lines. Peptides originating from protein N-terminal methionine clipping or representing the native C-terminus were excluded. Only semispecific N-terminal peptides derived from proteins that were proximal to the expected molecular weight gel slice were considered (Supplementary Figure S2). (B) Heatmap showing proteins for which at least 10 peptides were identified in at least one of the nine cell lines. The color indicates the number of semispecific peptides identified per protein in the respective cell lines.
List of conserved N-terminal peptide sequences identified in nine representative cancer cell lines. The list of confidently identified semitryptic peptides was filtered for peptides that were identified in all nine cell lines (see Figure 4A).
| Accessions | Sequence | Function | AA Before | Position | Expected Gel Slice | Observed Gel Slice |
|---|---|---|---|---|---|---|
| sp|Q9C0E2|XPO4_HUMAN | .(Acetyl)AAALGPPEVIAQLENAAK | Double clipping ** | M | 3 | 5 | 6 |
| sp|Q9BWU0|NADAP_HUMAN | .(Acetyl)ADILSQSETLASQDLSGDFKKPALPVSPAAR | Potential ATIS * | M | 56 | 7 | 7 |
| sp|O14980|XPO1_HUMAN | .(Acetyl)TM(Oxidation)LADHAAR | ATIS * | M | 6 | 6 | 7 |
| sp|Q9Y4L1|HYOU1_HUMAN | LAVM(Oxidation)SVDLGSESM(Oxidation)K | Removal of signal peptide | T | 33 | 6 | 6 |
| sp|P51610|HCFC1_HUMAN | THETGTTNTATTSNAGSAQR | Cleavage by autolysis/HCF C-terminal chain 4 | E | 1296 | 7 | 8 |
| sp|Q8N766|EMC1_HUMAN | VYEDQVGK | Removal of signal peptide | A | 22 | 6 | 7 |
* ATIS = Alternative translation initiation site. ** Double clipping of methionyl aminopeptidase.