| Literature DB >> 35454168 |
Le Zhang1,2,3, Geng Liu2, Guixue Hou3, Haitao Xiang1,3, Xi Zhang3, Ying Huang2, Xiuqing Zhang1,2,3, Bo Li2,3, Leo J Lee2,3,4.
Abstract
Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search, and then build a targeted database for refined search. Evaluated on 18 representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 76%, compared to conventional searches with unspecific digestions, while maintaining a very high level of accuracy (~96%), as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data, so that it performs equally well on both well-studied and poorly-studied HLA types, unlike the previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to a conventional database search. Finally, we demonstrate the practical value of IntroSpect by discovering neoepitopes from MS data directly, an important application in cancer immunotherapies. IntroSpect is freely available to download and use.Entities:
Keywords: database search; immunopeptidome; mass spectrometry; motif; search space
Mesh:
Substances:
Year: 2022 PMID: 35454168 PMCID: PMC9025654 DOI: 10.3390/biom12040579
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1IntroSpect improves peptide identification sensitivity by reducing the search space. (a) The flowchart of the conventional database search and IntroSpect database search. (b) IntroSpect and SpectMHC decreased the database size and increased the proportion of identified MS/MS spectra with MS-GF+. The database size is calculated as the number of 9–11 mer peptides in the database. (c) IntroSpect and SpectMHC significantly increased the identified peptides with MS-GF+ and Comet search engines, while IntroSpect consistently outperformed SpectMHC in terms of sensitivity.
Summary of immunopeptidome data sets.
| Dataset | Source | Spectra | HLA Alleles |
|---|---|---|---|
| K562 | Inhouse | 64,572 | transduced to express only A*11:01 |
| B721.221 | Public 39 | 111,662 | transduced to express only A*02:07 |
| Jurkat | Public 21 | 670,119 | A*03:01, B*07:02, C*04:01, A*03:01, B*35:03, C*07:02 |
| Train1 | Public 40 | 84,453 | A*11:01, B*27:02, C*03:03, A*11:01, B*55:01, C*05:01 |
| Train9 | Public 40 | 88,437 | A*11:01, B*51:01, C*01:02, A*68:01, B*56:01, C*07:02 |
| Train10 | Public 40 | 170,101 | A*29:02, B*44:03, C*04:01, A*29:02, B*35:01, C*16:01 |
| Train13 | Public 40 | 128,712 | A*01:01, B*08:01, C*07:01, A*03:01, B*35:01, C*04:01 |
| Train22 | Public 40 | 273,039 | A*31:01, B*08:01, C*12:03, A*03:01, B*38:01, C*07:01 |
| Train28 | Public 40 | 192,712 | A*03:01, B*35:03, C*03:03, A*03:01, B*51:01, C*04:01 |
| Train29 | Public 40 | 175,619 | A*03:02, B*44:03, C*03:03, A*26:01, B*35:02, C*16:01 |
| Train32 | Public 40 | 123,863 | A*29:02, B*44:03, C*07:02, A*03:01, B*07:02, C*16:01 |
| Train33 | Public 40 | 463,383 | A*02:03, B*15:02, C*08:01, A*68:01, B*15:13, C*08:01 |
| Train45 | Public 40 | 178,449 | A*31:01, B*44:02, C*05:01, A*01:01, B*67:01, C*12:03 |
| Train48 | Public 40 | 468,069 | A*24:02, B*18:01, C*07:02, A*25:01, B*07:02, C*12:03 |
| Train50 | Public 40 | 142,681 | A*33:03, B*44:03, C*07:06, A*68:01, B*35:01, C*04:01 |
| Train55 | Public 40 | 281,891 | A*01:01, B*08:01, C*07:01, A*24:02, B*08:01, C*07:01 |
| Train62 | Public 40 | 168,243 | A*02:01, B*44:02, C*05:01, A*68:01, B*44:02, C*07:04 |
| Train63 | Public 68 | 329,221 | A*31:01, B*44:02, C*05:01, A*02:01, B*27:05, C*02:02 |
Figure 2Immunopeptides from IntroSpect and conventional database search are very similar. (a) The histogram of predicted BA rank values of peptides identified by conventional and IntroSpect search: the peptides in separate panels are predicted to be strong or weak binders (BA rank < 2%), with their percentages marked on the panel. (b) The sequence logos of immunopeptides in three datasets (B721.221-A*02:07, K562-A*11:01 and Jurkat-B*07:02) identified by the conventional and IntroSpect search. (c) Amino acid frequencies at each position for peptides identified by the conventional and IntroSpect search.
Randomly selected peptides identified by IntroSpect and conventional database search were confirmed by spectral validation.
| Software | Source | Identified | Selected for | Confirmed | Precision (%) |
|---|---|---|---|---|---|
| MS-GF+ | Both conventional and IntroSpect | 2385 | 91 | 89 | 97.80 |
| IntroSpect only | 993 | 27 | 26 | 96.30 |
Figure 3The high consistency of identified spectra and peptides between conventional and IntroSpect search. (a) Most of the spectra (top panel) or immunopeptides (bottom panel) detected by conventional method can be identified through IntroSpect. Regions 1, 2 and 3 denote spectra (top panel) or immunopeptides (bottom panel) detected by conventional only, both, or IntroSpect only. The percentages are calculated based on the total number of peptides or spectra identified by both methods. The gray boxes on the right panel denote cell lines. (b) A fraction of spectra newly identified by IntroSpect were matched to peptides previously identified by conventional search (refined peptides). These refined peptides were indicated in dark shades. (c) The number of assigned spectra for refined peptides increased substantially.
Figure 4IntroSpect generates a smaller and more targeted database than that of SpectMHC. (a) The line plot compares the numbers of identified peptides by SpectMHC and IntroSpect on databases with various matching sizes. The data point with an asterisk corresponds to the motif score of 0.3, the empirically chosen optimal threshold for IntroSpect. (b) The bar plot shows the relationship between the databases and identified peptides of IntroSpect and SpectMHC on databases with various sizes. (c) The line plot comparing the PCCaaf by SpectMHC and IntroSpect search on databases with various sizes. (d) PCCaaf at P2, P3, P9 and all positions between the databases and identified peptides by SpectMHC, IntroSpect and conventional search on the three datasets (B721.221, K562 and Jurkat).
Figure 5IntroSpect identified more neoepitopes than conventional search. (a) Flowcharts indicating key steps involved in neoepitope discovery. (b) Percolator q-values of neoepitopes identified by both methods are plotted. Underlined peptides have support in other studies. (c) Spectra of neoepitope candidates assigned by IntroSpect with assay support. Peaks represent b ions in green, y ions in orange and precursor ions in dark grey. (d) The numbers of neoepitopes identified by the two methods under different FDRs.