| Literature DB >> 27329701 |
Vladimir Gorshkov1, Stéphanie Yuki Kolbeck Hotta2, Thiago Verano-Braga2,3, Frank Kjeldsen2.
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co-isolation and thus prone to false identifications. The deconvolution approach matched complementary b-, y-ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co-isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20-35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.Entities:
Keywords: Bioinformatics; Complementary ions; De novo sequencing; Mass spectral interference; Mixture fragmentation spectra
Mesh:
Substances:
Year: 2016 PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1True identification rate and the number of true identifications at different mixture ratios for de novo (PEAKS) and database searching (MASCOT and SequestHT). Cutoff: IonsScore > 13 (MASCOT), XCorr > 1.0 (SequestHT), ALC > 66 (de novo); LD ≤ 2.
Figure 2(A) True identification rate and (B) number of correct identifications for different processing of artificial mixture spectra. PEAKS ALC > 66; LD ≤ 2.
Figure 3Number of correct sequences identified in artificial mixture spectra before and after mixture spectra deconvolution (PEAKS; ALC > 66). Red: sequences identified only in the deconvoluted dataset, blue: sequences identified only in the mixed dataset, purple: sequences identified in both datasets.
Valid identification rate by different processing of HeLa lysate sample
| Program | Valid identification rate | ||
|---|---|---|---|
| Unprocessed (%) | Purified (%) | Deconvoluted (%) | |
| PEAKS | 46.18 | 49.80 | 47.58 |
| pNovo+ | 45.63 | 52.59 | 51.74 |
| pepNovo+ | 56.01 | 58.60 | 44.70 |
| Novor | 45.66 | 47.94 | 45.42 |
Novor aaScore > 36; pNovo+ Score > 30; PEAKS ALC > 66; pepNovo+ no restriction
Figure 4Number of valid sequences identified by different processing of HeLa lysate sample. Novor aaScore > 36; pNovo+ Score > 30; PEAKS ALC > 66; pepNovo+ no restriction. Red: sequences identified only in the deconvoluted dataset, blue: sequences identified only in the mixed dataset, purple: sequences identified in both datasets.
Figure 5Number of sequences identified in scorpion samples. Novor aaScore > 36; pNovo+ Score > 30; PEAKS ALC > 66; pepNovo+ no restriction. Red: sequences identified only in deconvoluted dataset, blue: sequences identified only in mixed dataset, purple: sequences identified in both datasets. All identified sequences reported without validation.