Literature DB >> 32175744

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.

Pavel Sulimov1, Attila Kertész-Farkas1.   

Abstract

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, we introduce a novel, nonparametric, heuristic PSM score calibration method, called Tailor, which calibrates PSM scores by dividing them with the top 100-quantile of the empirical, spectrum-specific null distributions (i.e., the score with an associated p-value of 0.01 at the tail, hence the name) observed during database searching. Tailor does not require any optimization steps or long calculations; it does not rely on any assumptions on the form of the score distribution (i.e., if it is, e.g., binomial); however, it relies on our empirical observation that the mean and the variance of the null distributions are correlated. In our benchmark, we re-calibrated the match scores of XCorr from Crux, HyperScore scores from X!Tandem, and the p-values from OMSSA with the Tailor method and obtained more spectrum annotations than with raw scores at any false discovery rate level. Moreover, Tailor provided slightly more annotations than E-values of X!Tandem and OMSSA and approached the performance of the computationally exhaustive exact p-value method for XCorr on spectrum data sets containing low-resolution fragmentation information (MS2) around 20-150 times faster. On high-resolution MS2 data sets, the Tailor method with XCorr achieved state-of-the-art performance and produced more annotations than the well-calibrated residue-evidence (Res-ev) score around 50-80 times faster.

Entities:  

Keywords:  PSM scores; database search; fast; heuristic; peptide assignment; score calibration; spectrum identification

Mesh:

Substances:

Year:  2020        PMID: 32175744     DOI: 10.1021/acs.jproteome.9b00736

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


  4 in total

1.  Does Data-Independent Acquisition Data Contain Hidden Gems? A Case Study Related to Alzheimer's Disease.

Authors:  Evan E Hubbard; Lilian R Heil; Gennifer E Merrihew; Jasmeer P Chhatwal; Martin R Farlow; Catriona A McLean; Bernardino Ghetti; Kathy L Newell; Matthew P Frosch; Randall J Bateman; Eric B Larson; C Dirk Keene; Richard J Perrin; Thomas J Montine; Michael J MacCoss; Ryan R Julian
Journal:  J Proteome Res       Date:  2021-11-24       Impact factor: 4.466

2.  Building Spectral Libraries from Narrow-Window Data-Independent Acquisition Mass Spectrometry Data.

Authors:  Lilian R Heil; William E Fondrie; Christopher D McGann; Alexander J Federation; William S Noble; Michael J MacCoss; Uri Keich
Journal:  J Proteome Res       Date:  2022-05-12       Impact factor: 5.370

3.  TIDD: tool-independent and data-dependent machine learning for peptide identification.

Authors:  Honglan Li; Seungjin Na; Kyu-Baek Hwang; Eunok Paek
Journal:  BMC Bioinformatics       Date:  2022-03-30       Impact factor: 3.169

4.  DIAmeter: matching peptides to data-independent acquisition mass spectrometry data.

Authors:  Yang Young Lu; Jeff Bilmes; Ricard A Rodriguez-Mias; Judit Villén; William Stafford Noble
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.