| Literature DB >> 30647073 |
Richard D LeDuc1, Ryan T Fellers2, Bryan P Early3, Joseph B Greer2, Daniel P Shams4, Paul M Thomas3, Neil L Kelleher5.
Abstract
Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.Keywords: Algorithms; Automation; False Discovery Rate; Mathematical Modeling; Multiple Hypothesis Testing; Post-translational modifications*; Proteoform; Statistics; TDCD_FDR_CALCULATOR; Top-Down Proteomics
Mesh:
Substances:
Year: 2019 PMID: 30647073 PMCID: PMC6442365 DOI: 10.1074/mcp.RA118.000993
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911