Literature DB >> 27418748

DecoyPyrat: Fast Non-redundant Hybrid Decoy Sequence Generation for Large Scale Proteomics.

James C Wright1, Jyoti S Choudhary1.   

Abstract

Accurate statistical evaluation of sequence database peptide identifications from tandem mass spectra is essential in mass spectrometry based proteomics experiments. These statistics are dependent on accurately modelling random identifications. The target-decoy approach has risen to become the de facto approach to calculating FDR in proteomic datasets. The main principle of this approach is to search a set of decoy protein sequences that emulate the size and composition of the target protein sequences searched whilst not matching real proteins in the sample. To do this, it is commonplace to reverse or shuffle the proteins and peptides in the target database. However, these approaches have their drawbacks and limitations. A key confounding issue is the peptide redundancy between target and decoy databases leading to inaccurate FDR estimation. This inaccuracy is further amplified at the protein level and when searching large sequence databases such as those used for proteogenomics. Here, we present a unifying hybrid method to quickly and efficiently generate decoy sequences with minimal overlap between target and decoy peptides. We show that applying a reversed decoy approach can produce up to 5% peptide redundancy and many more additional peptides will have the exact same precursor mass as a target peptide. Our hybrid method addresses both these issues by first switching proteolytic cleavage sites with preceding amino acid, reversing the database and then shuffling any redundant sequences. This flexible hybrid method reduces the peptide overlap between target and decoy peptides to about 1% of peptides, making a more robust decoy model suitable for large search spaces. We also demonstrate the anti-conservative effect of redundant peptides on the calculation of q-values in mouse brain tissue data.

Entities:  

Keywords:  Database searching; FDR; Python; Sequence database; Shotgun proteomics; Target-decoy

Year:  2016        PMID: 27418748      PMCID: PMC4941923          DOI: 10.4172/jpb.1000404

Source DB:  PubMed          Journal:  J Proteomics Bioinform        ISSN: 0974-276X


  18 in total

1.  Probability-based protein identification by searching sequence databases using mass spectrometry data.

Authors:  D N Perkins; D J Pappin; D M Creasy; J S Cottrell
Journal:  Electrophoresis       Date:  1999-12       Impact factor: 3.535

2.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Authors:  Andrew Keller; Alexey I Nesvizhskii; Eugene Kolker; Ruedi Aebersold
Journal:  Anal Chem       Date:  2002-10-15       Impact factor: 6.986

3.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.

Authors:  Joshua E Elias; Steven P Gygi
Journal:  Nat Methods       Date:  2007-03       Impact factor: 28.547

Review 4.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases.

Authors:  Lukas Käll; John D Storey; Michael J MacCoss; William Stafford Noble
Journal:  J Proteome Res       Date:  2007-12-08       Impact factor: 4.466

5.  A refined method to calculate false discovery rates for peptide identification using decoy databases.

Authors:  Pedro Navarro; Jesús Vázquez
Journal:  J Proteome Res       Date:  2009-04       Impact factor: 4.466

6.  Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets.

Authors:  Luca Blanco; Jennifer A Mead; Conrad Bessant
Journal:  J Proteome Res       Date:  2009-04       Impact factor: 4.466

7.  Andromeda: a peptide search engine integrated into the MaxQuant environment.

Authors:  Jürgen Cox; Nadin Neuhauser; Annette Michalski; Richard A Scheltema; Jesper V Olsen; Matthias Mann
Journal:  J Proteome Res       Date:  2011-02-22       Impact factor: 4.466

8.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets.

Authors:  Marina Spivak; Jason Weston; Léon Bottou; Lukas Käll; William Stafford Noble
Journal:  J Proteome Res       Date:  2009-07       Impact factor: 4.466

9.  Initial quantitative proteomic map of 28 mouse tissues using the SILAC mouse.

Authors:  Tamar Geiger; Ana Velic; Boris Macek; Emma Lundberg; Caroline Kampf; Nagarjuna Nagaraj; Mathias Uhlen; Juergen Cox; Matthias Mann
Journal:  Mol Cell Proteomics       Date:  2013-02-22       Impact factor: 5.911

10.  MS-GF+ makes progress towards a universal database search tool for proteomics.

Authors:  Sangtae Kim; Pavel A Pevzner
Journal:  Nat Commun       Date:  2014-10-31       Impact factor: 14.919

View more
  7 in total

1.  Aortic disease in Marfan syndrome is caused by overactivation of sGC-PRKG signaling by NO.

Authors:  Andrea de la Fuente-Alonso; Marta Toral; Alvaro Alfayate; María Jesús Ruiz-Rodríguez; Elena Bonzón-Kulichenko; Gisela Teixido-Tura; Sara Martínez-Martínez; María José Méndez-Olivares; Dolores López-Maderuelo; Ileana González-Valdés; Eusebio Garcia-Izquierdo; Susana Mingo; Carlos E Martín; Laura Muiño-Mosquera; Julie De Backer; J Francisco Nistal; Alberto Forteza; Arturo Evangelista; Jesús Vázquez; Miguel R Campanero; Juan Miguel Redondo
Journal:  Nat Commun       Date:  2021-05-11       Impact factor: 14.919

2.  Shotgun proteomics datasets acquired on Gammarus pulex animals sampled from the wild.

Authors:  Duarte Gouveia; Yannick Cogne; Jean-Charles Gaillard; Christine Almunia; Olivier Pible; Adeline François; Davide Degli-Esposti; Olivier Geffard; Arnaud Chaumot; Jean Armengaud
Journal:  Data Brief       Date:  2019-10-12

3.  TASL is the SLC15A4-associated adaptor for IRF5 activation by TLR7-9.

Authors:  Leonhard X Heinz; JangEun Lee; Utkarsh Kapoor; Felix Kartnig; Vitaly Sedlyarov; Konstantinos Papakostas; Adrian César-Razquin; Patrick Essletzbichler; Ulrich Goldmann; Adrijana Stefanovic; Johannes W Bigenzahn; Stefania Scorzoni; Mattia D Pizzagalli; Ariel Bensimon; André C Müller; F James King; Jun Li; Enrico Girardi; M Lamine Mbow; Charles E Whitehurst; Manuele Rebsamen; Giulio Superti-Furga
Journal:  Nature       Date:  2020-05-13       Impact factor: 49.962

4.  The secreted inhibitor of invasive cell growth CREG1 is negatively regulated by cathepsin proteases.

Authors:  Alejandro Gomez-Auli; Larissa Elisabeth Hillebrand; Daniel Christen; Sira Carolin Günther; Martin Lothar Biniossek; Christoph Peters; Oliver Schilling; Thomas Reinheckel
Journal:  Cell Mol Life Sci       Date:  2020-05-08       Impact factor: 9.261

5.  GENCODE reference annotation for the human and mouse genomes.

Authors:  Adam Frankish; Mark Diekhans; Anne-Maud Ferreira; Rory Johnson; Irwin Jungreis; Jane Loveland; Jonathan M Mudge; Cristina Sisu; James Wright; Joel Armstrong; If Barnes; Andrew Berry; Alexandra Bignell; Silvia Carbonell Sala; Jacqueline Chrast; Fiona Cunningham; Tomás Di Domenico; Sarah Donaldson; Ian T Fiddes; Carlos García Girón; Jose Manuel Gonzalez; Tiago Grego; Matthew Hardy; Thibaut Hourlier; Toby Hunt; Osagie G Izuogu; Julien Lagarde; Fergal J Martin; Laura Martínez; Shamika Mohanan; Paul Muir; Fabio C P Navarro; Anne Parker; Baikang Pei; Fernando Pozo; Magali Ruffier; Bianca M Schmitt; Eloise Stapleton; Marie-Marthe Suner; Irina Sycheva; Barbara Uszczynska-Ratajczak; Jinuri Xu; Andrew Yates; Daniel Zerbino; Yan Zhang; Bronwen Aken; Jyoti S Choudhary; Mark Gerstein; Roderic Guigó; Tim J P Hubbard; Manolis Kellis; Benedict Paten; Alexandre Reymond; Michael L Tress; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

6.  Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon.

Authors:  Yousuf A Khan; Irwin Jungreis; James C Wright; Jonathan M Mudge; Jyoti S Choudhary; Andrew E Firth; Manolis Kellis
Journal:  BMC Genet       Date:  2020-03-06       Impact factor: 2.797

7.  An analysis of tissue-specific alternative splicing at the protein level.

Authors:  Jose Manuel Rodriguez; Fernando Pozo; Tomas di Domenico; Jesus Vazquez; Michael L Tress
Journal:  PLoS Comput Biol       Date:  2020-10-05       Impact factor: 4.475

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.