Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

Literature DB >> 27498275

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

Enrique Audain¹, Julian Uszkoreit², Timo Sachsenberg³, Julianus Pfeuffer⁴, Xiao Liang⁴, Henning Hermjakob⁵, Aniel Sanchez⁶, Martin Eisenacher², Knut Reinert⁴, David L Tabb⁷, Oliver Kohlbacher⁸, Yasset Perez-Riverol⁹.

Abstract

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. SIGNIFICANCE: Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations.

Entities: Gene

Keywords: Algorithms; Bioinformatics; Mass spectrometry; Protein inference

Mesh：

Substances：
Peptides
Protein Isoforms

Year: 2016 PMID： 27498275 DOI： 10.1016/j.jprot.2016.08.002

Source DB: PubMed Journal: J Proteomics ISSN： 1874-3919 Impact factor: 4.044

Keyword Cloud
Cited

12 in total

1. ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data.

Authors: Joon-Yong Lee; Hyungwon Choi; Christopher M Colangelo; Darryl Davis; Michael R Hoopmann; Lukas Käll; Henry Lam; Samuel H Payne; Yasset Perez-Riverol; Matthew The; Ryan Wilson; Susan T Weintraub; Magnus Palmblad
Journal: J Biomol Tech Date: 2018-06-21

2. Protocol for Increasing the Sensitivity of MS-Based Protein Detection in Human Chorionic Villi.

Authors: Timur Shkrigunov; Pavel Pogodin; Victor Zgoda; Olesya Larina; Yulia Kisrieva; Maria Klimenko; Oleg Latyshkevich; Peter Klimenko; Andrey Lisitsa; Natalia Petushkova
Journal: Curr Issues Mol Biol Date: 2022-05-09 Impact factor: 2.976

3. Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs.

Authors: Karin Schork; Michael Turewicz; Julian Uszkoreit; Jörg Rahnenführer; Martin Eisenacher
Journal: PLoS One Date: 2022-10-21 Impact factor: 3.752

4. EPIFANY: A Method for Efficient High-Confidence Protein Inference.

Authors: Julianus Pfeuffer; Timo Sachsenberg; Tjeerd M H Dijkstra; Oliver Serang; Knut Reinert; Oliver Kohlbacher
Journal: J Proteome Res Date: 2020-02-13 Impact factor: 4.466

5. A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.

Authors: Matthew The; Fredrik Edfors; Yasset Perez-Riverol; Samuel H Payne; Michael R Hoopmann; Magnus Palmblad; Björn Forsström; Lukas Käll
Journal: J Proteome Res Date: 2018-04-16 Impact factor: 4.466

Review 6. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines.

Authors: Yasset Perez-Riverol; Pablo Moreno
Journal: Proteomics Date: 2019-12-18 Impact factor: 5.393

7. MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data.

Authors: Michael Riffle; Damon H May; Emma Timmins-Schiffman; Molly P Mikan; Daniel Jaschob; William Stafford Noble; Brook L Nunn
Journal: Proteomes Date: 2017-12-27

8. Future Prospects of Spectral Clustering Approaches in Proteomics.

Authors: Yasset Perez-Riverol; Juan Antonio Vizcaíno; Johannes Griss
Journal: Proteomics Date: 2018-07 Impact factor: 3.984

9. The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Authors: Yasset Perez-Riverol; Attila Csordas; Jingwen Bai; Manuel Bernal-Llinares; Suresh Hewapathirana; Deepti J Kundu; Avinash Inuganti; Johannes Griss; Gerhard Mayer; Martin Eisenacher; Enrique Pérez; Julian Uszkoreit; Julianus Pfeuffer; Timo Sachsenberg; Sule Yilmaz; Shivani Tiwary; Jürgen Cox; Enrique Audain; Mathias Walzer; Andrew F Jarnuczak; Tobias Ternent; Alvis Brazma; Juan Antonio Vizcaíno
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10. Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach.

Authors: Elizabeth Guruceaga; Alba Garin-Muga; Gorka Prieto; Bartolomé Bejarano; Miguel Marcilla; Consuelo Marín-Vicente; Yasset Perez-Riverol; J Ignacio Casal; Juan Antonio Vizcaíno; Fernando J Corrales; Victor Segura
Journal: J Proteome Res Date: 2017-10-11 Impact factor: 4.466