Joao A Paulo1. 1. Department of Medicine, Harvard Medical School, Boston, MA.
Abstract
BACKGROUND: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. METHODS: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. RESULTS: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. CONCLUSIONS: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.
BACKGROUND: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. METHODS: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. RESULTS: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. CONCLUSIONS: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.
Entities:
Keywords:
Mass spectrometry; proteomics; search engine
Authors: D Brent Weatherly; James A Atwood; Todd A Minning; Cameron Cavola; Rick L Tarleton; Ron Orlando Journal: Mol Cell Proteomics Date: 2005-02-09 Impact factor: 5.911
Authors: Qizhi Hu; Robert J Noll; Hongyan Li; Alexander Makarov; Mark Hardman; R Graham Cooks Journal: J Mass Spectrom Date: 2005-04 Impact factor: 1.982
Authors: Ignat V Shilov; Sean L Seymour; Alpesh A Patel; Alex Loboda; Wilfred H Tang; Sean P Keating; Christie L Hunter; Lydia M Nuwaysir; Daniel A Schaeffer Journal: Mol Cell Proteomics Date: 2007-05-27 Impact factor: 5.911
Authors: David C Wedge; Ritesh Krishna; Paul Blackhurst; Jennifer A Siepen; Andrew R Jones; Simon J Hubbard Journal: J Proteome Res Date: 2011-02-21 Impact factor: 4.466
Authors: Peter Townsend; Qibin Zhang; Jason Shapiro; Bobbie-Jo Webb-Robertson; Lisa Bramer; Athena A Schepmoes; Karl K Weitz; Meaghan Mallette; Heather Moniz; Renee Bright; Marjorie Merrick; Samir A Shah; Bruce E Sands; Neal Leleiko Journal: Inflamm Bowel Dis Date: 2015-08 Impact factor: 5.325
Authors: Marion Janschitz; Natalie Romanov; Gina Varnavides; David Maria Hollenstein; Gabriela Gérecová; Gustav Ammerer; Markus Hartl; Wolfgang Reiter Journal: Cell Commun Signal Date: 2019-06-17 Impact factor: 5.712
Authors: Lucas S Torati; Hervé Migaud; Mary K Doherty; Justyna Siwy; Willian Mullen; Pedro E C Mesquita; Amaya Albalat Journal: PLoS One Date: 2017-10-24 Impact factor: 3.240
Authors: Jane I Khudyakov; Jared S Deyarmin; Ryan M Hekman; Laura Pujade Busqueta; Rasool Maan; Melony J Mody; Reeti Banerjee; Daniel E Crocker; Cory D Champagne Journal: Biol Open Date: 2018-11-19 Impact factor: 2.422
Authors: Bernat Morro; Mary K Doherty; Pablo Balseiro; Sigurd O Handeland; Simon MacKenzie; Harald Sveier; Amaya Albalat Journal: PLoS One Date: 2020-01-03 Impact factor: 3.240