So Young Ryu1, Wei-Jun Qian2, David G Camp2, Richard D Smith2, Ronald G Tompkins2, Ronald W Davis2, Wenzhong Xiao1. 1. Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA. 2. Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
Abstract
MOTIVATION: Mass spectrometry (MS)-based high-throughput quantitative proteomics shows great potential in large-scale clinical biomarker studies, identifying and quantifying thousands of proteins in biological samples. However, there are unique challenges in analyzing the quantitative proteomics data. One issue is that the quantification of a given peptide is often missing in a subset of the experiments, especially for less abundant peptides. Another issue is that different MS experiments of the same study have significantly varying numbers of peptides quantified, which can result in more missing peptide abundances in an experiment that has a smaller total number of quantified peptides. To detect as many biomarker proteins as possible, it is necessary to develop bioinformatics methods that appropriately handle these challenges. RESULTS: We propose a Significance Analysis for Large-scale Proteomics Studies (SALPS) that handles missing peptide intensity values caused by the two mechanisms mentioned above. Our model has a robust performance in both simulated data and proteomics data from a large clinical study. Because varying patients' sample qualities and deviating instrument performances are not avoidable for clinical studies performed over the course of several years, we believe that our approach will be useful to analyze large-scale clinical proteomics data. AVAILABILITY AND IMPLEMENTATION: R codes for SALPS are available at http://www.stanford.edu/%7eclairesr/software.html.
MOTIVATION: Mass spectrometry (MS)-based high-throughput quantitative proteomics shows great potential in large-scale clinical biomarker studies, identifying and quantifying thousands of proteins in biological samples. However, there are unique challenges in analyzing the quantitative proteomics data. One issue is that the quantification of a given peptide is often missing in a subset of the experiments, especially for less abundant peptides. Another issue is that different MS experiments of the same study have significantly varying numbers of peptides quantified, which can result in more missing peptide abundances in an experiment that has a smaller total number of quantified peptides. To detect as many biomarker proteins as possible, it is necessary to develop bioinformatics methods that appropriately handle these challenges. RESULTS: We propose a Significance Analysis for Large-scale Proteomics Studies (SALPS) that handles missing peptide intensity values caused by the two mechanisms mentioned above. Our model has a robust performance in both simulated data and proteomics data from a large clinical study. Because varying patients' sample qualities and deviating instrument performances are not avoidable for clinical studies performed over the course of several years, we believe that our approach will be useful to analyze large-scale clinical proteomics data. AVAILABILITY AND IMPLEMENTATION: R codes for SALPS are available at http://www.stanford.edu/%7eclairesr/software.html.
Authors: Haixu Tang; Randy J Arnold; Pedro Alves; Zhiyin Xun; David E Clemmer; Milos V Novotny; James P Reilly; Predrag Radivojac Journal: Bioinformatics Date: 2006-07-15 Impact factor: 6.937
Authors: Ann L Oberg; Douglas W Mahoney; Jeanette E Eckel-Passow; Christopher J Malone; Russell D Wolfinger; Elizabeth G Hill; Leslie T Cooper; Oyere K Onuma; Craig Spiro; Terry M Therneau; H Robert Bergen Journal: J Proteome Res Date: 2008-01-04 Impact factor: 4.466
Authors: Yuliya Karpievitch; Jeff Stanley; Thomas Taverner; Jianhua Huang; Joshua N Adkins; Charles Ansong; Fred Heffron; Thomas O Metz; Wei-Jun Qian; Hyunjin Yoon; Richard D Smith; Alan R Dabney Journal: Bioinformatics Date: 2009-06-17 Impact factor: 6.937
Authors: Sophie Paczesny; Thomas M Braun; John E Levine; Jason Hogan; Jeffrey Crawford; Bryan Coffing; Stephen Olsen; Sung W Choi; Hong Wang; Vitor Faca; Sharon Pitteri; Qing Zhang; Alice Chin; Carrie Kitko; Shin Mineishi; Gregory Yanik; Edward Peres; David Hanauer; Ying Wang; Pavan Reddy; Samir Hanash; James L M Ferrara Journal: Sci Transl Med Date: 2010-01-06 Impact factor: 17.956