Literature DB >> 21068002

The necessity of adjusting tests of protein category enrichment in discovery proteomics.

Brenton Louie1, Roger Higdon, Eugene Kolker.   

Abstract

MOTIVATION: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias.
RESULTS: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21068002      PMCID: PMC2995116          DOI: 10.1093/bioinformatics/btq541

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  19 in total

1.  AAindex: amino acid index database.

Authors:  S Kawashima; M Kanehisa
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Statistical significance for genomewide studies.

Authors:  John D Storey; Robert Tibshirani
Journal:  Proc Natl Acad Sci U S A       Date:  2003-07-25       Impact factor: 11.205

Review 3.  Pathways to the analysis of microarray data.

Authors:  R Keira Curtis; Matej Oresic; Antonio Vidal-Puig
Journal:  Trends Biotechnol       Date:  2005-08       Impact factor: 19.536

Review 4.  Protein identification and expression analysis using mass spectrometry.

Authors:  Eugene Kolker; Roger Higdon; Jason M Hogan
Journal:  Trends Microbiol       Date:  2006-04-17       Impact factor: 17.079

5.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure.

Authors:  Adrian Alexa; Jörg Rahnenführer; Thomas Lengauer
Journal:  Bioinformatics       Date:  2006-04-10       Impact factor: 6.937

6.  Computational prediction of proteotypic peptides for quantitative proteomics.

Authors:  Parag Mallick; Markus Schirle; Sharon S Chen; Mark R Flory; Hookeun Lee; Daniel Martin; Jeffrey Ranish; Brian Raught; Robert Schmitt; Thilo Werner; Bernhard Kuster; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2006-12-31       Impact factor: 54.908

7.  Using GOstats to test gene lists for GO term association.

Authors:  S Falcon; R Gentleman
Journal:  Bioinformatics       Date:  2006-11-10       Impact factor: 6.937

8.  A note on the false discovery rate and inconsistent comparisons between experiments.

Authors:  Roger Higdon; Gerald van Belle; Eugene Kolker
Journal:  Bioinformatics       Date:  2008-04-19       Impact factor: 6.937

9.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

10.  Global analysis of protein expression in yeast.

Authors:  Sina Ghaemmaghami; Won-Ki Huh; Kiowa Bower; Russell W Howson; Archana Belle; Noah Dephoure; Erin K O'Shea; Jonathan S Weissman
Journal:  Nature       Date:  2003-10-16       Impact factor: 49.962

View more
  2 in total

1.  MOPED enables discoveries through consistently processed proteomics data.

Authors:  Roger Higdon; Elizabeth Stewart; Larissa Stanberry; Winston Haynes; John Choiniere; Elizabeth Montague; Nathaniel Anderson; Gregory Yandl; Imre Janko; William Broomall; Simon Fishilevich; Doron Lancet; Natali Kolker; Eugene Kolker
Journal:  J Proteome Res       Date:  2013-12-18       Impact factor: 4.466

2.  1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data.

Authors:  Juergen Cox; Matthias Mann
Journal:  BMC Bioinformatics       Date:  2012-11-05       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.