Literature DB >> 19447782

Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins.

Wandaliz Torres-García1, Weiwen Zhang, George C Runger, Roger H Johnson, Deirdre R Meldrum.   

Abstract

MOTIVATION: Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.
RESULTS: In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19447782      PMCID: PMC2712339          DOI: 10.1093/bioinformatics/btp325

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  33 in total

Review 1.  The use of accurate mass tags for high-throughput microbial proteomics.

Authors:  Richard D Smith; Gordon A Anderson; Mary S Lipton; Christophe Masselon; Ljiljana Pasa-Tolic; Yufeng Shen; Harold R Udseth
Journal:  OMICS       Date:  2002

2.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.

Authors:  T Ideker; V Thorsson; J A Ranish; R Christmas; J Buhler; J K Eng; R Bumgarner; D R Goodlett; R Aebersold; L Hood
Journal:  Science       Date:  2001-05-04       Impact factor: 47.728

3.  Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts.

Authors:  Dov Greenbaum; Ronald Jansen; Mark Gerstein
Journal:  Bioinformatics       Date:  2002-04       Impact factor: 6.937

Review 4.  Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.

Authors:  Lei Nie; Gang Wu; David E Culley; Johannes C M Scholten; Weiwen Zhang
Journal:  Crit Rev Biotechnol       Date:  2007 Apr-Jun       Impact factor: 8.429

5.  Boosted trees for ecological modeling and prediction.

Authors:  Glenn De'ath
Journal:  Ecology       Date:  2007-01       Impact factor: 5.499

6.  A working guide to boosted regression trees.

Authors:  J Elith; J R Leathwick; T Hastie
Journal:  J Anim Ecol       Date:  2008-04-08       Impact factor: 5.091

7.  A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry.

Authors:  Weiwen Zhang; Marina A Gritsenko; Ronald J Moore; David E Culley; Lei Nie; Konstantinos Petritis; Eric F Strittmatter; David G Camp; Richard D Smith; Fred J Brockman
Journal:  Proteomics       Date:  2006-08       Impact factor: 3.984

8.  Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics.

Authors:  Vamsi K Mootha; Pierre Lepage; Kathleen Miller; Jakob Bunkenborg; Michael Reich; Majbrit Hjerrild; Terrye Delmonte; Amelie Villeneuve; Robert Sladek; Fenghao Xu; Grant A Mitchell; Charles Morin; Matthias Mann; Thomas J Hudson; Brian Robinson; John D Rioux; Eric S Lander
Journal:  Proc Natl Acad Sci U S A       Date:  2003-01-14       Impact factor: 11.205

Review 9.  Global analysis of gene expression in yeast.

Authors:  Christine E Horak; Michael Snyder
Journal:  Funct Integr Genomics       Date:  2002-07-10       Impact factor: 3.410

10.  Exploring glycopeptide-resistance in Staphylococcus aureus: a combined proteomics and transcriptomics approach for the identification of resistance-related markers.

Authors:  Alexander Scherl; Patrice François; Yvan Charbonnier; Jacques M Deshusses; Thibaud Koessler; Antoine Huyghe; Manuela Bento; Jianru Stahl-Zeng; Adrien Fischer; Alexandre Masselot; Alireza Vaezzadeh; Francesca Gallé; Adriana Renzoni; Pierre Vaudaux; Daniel Lew; Catherine G Zimmermann-Ivol; Pierre-Alain Binz; Jean-Charles Sanchez; Denis F Hochstrasser; Jacques Schrenzel
Journal:  BMC Genomics       Date:  2006-11-22       Impact factor: 3.969

View more
  12 in total

1.  Robust Score Tests With Missing Data in Genomics Studies.

Authors:  Kin Yau Wong; Donglin Zeng; D Y Lin
Journal:  J Am Stat Assoc       Date:  2019-02-26       Impact factor: 5.033

2.  Predicting the dynamics of protein abundance.

Authors:  Ahmed M Mehdi; Ralph Patrick; Timothy L Bailey; Mikael Bodén
Journal:  Mol Cell Proteomics       Date:  2014-02-16       Impact factor: 5.911

3.  Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris.

Authors:  Feng Li; Lei Nie; Gang Wu; Jianjun Qiao; Weiwen Zhang
Journal:  Comp Funct Genomics       Date:  2011-05-04

4.  Genetics and molecular biology of the electron flow for sulfate respiration in desulfovibrio.

Authors:  Kimberly L Keller; Judy D Wall
Journal:  Front Microbiol       Date:  2011-06-29       Impact factor: 5.640

5.  Genetic basis for nitrate resistance in Desulfovibrio strains.

Authors:  Hannah L Korte; Samuel R Fels; Geoff A Christensen; Morgan N Price; Jennifer V Kuehl; Grant M Zane; Adam M Deutschbauer; Adam P Arkin; Judy D Wall
Journal:  Front Microbiol       Date:  2014-04-21       Impact factor: 5.640

6.  Identifying Aspects of the Post-Transcriptional Program Governing the Proteome of the Green Alga Micromonas pusilla.

Authors:  Peter H Waltman; Jian Guo; Emily Nahas Reistetter; Samuel Purvine; Charles K Ansong; Marijke J van Baren; Chee-Hong Wong; Chia-Lin Wei; Richard D Smith; Stephen J Callister; Joshua M Stuart; Alexandra Z Worden
Journal:  PLoS One       Date:  2016-07-19       Impact factor: 3.240

7.  An integrative imputation method based on multi-omics datasets.

Authors:  Dongdong Lin; Jigang Zhang; Jingyao Li; Chao Xu; Hong-Wen Deng; Yu-Ping Wang
Journal:  BMC Bioinformatics       Date:  2016-06-21       Impact factor: 3.169

8.  Integrated analysis of transcriptomic and proteomic data.

Authors:  Saad Haider; Ranadip Pal
Journal:  Curr Genomics       Date:  2013-04       Impact factor: 2.236

Review 9.  A review of the "Omics" approach to biomarkers of oxidative stress in Oryza sativa.

Authors:  Nyuk Ling Ma; Zaidah Rahmat; Su Shiung Lam
Journal:  Int J Mol Sci       Date:  2013-04-08       Impact factor: 5.923

10.  Multi-omic network signatures of disease.

Authors:  David L Gibbs; Lisa Gralinski; Ralph S Baric; Shannon K McWeeney
Journal:  Front Genet       Date:  2014-01-07       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.