Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins.

Literature DB >> 19447782

Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins.

Wandaliz Torres-García¹, Weiwen Zhang, George C Runger, Roger H Johnson, Deirdre R Meldrum.

Abstract

MOTIVATION: Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.
RESULTS: In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Species

Mesh：

Substances：
Bacterial Proteins

Year: 2009 PMID： 19447782 PMCID： PMC2712339 DOI： 10.1093/bioinformatics/btp325

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

33 in total

Review 1. The use of accurate mass tags for high-throughput microbial proteomics.

Authors: Richard D Smith; Gordon A Anderson; Mary S Lipton; Christophe Masselon; Ljiljana Pasa-Tolic; Yufeng Shen; Harold R Udseth
Journal: OMICS Date: 2002

2. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.

Authors: T Ideker; V Thorsson; J A Ranish; R Christmas; J Buhler; J K Eng; R Bumgarner; D R Goodlett; R Aebersold; L Hood
Journal: Science Date: 2001-05-04 Impact factor: 47.728

3. Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts.

Authors: Dov Greenbaum; Ronald Jansen; Mark Gerstein
Journal: Bioinformatics Date: 2002-04 Impact factor: 6.937

Review 4. Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.

Authors: Lei Nie; Gang Wu; David E Culley; Johannes C M Scholten; Weiwen Zhang
Journal: Crit Rev Biotechnol Date: 2007 Apr-Jun Impact factor: 8.429

5. Boosted trees for ecological modeling and prediction.

Authors: Glenn De'ath
Journal: Ecology Date: 2007-01 Impact factor: 5.499

6. A working guide to boosted regression trees.

Authors: J Elith; J R Leathwick; T Hastie
Journal: J Anim Ecol Date: 2008-04-08 Impact factor: 5.091

7. A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry.

Authors: Weiwen Zhang; Marina A Gritsenko; Ronald J Moore; David E Culley; Lei Nie; Konstantinos Petritis; Eric F Strittmatter; David G Camp; Richard D Smith; Fred J Brockman
Journal: Proteomics Date: 2006-08 Impact factor: 3.984

8. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics.

Authors: Vamsi K Mootha; Pierre Lepage; Kathleen Miller; Jakob Bunkenborg; Michael Reich; Majbrit Hjerrild; Terrye Delmonte; Amelie Villeneuve; Robert Sladek; Fenghao Xu; Grant A Mitchell; Charles Morin; Matthias Mann; Thomas J Hudson; Brian Robinson; John D Rioux; Eric S Lander
Journal: Proc Natl Acad Sci U S A Date: 2003-01-14 Impact factor: 11.205

Review 9. Global analysis of gene expression in yeast.

Authors: Christine E Horak; Michael Snyder
Journal: Funct Integr Genomics Date: 2002-07-10 Impact factor: 3.410

10. Exploring glycopeptide-resistance in Staphylococcus aureus: a combined proteomics and transcriptomics approach for the identification of resistance-related markers.

Authors: Alexander Scherl; Patrice François; Yvan Charbonnier; Jacques M Deshusses; Thibaud Koessler; Antoine Huyghe; Manuela Bento; Jianru Stahl-Zeng; Adrien Fischer; Alexandre Masselot; Alireza Vaezzadeh; Francesca Gallé; Adriana Renzoni; Pierre Vaudaux; Daniel Lew; Catherine G Zimmermann-Ivol; Pierre-Alain Binz; Jean-Charles Sanchez; Denis F Hochstrasser; Jacques Schrenzel
Journal: BMC Genomics Date: 2006-11-22 Impact factor: 3.969

12 in total

1. Robust Score Tests With Missing Data in Genomics Studies.

Authors: Kin Yau Wong; Donglin Zeng; D Y Lin
Journal: J Am Stat Assoc Date: 2019-02-26 Impact factor: 5.033

2. Predicting the dynamics of protein abundance.

Authors: Ahmed M Mehdi; Ralph Patrick; Timothy L Bailey; Mikael Bodén
Journal: Mol Cell Proteomics Date: 2014-02-16 Impact factor: 5.911

3. Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris.

Authors: Feng Li; Lei Nie; Gang Wu; Jianjun Qiao; Weiwen Zhang
Journal: Comp Funct Genomics Date: 2011-05-04

4. Genetics and molecular biology of the electron flow for sulfate respiration in desulfovibrio.

Authors: Kimberly L Keller; Judy D Wall
Journal: Front Microbiol Date: 2011-06-29 Impact factor: 5.640

5. Genetic basis for nitrate resistance in Desulfovibrio strains.

Authors: Hannah L Korte; Samuel R Fels; Geoff A Christensen; Morgan N Price; Jennifer V Kuehl; Grant M Zane; Adam M Deutschbauer; Adam P Arkin; Judy D Wall
Journal: Front Microbiol Date: 2014-04-21 Impact factor: 5.640

6. Identifying Aspects of the Post-Transcriptional Program Governing the Proteome of the Green Alga Micromonas pusilla.

Authors: Peter H Waltman; Jian Guo; Emily Nahas Reistetter; Samuel Purvine; Charles K Ansong; Marijke J van Baren; Chee-Hong Wong; Chia-Lin Wei; Richard D Smith; Stephen J Callister; Joshua M Stuart; Alexandra Z Worden
Journal: PLoS One Date: 2016-07-19 Impact factor: 3.240