Literature DB >> 16990432

Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals.

David P Enot1, Manfred Beckmann, David Overy, John Draper.   

Abstract

Powerful algorithms are required to deal with the dimensionality of metabolomics data. Although many achieve high classification accuracy, the models they generate have limited value unless it can be demonstrated that they are reproducible and statistically relevant to the biological problem under investigation. Random forest (RF) generates models, without any requirement for dimensionality reduction or feature selection, in which individual variables are ranked for significance and displayed in an explicit manner. In metabolome fingerprinting by mass spectrometry, each metabolite can be represented by signals at several m/z. Exploiting a prior understanding of expected biochemical differences between sample classes, we aimed to develop meaningful metrics relevant to the significance both of the overall RF model and individual, potentially explanatory, signals. Pair-wise comparison of related plant genotypes with strong phenotypic differences demonstrated that robust models are not only reproducible but also logically structured, highlighting correlated m/z derived from just a small number of explanatory metabolites reflecting the biological differences between sample classes. RF models were also generated by using groupings of samples known to be increasingly phenotypically similar. Although classification accuracy was often reasonable, we demonstrated reproducibly in both Arabidopsis and potato a performance threshold based on margin statistics beyond which such models showed little structure indicative of either generalizability or further biological interpretability. In a multiclass problem using 25 Arabidopsis genotypes, despite the complicating effects of ecotype background and secondary metabolome perturbations common to several mutations, the ranking of metabolome signals by RF provided scope for deeper interpretability.

Entities:  

Mesh:

Year:  2006        PMID: 16990432      PMCID: PMC1595442          DOI: 10.1073/pnas.0605152103

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  26 in total

Review 1.  Metabolomics--the link between genotypes and phenotypes.

Authors:  Oliver Fiehn
Journal:  Plant Mol Biol       Date:  2002-01       Impact factor: 4.076

2.  Modelling of classification rules on metabolic patterns including machine learning and expert knowledge.

Authors:  Christian Baumgartner; Christian Böhm; Daniela Baumgartner
Journal:  J Biomed Inform       Date:  2005-04       Impact factor: 6.317

3.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer.

Authors:  Liat Ein-Dor; Or Zuk; Eytan Domany
Journal:  Proc Natl Acad Sci U S A       Date:  2006-04-03       Impact factor: 11.205

4.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

5.  Metabolite profiling for plant functional genomics.

Authors:  O Fiehn; J Kopka; P Dörmann; T Altmann; R N Trethewey; L Willmitzer
Journal:  Nat Biotechnol       Date:  2000-11       Impact factor: 54.908

6.  High-throughput classification of yeast mutants for functional genomics using metabolic footprinting.

Authors:  Jess Allen; Hazel M Davey; David Broadhurst; Jim K Heald; Jem J Rowland; Stephen G Oliver; Douglas B Kell
Journal:  Nat Biotechnol       Date:  2003-05-12       Impact factor: 54.908

7.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

Review 8.  Measuring the metabolome: current analytical technologies.

Authors:  Warwick B Dunn; Nigel J C Bailey; Helen E Johnson
Journal:  Analyst       Date:  2005-03-04       Impact factor: 4.616

9.  Assessment of 1H NMR spectroscopy and multivariate analysis as a technique for metabolite fingerprinting of Arabidopsis thaliana.

Authors:  Jane L Ward; Cassandra Harris; Jennie Lewis; Michael H Beale
Journal:  Phytochemistry       Date:  2003-03       Impact factor: 4.072

10.  Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

Authors:  James Lyons-Weiler; Richard Pelikan; Herbert J Zeh; David C Whitcomb; David E Malehorn; William L Bigbee; Milos Hauskrecht
Journal:  Cancer Inform       Date:  2005
View more
  11 in total

1.  Human urinary metabolomic profile of PPARalpha induced fatty acid beta-oxidation.

Authors:  Andrew D Patterson; Ondrej Slanar; Kristopher W Krausz; Fei Li; Constance C Höfer; Frantisek Perlík; Frank J Gonzalez; Jeffrey R Idle
Journal:  J Proteome Res       Date:  2009-09       Impact factor: 4.466

2.  Enhancement of plant metabolite fingerprinting by machine learning.

Authors:  Ian M Scott; Cornelia P Vermeer; Maria Liakata; Delia I Corol; Jane L Ward; Wanchang Lin; Helen E Johnson; Lynne Whitehead; Baldeep Kular; John M Baker; Sean Walsh; Anuja Dave; Tony R Larson; Ian A Graham; Trevor L Wang; Ross D King; John Draper; Michael H Beale
Journal:  Plant Physiol       Date:  2010-06-21       Impact factor: 8.340

3.  Advanced data-mining strategies for the analysis of direct-infusion ion trap mass spectrometry data from the association of perennial ryegrass with its endophytic fungus, Neotyphodium lolii.

Authors:  Mingshu Cao; Albert Koulman; Linda J Johnson; Geoffrey A Lane; Susanne Rasmussen
Journal:  Plant Physiol       Date:  2008-02-20       Impact factor: 8.340

4.  Convergent Random Forest predictor: methodology for predicting drug response from genome-scale data applied to anti-TNF response.

Authors:  Jadwiga R Bienkowska; Gul S Dalgin; Franak Batliwalla; Normand Allaire; Ronenn Roubenoff; Peter K Gregersen; John P Carulli
Journal:  Genomics       Date:  2009-08-20       Impact factor: 5.736

Review 5.  The role of mass spectrometry-based metabolomics in medical countermeasures against radiation.

Authors:  Andrew D Patterson; Christian Lanz; Frank J Gonzalez; Jeffrey R Idle
Journal:  Mass Spectrom Rev       Date:  2010 May-Jun       Impact factor: 10.946

6.  Prediction of high-responding peptides for targeted protein assays by mass spectrometry.

Authors:  Vincent A Fusaro; D R Mani; Jill P Mesirov; Steven A Carr
Journal:  Nat Biotechnol       Date:  2009-01-25       Impact factor: 54.908

7.  Bioinformatic-driven search for metabolic biomarkers in disease.

Authors:  Christian Baumgartner; Melanie Osl; Michael Netzer; Daniela Baumgartner
Journal:  J Clin Bioinforma       Date:  2011-01-20

8.  Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis.

Authors:  Masahiro Sugimoto; Masato Kawakami; Martin Robert; Tomoyoshi Soga; Masaru Tomita
Journal:  Curr Bioinform       Date:  2012-03       Impact factor: 3.543

9.  Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Authors:  John Draper; David P Enot; David Parker; Manfred Beckmann; Stuart Snowdon; Wanchang Lin; Hassan Zubair
Journal:  BMC Bioinformatics       Date:  2009-07-21       Impact factor: 3.169

10.  A flow-injection mass spectrometry fingerprinting scaffold for feature selection and quantitation of Cordyceps and Ganoderma extracts in beverage: a predictive artificial neural network modelling strategy.

Authors:  Chee Wei Lim; Siew Hoon Tai; Sheot Harn Chan
Journal:  AMB Express       Date:  2012-08-13       Impact factor: 3.298

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.