Literature DB >> 25075114

Metabomxtr: an R package for mixture-model analysis of non-targeted metabolomics data.

Michael Nodzenski1, Michael J Muehlbauer1, James R Bain2, Anna C Reisetter1, William L Lowe1, Denise M Scholtens1.   

Abstract

SUMMARY: Non-targeted metabolomics technologies often yield data in which abundance for any given metabolite is observed and quantified for some samples and reported as missing for other samples. Apparent missingness can be due to true absence of the metabolite in the sample or presence at a level below detectability. Mixture-model analysis can formally account for metabolite 'missingness' due to absence or undetectability, but software for this type of analysis in the high-throughput setting is limited. The R package metabomxtr has been developed to facilitate mixture-model analysis of non-targeted metabolomics data in which only a portion of samples have quantifiable abundance for certain metabolites.
AVAILABILITY AND IMPLEMENTATION: metabomxtr is available through Bioconductor. It is released under the GPL-2 license. CONTACT: dscholtens@northwestern.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25075114      PMCID: PMC4221120          DOI: 10.1093/bioinformatics/btu509

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

High-throughput metabolomics profiling has surged in popularity with non-targeted technologies in particular offering opportunity for discovery of new metabolite associations with phenotypes or outcomes. A challenge to analyzing non-targeted output is the frequent occurrence of missing data (Hrydziuszko and Viant, 2012). These data are not ‘missing’ in the sense that they were not collected; rather, metabolites may be detected and their abundance quantified in some samples and not others. Typically conducted using nuclear magnetic resonance, liquid chromatography-mass spectrometry or gas chromatography-mass spectrometry (Issaq ; Moco and Vervoort, 2007), non-targeted assays typically have unknown lower detection thresholds. Thus, when a given metabolite is not detected, it is unknown whether that metabolite was indeed absent or merely undetectable. Several approaches for handling missingness have been described in metabolomics literature, including complete case analysis, imputation and adaptations of classic dimension reduction tools to allow for missing data. For metabolite-by-metabolite analyses, imputation is common, with methods including minimum, median and nearest neighbor imputation (Hrydziuszko and Viant, 2012). Partial least squares discriminant analysis and principal components analysis with missing data adaptations have been used, although these methods identify regression-based linear combinations of multiple correlated metabolites associated with a phenotype or outcome, and, in general, results are less translatable for understanding individual metabolite contributions (Andersson and Bro, 1998; Walczak and Massart, 2001). An underused approach for metabolite-by-metabolite analysis is the Bernoulli/lognormal mixture model proposed by Moulton and Halsey (1995). This method simultaneously estimates parameters modeling the probability of non-missing response and the mean of observed values. Imputation is not required, and instead ‘missingness’ is explicitly modeled as either true absence or presence below detectability, consistent with non-targeted metabolomics technology. We used mixture models to analyze GC-MS metabolomics data (Scholtens ), but, to our knowledge, there is no available software to easily perform these analyses that folds into existing high-throughput data analysis pipelines. Noting the elegance of the mixture-model approach and the continued issue of missing data in metabolomics research, we present metabomxtr, an R package that automates mixture-model analysis. The core functions accept R objects typically handled in Bioconductor-type analyses or basic data frames, thus providing a flexible tool to complement existing user pipelines and preferences for data preprocessing.

2 MAIN FEATURES

2.1 Model specification

Models in metabomxtr are specified as follows. For a unique metabolite, y, with normally distributed values when present (generally following log transformation), the contribution of the ith observation to the likelihood is: where p represents the probability of metabolite detection in the ith sample, T is the threshold of detectability and δi is an indicator equal to 1 if the metabolite is detected and 0 otherwise. A logistic model is specified for p, log(p/(1 − p))=x’β, where x and β are the covariate and parameter vectors, respectively. A linear model is specified for the mean of the observed response, µ, with µ= z’α, where z and α are the covariate and parameter vectors, respectively.

2.2 Function descriptions

metabomxtr has two main functions: mxtrmod and mxtrmodLRT. mxtrmod executes mixture models, taking as inputs response variable names, a model formula and a data object (a matrix of values with NA to indicate missingness or an ExpressionSet R object). It returns optimized parameter estimates and the corresponding negative log likelihood value. Parameter vectors α and β are estimated using maximum likelihood using the optimx package. By default, T is set to the minimum observed metabolite abundance. Use of mxtrmod on the example dataset metabdata follows: To evaluate the significance of specific covariates, mxtrmodLRT implements nested model likelihood ratio χ2 tests. Required arguments include mxtrmod output for full and reduced models and, if desired, method of multiple comparisons adjustment. mxtrmodLRT outputs a data frame of negative log likelihoods, χ2 statistics, degrees of freedom and P-values for each metabolite.

2.3 Comparison with imputation

To illustrate mixture models, we re-analyzed a subset of GC-MS data on 115 fasting serum samples from pregnant women involved in the population-based Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study, contained in the example data (Scholtens ). A total of 49 non-targeted metabolites with at least five missing values were analyzed using mixture modeling as well as minimum imputation and five nearest neighbors. The predictor of interest was high (>90th percentile) versus low (<10th percentile) fasting plasma glucose (FPG). Samples for this pilot study were selected such that 67 had high FPG and 48 had low FPG. For minimum and nearest neighbor imputation, FPG groups were compared after imputation using linear models adjusted for study field center, parity, maternal and gestational age and sample storage time. The continuous portion of the mixture model also included these covariates, whereas the discrete portion included only FPG. FPG was removed for reduced models in mixture-model analysis. Nominal P < 0.01 were considered statistically significant. Of 49 metabolites analyzed, there was complete agreement (all significant or non-significant) among methods on 39 of them. Of the remaining 10 (Supplementary Fig. and Supplementary Table), mixture models detected significant effects for 7, nearest neighbor 4 and minimum 4. Of the seven mixture-model identifications, three were also detected by nearest neighbor, two also by minimum imputation and two were unique identifications. The mixture-model results were discussed from a biological perspective by Scholtens and include leucine and pyruvic acid. One significant metabolite finding was unique to nearest neighbor imputation, but the result is questionable because the median of the imputed values exceeded the observed median, inconsistent with the notion of low abundance. For the two significant effects unique to minimum imputation, mixture-model P-values approached significance (0.018, 0.011), suggesting approximate agreement between the two methods.

3 DISCUSSION

The R package metabomxtr facilitates mixture-model analysis of non-targeted metabolomics data. Re-analysis of the HAPO pilot metabolomics data indicates that mixture-model analysis detects metabolites identified by other common imputation approaches and additionally identifies associations that would otherwise be missed. Rigorous testing of mixture models on a wider scale is warranted. In summary, metabomxtr provides metabolomics researchers a previously unavailable tool for handling non-targeted metabolomics missingness. Funding: (R01-HD34242 and R01-HD34243) from the National Institute of Child Health and Human Development and the National Institute of Diabetes, Digestive and Kidney Diseases, by the National Center for Research Resources (M01-RR00048, M01-RR00080) and by the American Diabetes Association and Friends of Prentice. Conflict of interest: none declared.
  3 in total

Review 1.  Analytical and statistical approaches to metabolomics research.

Authors:  Haleem J Issaq; Que N Van; Timothy J Waybright; Gary M Muschik; Timothy D Veenstra
Journal:  J Sep Sci       Date:  2009-07       Impact factor: 3.645

2.  A mixture model with detection limits for regression analyses of antibody response to vaccine.

Authors:  L H Moulton; N A Halsey
Journal:  Biometrics       Date:  1995-12       Impact factor: 2.571

3.  Metabolomics reveals broad-scale metabolic perturbations in hyperglycemic mothers during pregnancy.

Authors:  Denise M Scholtens; Michael J Muehlbauer; Natalie R Daya; Robert D Stevens; Alan R Dyer; Lynn P Lowe; Boyd E Metzger; Christopher B Newgard; James R Bain; William L Lowe
Journal:  Diabetes Care       Date:  2013-08-29       Impact factor: 19.112

  3 in total
  12 in total

1.  Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data.

Authors:  Dominik Reinhold; Harrison Pielke-Lombardo; Sean Jacobson; Debashis Ghosh; Katerina Kechris
Journal:  Methods Mol Biol       Date:  2019

2.  Associations of maternal BMI and insulin resistance with the maternal metabolome and newborn outcomes.

Authors:  Victoria Sandler; Anna C Reisetter; James R Bain; Michael J Muehlbauer; Michael Nodzenski; Robert D Stevens; Olga Ilkayeva; Lynn P Lowe; Boyd E Metzger; Christopher B Newgard; Denise M Scholtens; William L Lowe
Journal:  Diabetologia       Date:  2016-12-16       Impact factor: 10.122

3.  Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data.

Authors:  Chanisa Thonusin; Heidi B IglayReger; Tanu Soni; Amy E Rothberg; Charles F Burant; Charles R Evans
Journal:  J Chromatogr A       Date:  2017-09-09       Impact factor: 4.759

4.  Network Approaches to Integrate Analyses of Genetics and Metabolomics Data with Applications to Fetal Programming Studies.

Authors:  Alan Kuang; M Geoffrey Hayes; Marie-France Hivert; Raji Balasubramanian; William L Lowe; Denise M Scholtens
Journal:  Metabolites       Date:  2022-06-02

5.  Cord Blood Metabolomics: Association With Newborn Anthropometrics and C-Peptide Across Ancestries.

Authors:  Rachel Kadakia; Octavious Talbot; Alan Kuang; James R Bain; Michael J Muehlbauer; Robert D Stevens; Olga R Ilkayeva; Lynn P Lowe; Boyd E Metzger; Christopher B Newgard; Denise M Scholtens; William L Lowe
Journal:  J Clin Endocrinol Metab       Date:  2019-10-01       Impact factor: 5.958

6.  Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data.

Authors:  Anna C Reisetter; Michael J Muehlbauer; James R Bain; Michael Nodzenski; Robert D Stevens; Olga Ilkayeva; Boyd E Metzger; Christopher B Newgard; William L Lowe; Denise M Scholtens
Journal:  BMC Bioinformatics       Date:  2017-02-02       Impact factor: 3.169

Review 7.  From chromatogram to analyte to metabolite. How to pick horses for courses from the massive web resources for mass spectral plant metabolomics.

Authors:  Leonardo Perez de Souza; Thomas Naake; Takayuki Tohge; Alisdair R Fernie
Journal:  Gigascience       Date:  2017-07-01       Impact factor: 6.524

8.  Metabolomic and genetic associations with insulin resistance in pregnancy.

Authors:  Yu Liu; Alan Kuang; Octavious Talbot; James R Bain; Michael J Muehlbauer; M Geoffrey Hayes; Olga R Ilkayeva; Lynn P Lowe; Boyd E Metzger; Christopher B Newgard; Denise M Scholtens; William L Lowe
Journal:  Diabetologia       Date:  2020-06-18       Impact factor: 10.122

9.  Metabolic Networks and Metabolites Underlie Associations Between Maternal Glucose During Pregnancy and Newborn Size at Birth.

Authors:  Denise M Scholtens; James R Bain; Anna C Reisetter; Michael J Muehlbauer; Michael Nodzenski; Robert D Stevens; Olga Ilkayeva; Lynn P Lowe; Boyd E Metzger; Christopher B Newgard; William L Lowe
Journal:  Diabetes       Date:  2016-04-05       Impact factor: 9.461

Review 10.  Navigating freely-available software tools for metabolomics analysis.

Authors:  Rachel Spicer; Reza M Salek; Pablo Moreno; Daniel Cañueto; Christoph Steinbeck
Journal:  Metabolomics       Date:  2017-08-09       Impact factor: 4.290

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.