Literature DB >> 35881554

Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication.

Trenton J Davis1,2, Tarek R Firzli3, Emily A Higgins Keppler1,2, Matthew Richardson4,5, Heather D Bean1,2.   

Abstract

Missing data is a significant issue in metabolomics that is often neglected when conducting data preprocessing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metabolomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatography (GC × GC) data sets. We also present these goals in the context of experimental replication whereby imputation is conducted in a within-replicate-based fashion─the first description and evaluation of this strategy─and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two GC × GC data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approaches (Bayesian principal component analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially important features in downstream analyses for biomarker discovery.

Entities:  

Mesh:

Year:  2022        PMID: 35881554      PMCID: PMC9369014          DOI: 10.1021/acs.analchem.1c04093

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   8.008


  18 in total

1.  Fusion of mass spectrometry-based metabolomics data.

Authors:  Age K Smilde; Mariët J van der Werf; Sabina Bijlsma; Bianca J C van der Werff-van der Vat; Renger H Jellema
Journal:  Anal Chem       Date:  2005-10-15       Impact factor: 6.986

2.  Random Forest Missing Data Algorithms.

Authors:  Fei Tang; Hemant Ishwaran
Journal:  Stat Anal Data Min       Date:  2017-06-13       Impact factor: 1.051

3.  Comparative analysis of the volatile metabolomes of Pseudomonas aeruginosa clinical isolates.

Authors:  Heather D Bean; Christiaan A Rees; Jane E Hill
Journal:  J Breath Res       Date:  2016-11-21       Impact factor: 3.262

4.  A data preprocessing strategy for metabolomics to reduce the mask effect in data analysis.

Authors:  Jun Yang; Xinjie Zhao; Xin Lu; Xiaohui Lin; Guowang Xu
Journal:  Front Mol Biosci       Date:  2015-02-02

Review 5.  Strategies for dealing with missing data in clinical trials: from design to analysis.

Authors:  James D Dziura; Lori A Post; Qing Zhao; Zhixuan Fu; Peter Peduzzi
Journal:  Yale J Biol Med       Date:  2013-09-20

6.  Breath Metabolomics Provides an Accurate and Noninvasive Approach for Screening Cirrhosis, Primary, and Secondary Liver Tumors.

Authors:  Galen Miller-Atkins; Lou-Anne Acevedo-Moreno; David Grove; Raed A Dweik; Adriano R Tonelli; J Mark Brown; Daniela S Allende; Federico Aucejo; Daniel M Rotroff
Journal:  Hepatol Commun       Date:  2020-04-26

7.  Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.

Authors:  Marietta Kokla; Jyrki Virtanen; Marjukka Kolehmainen; Jussi Paananen; Kati Hanhineva
Journal:  BMC Bioinformatics       Date:  2019-10-11       Impact factor: 3.169

8.  Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling.

Authors:  Riccardo Di Guida; Jasper Engel; J William Allwood; Ralf J M Weber; Martin R Jones; Ulf Sommer; Mark R Viant; Warwick B Dunn
Journal:  Metabolomics       Date:  2016-04-15       Impact factor: 4.290

9.  GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.

Authors:  Runmin Wei; Jingye Wang; Erik Jia; Tianlu Chen; Yan Ni; Wei Jia
Journal:  PLoS Comput Biol       Date:  2018-01-31       Impact factor: 4.475

10.  Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.

Authors:  Kieu Trinh Do; Simone Wahl; Johannes Raffler; Sophie Molnos; Michael Laimighofer; Jerzy Adamski; Karsten Suhre; Konstantin Strauch; Annette Peters; Christian Gieger; Claudia Langenberg; Isobel D Stewart; Fabian J Theis; Harald Grallert; Gabi Kastenmüller; Jan Krumsiek
Journal:  Metabolomics       Date:  2018-09-20       Impact factor: 4.290

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.