Literature DB >> 30830437

NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.

Justin Y Lee1, Mark P Styczynski2.   

Abstract

INTRODUCTION: A common problem in metabolomics data analysis is the existence of a substantial number of missing values, which can complicate, bias, or even prevent certain downstream analyses. One of the most widely-used solutions to this problem is imputation of missing values using a k-nearest neighbors (kNN) algorithm to estimate missing metabolite abundances. kNN implicitly assumes that missing values are uniformly distributed at random in the dataset, but this is typically not true in metabolomics, where many values are missing because they are below the limit of detection of the analytical instrumentation.
OBJECTIVES: Here, we explore the impact of nonuniformly distributed missing values (missing not at random, or MNAR) on imputation performance. We present a new model for generating synthetic missing data and a new algorithm, No-Skip kNN (NS-kNN), that accounts for MNAR values to provide more accurate imputations.
METHODS: We compare the imputation errors of the original kNN algorithm using two distance metrics, NS-kNN, and a recently developed algorithm KNN-TN, when applied to multiple experimental datasets with different types and levels of missing data.
RESULTS: Our results show that NS-kNN typically outperforms kNN when at least 20-30% of missing values in a dataset are MNAR. NS-kNN also has lower imputation errors than KNN-TN on realistic datasets when at least 50% of missing values are MNAR.
CONCLUSION: Accounting for the nonuniform distribution of missing values in metabolomics data can significantly improve the results of imputation algorithms. The NS-kNN method imputes missing metabolomics data more accurately than existing kNN-based approaches when used on realistic datasets.

Entities:  

Keywords:  GC–MS; Imputation; Metabolomics; Missing data; kNN

Mesh:

Year:  2018        PMID: 30830437      PMCID: PMC6532628          DOI: 10.1007/s11306-018-1451-8

Source DB:  PubMed          Journal:  Metabolomics        ISSN: 1573-3882            Impact factor:   4.290


  17 in total

1.  Missing value estimation for DNA microarray gene expression data: local least squares imputation.

Authors:  Hyunsoo Kim; Gene H Golub; Haesun Park
Journal:  Bioinformatics       Date:  2004-08-27       Impact factor: 6.937

2.  Missing value imputation strategies for metabolomics data.

Authors:  Emily Grace Armitage; Joanna Godzien; Vanesa Alonso-Herranz; Ángeles López-Gonzálvez; Coral Barbas
Journal:  Electrophoresis       Date:  2015-10-20       Impact factor: 3.535

3.  Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

Authors:  Cosmin Lazar; Laurent Gatto; Myriam Ferro; Christophe Bruley; Thomas Burger
Journal:  J Proteome Res       Date:  2016-03-01       Impact factor: 4.466

Review 4.  Applications of multiple imputation in medical studies: from AIDS to NHANES.

Authors:  J Barnard; X L Meng
Journal:  Stat Methods Med Res       Date:  1999-03       Impact factor: 3.021

5.  Estimation of Values below the Limit of Detection of a Contemporary Sensitive Troponin I Assay Improves Diagnosis of Acute Myocardial Infarction.

Authors:  Jes-Niels Boeckel; Lars Palapies; Tanja Zeller; Sophia M Reis; Beatrice von Jeinsen; Stergios Tzikas; Christoph Bickel; Stephan Baldus; Stefan Blankenberg; Thomas Münzel; Andreas M Zeiher; Karl J Lackner; Till Keller
Journal:  Clin Chem       Date:  2015-07-27       Impact factor: 8.327

6.  Plasma metabolomic profiles reflective of glucose homeostasis in non-diabetic and type 2 diabetic obese African-American women.

Authors:  Oliver Fiehn; W Timothy Garvey; John W Newman; Kerry H Lok; Charles L Hoppel; Sean H Adams
Journal:  PLoS One       Date:  2010-12-10       Impact factor: 3.240

7.  A distribution-based multiple imputation method for handling bivariate pesticide data with values below the limit of detection.

Authors:  Haiying Chen; Sara A Quandt; Joseph G Grzywacz; Thomas A Arcury
Journal:  Environ Health Perspect       Date:  2010-11-19       Impact factor: 9.031

8.  A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits.

Authors:  MinJae Lee; Mohammad H Rahbar; Matthew Brown; Lianne Gensler; Michael Weisman; Laura Diekman; John D Reveille
Journal:  BMC Med Res Methodol       Date:  2018-01-11       Impact factor: 4.615

9.  Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.

Authors:  Runmin Wei; Jingye Wang; Mingming Su; Erik Jia; Shaoqiu Chen; Tianlu Chen; Yan Ni
Journal:  Sci Rep       Date:  2018-01-12       Impact factor: 4.379

10.  GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.

Authors:  Runmin Wei; Jingye Wang; Erik Jia; Tianlu Chen; Yan Ni; Wei Jia
Journal:  PLoS Comput Biol       Date:  2018-01-31       Impact factor: 4.475

View more
  9 in total

1.  Predictive Modeling for Metabolomics Data.

Authors:  Tusharkanti Ghosh; Weiming Zhang; Debashis Ghosh; Katerina Kechris
Journal:  Methods Mol Biol       Date:  2020

Review 2.  Data analysis methods for defining biomarkers from omics data.

Authors:  Chao Li; Zhenbo Gao; Benzhe Su; Guowang Xu; Xiaohui Lin
Journal:  Anal Bioanal Chem       Date:  2021-12-24       Impact factor: 4.142

3.  Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.

Authors:  Jonathan P Dekermanjian; Elin Shaddox; Debmalya Nandy; Debashis Ghosh; Katerina Kechris
Journal:  BMC Bioinformatics       Date:  2022-05-16       Impact factor: 3.169

4.  An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data Based on the GRNN-SGTM Ensemble.

Authors:  Roman Tkachenko; Ivan Izonin; Natalia Kryvinska; Ivanna Dronyuk; Khrystyna Zub
Journal:  Sensors (Basel)       Date:  2020-05-04       Impact factor: 3.576

5.  Biomarker selection and a prospective metabolite-based machine learning diagnostic for lyme disease.

Authors:  Eric R Kehoe; Bryna L Fitzgerald; Barbara Graham; M Nurul Islam; Kartikay Sharma; Gary P Wormser; John T Belisle; Michael J Kirby
Journal:  Sci Rep       Date:  2022-01-27       Impact factor: 4.379

6.  NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data.

Authors:  Jingjing Xu; Yuanshan Wang; Xiangnan Xu; Kian-Kai Cheng; Daniel Raftery; Jiyang Dong
Journal:  Molecules       Date:  2021-09-24       Impact factor: 4.927

7.  Identification and Experimental Validation of Marker Genes between Diabetes and Alzheimer's Disease.

Authors:  Cheng Huang; Xueyi Wen; Hesong Xie; Di Hu; Keshen Li
Journal:  Oxid Med Cell Longev       Date:  2022-08-12       Impact factor: 7.310

8.  A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model.

Authors:  Qing Yang; Sunan Gao; Junfen Lin; Ke Lyu; Zexu Wu; Yuhao Chen; Yinwei Qiu; Yanrong Zhao; Wei Wang; Tianxiang Lin; Huiyun Pan; Ming Chen
Journal:  BMC Bioinformatics       Date:  2022-10-03       Impact factor: 3.307

9.  Serum Metabolite Profiles in Participants of Lung Cancer Screening Study; Comparison of Two Independent Cohorts.

Authors:  Piotr Widłak; Karol Jelonek; Agata Kurczyk; Joanna Żyła; Magdalena Sitkiewicz; Edoardo Bottoni; Giulia Veronesi; Joanna Polańska; Witold Rzyman
Journal:  Cancers (Basel)       Date:  2021-05-31       Impact factor: 6.639

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.