Literature DB >> 32035304

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.

Linda Malan1, Cornelius M Smuts1, Jeannine Baumgartner2, Cristian Ricci3.   

Abstract

Principal component analysis (PCA) is a popular statistical tool. However, despite numerous advantages, the good practice of imputing missing data before PCA is not common. In the present work, we evaluated the hypothesis that the expectation-maximization (EM) algorithm for missing data imputation is a reliable and advantageous procedure when using PCA to derive biomarker profiles and dietary patterns. To this aim, we used numerical simulations aimed to mimic real data commonly observed in nutritional research. Finally, we showed the advantages and pitfalls of the EM algorithm for missing data imputation applied to plasma fatty acid concentrations and nutrient intakes from real data sets deriving from the US National Health and Nutrition Examination Survey. PCA applied to simulated data having missing values resulted in biased eigenvalues with respect to the original data set without missing values. The bias between the eigenvalues from the original set of data and from the data set with missing values increased with number of missing values and appeared as independent with respect to the correlation structure among variables. On the other hand, when data were imputed, the mean of the eigenvalues over the 10 missing imputation runs overlapped with the ones derived from the PCA applied to the original data set. These results were confirmed when real data sets from the National Health and Nutrition Examination Survey were analyzed. We accept the hypothesis that the EM algorithm for missing data imputation applied before PCA aimed to derive biochemical profiles and dietary patterns is an effective technique especially for relatively small sample sizes.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Biochemical profiles; Dietary patterns; EM algorithm; Missing data imputation; Principal component analysis

Mesh:

Substances:

Year:  2020        PMID: 32035304     DOI: 10.1016/j.nutres.2020.01.001

Source DB:  PubMed          Journal:  Nutr Res        ISSN: 0271-5317            Impact factor:   3.315


  4 in total

1.  An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data.

Authors:  Junsheng Huang; Baohua Mao; Yun Bai; Tong Zhang; Changjun Miao
Journal:  Sensors (Basel)       Date:  2020-04-02       Impact factor: 3.576

2.  Additive integer-valued data envelopment analysis with missing data: A multi-criteria evaluation approach.

Authors:  Chunhua Chen; Jianwei Ren; Lijun Tang; Haohua Liu
Journal:  PLoS One       Date:  2020-06-11       Impact factor: 3.240

3.  Artificial Intelligence Algorithm-Based Computed Tomography Image of Both Kidneys in Diagnosis of Renal Dysplasia.

Authors:  Yonghui Liu; Siai Tang
Journal:  Comput Math Methods Med       Date:  2022-01-27       Impact factor: 2.238

4.  Higher CSF sTNFR1-related proteins associate with better prognosis in very early Alzheimer's disease.

Authors:  William T Hu; Tugba Ozturk; Alexander Kollhoff; Whitney Wharton; J Christina Howell
Journal:  Nat Commun       Date:  2021-06-28       Impact factor: 14.919

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.