Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.

Literature DB >> 32035304

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.

Linda Malan¹, Cornelius M Smuts¹, Jeannine Baumgartner², Cristian Ricci³.

Abstract

Principal component analysis (PCA) is a popular statistical tool. However, despite numerous advantages, the good practice of imputing missing data before PCA is not common. In the present work, we evaluated the hypothesis that the expectation-maximization (EM) algorithm for missing data imputation is a reliable and advantageous procedure when using PCA to derive biomarker profiles and dietary patterns. To this aim, we used numerical simulations aimed to mimic real data commonly observed in nutritional research. Finally, we showed the advantages and pitfalls of the EM algorithm for missing data imputation applied to plasma fatty acid concentrations and nutrient intakes from real data sets deriving from the US National Health and Nutrition Examination Survey. PCA applied to simulated data having missing values resulted in biased eigenvalues with respect to the original data set without missing values. The bias between the eigenvalues from the original set of data and from the data set with missing values increased with number of missing values and appeared as independent with respect to the correlation structure among variables. On the other hand, when data were imputed, the mean of the eigenvalues over the 10 missing imputation runs overlapped with the ones derived from the PCA applied to the original data set. These results were confirmed when real data sets from the National Health and Nutrition Examination Survey were analyzed. We accept the hypothesis that the EM algorithm for missing data imputation applied before PCA aimed to derive biochemical profiles and dietary patterns is an effective technique especially for relatively small sample sizes.

Entities: Chemical

Keywords: Biochemical profiles; Dietary patterns; EM algorithm; Missing data imputation; Principal component analysis

Mesh：

Substances：
Biomarkers
Fatty Acids

Year: 2020 PMID： 32035304 DOI： 10.1016/j.nutres.2020.01.001

Source DB: PubMed Journal: Nutr Res ISSN： 0271-5317 Impact factor: 3.315

Keyword Cloud
Cited

4 in total

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns.

1. An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data.

2. Additive integer-valued data envelopment analysis with missing data: A multi-criteria evaluation approach.

3. Artificial Intelligence Algorithm-Based Computed Tomography Image of Both Kidneys in Diagnosis of Renal Dysplasia.

4. Higher CSF sTNFR1-related proteins associate with better prognosis in very early Alzheimer's disease.