| Literature DB >> 25849489 |
Anvar Suyundikov1, John R Stevens1, Christopher Corcoran1, Jennifer Herrick2, Roger K Wolff2, Martha L Slattery2.
Abstract
Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25849489 PMCID: PMC4388652 DOI: 10.1371/journal.pone.0119876
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The RMSE values for different number of neighbor subjects (k).
Fig 2The RMSE values for different imputation techniques.
Fig 3TPR and FDR for sample sizes of 50, 100, 200, and 400 with missingness of 10%–50%.