| Literature DB >> 16780582 |
Peter Johansson1, Jari Häkkinen.
Abstract
BACKGROUND: Microarray technology has become popular for gene expression profiling, and many analysis tools have been developed for data interpretation. Most of these tools require complete data, but measurement values are often missing A way to overcome the problem of incomplete data is to impute the missing data before analysis. Many imputation methods have been suggested, some naïve and other more sophisticated taking into account correlation in data. However, these methods are binary in the sense that each spot is considered either missing or present. Hence, they are depending on a cutoff separating poor spots from good spots. We suggest a different approach in which a continuous spot quality weight is built into the imputation methods, allowing for smooth imputations of all spots to larger or lesser degree.Entities:
Mesh:
Year: 2006 PMID: 16780582 PMCID: PMC1533869 DOI: 10.1186/1471-2105-7-306
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1WeNNI is the most accurate imputation method. Performance of the five imputation methods with varying β. As explained in the Methods section, larger β changes weights to smaller values. In non-weighted methods β is the SNR cutoff. The increase in MSD for large β is an effect from too many missing values, which implies imputation breaks down. The standard error of means are within the line thicknesses. (A) Breast cancer data. WeNNI (black line) has the lowest MSD and the weighted methods perform better than the non-weighted methods. All methods have a minimum MSD around β = 0.2. (B) Melanoma data. WeNNI (black line) has the lowest MSD and the weighted methods perform better than the non-weighted methods. All methods have a minimum MSD around β = 0.6. (C) Mycorrhiza data. WeNNI (black line) retains the lowest MSD, whereas KNNimpute (red line) performs better that the weighted reporter average method. This may be explained as an effect of a different experimental design as discussed in the text. The minimum MSD is found in a β range 0.3–1 for the different methods.
Figure 2WeNNI is most accurate over all ranges of spot quality. The contribution to MSD for specific SNR for the different imputation methods applied to the three different data sets using β = 0.3. These plots were created using a sliding window containing 1% of all spots. Spots with small SNR (low quality) have the largest impact on MSD. (A) In the breast cancer data a weighted scheme is clearly essential and WeNNI is most accurate over all ranges of SNR (B) In the melanoma data a weighted scheme is clearly essential and the weighted reporter average show best performance for an SNR range 0.2–1. (C) In the mycorrhiza data the breakdown of the average reporter methods is very prominent. For the SNR range 0.07–0.4 it is even better to use no impute (green line) than the average methods. The breakdown of the reporter average methods are discussed in the text.
Figure 3Comparison of WeNNI and KNNimpute. MSD contributions from specific SNR and different β for the breast cancer data set. This plot was created using a sliding window containing 1% of all spots.
Figure 4WeNNI and KNNimpute are insensitive to number of neighbours used. Performance of WeNNI and KNNimpute is plotted against the number of nearest neighbours for all three data sets using β = 1.
Comparisons of WeNNI, KNNimpute, and LSimpute adaptive using two different measures. MSD is the mean squared deviation calculated over all spots, whereas MSD_imputed is calculated over spots with SNR smaller than β, i.e., the spots imputed in non-weighted methods. β was chosen to yield the lowest MSD for LSimpute adaptive. WeNNI is more accurate than LSimpute and KNNimpute, even though β was tuned to optimise the performance of LSimpute.
| Data set | Measure | WeNNI | KNNimpute | LSimpute adaptive | |
| MSD | 0.2 | 0.345 | 0.369 | 0.368 | |
| MSD_imputed | 1.59 | 1.81 | 1.75 | ||
| MSD | 0.6 | 0.995 | 1.08 | 1.05 | |
| MSD_imputed | 3.41 | 3.77 | 3.64 | ||
| MSD | 0.2 | 0.216 | 0.241 | 0.244 | |
| MSD_imputed | 0.840 | 0.902 | 0.954 |