| Literature DB >> 16185360 |
Brian D M Tom1, Walter R Gilks, Elizabeth T Brooke-Powell, James W Ajioka.
Abstract
BACKGROUND: A common feature of microarray experiments is the occurrence of missing gene expression data. These missing values occur for a variety of reasons, in particular, because of the filtering of poor quality spots and the removal of undefined values when a logarithmic transformation is applied to negative background-corrected intensities. The efficiency and power of an analysis performed can be substantially reduced by having an incomplete matrix of gene intensities. Additionally, most statistical methods require a complete intensity matrix. Furthermore, biases may be introduced into analyses through missing information on some genes. Thus methods for appropriately replacing (imputing) missing data and/or weighting poor quality spots are required.Entities:
Mesh:
Year: 2005 PMID: 16185360 PMCID: PMC1262693 DOI: 10.1186/1471-2105-6-234
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Parameter estimates from the model. Estimated parameters from the mixture model used to assess quality. The μ1estimates are not shown, since there are a large number of genes.
| F635 Channel (c = 1) | F532 Channel (c = 2) | |
| parameter | estimate | estimate |
| -0.37 | -0.72 | |
| 0.99 | 1.45 | |
| 0.11 | 0.23 | |
| 0.11 | 0.23 | |
| 0.22 | 0.47 | |
| 0.013 | 0.010 | |
| 0.808 | 0.976 | |
| 0.178 | 0.013 | |
Failures in the six arrays. Numbers and proportions predicted as failing in each channel for the six arrays.
| F635 Channel Failures | F532 Channel Failures | Double Channel Failures | ||||
| Array | Failures | % Total | Failures | % Total | Failures | % Total |
| 1 | 30 | 0.4 | 8 | 0.1 | 0 | 0 |
| 2 | 4539 | 54 | 225 | 3 | 204 | 2 |
| 3 | 1345 | 16 | 43 | 0.5 | 29 | 0.3 |
| 4 | 1180 | 14 | 13 | 0.2 | 5 | 0.1 |
| 5 | 191 | 2 | 13 | 0.2 | 9 | 0.1 |
| 6 | 444 | 5 | 58 | 0.7 | 23 | 0.3 |
| Total | 7729 | 360 | 270 | |||
Figure 1Spot quality identification. Spot quality identification. The spot on Array 2 has been identified as being of poor quality in both channels due to dust on the slide at that position.
Figure 2Bivariate scatter plot of transformed data. The bivariate scatter distribution of the transformed intensity data, y. z = 0, 1, 2 correspond to the poor component with unreliably low intensities, the good component and the poor component with unreliably high intensities.
Figure 3Bivariate scatter plot of residuals. The bivariate scatter distribution of the residuals, r. z = 0, 1, 2 correspond to the poor component with unreliably low intensities, the good component and the poor component with unreliably high intensities.
Figure 4Quantile-Quantile plots. Quantile-Quantile plot for spots predicted as good quality in each channel.