| Literature DB >> 19545436 |
Brian E Howard1, Beate Sick, Steffen Heber.
Abstract
BACKGROUND: Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny.Entities:
Mesh:
Year: 2009 PMID: 19545436 PMCID: PMC2717951 DOI: 10.1186/1471-2105-10-191
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
BioConductor Quality Control Statistics
| mean, standard deviation, median and inter-quartile range of raw log intensity distribution. | |
| 5th and 95th percentile of raw log intensity distribution. | |
| slope parameter and associated p-value of linear regression of log expression level versus probe number, as computed by R affy library function AffyRNAdeg(). | |
| mean, standard deviation, median, inter-quartile range, and 5th and 95th percentiles of normalized log intensity distribution. | |
| 0.1th, 1st, 10th and 20th percentile of the probe-level model weights, computed using affyPLM library functionality. | |
| 1st, 10th, 25th, 75th, 90th, and 99th percentile of probe-level model residuals, computed using affyPLM library functionality. | |
| median, inter-quartile range, lower tail and upper tail of "relative log intensity", computed using affyPLM library functionality. | |
1. The "SCORE" function was used to normalize values for each statistic, t, for each chip, i, relative to the values observed in other chips from the same experiment: ; with median() and mad() computed across all chips in the experiment.
Affymetrix Expression Console Quality Control Statistics (Exon Arrays)
| mean of the raw intensity for all PM probes, prior to any normalizations. | |
| mean of the raw intensity for all probes used to compute background intensity. (Note: may be higher than pm.mean because GC compositions of probes used to compute background and PM probes can be quite different.) | |
| area under ROC curve discriminating between positive control probesets and negative control probesets. | |
| mean and standard deviation of probeset signals after normalization. 2 | |
| mean and standard deviation of the absolute deviations of the RMA probe level model residuals from the median across chips. 2 | |
| mean and standard deviation of the absolute values of the relative log expression (RLE) for all probesets. 2 | |
1. The "SCORE" function was used to normalize values for each statistic, t, for each chip, i, relative to the values observed in other chips from the same experiment: ; with median() and mad() computed across all chips in the experiment.
2. Separate statistics are computed for a) all probesets, b) negative control probesets, and c) positive control probesets.
Figure 1Mixture Model Parameter Estimates. Supervised (MLE) and Unsupervised (EM) estimates shown are for the following features from the 3' expression arrays: (A) 5th percentile of raw intensities, (B) inter-quartile range of the Relative Log Intensity (RLE), (C) 25th percentile of the probe-level model residuals, and (D) the 20th percentile of the probe-level model weights. All features were normalized relative to other chips in the same experiment, using the SCORE function (see Table 1).
Figure 2Comparison of Parameter Estimates for 3' Expression Arrays and Exon Arrays. Each diagram illustrates the unsupervised Gaussian parameter estimates for one of the quality control features, for each of the two chip types. Estimates shown are for the following features: (A) Upper tail of the Relative Log Intensity (RLE), computed using the affyPLM functionality, (B) median of the raw intensity distribution, (C) 10th percentile of the probe-level model residuals, and (D) inter-quartile range of the RLE.
Figure 3Parameter Estimates for Exon Array Expression Console QC Features. Shown are the parameter estimates obtained using the EM algorithm for various exon array quality control features available in the Affymetrix Expression Console software. Estimates shown are for the following features: (A) mean of the absolute deviation of the RMA probe level model residuals from the median across chips, (B) standard deviation of signal from positive control probesets after normalization, (C) standard deviation of signal from all probesets after normalization, and (D) area under ROC curve discriminating between positive control probesets and negative controls.
Figure 4Classifier Performance. Unsupervised versus supervised classifier using labeled data sets of various sizes. When the full labeled training dataset (~540 labeled instances per fold) is available, the performance of the unsupervised classification method (EM+Naïve Bayes) and the supervised classification method (MLE+Naïve Bayes) are equivalent on the test dataset. When the amount of labeled data is limited, but unlabeled data is abundant, the unsupervised method outperforms the supervised method.