| Literature DB >> 22129216 |
Yan Yang1, Phillip Stafford, YoonJoo Kim.
Abstract
BACKGROUND: Microarray image analysis processes scanned digital images of hybridized arrays to produce the input spot-level data for downstream analysis, so it can have a potentially large impact on those and subsequent analysis. Signal saturation is an optical effect that occurs when some pixel values for highly expressed genes or peptides exceed the upper detection threshold of the scanner software (2(16) - 1 = 65, 535 for 16-bit images). In practice, spots with a sizable number of saturated pixels are often flagged and discarded. Alternatively, the saturated values are used without adjustments for estimating spot intensities. The resulting expression data tend to be biased downwards and can distort high-level analysis that relies on these data. Hence, it is crucial to effectively correct for signal saturation.Entities:
Mesh:
Year: 2011 PMID: 22129216 PMCID: PMC3269438 DOI: 10.1186/1471-2105-12-462
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Array image with a hexagonal grid superimposed. Valley Fever diagnosis study: Array image with a hexagonal grid superimposed. The red +'s are spot centers.
Figure 2Boxplots of foreground median intensities. Valley Fever diagnosis study: Boxplots of foreground median intensities for blank spots, two-cluster spots and three-cluster spots on a random block with 484 spots. The number of clusters was selected by BIC.
Figure 3Three saturated spots segmented by censored GMM and regular GMM. Valley Fever diagnosis study: Three saturated spots segmented by the censored Gaussian mixture model (top panel) or the regular Gaussian mixture model (bottom panel). Foreground pixels are bounded by black line segments. Intermediate pixels that are neither foreground nor background are bounded between black and white line segments.
Foreground median and mean intensities for three saturated spots
| Spot | Saturated pixels | FG pixels | FG median (mean) | ||||
|---|---|---|---|---|---|---|---|
| GenePix | CGMM | GMM | GenePix | CGMM | GMM | ||
| 1 | 34 | 120 | 116 | 114 | 52548 (48565) | 57174 | 55538 |
| 2 | 60 | 120 | 78 | 60 | 59077 (42119) | 70460 | 65535 |
| 3 | 18 | 120 | 26 | 140 | 20607 (26909) | 74128 | 24738 |
Valley Fever diagnosis study: Number of saturated pixels, number of foreground pixels, and foreground median and mean intensities from GenePix, censored Gaussian mixture model, and regular Gaussian mixture model for the three saturated spots displayed in Figure 3
Background median and mean intensities for three saturated spots
| Spot | BG pixels | BG median (mean) | ||||
|---|---|---|---|---|---|---|
| GenePix | CGMM | GMM | GenePix | CGMM | GMM | |
| 1 | 556 | 249 | 250 | 5784 (5825) | 5845 | 5857 |
| 2 | 596 | 284 | 288 | 1619 (1992) | 2458 | 2546 |
| 3 | 671 | 348 | 349 | 1038 (1107) | 1243 | 1248 |
Valley Fever diagnosis study: Number of background pixels and background median and mean intensities from GenePix, censored Gaussian mixture model, and regular Gaussian mixture model for the three saturated spots displayed in Figure 3
Median percentage of saturated foreground pixels
| Spot | Median % of saturated FG pixels | |
|---|---|---|
| 1 | 3.3 | 27.9 |
| 2 | 4.9 | 50.0 |
| 3 | 10.9 | 69.4 |
| 4 | 26.5 | 72.2 |
Lymphoma diagnosis study: Median percentage of saturated foreground pixels, calculated as # of saturated foreground pixels divided by # of foreground pixels, across 21 arrays for four spots. The artificial saturation threshold was taken at S1 = 1000 or S2 = 800.
Figure 4Comparison of background-subtracted median intensities for four selected spots. Lymphoma diagnosis study: Comparison of background-subtracted median intensity estimates for four spots on 21 arrays, based on the regular Gaussian mixture model and GenePix each with the original, uncensored data (S = 65535) as well as the censored Gaussian mixture model and the regular Gaussian mixture model each with the artificially saturated data (S = 1000 or 800).
Misclassification rate based on leave-one-out cross validation
| Method | Data | TP | TN | FP | FN | Error rate |
|---|---|---|---|---|---|---|
| GenePix | Original | 12 | 4 | 3 | 2 | 0.24 |
| GMM | Original | 13 | 3 | 4 | 1 | 0.24 |
| CGMM | Censored at 1000 | 13 | 3 | 4 | 1 | 0.24 |
| GMM | Censored at 1000 | 10 | 2 | 5 | 4 | 0.43 |
| CGMM | Censored at 800 | 11 | 3 | 4 | 3 | 0.33 |
| GMM | Censored at 800 | 9 | 0 | 7 | 5 | 0.57 |
Lymphoma diagnosis study: Numbers of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) out of 21 samples, and misclassification rate based on leave-one-out cross validation
Percent of times the mixture component was correctly selected
| % of saturated FG pixels | GMM0 | CGMM | GMM1 | |
|---|---|---|---|---|
| 2 | 10 | 100.0 | 100.0 | 99.0 |
| 40 | 100.0 | 100.0 | 76.3 | |
| 70 | 100.0 | 100.0 | 82.7 | |
| 3 | 10 | 100.0 | 100.0 | 100.0 |
| 40 | 100.0 | 100.0 | 95.0 | |
| 70 | 100.0 | 99.5 | 68.9 |
Percent of times the mixture component K was correctly selected by BIC in the regular Gaussian mixture model for complete, uncensored data (GMM0), the censored Gaussian mixture model for censored data, and the regular Gaussian mixture model for censored data (GMM1)
Relative bias in the two-component mixture model
| Parameter | GMM0 | CGMM | GMM1 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 10 | 40 | 70 | 10 | 40 | 70 | 10 | 40 | 70 | |
| 0.0006 | 0.0007 | 0.0006 | 0.0006 | 0.0006 | 0.0006 | 0.0026 | 0.0075 | 0.0035 | |
| 0.0003 | 0.0003 | 0.0003 | 0.0003 | 0.0003 | 0.0003 | 0.0007 | 0.0006 | 0.0014 | |
| -0.0043 | -0.0043 | -0.0044 | -0.0044 | -0.0044 | -0.0045 | -0.0010 | 0.0032 | 0.0073 | |
| -0.0002 | 0.0000 | -0.0001 | -0.0001 | 0.0004 | -0.0019 | -0.0104 | -0.0867 | -0.2220 | |
| -0.0079 | -0.0080 | -0.0075 | -0.0070 | -0.0068 | -0.0162 | -0.1121 | -0.3784 | -0.6426 | |
Simulation with true K = 2: Relative bias based on runs with K correctly selected by BIC. The models considered were regular Gaussian mixture for complete, uncensored data (GMM0), censored Gaussian mixture for censored data, and regular Gaussian mixture for censored data (GMM1). Percents of saturated foreground pixels were set at 10% (μ2 = 46, 300, σ2 = 15, 000), 40% (μ2 = 60, 450, σ2 = 20, 000) and 70% (μ2 = 78, 650, σ2 = 25, 000).
Relative bias in the three-component mixture model
| Parameter | GMM0 | CGMM | GMM1 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 10 | 40 | 70 | 10 | 40 | 70 | 10 | 40 | 70 | |
| 0.0011 | 0.0010 | 0.0011 | 0.0011 | 0.0011 | 0.0012 | 0.0006 | -0.0002 | -0.0088 | |
| 0.0015 | 0.0015 | 0.0015 | 0.0015 | 0.0015 | 0.0015 | 0.0015 | 0.0014 | 0.0033 | |
| -0.0044 | -0.0045 | -0.0045 | -0.0044 | -0.0045 | -0.0044 | -0.0051 | -0.0077 | -0.0117 | |
| -0.0022 | -0.0017 | -0.0023 | -0.0032 | -0.0032 | -0.0046 | 0.0176 | 0.1637 | 0.5294 | |
| 0.0008 | 0.0005 | 0.0004 | 0.0003 | -0.0001 | 0.0003 | 0.0117 | 0.1679 | 0.7195 | |
| -0.0262 | -0.0249 | -0.0255 | -0.0276 | -0.0271 | -0.0285 | 0.0100 | 0.3923 | 1.8881 | |
| -0.0001 | 0.0000 | -0.0002 | -0.0002 | 0.0002 | -0.0010 | -0.0053 | -0.0384 | -0.1352 | |
| -0.0071 | -0.0076 | -0.0071 | -0.0049 | -0.0028 | -0.0033 | -0.1161 | -0.4601 | -0.9294 | |
Simulation with true K = 3: Relative bias based on runs with K correctly selected by BIC. The models considered were regular Gaussian mixture for complete, uncensored data (GMM0), censored Gaussian mixture for censored data, and regular Gaussian mixture for censored data (GMM1). Percents of saturated foreground pixels were set at 10% (μ3 = 52, 700, σ3 = 10, 000), 40% (μ3 = 62, 500, σ3 = 12, 000) and 70% (μ3 = 75, 000, σ3 = 18, 000).