| Literature DB >> 19458777 |
Song Yang1, Xiang Guo, Yaw-Ching Yang, Denise Papcunik, Caroline Heckman, Jeffrey Hooke, Craig D Shriver, Michael N Liebman, Hai Hu.
Abstract
We developed a quality assurance (QA) tool, namely microarray outlier filter (MOF), and have applied it to our microarray datasets for the identification of problematic arrays. Our approach is based on the comparison of the arrays using the correlation coefficient and the number of outlier spots generated on each array to reveal outlier arrays. For a human universal reference (HUR) dataset, which is used as a technical control in our standard hybridization procedure, 3 outlier arrays were identified out of 35 experiments. For a human blood dataset, 12 outlier arrays were identified from 185 experiments. In general, arrays from human blood samples displayed greater variation in their gene expression profiles than arrays from HUR samples. As a result, MOF identified two distinct patterns in the occurrence of outlier arrays. These results demonstrate that this methodology is a valuable QA practice to identify questionable microarray data prior to downstream analysis.Entities:
Year: 2007 PMID: 19458777 PMCID: PMC2675485
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Correlation coefficient table for selected HUR arrays.
| B_0.00223_ | 1.00 | 0.85 | 0.87 | 0.78 | 0.90 | 0.87 |
| B_0.00553_ | 0.85 | 1.00 | 0.86 | 0.93 | 0.83 | 0.83 |
| B_0.00215_ | 0.87 | 0.86 | 1.00 | 0.87 | 0.92 | 0.90 |
| B_1.00083_ | 0.78 | 0.93 | 0.87 | 1.00 | 0.81 | 0.83 |
| B_1.00110_ | 0.90 | 0.83 | 0.92 | 0.81 | 1.00 | 0.88 |
| B_0.00170_ | 0.87 | 0.83 | 0.90 | 0.83 | 0.88 | 1.00 |
Average correlation coefficient and percentage of outlier points for the 35 HUR arrays.
| T00245878 | 0.3 | 42.23 | T00209832 | 0.8 | 1.85 | T00211006 | 0.84 | 0.37 |
| T00237520 | 0.22 | 40.56 | T00205609 | 0.78 | 1.71 | T00216482 | 0.82 | 0.37 |
| T00225133 | 0.49 | 28.21 | T00211750 | 0.81 | 1.29 | T00208091 | 0.84 | 0.27 |
| T00208035 | 0.76 | 11.11 | T00211760 | 0.81 | 1.16 | T00208342 | 0.83 | 0.26 |
| T00237506 | 0.78 | 5.32 | T00210855 | 0.83 | 1.14 | T00210907 | 0.84 | 0.23 |
| T00208021 | 0.76 | 4.27 | T00208020 | 0.8 | 1.03 | T00208057 | 0.82 | 0.22 |
| T00237505 | 0.78 | 4.16 | T00216483 | 0.83 | 0.95 | T00210996 | 0.83 | 0.19 |
| T00237508 | 0.78 | 4.07 | T00207911 | 0.83 | 0.91 | T00210869 | 0.84 | 0.18 |
| T00236213 | 0.79 | 3.7 | T00209843 | 0.82 | 0.84 | T00210856 | 0.83 | 0.16 |
| T00245873 | 0.82 | 2.75 | T00210891 | 0.83 | 0.72 | T00207898 | 0.82 | 0.11 |
| T00225122 | 0.79 | 2.63 | T00208076 | 0.81 | 0.69 | T00210880 | 0.83 | 0 |
| T00209817 | 0.8 | 2.34 | T00216489 | 0.83 | 0.68 |
The average correlation coefficient for each array is computed by averaging the correlation coefficients of that array with every other array. Percentage of outlier spots on an array was computed by dividing the number of outlier spots by the total number of probes involved in the analysis. Data points with resistant z-score below −3 or above 3 were counted as outlier spots.
Figure 1Clustering of the HUR arrays. The correlation coefficient table for the 35 HUR arrays was clustered by hierarchical clustering and displayed as a heat map using Spotfire. From red color to green color, correlation coefficient increases. The 3 outlier arrays were clustered in red.
Figure 2Biased spatial distribution of spots on array T00237520. (A) Scatter plot of log transformed intensity values with the x-axis for the values from this array and the y-axis for those from the model array. The highlighted spots (in black) were probes showing consistent and strong signals (intensity > 2000) in both arrays. (B) The physical locations of the probes on the array. Probes highlighted in (A) were more focused in 4 blocks (black dots). (C) Scatter plot similar to (A) but now the highlighted probes were those whose signal is strong on one array but weaker in the other. (D) Those probes of inconsistent performance were distributed mostly on the other 12 blocks.
Figure 3Visualization of correlation between the 185 human blood samples (left panel) and percentage of outlier spots on each of these arrays (right panel). The correlation table of the human blood samples was displayed as a heat map with the red color representing low correlation and green color showing high correlation between a pair of arrays. The percentages of outlier spots for each array were also shown in a heat map on the right with the red and green color standing for high and low percentage of outlier spots, respectively. The arrays were in the same order from top to bottom in both heat maps and the subject categories were shown on the left side of the figure. Marked in dark green were 12 arrays that were flagged as failed ones as described in the text.