| Literature DB >> 16959036 |
Jaroslav P Novak1, Seon-Young Kim, Jun Xu, Olga Modlich, David J Volsky, David Honys, Joan L Slonczewski, Douglas A Bell, Fred R Blattner, Eduardo Blumwald, Marjan Boerma, Manuel Cosio, Zoran Gatalica, Marian Hajduch, Juan Hidalgo, Roderick R McInnes, Merrill C Miller, Milena Penkowa, Michael S Rolph, Jordan Sottosanto, Rene St-Arnaud, Michael J Szego, David Twell, Charles Wang.
Abstract
BACKGROUND: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.Entities:
Year: 2006 PMID: 16959036 PMCID: PMC1586001 DOI: 10.1186/1745-6150-1-27
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Illustration of the consecutive sampling procedure
| Rank | Probe set | Sample Y1 | Sample Y2 | Y2-Y1 | (Y2+Y1)/2 | Sample Mean | SD (Y2-Y1) | SD(Y1)+ SD(Y2) |
| ... | ... | ... | ... | ... | ||||
| 251 | J03040_at | 628 | 614 | -14 | 621 | 614.4 | 71.1 | 71.8 |
| 252 | M26880_at | 657 | 583 | -74 | 620 | |||
| 253 | HG384-HT384_at | 577 | 662 | 86 | 619 | |||
| 254 | X04654_s_at | 633 | 604 | -29 | 619 | |||
| 255 | J04046_s_at | 554 | 680 | 126 | 617 | |||
| 256 | X69908_rna1_at | 593 | 640 | 47 | 617 | |||
| 257 | D85758_at | 672 | 555 | -117 | 614 | |||
| 258 | L12168_at | 633 | 592 | -41 | 612 | |||
| 259 | HG1614-HT1614_at | 590 | 633 | 43 | 611 | |||
| 260 | X71428_at | 571 | 649 | 77 | 610 | |||
| 261 | S75463_at | 602 | 615 | 13 | 608 | |||
| 262 | X69910_at | 579 | 630 | 50 | 604 | |||
| 263 | X57346_at | 597 | 610 | 13 | 603 | 590.1 | 136.2 | 137.0 |
| 264 | U01691_s_at | 576 | 630 | 54 | 603 | |||
| 265 | X17620_at | 605 | 594 | -11 | 600 | |||
| 266 | U10323_at | 562 | 617 | 56 | 590 | |||
| 267 | AJ001421_at | 413 | 766 | 354 | 589 | |||
| 268 | X62654_rna1_at | 576 | 602 | 26 | 589 | |||
| 269 | D64142_at | 666 | 510 | -156 | 588 | |||
| 270 | D21063_at | 562 | 613 | 51 | 588 | |||
| 271 | X16560_at | 588 | 580 | -8 | 584 | |||
| 272 | D26600_at | 580 | 586 | 6 | 583 | |||
| 273 | M19267_s_at | 599 | 566 | -33 | 583 | |||
| 274 | J02621_s_at | 688 | 475 | -213 | 582 | |||
| ... | ... | ... | ... | ... | ... | |||
Rank shows the rank from the highest mean expression. The columns "Sample Y1 and Y2" give the expression values, Y2 - Y1 is the expressions difference and (Y2+Y1)/2 the mean expression of the probe sets Y1 and Y2. Sample Mean is the mean expression of the sample, "D(Y2-Y1)" is the standard deviation obtained from the difference of expressions and SD(Y1)+SD(Y2) is the sum of the standard deviations calculated from the values Y1 and Y2, respectively. The first 250 probe pairs are excluded to keep variation of the mean expression within the sample small.
Percentage of samples failing the Kolmogorov-Smirnov normality test
| Array | Materials | No. of arrays | No. of probe sets | Threshold | % failure (total) | % failure (above) | % failure (below) |
| Affy. HuGeneFL | human cell line SKBR [a] | 5 | 7070 | 2.7 | 7.2 | 6.9 | 7.8 |
| Affy. HuGeneFL | human cell line IMR90 [a] | 11 | 7070 | 4.1 | 6.3 | 6.5 | 5.9 |
| Affy. U74Av2 | murine lung tissue [b] | 5 | 12422 | 10.0 | 6.1 | 6.6 | 5.5 |
| Affy. U74Av2 | murine lung tissue [c] | 5 | 12422 | 13.6 | 7.6 | 8.3 | 6.4 |
| Affy. U74Av2 | murine lung tissue [d] | 11 | 12422 | 14.2 | 10.6 | 10.6 | 10.7 |
| Affy. Focus | human blood cell line [e] | 9 | 8746 | 5.0 | 6.14 | 6.14 | 6.14 |
| Illumina 1 | human cell line GM10469 [f] | 4 | 633 | 2.1 | 4.6 | 3.9 | 6.2 |
| Illumina 2 | human cell line GM10469 [f] | 4 | 633 | 3.6 | 6.5 | 6.6 | 6.2 |
| Average | --- | --- | --- | --- | 6.9 | 6.9 | 6.9 |
Percentage of samples failing the Kolmogorov-Smirnov normality test at the level P = 0.05. All arrays are normalized to 100% of the mean value. The columns %failure (above) and %failure (below) give percentage of failures above and below the specified threshold.
[a] data Ref. [10].
[b] C57BL/6 (B6) WT mice, data Ref. [15].
[c] C57BL/6-Cftr-/- KO inbred mice, data Ref. [15].
[d] data M. Cosio.
[e] data O. Modlich and S. Raschke.
[f] lymphoblast cell line GM10469 [8].
Figure 1Comparison of the observed frequency distribution to the inverse normal cumulative distribution. Quantile-quantile plots show on y-axis the observed expression and on x-axis value of the corresponding inverse normal cumulative distribution. Microarray data are derived from HuGeneFL, using IMR90 cell line with 11 samples. Panels show the probe sets with the Kolmogorov-Smirnov maximum distance D equal or close to the mean value in the specified average expression rage. Inserts provide the Affymetrix probe set identification, average expression for a given gene and standard deviation. A: probe set HG2279-HT2375_at, rank 43, expression range from 1000 to 6681 (high range, maximum), average D in the range is 0.176, sample D is 0.176; B: probe set Z23091_rna1_at, rank 5484, expression range from -0.4 to 0.4 (near-zero range), average D in the rang is 0.181, sample D is 0.182; C: probe set X95876_at, rank 7003, expression range from -20 to -923 (negative range, minimum), average D in the range is 0.183, sample D is 0.182; D: example of the probe set that failed the test – probe set M14199_s_at, rank 25, sample D is 0.204 (data Novak et al., IMR90 [10]).
Figure 2Comparison of the observed frequency distribution to the inverse normal cumulative distribution, pooled data. Quantile-quantile plots show on y-axis the observed expression and on x-axis value of the corresponding inverse normal cumulative distribution. Microarray data are derived from HuGeneFL, using the cell line IMR90 with 11 samples, pooled data. A: expression range from -0.1 to 0.1, 62 probe sets; B: expression range from 500 to 1000, 185 probe sets; C: expression range from 500 to 1000, 185 probe sets, relative expression values (sample expression divided by the mean of 11 samples; data Novak et al. [10]).
Figure 3Comparison of the observed frequency distribution of consecutive samples to the inverse normal cumulative distribution. Quantile-quantile plots show on y-axis the difference of expression of two microarrays and on x-axis value of the corresponding inverse normal cumulative distribution. Microarray data are derived from HuGeneFL, using cell line IMR90 [10]. Probe sets of the microarrays 1 and 3 are ordered according to the mean expression and statistical samples of 12 probe sets are taken in the range of ranks from 250 to 4800. Panels show the samples with the Kolmogorov-Smirnov maximum distance equal or close to the mean value in the specified average expression rage. Inserts provide the average mean expression (range avg.), mean of the differences (s. avg.) and standard deviation (s. SD). A: expression range from 400 to 620, average D in the range is 0.142, sample D is 0.142; B: expression range from 10 to 20, average D in the range is 0.204, sample D is 0.204.
Figure 4Dispersion of the murine tissue data, array MG-U74Av2, samples MT4-07 and MT4-08. A. Dispersion plot and boundaries of the 0.8 and 0.95 probability intervals. B: Standard deviations calculated using the expression difference in consecutive samples and the regression curve (solid line), representing the standard deviation function (data Cosio).
Summary of values of the coefficients of standard deviation function and Kcoefficients
| GeneChip/average/SD/CV | No. of probe sets | Cons. samp. size | No. of arrays | No. of comp. | Coefficient. of st. dev. function | Average Kalpha values in given probability intervals | ||||||||
| a1 | a2 | 0.500 | 0.600 | 0.700 | 0.800 | 0.900 | 0.950 | 0.990 | 0.995 | |||||
| HuGeneFL [a1] | 7070 | 12 | 11 | 54 | 8.61 | 0.115 | 0.64 | 0.81 | 1.01 | 1.27 | 1.71 | 2.16 | 3.41 | 4.09 |
| HuGeneFL [b] | 7070 | 12 | 13 | 78 | 8.09 | 0.418 | 0.61 | 0.77 | 0.97 | 1.24 | 1.73 | 2.24 | 3.58 | 4.18 |
| HuGeneFL [c] | 7070 | 25 | 5 | 10 | 34.58 | 0.435 | 0.61 | 0.76 | 0.94 | 1.20 | 1.60 | 2.03 | 3.39 | 4.10 |
| HuGeneFL [c] | 7070 | 25 | 5 | 10 | 28.93 | 0.393 | 0.59 | 0.73 | 0.92 | 1.18 | 1.60 | 2.07 | 3.42 | 4.07 |
| HG-U95Av2 [d] | 12559 | 12 | 5 | 10 | 6.30 | 0.688 | 0.61 | 0.78 | 1.00 | 1.30 | 1.83 | 2.32 | 2.98 | 3.26 |
| HG-U95Av2 [e1] | 12559 | 25 | 15 | 15 | 10.49 | 0.199 | 0.65 | 0.82 | 1.01 | 1.27 | 1.68 | 2.07 | 3.05 | 3.55 |
| HG-U95Av2 [e2] | 12559 | 25 | 15 | 15 | 10.59 | 0.200 | 0.64 | 0.81 | 1.01 | 1.26 | 1.67 | 2.07 | 3.07 | 3.56 |
| HG-U95Av2 [e3] | 12559 | 25 | 15 | 15 | 10.41 | 0.189 | 0.63 | 0.80 | 1.00 | 1.25 | 1.66 | 2.07 | 3.06 | 3.55 |
| HG-U95Av2 [e4] | 12559 | 25 | 12 | 12 | 10.91 | 0.185 | 0.63 | 0.79 | 0.99 | 1.24 | 1.66 | 2.04 | 3.02 | 3.50 |
| HG-U95Av2 [e5] | 12559 | 25 | 4 | 2 | 6.50 | 0.394 | 0.65 | 0.82 | 1.01 | 1.26 | 1.65 | 2.05 | 3.05 | 3.62 |
| HG-U95Av2 [f] | 12559 | 25 | 4 | 2 | 5.64 | 0.155 | 0.64 | 0.81 | 1.00 | 1.25 | 1.66 | 2.04 | 2.92 | 3.31 |
| HG-U95Av2 [g1] | 12559 | 25 | 2 | 1 | 4.79 | 0.479 | 0.61 | 0.76 | 0.95 | 1.20 | 1.64 | 2.10 | 3.22 | 3.71 |
| HG-U95Av2 [g1] | 12559 | 25 | 5 | 10 | 3.31 | 0.500 | 0.64 | 0.80 | 1.00 | 1.26 | 1.69 | 2.09 | 2.98 | 3.29 |
| HG-U95B [d] | 12563 | 12 | 5 | 10 | 16.26 | 0.636 | 0.63 | 0.80 | 1.01 | 1.30 | 1.80 | 2.28 | 3.18 | 3.68 |
| HG-U95B [e5] | 12563 | 25 | 2 | 1 | 17.95 | 0.167 | 0.64 | 0.80 | 0.99 | 1.24 | 1.65 | 2.02 | 3.08 | 3.54 |
| HG-U95C [d] | 12587 | 12 | 5 | 10 | 20.57 | 0.603 | 0.64 | 0.81 | 1.02 | 1.30 | 1.77 | 2.25 | 3.39 | 3.98 |
| HG-U95C [e5] | 12587 | 25 | 2 | 1 | 16.42 | 0.178 | 0.63 | 0.79 | 0.99 | 1.23 | 1.62 | 2.00 | 3.06 | 3.61 |
| HG-U95D [d] | 12587 | 12 | 5 | 10 | 39.79 | 0.501 | 0.62 | 0.78 | 0.99 | 1.28 | 1.77 | 2.33 | 3.80 | 4.42 |
| HG-U95D [e5] | 12587 | 25 | 2 | 1 | 54.45 | 0.240 | 0.63 | 0.79 | 0.99 | 1.23 | 1.62 | 2.05 | 2.98 | 3.48 |
| HG-U95E [d] | 12582 | 12 | 5 | 10 | 31.84 | 0.534 | 0.60 | 0.76 | 0.97 | 1.27 | 1.78 | 2.29 | 3.53 | 4.08 |
| HG-U95E [e5] | 12582 | 25 | 2 | 1 | 45.64 | 0.215 | 0.63 | 0.79 | 1.00 | 1.25 | 1.64 | 2.02 | 2.88 | 3.31 |
| HG-U133A 2.0 [e6] | 22225 | 25 | 15 | 15 | 4.55 | 0.091 | 0.65 | 0.82 | 1.02 | 1.28 | 1.72 | 2.14 | 3.14 | 3.62 |
| HG-U133A 2.0 [e7] | 22225 | 25 | 15 | 15 | 4.50 | 0.106 | 0.66 | 0.84 | 1.04 | 1.31 | 1.75 | 2.17 | 3.18 | 3.63 |
| HG-U133A 2.0 [e8] | 22225 | 25 | 12 | 12 | 4.10 | 0.108 | 0.67 | 0.84 | 1.05 | 1.32 | 1.77 | 2.21 | 3.26 | 3.75 |
| HG-U133A 2.0 [h1] | 22225 | 25 | 8 | 4 | 3.91 | 0.288 | 0.64 | 0.80 | 1.00 | 1.26 | 1.67 | 2.08 | 3.05 | 3.54 |
| HG-U133A 2.0 [i1] | 22225 | 12 | 4 | 6 | 6.25 | 0.210 | 0.65 | 0.82 | 1.01 | 1.27 | 1.68 | 2.08 | 3.04 | 3.48 |
| HG-U133A 2.0 [j] | 22225 | 25 | 5 | 10 | 6.54 | 0.390 | 0.62 | 0.79 | 0.98 | 1.25 | 1.68 | 2.10 | 3.13 | 3.62 |
| HG-U133A 2.0 [j] | 22225 | 25 | 5 | 10 | 5.54 | 0.393 | 0.62 | 0.79 | 0.98 | 1.25 | 1.66 | 2.08 | 3.10 | 3.64 |
| HG-U133A 2.0 [k1] | 22225 | 25 | 6 | 15 | 3.68 | 0.672 | 0.63 | 0.80 | 1.00 | 1.27 | 1.71 | 2.11 | 2.84 | 3.16 |
| HG-U133A 2.0 [k1] | 22225 | 25 | 3 | 3 | 4.95 | 0.435 | 0.59 | 0.74 | 0.93 | 1.19 | 1.65 | 2.15 | 3.37 | 3.79 |
| HG-U133A 2.0 [l1] | 22225 | 25 | 6 | 3 | 6.45 | 0.151 | 0.64 | 0.81 | 1.01 | 1.27 | 1.68 | 2.08 | 3.06 | 3.55 |
| HG-U133A 2.0 [l2] | 22225 | 25 | 12 | 6 | 6.78 | 0.148 | 0.65 | 0.82 | 1.02 | 1.27 | 1.67 | 2.06 | 2.93 | 3.33 |
| HG-U133 Plus 2 [i2] | 54000 | 50 | 8 | 10 | 3.94 | 0.188 | 0.68 | 0.85 | 1.06 | 1.33 | 1.76 | 2.17 | 3.11 | 3.55 |
| HG-U133 Plus 2 [l3] | 54000 | 50 | 20 | 27 | 4.03 | 0.086 | 0.65 | 0.82 | 1.02 | 1.28 | 1.69 | 2.07 | 2.95 | 3.34 |
| HG-Focus [k2] | 8756 | 12 | 9 | 36 | 4.13 | 0.216 | 0.70 | 0.87 | 1.08 | 1.35 | 1.76 | 2.15 | 3.06 | 3.47 |
| HG-Focus [k2] | 8756 | 12 | 4 | 6 | 4.13 | 0.181 | 0.68 | 0.86 | 1.07 | 1.33 | 1.76 | 2.18 | 3.08 | 3.56 |
| HG-Focus [k2] | 8756 | 12 | 4 | 6 | 3.90 | 0.198 | 0.70 | 0.87 | 1.08 | 1.36 | 1.81 | 2.22 | 3.15 | 3.55 |
| HG-Focus [k2] | 8756 | 12 | 4 | 6 | 3.69 | 0.176 | 0.68 | 0.86 | 1.07 | 1.35 | 1.79 | 2.19 | 3.17 | 3.64 |
| HG-Focus [k2] | 8756 | 12 | 5 | 10 | 3.46 | 0.183 | 0.67 | 0.85 | 1.05 | 1.33 | 1.77 | 2.19 | 3.12 | 3.49 |
| HG-Focus [k2] | 8756 | 12 | 4 | 6 | 4.01 | 0.205 | 0.71 | 0.89 | 1.10 | 1.38 | 1.81 | 2.19 | 3.11 | 3.53 |
| HG-Focus [k2] | 8756 | 12 | 4 | 6 | 3.98 | 0.174 | 0.68 | 0.85 | 1.06 | 1.32 | 1.74 | 2.14 | 3.06 | 3.43 |
| MG-Mu11kSubA, SubB [a] | 13069 | 12 | 10 | 20 | 9.98 | 0.121 | 0.59 | 0.75 | 0.95 | 1.22 | 1.70 | 2.20 | 3.65 | 4.48 |
| MG-Mu11kSubA, SubB [a] | 13069 | 12 | 10 | 20 | 8.03 | 0.170 | 0.59 | 0.74 | 0.94 | 1.20 | 1.68 | 2.19 | 3.78 | 4.66 |
| MG-Mu11kSubA, SubB [a] | 13069 | 12 | 10 | 20 | 8.21 | 0.145 | 0.60 | 0.76 | 0.96 | 1.23 | 1.70 | 2.21 | 3.70 | 4.52 |
| MG-Mu11kSubA, SubB [a] | 13069 | 12 | 10 | 20 | 5.32 | 0.139 | 0.56 | 0.71 | 0.91 | 1.19 | 1.71 | 2.30 | 4.03 | 4.84 |
| MG-Mu11kSubA, SubB [m1] | 13069 | 12 | 8 | 4 | 13.86 | 0.321 | 0.64 | 0.81 | 1.01 | 1.28 | 1.73 | 2.22 | 3.62 | 4.24 |
| MG Mu11kSubA, SubB [n1] | 13069 | 12 | 20 | 10 | 11.84 | 0.420 | 0.66 | 0.82 | 1.01 | 1.26 | 1.71 | 2.21 | 3.61 | 4.32 |
| Mu19kSubA, B, C [m2] | 12420 | 12 | 12 | 6 | 15.41 | 0.314 | 0.63 | 0.79 | 0.99 | 1.25 | 1.73 | 2.26 | 3.95 | 4.56 |
| MG-U74Av2 [o] | 12588 | 12 | 6 | 6 | 8.72 | 0.180 | 0.59 | 0.75 | 0.95 | 1.23 | 1.70 | 2.20 | 3.56 | 4.30 |
| MG-U74Av2 [o] | 12588 | 12 | 5 | 4 | 6.97 | 0.230 | 0.58 | 0.74 | 0.94 | 1.23 | 1.71 | 2.24 | 3.68 | 4.48 |
| MG-U74Av2 [p] | 12588 | 12 | 7 | 21 | 9.50 | 0.125 | 0.59 | 0.75 | 0.95 | 1.24 | 1.72 | 2.21 | 3.54 | 4.30 |
| MG-U74Av2 [q] | 12588 | 12 | 5 | 4 | 4.97 | 0.229 | 0.68 | 0.85 | 1.06 | 1.33 | 1.76 | 2.19 | 3.23 | 3.76 |
| MG-U74Av2 [l4] | 12588 | 12 | 9 | 9 | 7.50 | 0.111 | 0.67 | 0.83 | 1.03 | 1.30 | 1.73 | 2.13 | 3.09 | 3.53 |
| MG-U74Av2 [r] | 12588 | 12 | 10 | 20 | 4.69 | 0.269 | 0.65 | 0.82 | 1.02 | 1.29 | 1.73 | 2.17 | 3.31 | 3.89 |
| MG-U74Av2 [r] | 12588 | 12 | 3 | 3 | 3.30 | 0.238 | 0.64 | 0.80 | 1.00 | 1.27 | 1.71 | 2.17 | 3.48 | 4.04 |
| MG-U74Av2 [e5] | 12588 | 12 | 2 | 1 | 3.83 | 0.184 | 0.64 | 0.81 | 1.00 | 1.27 | 1.69 | 2.10 | 3.15 | 3.80 |
| MG-U74Av2 [g2] | 12588 | 12 | 2 | 1 | 13.50 | 0.451 | 0.64 | 0.81 | 1.01 | 1.27 | 1.72 | 2.16 | 3.36 | 3.95 |
| MG U74Av2 [n2] | 12400 | 12 | 26 | 13 | 6.56 | 0.113 | 0.64 | 0.80 | 1.00 | 1.27 | 1.70 | 2.12 | 3.15 | 3.67 |
| MG-U430A [l5] | 22636 | 25 | 10 | 5 | 7.68 | 0.132 | 0.65 | 0.81 | 1.01 | 1.27 | 1.68 | 2.06 | 2.94 | 3.36 |
| MG-U430A [l6] | 22636 | 25 | 5 | 10 | 10.08 | 0.265 | 0.67 | 0.84 | 1.04 | 1.30 | 1.71 | 2.10 | 2.98 | 3.35 |
| MG-U430A [l6] | 22636 | 25 | 5 | 10 | 9.44 | 0.160 | 0.65 | 0.81 | 1.01 | 1.27 | 1.67 | 2.05 | 2.93 | 3.30 |
| RG-U34A [h2] | 8740 | 12 | 35 | 34 | 1.82 | 0.316 | 0.64 | 0.81 | 1.01 | 1.28 | 1.73 | 2.21 | 3.40 | 3.95 |
| RG-U34A [l7] | 8740 | 12 | 6 | 3 | 3.25 | 0.226 | 0.68 | 0.85 | 1.06 | 1.33 | 1.76 | 2.17 | 3.20 | 3.77 |
| RG-U34A, [l8] | 8740 | 12 | 4 | 2 | 6.01 | 0.146 | 0.65 | 0.83 | 1.03 | 1.29 | 1.72 | 2.11 | 3.10 | 3.60 |
| E. coli [t] | 7290 | 12 | 38 | 39 | 3.09 | 0.337 | 0.65 | 0.83 | 1.04 | 1.32 | 1.78 | 2.24 | 3.45 | 4.03 |
| E. coli [u] | 7290 | 12 | 15 | 30 | 1.88 | 0.302 | 0.65 | 0.83 | 1.04 | 1.34 | 1.81 | 2.29 | 3.47 | 4.04 |
| ATH1 [v1] | 22700 | 25 | 14 | 17 | 9.22 | 0.307 | 0.68 | 0.85 | 1.05 | 1.31 | 1.71 | 2.09 | 2.95 | 3.31 |
| ATH1 [v1] | 22700 | 25 | 34 | 36 | 11.18 | 0.269 | 0.68 | 0.85 | 1.05 | 1.31 | 1.71 | 2.08 | 2.91 | 3.26 |
| ATH1 [w] | 22700 | 25 | 8 | 4 | 6.26 | 0.279 | 0.65 | 0.81 | 1.01 | 1.27 | 1.69 | 2.10 | 3.06 | 3.45 |
| ATH1 [x] | 22700 | 25 | 4 | 2 | 3.09 | 0.232 | 0.68 | 0.85 | 1.05 | 1.31 | 1.72 | 2.12 | 3.17 | 3.70 |
| ATH1 [y] | 22700 | 25 | 4 | 2 | 3.07 | 0.247 | 0.66 | 0.82 | 1.02 | 1.28 | 1.68 | 2.08 | 3.09 | 3.60 |
| Arabidopsis [v2] | 8200 | 12 | 7 | 8 | 7.69 | 0.403 | 0.75 | 0.94 | 1.14 | 1.40 | 1.79 | 2.15 | 2.89 | 3.18 |
No. of probe sets is approximate number of the probe sets on array, Cons. samp. size is number of the probe pairs in a consecutive sample, No. of arrays is a number of arrays tested, No. of comp. is the number of pair-wise comparisons among the replicates, coefficients a1 and a2 are the coefficients of standard deviation function and Kalpha is the coefficient determining probability interval; "h." stands for "human," "m." for "murine." Average values, sum of arrays and comparisons (in square brackets), standard deviations (SD) and coefficients of variation (CV) of each GeneChip type are printed in bold italics.
Data sources:
a1 – J. P. Novak et al., HuGeneFl, IMR90 human cell line [10].
a2 – J. P. Novak et al., HuGeneFl, mouse tissues, adult male C57BL/6 [10].
b – A.-M. Mes-Masson, P. Tonin and coworkers, HuGeneFL, normal ovarian surface epithelial (NOSE) primary cell cultures, private communication.
c – P. Permana, HuGeneFL, human skeletal muscle tissue [35].
d – P. Tonin and A.-M. Mes-Masson and coworkers, HG-U95A to E, epithelial ovarian cancer (EOC) cell line [36].
e1 – Affymetrix, HG-U95A, latin square, experiments 1 to 5.
e2 – Affymetrix, HG-U95A, latin square, experiments 6 to 10.
e3 – Affymetrix, HG-U95A, latin square experiments 11 to 15.
e4 – Affymetrix, HG-U95A, latin square, experiments 16 to 19.
e5 – Affymetrix, HG-U95A to E, Demo Data.
e6 – Affymetrix, HG-U133A, latin square, experiments 1 to 5.
e7 – Affymetrix, HG-U133A, latin square, experiments 6 to 10.
e8 – Affymetrix, HG-U133A, latin square, experiments 11 to 14.
f – M. S. Rolph, HG-U95Av2, primary human bronchial epithelial cells.
g1 – Z. Gatalica, HG-U95Av2, breast tumor tissues and normal breast tissue samples [37].
g2 – Z. Gatalica, MG-U74Av2, mouse kidney tissue.
h1 – M. Boerma, HG-U133A 2.0, primary human umbilical vein endothelial cells (HUVECs) and the immortalized HUVEC cell line EA.hy926 [38].
h2 – M. Boerma, RG-U34A, cultures enriched for neonatal rat cardiac myocytes or fibroblasts [39].
i1 – M. Hajduch, HG-U133A 2.0.
i2 – M. Hajduch, HG-U133 plus 2.
j – S. Y. Kim and D. J. Volsky, HG-U133A 2.0, human fetal astrocytes, normal and pseudotyped HIV-1 infected [40].
k1 – O. Modlich, HG-U133A 2.0, human superficial and invasive bladder tumors [41].
k2 – O. Modlich and Raschke, Focus arrays, human lymphoma cell line Kaspas-422, DSMZ no.: ACC 32 (follicular B cell).
l1 – C. Wang and J. Xu, HG-U133A 2.0, human lymphoblast cell line.
l2 – C. Wang and J. Xu, HG-U133A 2.0, human pancreatic islet.
l3 – C. Wang and J. Xu, HG-U133 Plus 2, Stratagene Universal Human Reference RNA, Ambion Human Brain Reference RNA and mixtures of both in different concentrations.
l4 – C. Wang and J. Xu, MG-U74Av2, mouse biliary epithelial cells.
l5 – C. Wang and J. Xu, MG-U430A, mouse spleen.
l6 – C. Wang and J. Xu, MG-U430A, myofibroblast cell line.
l7 – C. Wang and J. Xu, RG-U34A, rat livers.
l8 – C. Wang and J. Xu, RG-U34A, rat bone marrow stem cells.
m1 – McInnes and coworkers, MG-Mu11kSubA, SubB, retinal RNA samples from WT and Rom1 knock-out mice [36].
m2 – McInnes, Szego and coworkers, MG-Mu19kSubA, SubB, SubC, retinal RNA samples from WT and Rom1 knock-out mice [42].
n1 – Burton, McGehee and coworkers, MG-Mu11kSubA, SubB, 3T3-L1 adipocytes [43].
n2 – Burton, McGehee and coworkers, MG-U74Av2,3T3-L1 adipocyte cultures [44].
o – M. Cosio, MG-U74Av2, lung tissues, murine strains NZW and AKR.
p – R. St-Arnaud, MG-U74Av2, C2C12 cells [45].
q – J. Hidalgo, MG-U74Av2, cortex samples, C57B6 normal and IL6 KO mice [46].
r – D. Radzioch, C. Guilbault and coworkers, MG-U74Av2, C57BL/6 (B6) WT and C57BL/6-Cftr-/- (KO) inbred mice [15].
s – S. E. Choe, Drosophila, Drosophila Gene Collection release 1.0 cDNA clones [19].
t – F. Blattner, E. coli antisense genome, E. coli K-12 strain MG1655 and an isogenic fnr::Spr Smr strain [47].
u – J. Slonczewski and S. BonDurant, E. coli antisense genome, E. coli K-12 strain W3110 [48].
v1 – E. Blumwald and J. Sottosanto, ATH1, A. thaliana ecotype Wassilewskija, wild-type line (WS), nhx1 'knockout' line, and a knockout restoration line; [49] and unpublished data.
v2 – E. Blumwald and J. Sottosanto, arabidopsis, A. thaliana ecotype Wassilewskija, wild-type line (WS), and a nhx1 'knockout' line (unpublished data).
w – D. Honys and D. Twell, ATH1, Arabidopsis thaliana ecotype Landsberg erecta plants [50].
x – E. Nambara and K. Nakabayashi, ATH1, Arabidopsis thaliana (L.) Heynh of ecotype Columbia [51].
y – E. Nambara and K. Tatematsu, ATH1, Arabidopsis thaliana (L.) Heynh of ecotype Columbia [52].
Comparison of the coefficients of standard deviation function derived from the consecutive sampling and individual probe sets
| Array | No. of samples | Pair-wise a1 | Individual genes a1 | Difference % | Pair-wise a2 | Individual genes a2 | Difference % |
| HuGene FL (IMR90) | 11 | 6.0 | 5.9 | 1.8 | 0.082 | 0.076 | 7.3 |
| Focus | 9 | 2.9 | 2.9 | 1.7 | 0.153 | 0.154 | -0.6 |
| MG-U74Av2 | 11 | 5.1 | 4.4 | 12.8 | 0.161 | 0.136 | 15.6 |
| Illumina 1 | 4 | 2.7 | 2.4 | 12.2 | 0.092 | 0.085 | 7.7 |
| Illumina 2 | 4 | 2.2 | 2.1 | 2.6 | 0.096 | 0.082 | 14.7 |
| mean difference % | --- | --- | --- | 6.2 | --- | --- | 9.0 |
Columns pair-wise a1 and pair-wise a2 are the coefficients of the standard deviation characteristic function derived from the consecutive sampling. Columns individual genes a1 and a2 show the values derived from the individual probe sets and difference is the difference in % between the two methods.
Figure 5Standard deviation of the Focus arrays, arrays 01 to 09. Standard deviations are calculated from the individual probe sets of nine samples. The solid curve represents the standard deviation function derived from the consecutive sampling. The regression curve corresponding to logarithm of the linear standard deviation function fitted to logarithm of the experimental standard deviation (not shown) overlaps the consecutive sampling approximation; the coefficients obtained from consecutive sampling are a1 = 2.92√2 and a2 = 0.153√2 and the regression coefficients obtained from individual probe sets are a1 = 2.87 and a2 = 0.154 (data Modlich, Focus 1).
Figure 6Correlation of the . Figure shows the values of Kcoefficient correlated with the corresponding values of the t-distribution in the range of probabilities from 0.5 to 0.995. The adjusted R2 coefficient is 0.99993, intercept is 0.039 and the coefficient of proportionality is 0.855. The degree of freedom for the t-distribution is 6.
Figure 7Comparison of the . Kvalues correspond to probabilities from 0.5 to 0.995. The degree of freedom for the inverse t-distribution (solid lines) is 6 and 12.
Overview of the GeneChip types
| GeneChip | Feature size | Probe pairs | TF | No. of labs. | No. of arrays | Ka 0.95 | Ka 0.99 | Ka 0.995 | |||
| avg | SD | avg | SD | avg | SD | ||||||
| HuGeneFL | 24 | 20 | 44 | 2 | 34 | 2.13 | 0.10 | 3.45 | 0.09 | 4.11 | 0.05 |
| HG-U95Av2 | 20 | 16 | 36 | 4 | 77 | 2.09 | 0.09 | 3.04 | 0.08 | 3.48 | 0.16 |
| HG-U95B to E | 20 | 16 | 36 | 2 | 28 | 2.16 | 0.14 | 3.24 | 0.31 | 3.76 | 0.37 |
| HG-U133A 2.0 | 11 | 11 | 22 | 6 | 91 | 2.11 | 0.05 | 3.10 | 0.15 | 3.56 | 0.18 |
| HG-U133 Plus 2 | 11 | 11 | 22 | 2 | 28 | 2.12 | --- | 3.03 | --- | 3.44 | --- |
| HG-Focus | 18 | 11 | 29 | 1 | 34 | 2.18 | 0.03 | 3.11 | 0.04 | 3.52 | 0.07 |
| MG-Mu11kSubA, SubB | 24 | 20 | 44 | 2 | 80 | 2.22 | 0.04 | 3.73 | 0.16 | 4.52 | 0.20 |
| Mu19kSubA, B, C | 24 | 20 | 44 | 1 | 12 | 2.26 | --- | 3.95 | --- | 4.56 | --- |
| MG-U74Av2 | 20 | 16 | 36 | 6 | 75 | 2.17 | 0.04 | 3.36 | 0.20 | 3.97 | 0.31 |
| MG-U430A | 11 | 11 | 22 | 1 | 20 | 2.07 | 0.03 | 2.95 | 0.03 | 3.34 | 0.03 |
| RG-U34A | 24 | 16 | 40 | 2 | 45 | 2.16 | 0.05 | 3.24 | 0.15 | 3.78 | 0.17 |
| RT-U34 Neurobiology | 24 | 16 | 40 | 1 | 40 | 2.08 | --- | 3.12 | --- | 3.38 | --- |
| Drosophila | 20 | 14 | 34 | 1 | 6 | 2.17 | --- | 3.21 | --- | 3.67 | --- |
| E. Coli | 24 | 15 | 39 | 2 | 53 | 2.19 | --- | 3.31 | --- | 3.78 | --- |
| ATH1 | 18 | 11 | 29 | 4 | 64 | 2.09 | 0.02 | 3.03 | 0.11 | 3.46 | 0.19 |
| Arabidopis [s2] | 24 | 16 | 40 | 1 | 7 | 2.15 | --- | 2.89 | --- | 3.18 | --- |
The first two columns of data show the feature size and number of the probe pairs per probe set. TF is the technical factor defined as the sum of feature size and probe pairs. No. of lab gives the number of different laboratories, where the data were generated. No. of arrays gives the number of arrays per the GeneChip type. The last three columns give the mean Kvalues at the probability 0.95, 0.99 and 0.995.
Figure 8Average . Correlation of the Kcoefficients with the sum of the feature size and number of probe pairs; bars show the standard deviation for the interval 0.995.
Summary of the results of consistency tests
| Coincidence | RMA | |||
| Above or Below | above | below | above | above |
| Mean of 4-sample test | 58.2 | 72.0 | 29.4 | 40.4 |
| Common to 2 sets (mean) | 13.5 | 20.5 | 22.1 | 32.9 |
| SD | 2.3 | 2.8 | 3.4 | 6.0 |
| Ratio % | 23.2 | 28.5 | 75.2 | 81.4 |
| Coincidence, interval 0.9 | Coincidence, interval 0.8 | |||
| Mean of 3-samples test (7 of 9) | 12.3 | 17.5 | 11.0 | |
| Common to 2 sets (average) | 10.2 | 16.7 | 5.3 | |
| Ratio (%) | 83.0 | 95.2 | 48.2 | |
a) The t-test, coincidence test and RMA on MG-U75Av2 array (five samples; data Ref. [15]). The data were subject to one-tail t-test at the level 0.01, coincidence test and RMA. The coincidence and RMA tests were not carried out for the cases below the interval, since the numbers of occurrences were too small. The means of positive cases in five four-sample tests are given. The means of genes common to any two trials are shown. Ratio of the means is given in percent. b) The t-test and coincidence test, Illumina (four samples; data Ref. [8]). The second and third column list the number of genes identified by the coincidence method for the interval 0.9 and 0.8, respectively. The last column shows the numbers of genes that satisfied the t-test. The first and second rows of data give the mean number of genes that passed three-sample sets and the mean of the genes passing concurrently in two particular tests, respectively.