| Literature DB >> 18199333 |
Huiling Xiong1, Dapeng Zhang, Christopher J Martyniuk, Vance L Trudeau, Xuhua Xia.
Abstract
BACKGROUND: Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.Entities:
Mesh:
Year: 2008 PMID: 18199333 PMCID: PMC2275243 DOI: 10.1186/1471-2105-9-25
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A geometric transformation of microarray M-A plots in GPA normalization. (a) shows how the M-A plot for one slide transformed during each GPA transformation procedure. The blue points represent raw data; pink points represent reference slide; red, green and purple points represent data points after translation, rotation and scaling, respectively; (b) shows how the M-A plots for four slides represented by four colours (blue, red, pink, green) transformed after each GPA transformation procedure. The SIMAGE method was used to simulate the microarray data set used here, which includes 50 slides with 10% differentially expressed genes and ratio of up-regulated to down-regulated genes is 1:1.
Figure 2Mean of replicate variability for the (a) swirl zebrafish data set and (b) HCT116 data set. Larger value indicates a higher variability across slides. The reference line indicates the variability value for the GPA method.
Figure 3Mean of K-S statistic between pairs of slides for the (a) swirl zebrafish data set and (b) HCT116 data set. The reference line indicates the K-S value for the GPA method.
Comparison among different normalization methods based on simulated normal microarray data with the Balagurunathan's method. The data sets are simulated without or with dye bias and include 1000 genes in 10 slides with 5% differentially expressed genes. The ratio of up-regulated to down-regulated genes is 1:1. The additional file 3 contains the full data sets.
| Raw | 0.1182 | 0.004627 | 0.1231 | 0.004586 |
| Global | 0.1179 | 0.004732 | 0.1223 | 0.00473 |
| Lowess | 0.1143 | 0.005368 | 0.1182 | 0.004976 |
| Scale | 0.1113 | 0.004666 | 0.1201 | 0.004515 |
| Quantile | 0.1149 | 0.005226 | 0.1212 | 0.004716 |
| VSN | 0.06294 | 0.002581 | 0.06314 | 0.002541 |
| GPA | 0.02122 | 0.00117 | 0.02069 | 0.00119 |
| Global+Scale | 0.1117 | 0.004579 | 0.1191 | 0.004595 |
| Global+Quantile | 0.1149 | 0.005226 | 0.1212 | 0.004716 |
| Global+GPA | 0.02122 | 0.001174 | 0.02068 | 0.001156 |
| Lowess+Scale | 0.1088 | 0.005161 | 0.1153 | 0.005067 |
| Lowess+Quantile | 0.1133 | 0.005085 | 0.1198 | 0.004812 |
| Lowess+GPA | 0.02215 | 0.001279 | 0.02205 | 0.00134 |
(1) ν and β : the median of variance and bias, respectively, of MSE.
Comparison among different normalization methods based on simulated normal microarray data with the SIMAGE method. The data sets include 1000 genes in 50 slides with 5, 10, 30% differentially expressed genes and the ratio of up-regulated to down-regulated genes is 1:1.
| Raw | 1.677 | 0.06525 | 1.678 | 0.08937 | 1.773 | 0.1615 |
| Global | 0.6751 | 0.06319 | 0.4611 | 0.08158 | 0.5158 | 0.1697 |
| Loess | 0.2089 | 0.05962 | 0.1863 | 0.07056 | 0.1891 | 0.1642 |
| Scale | 1.273 | 0.06859 | 1.145 | 0.09515 | 1.368 | 0.1921 |
| Quantile | 0.2546 | 0.06579 | 0.2085 | 0.07702 | 0.2167 | 0.1825 |
| VSN | 0.2331 | 0.05441 | 0.177 | 0.06695 | 0.2014 | 0.1573 |
| GPA | 0.08605 | 0.04569 | 0.09839 | 0.06639 | 0.1063 | 0.1328 |
| Global+Scale | 0.5449 | 0.06481 | 0.3928 | 0.08096 | 0.4448 | 0.1851 |
| Global+Quantile | 0.08674 | 0.04529 | 0.2085 | 0.07702 | 0.2167 | 0.1825 |
| Global+GPA | 0.2546 | 0.06579 | 0.09967 | 0.06784 | 0.107 | 0.13 |
| Loess+Scale | 0.1846 | 0.06 | 0.171 | 0.06968 | 0.1788 | 0.161 |
| Loess+Quantile | 0.2055 | 0.06124 | 0.1852 | 0.07132 | 0.1895 | 0.1638 |
| Loess+GPA | 0.1324 | 0.0536 | 0.1221 | 0.06078 | 0.1291 | 0.1468 |
(1) Percentage of differentially expressed genes
(2) ν and β: the median of variance and bias, respectively, of MSE
Variance and K-S values for mouse apoptosis boutique array after GPA and housekeeping gene normalizations.
| Raw | 0.8760382 | 0.700391 |
| Housekeeping gene | 0.3053415 | 0.620606 |
| GPA | 0.175378 | 0.268652 |
Comparison between the GPA and housekeeping gene normalizations based on simulated boutique array data with the Balagurunathan's method. The data sets are simulated without or with dye bias and include 1000 genes in 10 slides with 60% differentially expressed genes. The ratios of up-regulated to down-regulated genes are 5:5, 7:3, and 9:1, respectively.
| 5:5 | Raw | 0.1147 | 0.008632 | 0.1144 | 0.007297 |
| Housekeeping gene | 0.1133 | 0.008569 | 0.1164 | 0.007326 | |
| GPA | 0.03285 | 0.002833 | 0.03349 | 0.002584 | |
| 7:3 | Raw | 0.1139 | 0.007859 | 0.1186 | 0.007902 |
| Housekeeping gene | 0.1157 | 0.007937 | 0.1195 | 0.007427 | |
| GPA | 0.0288 | 0.00315 | 0.03133 | 0.003334 | |
| 9:1 | Raw | 0.1172 | 0.007659 | 0.1178 | 0.007916 |
| Housekeeping gene | 0.1187 | 0.006898 | 0.1152 | 0.009463 | |
| GPA | 0.02785 | 0.004814 | 0.02802 | 0.005119 | |
(1) Ratio of up- to down-regulated genes
(2) ν and β : the median of variance and bias, respectively, of MSE
Comparison between the GPA and housekeeping gene normalizations based on simulated boutique array data with the SIMAGE method. The data sets include 1000 genes in 50 slides with 60% differentially expressed genes and the ratios of up-regulated to down-regulated genes are 5:5, 7:3, and 9:1, respectively.
| Raw | 1.467 | 0.4234 | 1.178 | 0.4687 | 1.532 | 0.8552 |
| Housekeeping gene | 0.2055 | 0.4642 | 0.1842 | 0.478 | 0.176 | 0.9346 |
| GPA | 0.111 | 0.3402 | 0.1059 | 0.3296 | 0.1194 | 0.5331 |
(1) Ratio of up-regulated to down-regulated genes
(2) ν and β: the median of variance and bias, respectively, of MSE
Figure 4A geometric transformation of microarray M-A plots in GPA normalization on the extreme boutique arrays. The SIMAGE method was used to simulate the boutique array data set, which includes 50 slides with 90% up-regulated genes at 10 fold and 10% down-regulated genes at 2 fold. Four slides represented by four colours (blue, red, pink, green) were randomly selected to show their M-A plots after each GPA transformation procedure.