Rudolph S Parrish1, Horace J Spencer. 1. Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 42092, USA. rudy.parrish@louisville.edu
Abstract
MOTIVATION: Normalization techniques are used to reduce variation among gene expression measurements in oligonucleotide microarrays in an effort to improve the quality of the data and the power of significance tests for detecting differential expression. Of several such proposed methods, two that have commonly been employed include median-interquartile range normalization and quantile normalization. The median-IQR method applied directly to fold-changes for paired data also was considered. Two methods for calculating gene expression values include the MAS 5.0 algorithm [Affymetrix. (2002). Statistical Algorithms Description Document. Santa Clara, CA: Affymetrix, Inc. http://www.affymetrix.com/support/technical/whitepapers/sadd-whitepaper.pdf] and the RMA method [Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., Speed, T. P. (2003a). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4,e15); Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., Speed, T. P. (2003b). Exploration, normalization, and summaries of high density oligonucleotide array probe-level data. Biostatistics 4(2):249-264; Irizarry, R. A., Gautier, L., Cope, L. (2003c). An R package for analysis of Affymetrix oligonucleotide arrays. In: Parmigiani, R. I. G., Garrett, E. S., Ziegler, S., eds. The Analysis of Gene Expression Data: Methods and Software. Berlin: Springer, pp. 102-119]. RESULTS: In considering these methods applied to a prostate cancer data set derived from paired samples on normal and tumor tissue, it is shown that normalization methods may lead to substantial inflation of the number of genes identified by paired-t significance tests even after adjustment for multiple testing. This is shown to be due primarily to an unintended effect that normalization has on the experimental error variance. The impact appears to be greater in the RMA method compared to the MAS 5.0 algorithm and for quantile normalization compared to median-IQR normalization.
MOTIVATION: Normalization techniques are used to reduce variation among gene expression measurements in oligonucleotide microarrays in an effort to improve the quality of the data and the power of significance tests for detecting differential expression. Of several such proposed methods, two that have commonly been employed include median-interquartile range normalization and quantile normalization. The median-IQR method applied directly to fold-changes for paired data also was considered. Two methods for calculating gene expression values include the MAS 5.0 algorithm [Affymetrix. (2002). Statistical Algorithms Description Document. Santa Clara, CA: Affymetrix, Inc. http://www.affymetrix.com/support/technical/whitepapers/sadd-whitepaper.pdf] and the RMA method [Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., Speed, T. P. (2003a). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4,e15); Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., Speed, T. P. (2003b). Exploration, normalization, and summaries of high density oligonucleotide array probe-level data. Biostatistics 4(2):249-264; Irizarry, R. A., Gautier, L., Cope, L. (2003c). An R package for analysis of Affymetrix oligonucleotide arrays. In: Parmigiani, R. I. G., Garrett, E. S., Ziegler, S., eds. The Analysis of Gene Expression Data: Methods and Software. Berlin: Springer, pp. 102-119]. RESULTS: In considering these methods applied to a prostate cancer data set derived from paired samples on normal and tumor tissue, it is shown that normalization methods may lead to substantial inflation of the number of genes identified by paired-t significance tests even after adjustment for multiple testing. This is shown to be due primarily to an unintended effect that normalization has on the experimental error variance. The impact appears to be greater in the RMA method compared to the MAS 5.0 algorithm and for quantile normalization compared to median-IQR normalization.
Authors: Richard Shippy; Stephanie Fulmer-Smentek; Roderick V Jensen; Wendell D Jones; Paul K Wolber; Charles D Johnson; P Scott Pine; Cecilie Boysen; Xu Guo; Eugene Chudin; Yongming Andrew Sun; James C Willey; Jean Thierry-Mieg; Danielle Thierry-Mieg; Robert A Setterquist; Mike Wilson; Anne Bergstrom Lucas; Natalia Novoradovskaya; Adam Papallo; Yaron Turpaz; Shawn C Baker; Janet A Warrington; Leming Shi; Damir Herman Journal: Nat Biotechnol Date: 2006-09 Impact factor: 54.908