Yudi Pawitan1, Stefano Calza, Alexander Ploner. 1. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. yudi.pawitan@ki.se
Abstract
MOTIVATION: Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations. METHODS: We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model. RESULTS: For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data. AVAILABILITY: R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.
MOTIVATION: Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations. METHODS: We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model. RESULTS: For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data. AVAILABILITY: R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.
Authors: Paola Sebastiani; Nadia Timofeev; Daniel A Dworkis; Thomas T Perls; Martin H Steinberg Journal: Am J Hematol Date: 2009-08 Impact factor: 10.047
Authors: Woojoo Lee; Andrey Alexeyenko; Maria Pernemalm; Justine Guegan; Philippe Dessen; Vladimir Lazar; Janne Lehtiö; Yudi Pawitan Journal: Biomed Res Int Date: 2015-08-03 Impact factor: 3.411