MOTIVATION: A common difficulty in large-scale microarray studies is the presence of confounding factors, which may significantly skew estimates of statistical significance, cause unreliable feature selection and high false negative rates. To deal with these difficulties, an algorithmic framework known as Surrogate Variable Analysis (SVA) was recently proposed. RESULTS: Based on the notion that data can be viewed as an interference pattern, reflecting the superposition of independent effects and random noise, we present a modified SVA, called Independent Surrogate Variable Analysis (ISVA), to identify features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding. AVAILABILITY: An R-package isva is available from www.cran.r-project.org.
MOTIVATION: A common difficulty in large-scale microarray studies is the presence of confounding factors, which may significantly skew estimates of statistical significance, cause unreliable feature selection and high false negative rates. To deal with these difficulties, an algorithmic framework known as Surrogate Variable Analysis (SVA) was recently proposed. RESULTS: Based on the notion that data can be viewed as an interference pattern, reflecting the superposition of independent effects and random noise, we present a modified SVA, called Independent Surrogate Variable Analysis (ISVA), to identify features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding. AVAILABILITY: An R-package isva is available from www.cran.r-project.org.
Authors: Andrew E Jaffe; Peter Murakami; Hwajin Lee; Jeffrey T Leek; M Daniele Fallin; Andrew P Feinberg; Rafael A Irizarry Journal: Int J Epidemiol Date: 2012-02 Impact factor: 7.196
Authors: Xuezheng Sun; Patricia Casbas-Hernandez; Carol Bigelow; Liza Makowski; D Joseph Jerry; Sallie Smith Schneider; Melissa A Troester Journal: Breast Cancer Res Treat Date: 2011-10-15 Impact factor: 4.872
Authors: Tiffany J Morris; Lee M Butcher; Andrew Feber; Andrew E Teschendorff; Ankur R Chakravarthy; Tomasz K Wojdacz; Stephan Beck Journal: Bioinformatics Date: 2013-12-12 Impact factor: 6.937
Authors: Devin C Koestler; Brock Christensen; Margaret R Karagas; Carmen J Marsit; Scott M Langevin; Karl T Kelsey; John K Wiencke; E Andres Houseman Journal: Epigenetics Date: 2013-06-25 Impact factor: 4.528
Authors: Karin B Michels; Alexandra M Binder; Sarah Dedeurwaerder; Charles B Epstein; John M Greally; Ivo Gut; E Andres Houseman; Benedetta Izzi; Karl T Kelsey; Alexander Meissner; Aleksandar Milosavljevic; Kimberly D Siegmund; Christoph Bock; Rafael A Irizarry Journal: Nat Methods Date: 2013-10 Impact factor: 28.547
Authors: Ina Zaimi; Dong Pei; Devin C Koestler; Carmen J Marsit; Immaculata De Vivo; Shelley S Tworoger; Alexandra E Shields; Karl T Kelsey; Dominique S Michaud Journal: Epigenetics Date: 2018-10-21 Impact factor: 4.528
Authors: Brian Z Huang; Alexandra M Binder; Catherine A Sugar; Chun R Chao; Veronica Wendy Setiawan; Zuo-Feng Zhang Journal: Epigenomics Date: 2020-09-01 Impact factor: 4.778
Authors: Xiaobo Zhou; Weiliang Qiu; J Fah Sathirapongsasuti; Michael H Cho; John D Mancini; Taotao Lao; Derek M Thibault; Augusto A Litonjua; Per S Bakke; Amund Gulsvik; David A Lomas; Terri H Beaty; Craig P Hersh; Christopher Anderson; Ute Geigenmuller; Benjamin A Raby; Stephen I Rennard; Mark A Perrella; Augustine M K Choi; John Quackenbush; Edwin K Silverman Journal: Genomics Date: 2013-03-01 Impact factor: 5.736