Sheila M Gaynor1,2, Ryan Sun1, Xihong Lin1, John Quackenbush1,2. 1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA. 2. Department of Biostatistics and Computational Biology and Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
Abstract
MOTIVATION: Cancer genomics studies frequently aim to identify genes that are differentially expressed between clinically distinct patient subgroups, generally by testing single genes one at a time. However, the results of any individual transcriptomic study are often not fully reproducible. A particular challenge impeding statistical analysis is the difficulty of distinguishing between differential expression comprising part of the genomic disease etiology and that induced by downstream effects. More robust analytical approaches that are well-powered to detect potentially causative genes, are less prone to discovering spurious associations, and can deliver reproducible findings across different studies are needed. RESULTS: We propose a set-based procedure for testing of differential expression and show that this set-based approach can produce more robust results by aggregating information across multiple, correlated genomic markers. Specifically, we adapt the Generalized Berk-Jones statistic to test for the transcription factors that may contribute to the progression of estrogen receptor positive breast cancer. We demonstrate the ability of our method to produce reproducible findings by applying the same analysis to 21 publicly available datasets, producing a similar list of significant transcription factors across most studies. Our Generalized Berk-Jones approach produces results that show improved consistency over three set-based testing algorithms: Generalized Higher Criticism, Gene Set Analysis and Gene Set Enrichment Analysis. AVAILABILITY AND IMPLEMENTATION: Data are in the MetaGxBreast R package. Code is available at github.com/ryanrsun/gaynor_sun_GBJ_breast_cancer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Cancer genomics studies frequently aim to identify genes that are differentially expressed between clinically distinct patient subgroups, generally by testing single genes one at a time. However, the results of any individual transcriptomic study are often not fully reproducible. A particular challenge impeding statistical analysis is the difficulty of distinguishing between differential expression comprising part of the genomic disease etiology and that induced by downstream effects. More robust analytical approaches that are well-powered to detect potentially causative genes, are less prone to discovering spurious associations, and can deliver reproducible findings across different studies are needed. RESULTS: We propose a set-based procedure for testing of differential expression and show that this set-based approach can produce more robust results by aggregating information across multiple, correlated genomic markers. Specifically, we adapt the Generalized Berk-Jones statistic to test for the transcription factors that may contribute to the progression of estrogen receptor positive breast cancer. We demonstrate the ability of our method to produce reproducible findings by applying the same analysis to 21 publicly available datasets, producing a similar list of significant transcription factors across most studies. Our Generalized Berk-Jones approach produces results that show improved consistency over three set-based testing algorithms: Generalized Higher Criticism, Gene Set Analysis and Gene Set Enrichment Analysis. AVAILABILITY AND IMPLEMENTATION: Data are in the MetaGxBreast R package. Code is available at github.com/ryanrsun/gaynor_sun_GBJ_breast_cancer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend Journal: Nature Date: 2002-01-31 Impact factor: 49.962
Authors: Christos Sotiriou; Soek-Ying Neo; Lisa M McShane; Edward L Korn; Philip M Long; Amir Jazaeri; Philippe Martiat; Steve B Fox; Adrian L Harris; Edison T Liu Journal: Proc Natl Acad Sci U S A Date: 2003-08-13 Impact factor: 11.205
Authors: Emad A Rakha; Jorge S Reis-Filho; Frederick Baehner; David J Dabbs; Thomas Decker; Vincenzo Eusebi; Stephen B Fox; Shu Ichihara; Jocelyne Jacquemier; Sunil R Lakhani; José Palacios; Andrea L Richardson; Stuart J Schnitt; Fernando C Schmitt; Puay-Hoon Tan; Gary M Tse; Sunil Badve; Ian O Ellis Journal: Breast Cancer Res Date: 2010-07-30 Impact factor: 6.466
Authors: Louisa Helms; Silvia Marchiano; Ian B Stanaway; Tien-Ying Hsiang; Benjamin A Juliar; Shally Saini; Yan Ting Zhao; Akshita Khanna; Rajasree Menon; Fadhl Alakwaa; Carmen Mikacenic; Eric D Morrell; Mark M Wurfel; Matthias Kretzler; Jennifer L Harder; Charles E Murry; Jonathan Himmelfarb; Hannele Ruohola-Baker; Pavan K Bhatraju; Michael Gale; Benjamin S Freedman Journal: JCI Insight Date: 2021-12-22