PURPOSE: Institutional efforts toward the democratization of cloud-scale data and analysis methods for cancer genomics are proceeding rapidly. As part of this effort, we bridge two major bioinformatic initiatives: the Global Alliance for Genomics and Health (GA4GH) and Bioconductor. METHODS: We describe in detail a use case in pancancer transcriptomics conducted by blending implementations of the GA4GH Workflow Execution Services and Tool Registry Service concepts with the Bioconductor curatedTCGAData and BiocOncoTK packages. RESULTS: We carried out the analysis with a formally archived workflow and container at dockstore.org and a workspace and notebook at app.terra.bio. The analysis identified relationships between microsatellite instability and biomarkers of immune dysregulation at a finer level of granularity than previously reported. Our use of standard approaches to containerization and workflow programming allows this analysis to be replicated and extended. CONCLUSION: Experimental use of dockstore.org and app.terra.bio in concert with Bioconductor enabled novel statistical analysis of large genomic projects without the need for local supercomputing resources but involved challenges related to container design, script archiving, and unit testing. Best practices and cost/benefit metrics for the management and analysis of globally federated genomic data and annotation are evolving. The creation and execution of use cases like the one reported here will be helpful in the development and comparison of approaches to federated data/analysis systems in cancer genomics.
PURPOSE: Institutional efforts toward the democratization of cloud-scale data and analysis methods for cancer genomics are proceeding rapidly. As part of this effort, we bridge two major bioinformatic initiatives: the Global Alliance for Genomics and Health (GA4GH) and Bioconductor. METHODS: We describe in detail a use case in pancancer transcriptomics conducted by blending implementations of the GA4GH Workflow Execution Services and Tool Registry Service concepts with the Bioconductor curatedTCGAData and BiocOncoTK packages. RESULTS: We carried out the analysis with a formally archived workflow and container at dockstore.org and a workspace and notebook at app.terra.bio. The analysis identified relationships between microsatellite instability and biomarkers of immune dysregulation at a finer level of granularity than previously reported. Our use of standard approaches to containerization and workflow programming allows this analysis to be replicated and extended. CONCLUSION: Experimental use of dockstore.org and app.terra.bio in concert with Bioconductor enabled novel statistical analysis of large genomic projects without the need for local supercomputing resources but involved challenges related to container design, script archiving, and unit testing. Best practices and cost/benefit metrics for the management and analysis of globally federated genomic data and annotation are evolving. The creation and execution of use cases like the one reported here will be helpful in the development and comparison of approaches to federated data/analysis systems in cancer genomics.
Authors: Li Ding; Matthew H Bailey; Eduard Porta-Pardo; Vesteinn Thorsson; Antonio Colaprico; Denis Bertrand; David L Gibbs; Amila Weerasinghe; Kuan-Lin Huang; Collin Tokheim; Isidro Cortés-Ciriano; Reyka Jayasinghe; Feng Chen; Lihua Yu; Sam Sun; Catharina Olsen; Jaegil Kim; Alison M Taylor; Andrew D Cherniack; Rehan Akbani; Chayaporn Suphavilai; Niranjan Nagarajan; Joshua M Stuart; Gordon B Mills; Matthew A Wyczalkowski; Benjamin G Vincent; Carolyn M Hutter; Jean Claude Zenklusen; Katherine A Hoadley; Michael C Wendl; Llya Shmulevich; Alexander J Lazar; David A Wheeler; Gad Getz Journal: Cell Date: 2018-04-05 Impact factor: 41.582
Authors: Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan Journal: Nat Methods Date: 2015-02 Impact factor: 28.547
Authors: Matthew H Bailey; Collin Tokheim; Eduard Porta-Pardo; Sohini Sengupta; Denis Bertrand; Amila Weerasinghe; Antonio Colaprico; Michael C Wendl; Jaegil Kim; Brendan Reardon; Patrick Kwok-Shing Ng; Kang Jin Jeong; Song Cao; Zixing Wang; Jianjiong Gao; Qingsong Gao; Fang Wang; Eric Minwei Liu; Loris Mularoni; Carlota Rubio-Perez; Niranjan Nagarajan; Isidro Cortés-Ciriano; Daniel Cui Zhou; Wen-Wei Liang; Julian M Hess; Venkata D Yellapantula; David Tamborero; Abel Gonzalez-Perez; Chayaporn Suphavilai; Jia Yu Ko; Ekta Khurana; Peter J Park; Eliezer M Van Allen; Han Liang; Michael S Lawrence; Adam Godzik; Nuria Lopez-Bigas; Josh Stuart; David Wheeler; Gad Getz; Ken Chen; Alexander J Lazar; Gordon B Mills; Rachel Karchin; Li Ding Journal: Cell Date: 2018-04-05 Impact factor: 41.582
Authors: Marcel Ramos; Lucas Schiffer; Angela Re; Rimsha Azhar; Azfar Basunia; Carmen Rodriguez; Tiffany Chan; Phil Chapman; Sean R Davis; David Gomez-Cabrero; Aedin C Culhane; Benjamin Haibe-Kains; Kasper D Hansen; Hanish Kodali; Marie S Louis; Arvind S Mer; Markus Riester; Martin Morgan; Vince Carey; Levi Waldron Journal: Cancer Res Date: 2017-11-01 Impact factor: 12.701
Authors: Andrew J Gentles; Aaron M Newman; Chih Long Liu; Scott V Bratman; Weiguo Feng; Dongkyoon Kim; Viswam S Nair; Yue Xu; Amanda Khuong; Chuong D Hoang; Maximilian Diehn; Robert B West; Sylvia K Plevritis; Ash A Alizadeh Journal: Nat Med Date: 2015-07-20 Impact factor: 53.440
Authors: Beifang Niu; Kai Ye; Qunyuan Zhang; Charles Lu; Mingchao Xie; Michael D McLellan; Michael C Wendl; Li Ding Journal: Bioinformatics Date: 2013-12-25 Impact factor: 6.937
Authors: Brian D O'Connor; Denis Yuen; Vincent Chung; Andrew G Duncan; Xiang Kun Liu; Janice Patricia; Benedict Paten; Lincoln Stein; Vincent Ferretti Journal: F1000Res Date: 2017-01-18