Prabhakar Chalise1, Rama Raghavan2, Brooke L Fridley3. 1. Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, United States. Electronic address: pchalise@kumc.edu. 2. Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, United States. Electronic address: rraghavan@kumc.edu. 3. Department of Biostatistics, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, United States. Electronic address: bfridley@kumc.edu.
Abstract
BACKGROUND AND OBJECTIVE: Integrative approaches for the study of biological systems have gained popularity in the realm of statistical genomics. For example, The Cancer Genome Atlas (TCGA) has applied integrative clustering methodologies to various cancer types to determine molecular subtypes within a given cancer histology. In order to adequately compare integrative or "systems-biology"-type methods, realistic and related datasets are needed to assess the methods. This involves simulating multiple types of 'omic data with realistic correlation between features of the same type (e.g., gene expression for genes in a pathway) and across data types (e.g., "gene silencing" involving DNA methylation and gene expression). METHODS: We present the software application tool InterSIM for simulating multiple interrelated data types with realistic intra- and inter-relationships based on the DNA methylation, mRNA gene expression, and protein expression from the TCGA ovarian cancer study. RESULTS: The resulting simulated datasets can be used to assess and compare the operating characteristics of newly developed integrative bioinformatics methods to existing methods. Application of InterSIM is presented with an example of heatmaps of the simulated datasets. CONCLUSIONS: InterSIM allows researchers to evaluate and test new integrative methods with realistically simulated interrelated genomic datasets. The software tool InterSIM is implemented in R and is freely available from CRAN.
BACKGROUND AND OBJECTIVE: Integrative approaches for the study of biological systems have gained popularity in the realm of statistical genomics. For example, The Cancer Genome Atlas (TCGA) has applied integrative clustering methodologies to various cancer types to determine molecular subtypes within a given cancer histology. In order to adequately compare integrative or "systems-biology"-type methods, realistic and related datasets are needed to assess the methods. This involves simulating multiple types of 'omic data with realistic correlation between features of the same type (e.g., gene expression for genes in a pathway) and across data types (e.g., "gene silencing" involving DNA methylation and gene expression). METHODS: We present the software application tool InterSIM for simulating multiple interrelated data types with realistic intra- and inter-relationships based on the DNA methylation, mRNA gene expression, and protein expression from the TCGA ovarian cancer study. RESULTS: The resulting simulated datasets can be used to assess and compare the operating characteristics of newly developed integrative bioinformatics methods to existing methods. Application of InterSIM is presented with an example of heatmaps of the simulated datasets. CONCLUSIONS: InterSIM allows researchers to evaluate and test new integrative methods with realistically simulated interrelated genomic datasets. The software tool InterSIM is implemented in R and is freely available from CRAN.
Authors: T Sørlie; C M Perou; R Tibshirani; T Aas; S Geisler; H Johnsen; T Hastie; M B Eisen; M van de Rijn; S S Jeffrey; T Thorsen; H Quist; J C Matese; P O Brown; D Botstein; P E Lønning; A L Børresen-Dale Journal: Proc Natl Acad Sci U S A Date: 2001-09-11 Impact factor: 11.205
Authors: Roel G W Verhaak; Katherine A Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D Wilkerson; C Ryan Miller; Li Ding; Todd Golub; Jill P Mesirov; Gabriele Alexe; Michael Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S Feiler; J Graeme Hodgson; C David James; Jann N Sarkaria; Cameron Brennan; Ari Kahn; Paul T Spellman; Richard K Wilson; Terence P Speed; Joe W Gray; Matthew Meyerson; Gad Getz; Charles M Perou; D Neil Hayes Journal: Cancer Cell Date: 2010-01-19 Impact factor: 31.743
Authors: Marcus Gry; Rebecca Rimini; Sara Strömberg; Anna Asplund; Fredrik Pontén; Mathias Uhlén; Peter Nilsson Journal: BMC Genomics Date: 2009-08-07 Impact factor: 3.969