Hao Ding1, Michael Sharpnack2, Chao Wang1,2, Kun Huang1,2, Raghu Machiraju1,2. 1. Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA. 2. Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
Abstract
MOTIVATION: Technologies that generate high-throughput omics data are flourishing, creating enormous, publicly available repositories of multi-omics data. As many data repositories continue to grow, there is an urgent need for computational methods that can leverage these data to create comprehensive clusters of patients with a given disease. RESULTS: Our proposed approach creates a patient-to-patient similarity graph for each data type as an intermediate representation of each omics data type and merges the graphs through subspace analysis on a Grassmann manifold. We hypothesize that this approach generates more informative clusters by preserving the complementary information from each level of omics data. We applied our approach to The Cancer Genome Atlas (TCGA) breast cancer dataset and show that by integrating gene expression, microRNA and DNA methylation data, our proposed method can produce clinically useful subtypes of breast cancer. We then investigate the molecular characteristics underlying these subtypes. We discover a highly expressed cluster of genes on chromosome 19p13 that strongly correlates with survival in TCGA breast cancer patients and validate these results in three additional breast cancer datasets. We also compare our approach with previous integrative clustering approaches and obtain comparable or superior results. AVAILABILITY AND IMPLEMENTATION: https://github.com/michaelsharpnack/GrassmannCluster. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Technologies that generate high-throughput omics data are flourishing, creating enormous, publicly available repositories of multi-omics data. As many data repositories continue to grow, there is an urgent need for computational methods that can leverage these data to create comprehensive clusters of patients with a given disease. RESULTS: Our proposed approach creates a patient-to-patient similarity graph for each data type as an intermediate representation of each omics data type and merges the graphs through subspace analysis on a Grassmann manifold. We hypothesize that this approach generates more informative clusters by preserving the complementary information from each level of omics data. We applied our approach to The Cancer Genome Atlas (TCGA) breast cancer dataset and show that by integrating gene expression, microRNA and DNA methylation data, our proposed method can produce clinically useful subtypes of breast cancer. We then investigate the molecular characteristics underlying these subtypes. We discover a highly expressed cluster of genes on chromosome 19p13 that strongly correlates with survival in TCGA breast cancerpatients and validate these results in three additional breast cancer datasets. We also compare our approach with previous integrative clustering approaches and obtain comparable or superior results. AVAILABILITY AND IMPLEMENTATION: https://github.com/michaelsharpnack/GrassmannCluster. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend Journal: Nature Date: 2002-01-31 Impact factor: 49.962
Authors: Andrea H Bild; Guang Yao; Jeffrey T Chang; Quanli Wang; Anil Potti; Dawn Chasse; Mary-Beth Joshi; David Harpole; Johnathan M Lancaster; Andrew Berchuck; John A Olson; Jeffrey R Marks; Holly K Dressman; Mike West; Joseph R Nevins Journal: Nature Date: 2005-11-06 Impact factor: 49.962
Authors: C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein Journal: Nature Date: 2000-08-17 Impact factor: 49.962
Authors: Roel G W Verhaak; Katherine A Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D Wilkerson; C Ryan Miller; Li Ding; Todd Golub; Jill P Mesirov; Gabriele Alexe; Michael Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S Feiler; J Graeme Hodgson; C David James; Jann N Sarkaria; Cameron Brennan; Ari Kahn; Paul T Spellman; Richard K Wilson; Terence P Speed; Joe W Gray; Matthew Meyerson; Gad Getz; Charles M Perou; D Neil Hayes Journal: Cancer Cell Date: 2010-01-19 Impact factor: 31.743
Authors: Stephanie Hammerich-Hille; Valerie J Bardout; Susan G Hilsenbeck; C Kent Osborne; Steffi Oesterreich Journal: Breast Cancer Res Treat Date: 2009-01-10 Impact factor: 4.872
Authors: Yudi Pawitan; Judith Bjöhle; Lukas Amler; Anna-Lena Borg; Suzanne Egyhazi; Per Hall; Xia Han; Lars Holmberg; Fei Huang; Sigrid Klaar; Edison T Liu; Lance Miller; Hans Nordgren; Alexander Ploner; Kerstin Sandelin; Peter M Shaw; Johanna Smeds; Lambert Skoog; Sara Wedrén; Jonas Bergh Journal: Breast Cancer Res Date: 2005-10-03 Impact factor: 6.466
Authors: Cherie Blenkiron; Leonard D Goldstein; Natalie P Thorne; Inmaculada Spiteri; Suet-Feung Chin; Mark J Dunning; Nuno L Barbosa-Morais; Andrew E Teschendorff; Andrew R Green; Ian O Ellis; Simon Tavaré; Carlos Caldas; Eric A Miska Journal: Genome Biol Date: 2007 Impact factor: 13.583
Authors: Rebecca A Ward; Nima Aghaeepour; Roby P Bhattacharyya; Clary B Clish; Brice Gaudillière; Nir Hacohen; Michael K Mansour; Philip A Mudd; Shravani Pasupneti; Rachel M Presti; Eugene P Rhee; Pritha Sen; Andrej Spec; Jenny M Tam; Alexandra-Chloé Villani; Ann E Woolley; Joe L Hsu; Jatin M Vyas Journal: Open Forum Infect Dis Date: 2021-09-25 Impact factor: 3.835
Authors: Hamas A Al-Kuhali; Ma Shan; Mohanned Abduljabbar Hael; Eman A Al-Hada; Shamsan A Al-Murisi; Ahmed A Al-Kuhali; Ammar A Q Aldaifl; Mohammed Elmustafa Amin Journal: BMC Bioinformatics Date: 2022-07-21 Impact factor: 3.307