Eric F Lock1, David B Dunson. 1. Department of Statistical Science, Duke University, Durham, NC 27708, USA and Center for Human Genetics, Duke University Medical Center, Durham, NC 27710, USA.
Abstract
MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.
MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.
Authors: Qianxing Mo; Sijian Wang; Venkatraman E Seshan; Adam B Olshen; Nikolaus Schultz; Chris Sander; R Scott Powers; Marc Ladanyi; Ronglai Shen Journal: Proc Natl Acad Sci U S A Date: 2013-02-21 Impact factor: 11.205
Authors: Richard S Savage; Zoubin Ghahramani; Jim E Griffin; Bernard J de la Cruz; David L Wild Journal: Bioinformatics Date: 2010-06-15 Impact factor: 6.937
Authors: Simon Rogers; Mark Girolami; Walter Kolch; Katrina M Waters; Tao Liu; Brian Thrall; H Steven Wiley Journal: Bioinformatics Date: 2008-10-30 Impact factor: 6.937
Authors: Christina Curtis; Sohrab P Shah; Suet-Feung Chin; Gulisa Turashvili; Oscar M Rueda; Mark J Dunning; Doug Speed; Andy G Lynch; Shamith Samarajiwa; Yinyin Yuan; Stefan Gräf; Gavin Ha; Gholamreza Haffari; Ali Bashashati; Roslin Russell; Steven McKinney; Anita Langerød; Andrew Green; Elena Provenzano; Gordon Wishart; Sarah Pinder; Peter Watson; Florian Markowetz; Leigh Murphy; Ian Ellis; Arnie Purushotham; Anne-Lise Børresen-Dale; James D Brenton; Simon Tavaré; Carlos Caldas; Samuel Aparicio Journal: Nature Date: 2012-04-18 Impact factor: 49.962
Authors: Göran Jönsson; Johan Staaf; Johan Vallon-Christersson; Markus Ringnér; Karolina Holm; Cecilia Hegardt; Haukur Gunnarsson; Rainer Fagerholm; Carina Strand; Bjarni A Agnarsson; Outi Kilpivaara; Lena Luts; Päivi Heikkilä; Kristiina Aittomäki; Carl Blomqvist; Niklas Loman; Per Malmström; Håkan Olsson; Oskar Th Johannsson; Adalgeir Arason; Heli Nevanlinna; Rosa B Barkardottir; Ake Borg Journal: Breast Cancer Res Date: 2010-06-24 Impact factor: 6.466
Authors: Marylyn D Ritchie; Emily R Holzinger; Ruowang Li; Sarah A Pendergrass; Dokyoon Kim Journal: Nat Rev Genet Date: 2015-01-13 Impact factor: 53.242
Authors: Anita Sathyanarayanan; Rohit Gupta; Erik W Thompson; Dale R Nyholt; Denis C Bauer; Shivashankar H Nagaraj Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622