Literature DB >> 30753304

Are clusterings of multiple data views independent?

Lucy L Gao1, Jacob Bien2, Daniela Witten3.   

Abstract

In the Pioneer 100 (P100) Wellness Project, multiple types of data are collected on a single set of healthy participants at multiple timepoints in order to characterize and optimize wellness. One way to do this is to identify clusters, or subgroups, among the participants, and then to tailor personalized health recommendations to each subgroup. It is tempting to cluster the participants using all of the data types and timepoints, in order to fully exploit the available information. However, clustering the participants based on multiple data views implicitly assumes that a single underlying clustering of the participants is shared across all data views. If this assumption does not hold, then clustering the participants using multiple data views may lead to spurious results. In this article, we seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop a new test for answering this question, which we then apply to clinical, proteomic, and metabolomic data, across two distinct timepoints, from the P100 study. We find that while the subgroups of the participants defined with respect to any single data type seem to be dependent across time, the clustering among the participants based on one data type (e.g. proteomic data) appears not to be associated with the clustering based on another data type (e.g. clinical data).
© The Author 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  Data integration; Hypothesis testing; Model-based clustering; Multiple-view data

Year:  2020        PMID: 30753304     DOI: 10.1093/biostatistics/kxz001

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  4 in total

1.  Two-stage linked component analysis for joint decomposition of multiple biologically related data sets.

Authors:  Huan Chen; Brian Caffo; Genevieve Stein-O'Brien; Jinrui Liu; Ben Langmead; Carlo Colantuoni; Luo Xiao
Journal:  Biostatistics       Date:  2022-10-14       Impact factor: 5.279

2.  Multi-Omic Biological Age Estimation and Its Correlation With Wellness and Disease Phenotypes: A Longitudinal Study of 3,558 Individuals.

Authors:  John C Earls; Noa Rappaport; Laura Heath; Tomasz Wilmanski; Andrew T Magis; Nicholas J Schork; Gilbert S Omenn; Jennifer Lovejoy; Leroy Hood; Nathan D Price
Journal:  J Gerontol A Biol Sci Med Sci       Date:  2019-11-13       Impact factor: 6.053

3.  Testing for association in multiview network data.

Authors:  Lucy L Gao; Daniela Witten; Jacob Bien
Journal:  Biometrics       Date:  2021-04-12       Impact factor: 1.701

4.  Deep multiview learning to identify imaging-driven subtypes in mild cognitive impairment.

Authors:  Yixue Feng; Mansu Kim; Xiaohui Yao; Kefei Liu; Qi Long; Li Shen
Journal:  BMC Bioinformatics       Date:  2022-09-29       Impact factor: 3.307

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.