Literature DB >> 34348641

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data.

Audrey Hulot1,2,3, Denis Laloë4, Florence Jaffrézic4.   

Abstract

BACKGROUND: Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations.
RESULTS: To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question.
CONCLUSION: Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.
© 2021. The Author(s).

Entities:  

Keywords:  Clustering; Data integration; MDS; MFA; Network

Mesh:

Substances:

Year:  2021        PMID: 34348641     DOI: 10.1186/s12859-021-04303-4

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  8 in total

1.  Indefinite Proximity Learning: A Review.

Authors:  Frank-Michael Schleif; Peter Tino
Journal:  Neural Comput       Date:  2015-08-27       Impact factor: 2.026

Review 2.  Methods for biological data integration: perspectives and challenges.

Authors:  Vladimir Gligorijević; Nataša Pržulj
Journal:  J R Soc Interface       Date:  2015-11-06       Impact factor: 4.118

3.  The huge Package for High-dimensional Undirected Graph Estimation in R.

Authors:  Tuo Zhao; Han Liu; Kathryn Roeder; John Lafferty; Larry Wasserman
Journal:  J Mach Learn Res       Date:  2012-04       Impact factor: 3.654

4.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.

Authors:  Peter Langfelder; Bin Zhang; Steve Horvath
Journal:  Bioinformatics       Date:  2007-11-16       Impact factor: 6.937

5.  Sparse inverse covariance estimation with the graphical lasso.

Authors:  Jerome Friedman; Trevor Hastie; Robert Tibshirani
Journal:  Biostatistics       Date:  2007-12-12       Impact factor: 5.899

Review 6.  Methods of integrating data to uncover genotype-phenotype interactions.

Authors:  Marylyn D Ritchie; Emily R Holzinger; Ruowang Li; Sarah A Pendergrass; Dokyoon Kim
Journal:  Nat Rev Genet       Date:  2015-01-13       Impact factor: 53.242

Review 7.  A review on machine learning principles for multi-view biological data integration.

Authors:  Yifeng Li; Fang-Xiang Wu; Alioune Ngom
Journal:  Brief Bioinform       Date:  2018-03-01       Impact factor: 11.622

Review 8.  Multitable Methods for Microbiome Data Integration.

Authors:  Kris Sankaran; Susan P Holmes
Journal:  Front Genet       Date:  2019-08-28       Impact factor: 4.599

  8 in total
  1 in total

1.  Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions.

Authors:  Xinpeng Guo; Jinyu Han; Yafei Song; Zhilei Yin; Shuaichen Liu; Xuequn Shang
Journal:  Front Genet       Date:  2022-08-15       Impact factor: 4.772

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.