Literature DB >> 34396389

A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery.

Teemu J Rintala1, Antonio Federico2,3, Leena Latonen1, Dario Greco2,3,4, Vittorio Fortino1.   

Abstract

Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Keywords:  cancer; clustering; multi-objective; network analysis; pathway enrichment analysis; transcriptomics

Mesh:

Substances:

Year:  2021        PMID: 34396389      PMCID: PMC8575038          DOI: 10.1093/bib/bbab314

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  47 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Fast R Functions for Robust Correlations and Hierarchical Clustering.

Authors:  Peter Langfelder; Steve Horvath
Journal:  J Stat Softw       Date:  2012-03       Impact factor: 6.440

3.  Open Targets: a platform for therapeutic target identification and validation.

Authors:  Gautier Koscielny; Peter An; Denise Carvalho-Silva; Jennifer A Cham; Luca Fumis; Rippa Gasparyan; Samiul Hasan; Nikiforos Karamanis; Michael Maguire; Eliseo Papa; Andrea Pierleoni; Miguel Pignatelli; Theo Platt; Francis Rowland; Priyanka Wankar; A Patrícia Bento; Tony Burdett; Antonio Fabregat; Simon Forbes; Anna Gaulton; Cristina Yenyxe Gonzalez; Henning Hermjakob; Anne Hersey; Steven Jupe; Şenay Kafkas; Maria Keays; Catherine Leroy; Francisco-Javier Lopez; Maria Paula Magarinos; James Malone; Johanna McEntyre; Alfonso Munoz-Pomer Fuentes; Claire O'Donovan; Irene Papatheodorou; Helen Parkinson; Barbara Palka; Justin Paschall; Robert Petryszak; Naruemon Pratanwanich; Sirarat Sarntivijal; Gary Saunders; Konstantinos Sidiropoulos; Thomas Smith; Zbyslaw Sondka; Oliver Stegle; Y Amy Tang; Edward Turner; Brendan Vaughan; Olga Vrousgou; Xavier Watkins; Maria-Jesus Martin; Philippe Sanseau; Jessica Vamathevan; Ewan Birney; Jeffrey Barrett; Ian Dunham
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

4.  Multiplatform-based molecular subtypes of non-small-cell lung cancer.

Authors:  F Chen; Y Zhang; E Parra; J Rodriguez; C Behrens; R Akbani; Y Lu; J M Kurie; D L Gibbons; G B Mills; I I Wistuba; C J Creighton
Journal:  Oncogene       Date:  2016-10-24       Impact factor: 9.867

5.  New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx.

Authors:  Mohamed Mounir; Marta Lucchetta; Tiago C Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo
Journal:  PLoS Comput Biol       Date:  2019-03-05       Impact factor: 4.475

6.  Cluster analysis on high dimensional RNA-seq data with applications to cancer research - An evaluation study.

Authors:  Linda Vidman; David Källberg; Patrik Rydén
Journal:  PLoS One       Date:  2019-12-05       Impact factor: 3.240

7.  Comprehensive molecular portraits of human breast tumours.

Authors: 
Journal:  Nature       Date:  2012-09-23       Impact factor: 49.962

8.  Pathway-based analysis of the hidden genetic heterogeneities in cancers.

Authors:  Xiaolei Zhao; Shouqiang Zhong; Xiaoyu Zuo; Meihua Lin; Jiheng Qin; Yizhao Luan; Naizun Zhang; Yan Liang; Shaoqi Rao
Journal:  Genomics Proteomics Bioinformatics       Date:  2014-01-22       Impact factor: 7.691

9.  Nomograms Predict Survival Advantages of Gleason Score 3+4 Over 4+3 for Prostate Cancer: A SEER-Based Study.

Authors:  Xin Zhu; Xin Gou; Mi Zhou
Journal:  Front Oncol       Date:  2019-07-16       Impact factor: 6.244

10.  Multiomic Integration of Public Oncology Databases in Bioconductor.

Authors:  Marcel Ramos; Ludwig Geistlinger; Sehyun Oh; Lucas Schiffer; Rimsha Azhar; Hanish Kodali; Ino de Bruijn; Jianjiong Gao; Vincent J Carey; Martin Morgan; Levi Waldron
Journal:  JCO Clin Cancer Inform       Date:  2020-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.