Literature DB >> 34293851

Supervised clustering of high-dimensional data using regularized mixture modeling.

Wennan Chang1, Changlin Wan1, Yong Zang2, Chi Zhang3, Sha Cao2.   

Abstract

Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  disease heterogeneity; mixture modeling; supervised learning

Mesh:

Year:  2021        PMID: 34293851      PMCID: PMC8294591          DOI: 10.1093/bib/bbaa291

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  17 in total

1.  Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space.

Authors:  Abbas Khalili; Jiahua Chen; Shili Lin
Journal:  Biostatistics       Date:  2010-08-16       Impact factor: 5.899

2.  An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules.

Authors:  Amrita Basu; Nicole E Bodycombe; Jaime H Cheah; Edmund V Price; Ke Liu; Giannina I Schaefer; Richard Y Ebright; Michelle L Stewart; Daisuke Ito; Stephanie Wang; Abigail L Bracha; Ted Liefeld; Mathias Wawer; Joshua C Gilbert; Andrew J Wilson; Nicolas Stransky; Gregory V Kryukov; Vlado Dancik; Jordi Barretina; Levi A Garraway; C Suk-Yee Hon; Benito Munoz; Joshua A Bittker; Brent R Stockwell; Dineo Khabele; Andrew M Stern; Paul A Clemons; Alykhan F Shamji; Stuart L Schreiber
Journal:  Cell       Date:  2013-08-29       Impact factor: 41.582

3.  Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data.

Authors:  In Sock Jang; Elias Chaibub Neto; Juistin Guinney; Stephen H Friend; Adam A Margolin
Journal:  Pac Symp Biocomput       Date:  2014

4.  Molecular signatures database (MSigDB) 3.0.

Authors:  Arthur Liberzon; Aravind Subramanian; Reid Pinchback; Helga Thorvaldsdóttir; Pablo Tamayo; Jill P Mesirov
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

5.  QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

Authors:  Juan Xie; Anjun Ma; Yu Zhang; Bingqiang Liu; Sha Cao; Cankun Wang; Jennifer Xu; Chi Zhang; Qin Ma
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

6.  The Cancer Genome Atlas Pan-Cancer analysis project.

Authors:  John N Weinstein; Eric A Collisson; Gordon B Mills; Kenna R Mills Shaw; Brad A Ozenberger; Kyle Ellrott; Ilya Shmulevich; Chris Sander; Joshua M Stuart
Journal:  Nat Genet       Date:  2013-10       Impact factor: 38.330

7.  Challenges of Big Data Analysis.

Authors:  Jianqing Fan; Fang Han; Han Liu
Journal:  Natl Sci Rev       Date:  2014-06       Impact factor: 17.275

8.  LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data.

Authors:  Changlin Wan; Wennan Chang; Yu Zhang; Fenil Shah; Xiaoyu Lu; Yong Zang; Anru Zhang; Sha Cao; Melissa L Fishel; Qin Ma; Chi Zhang
Journal:  Nucleic Acids Res       Date:  2019-10-10       Impact factor: 16.971

Review 9.  Tumor heterogeneity: causes and consequences.

Authors:  Andriy Marusyk; Kornelia Polyak
Journal:  Biochim Biophys Acta       Date:  2009-11-18

10.  Ovarian carcinoma subtypes are different diseases: implications for biomarker studies.

Authors:  Martin Köbel; Steve E Kalloger; Niki Boyd; Steven McKinney; Erika Mehl; Chana Palmer; Samuel Leung; Nathan J Bowen; Diana N Ionescu; Ashish Rajput; Leah M Prentice; Dianne Miller; Jennifer Santos; Kenneth Swenerton; C Blake Gilks; David Huntsman
Journal:  PLoS Med       Date:  2008-12-02       Impact factor: 11.069

View more
  2 in total

1.  Response to 'Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression based clustering', Zhang et al.

Authors:  Wennan Chang; Chi Zhang; Sha Cao
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

2.  Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression-based clustering.

Authors:  Bo Zhang; Jianghua He; Jinxiang Hu; Devin C Koestler; Prabhakar Chalise
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 13.994

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.