Literature DB >> 33322492

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.

Y-H Taguchi1, Turki Turki2.   

Abstract

The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.

Entities:  

Keywords:  gene expression; genomic regions; prostate cancer; protien-coding genes; tensor decomposition; unsupervised learning

Year:  2020        PMID: 33322492      PMCID: PMC7763286          DOI: 10.3390/genes11121493

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


  39 in total

1.  A unique H3K4me2 profile marks tissue-specific gene regulation.

Authors:  Aleksandra Pekowska; Touati Benoukraf; Pierre Ferrier; Salvatore Spicuglia
Journal:  Genome Res       Date:  2010-09-14       Impact factor: 9.043

Review 2.  Big data and machine learning algorithms for health-care delivery.

Authors:  Kee Yuan Ngiam; Ing Wei Khor
Journal:  Lancet Oncol       Date:  2019-05       Impact factor: 41.316

3.  GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data.

Authors:  Muaaz Gul Awan; Taban Eslami; Fahad Saeed
Journal:  Comput Biol Med       Date:  2018-08-16       Impact factor: 4.589

4.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM.

Authors:  Charles J Vaske; Stephen C Benz; J Zachary Sanborn; Dent Earl; Christopher Szeto; Jingchun Zhu; David Haussler; Joshua M Stuart
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

5.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis.

Authors:  Ronglai Shen; Adam B Olshen; Marc Ladanyi
Journal:  Bioinformatics       Date:  2009-09-16       Impact factor: 6.937

6.  PINSPlus: a tool for tumor subtype discovery in integrated genomic data.

Authors:  Hung Nguyen; Sangam Shrestha; Sorin Draghici; Tin Nguyen
Journal:  Bioinformatics       Date:  2019-08-15       Impact factor: 6.937

7.  TRRUST: a reference database of human transcriptional regulatory interactions.

Authors:  Heonjong Han; Hongseok Shim; Donghyun Shin; Jung Eun Shim; Yunhee Ko; Junha Shin; Hanhae Kim; Ara Cho; Eiru Kim; Tak Lee; Hyojin Kim; Kyungsoo Kim; Sunmo Yang; Dasom Bae; Ayoung Yun; Sunphil Kim; Chan Yeong Kim; Hyeon Jin Cho; Byunghee Kang; Susie Shin; Insuk Lee
Journal:  Sci Rep       Date:  2015-06-12       Impact factor: 4.379

8.  The Human Genome Project: big science transforms biology and medicine.

Authors:  Leroy Hood; Lee Rowen
Journal:  Genome Med       Date:  2013-09-13       Impact factor: 11.117

9.  A multivariate approach to the integration of multi-omics datasets.

Authors:  Chen Meng; Bernhard Kuster; Aedín C Culhane; Amin Moghaddas Gholami
Journal:  BMC Bioinformatics       Date:  2014-05-29       Impact factor: 3.169

Review 10.  On fusion methods for knowledge discovery from multi-omics datasets.

Authors:  Edwin Baldwin; Jiali Han; Wenting Luo; Jin Zhou; Lingling An; Jian Liu; Hao Helen Zhang; Haiquan Li
Journal:  Comput Struct Biotechnol J       Date:  2020-03-05       Impact factor: 7.271

View more
  1 in total

1.  Unsupervised tensor decomposition-based method to extract candidate transcription factors as histone modification bookmarks in post-mitotic transcriptional reactivation.

Authors:  Y-H Taguchi; Turki Turki
Journal:  PLoS One       Date:  2021-05-25       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.