Literature DB >> 33719342

Comparison of Methods for Feature Selection in Clustering of High-Dimensional RNA-Sequencing Data to Identify Cancer Subtypes.

David Källberg1,2, Linda Vidman2,3, Patrik Rydén2.   

Abstract

Cancer subtype identification is important to facilitate cancer diagnosis and select effective treatments. Clustering of cancer patients based on high-dimensional RNA-sequencing data can be used to detect novel subtypes, but only a subset of the features (e.g., genes) contains information related to the cancer subtype. Therefore, it is reasonable to assume that the clustering should be based on a set of carefully selected features rather than all features. Several feature selection methods have been proposed, but how and when to use these methods are still poorly understood. Thirteen feature selection methods were evaluated on four human cancer data sets, all with known subtypes (gold standards), which were only used for evaluation. The methods were characterized by considering mean expression and standard deviation (SD) of the selected genes, the overlap with other methods and their clustering performance, obtained comparing the clustering result with the gold standard using the adjusted Rand index (ARI). The results were compared to a supervised approach as a positive control and two negative controls in which either a random selection of genes or all genes were included. For all data sets, the best feature selection approach outperformed the negative control and for two data sets the gain was substantial with ARI increasing from (-0.01, 0.39) to (0.66, 0.72), respectively. No feature selection method completely outperformed the others but using the dip-rest statistic to select 1000 genes was overall a good choice. The commonly used approach, where genes with the highest SDs are selected, did not perform well in our study.
Copyright © 2021 Källberg, Vidman and Rydén.

Entities:  

Keywords:  RNA-seq; cancer subtypes; feature selection; gene selection; high-dimensional

Year:  2021        PMID: 33719342      PMCID: PMC7943624          DOI: 10.3389/fgene.2021.632620

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


  23 in total

1.  An entropy-based gene selection method for cancer classification using microarray data.

Authors:  Xiaoxing Liu; Arun Krishnan; Adrian Mondry
Journal:  BMC Bioinformatics       Date:  2005-03-24       Impact factor: 3.169

2.  Gene expression profiling identifies molecular subtypes of inflammatory breast cancer.

Authors:  François Bertucci; Pascal Finetti; Jacques Rougemont; Emmanuelle Charafe-Jauffret; Nathalie Cervera; Carole Tarpin; Catherine Nguyen; Luc Xerri; Rémi Houlgatte; Jocelyne Jacquemier; Patrice Viens; Daniel Birnbaum
Journal:  Cancer Res       Date:  2005-03-15       Impact factor: 12.701

3.  Evaluation of gene association methods for coexpression network construction and biological knowledge discovery.

Authors:  Sapna Kumari; Jeff Nie; Huann-Sheng Chen; Hao Ma; Ron Stewart; Xiang Li; Meng-Zhu Lu; William M Taylor; Hairong Wei
Journal:  PLoS One       Date:  2012-11-30       Impact factor: 3.240

4.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer.

Authors:  Jacques Lapointe; Chunde Li; John P Higgins; Matt van de Rijn; Eric Bair; Kelli Montgomery; Michelle Ferrari; Lars Egevad; Walter Rayford; Ulf Bergerheim; Peter Ekman; Angelo M DeMarzo; Robert Tibshirani; David Botstein; Patrick O Brown; James D Brooks; Jonathan R Pollack
Journal:  Proc Natl Acad Sci U S A       Date:  2004-01-07       Impact factor: 11.205

5.  Angiogenic mRNA and microRNA gene expression signature predicts a novel subtype of serous ovarian cancer.

Authors:  Stefan Bentink; Benjamin Haibe-Kains; Thomas Risch; Jian-Bing Fan; Michelle S Hirsch; Kristina Holton; Renee Rubio; Craig April; Jing Chen; Eliza Wickham-Garcia; Joyce Liu; Aedin Culhane; Ronny Drapkin; John Quackenbush; Ursula A Matulonis
Journal:  PLoS One       Date:  2012-02-13       Impact factor: 3.240

6.  Identification of arthritis-related gene clusters by microarray analysis of two independent mouse models for rheumatoid arthritis.

Authors:  Noriyuki Fujikado; Shinobu Saijo; Yoichiro Iwakura
Journal:  Arthritis Res Ther       Date:  2006       Impact factor: 5.156

7.  Cluster analysis on high dimensional RNA-seq data with applications to cancer research - An evaluation study.

Authors:  Linda Vidman; David Källberg; Patrik Rydén
Journal:  PLoS One       Date:  2019-12-05       Impact factor: 3.240

8.  Identification of Distinct Immune Subtypes in Colorectal Cancer Based on the Stromal Compartment.

Authors:  Rongfang Shen; Ping Li; Bing Li; Botao Zhang; Lin Feng; Shujun Cheng
Journal:  Front Oncol       Date:  2020-01-10       Impact factor: 6.244

9.  Identifying molecular subtypes in human colon cancer using gene expression and DNA methylation microarray data.

Authors:  Zhonglu Ren; Wenhui Wang; Jinming Li
Journal:  Int J Oncol       Date:  2015-11-24       Impact factor: 5.650

10.  The tumor immune microenvironmental analysis of 2,033 transcriptomes across 7 cancer types.

Authors:  Sungjae Kim; Ahreum Kim; Jong-Yeon Shin; Jeong-Sun Seo
Journal:  Sci Rep       Date:  2020-06-12       Impact factor: 4.379

View more
  2 in total

1.  Novel feature selection methods for construction of accurate epigenetic clocks.

Authors:  Adam Li; Amber Mueller; Brad English; Anthony Arena; Daniel Vera; Alice E Kane; David A Sinclair
Journal:  PLoS Comput Biol       Date:  2022-08-19       Impact factor: 4.779

2.  Investigation of the Role of PUFA Metabolism in Breast Cancer Using a Rank-Based Random Forest Algorithm.

Authors:  Mariia V Guryleva; Dmitry D Penzar; Dmitry V Chistyakov; Andrey A Mironov; Alexander V Favorov; Marina G Sergeeva
Journal:  Cancers (Basel)       Date:  2022-09-25       Impact factor: 6.575

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.