Literature DB >> 17277412

Accurate cancer classification using expressions of very few genes.

Lipo Wang1, Feng Chu, Wei Xie.   

Abstract

We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) It greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis.

Entities:  

Mesh:

Year:  2007        PMID: 17277412     DOI: 10.1109/TCBB.2007.1006

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  17 in total

1.  Association rule based similarity measures for the clustering of gene expression data.

Authors:  Prerna Sethi; Sathya Alagiriswamy
Journal:  Open Med Inform J       Date:  2010-05-28

2.  Effective selection of informative SNPs and classification on the HapMap genotype data.

Authors:  Nina Zhou; Lipo Wang
Journal:  BMC Bioinformatics       Date:  2007-12-20       Impact factor: 3.169

3.  Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer.

Authors:  Vasily Sachnev; Saras Saraswathi; Rashid Niaz; Andrzej Kloczkowski; Sundaram Suresh
Journal:  BMC Bioinformatics       Date:  2015-05-20       Impact factor: 3.169

4.  Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA.

Authors:  Ming-Gang Du; Shan-Wen Zhang; Hong Wang
Journal:  Adv Bioinformatics       Date:  2009-07-20

5.  Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification.

Authors:  Shu-Lin Wang; Xue-Ling Li; Jianwen Fang
Journal:  BMC Bioinformatics       Date:  2012-07-25       Impact factor: 3.169

6.  SPICE: discovery of phenotype-determining component interplays.

Authors:  Zhengzhang Chen; Kanchana Padmanabhan; Andrea M Rocha; Yekaterina Shpanskaya; James R Mihelcic; Kathleen Scott; Nagiza F Samatova
Journal:  BMC Syst Biol       Date:  2012-05-14

7.  Genomic and functional analysis of the toxic effect of tachyplesin I on the embryonic development of zebrafish.

Authors:  Hongya Zhao; Jianguo Dai; Gang Jin
Journal:  Comput Math Methods Med       Date:  2014-04-29       Impact factor: 2.238

8.  A Simple Algorithm for Population Classification.

Authors:  Peng Hu; Ming-Hua Hsieh; Ming-Jie Lei; Bin Cui; Sung-Kay Chiu; Chi-Meng Tzeng
Journal:  Sci Rep       Date:  2016-03-31       Impact factor: 4.379

9.  Gene selection for cancer classification with the help of bees.

Authors:  Johra Muhammad Moosa; Rameen Shakur; Mohammad Kaykobad; Mohammad Sohel Rahman
Journal:  BMC Med Genomics       Date:  2016-08-10       Impact factor: 3.063

10.  A modified T-test feature selection method and its application on the HapMap genotype data.

Authors:  Nina Zhou; Lipo Wang
Journal:  Genomics Proteomics Bioinformatics       Date:  2007-12       Impact factor: 7.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.