Literature DB >> 14642662

Reliable classification of two-class cancer data using evolutionary algorithms.

Kalyanmoy Deb1, A Raji Reddy.   

Abstract

In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.

Entities:  

Mesh:

Year:  2003        PMID: 14642662     DOI: 10.1016/s0303-2647(03)00138-2

Source DB:  PubMed          Journal:  Biosystems        ISSN: 0303-2647            Impact factor:   1.973


  13 in total

1.  Deconvolution of heterogeneous wound tissue samples into relative macrophage phenotype composition via models based on gene expression.

Authors:  Nicole M Ferraro; Will Dampier; Michael S Weingarten; Kara L Spiller
Journal:  Integr Biol (Camb)       Date:  2017-04-18       Impact factor: 2.192

2.  Conditional screening for ultra-high dimensional covariates with survival outcomes.

Authors:  Hyokyoung G Hong; Jian Kang; Yi Li
Journal:  Lifetime Data Anal       Date:  2016-12-08       Impact factor: 1.588

3.  Accurate molecular classification of cancer using simple rules.

Authors:  Xiaosheng Wang; Osamu Gotoh
Journal:  BMC Med Genomics       Date:  2009-10-30       Impact factor: 3.063

4.  Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

Authors:  Anirban Mukhopadhyay; Sanghamitra Bandyopadhyay; Ujjwal Maulik
Journal:  PLoS One       Date:  2010-11-12       Impact factor: 3.240

5.  A comparison of machine learning techniques for survival prediction in breast cancer.

Authors:  Leonardo Vanneschi; Antonella Farinaccio; Giancarlo Mauri; Mauro Antoniotti; Paolo Provero; Mario Giacobini
Journal:  BioData Min       Date:  2011-05-11       Impact factor: 2.522

6.  Classification of dendritic cell phenotypes from gene expression data.

Authors:  Giacomo Tuana; Viola Volpato; Paola Ricciardi-Castagnoli; Francesca Zolezzi; Fabio Stella; Maria Foti
Journal:  BMC Immunol       Date:  2011-08-29       Impact factor: 3.615

7.  Classification of Microarray Data Using Kernel Fuzzy Inference System.

Authors:  Mukesh Kumar; Santanu Kumar Rath
Journal:  Int Sch Res Notices       Date:  2014-08-21

8.  Signature Evaluation Tool (SET): a Java-based tool to evaluate and visualize the sample discrimination abilities of gene expression signatures.

Authors:  Chih-Hung Jen; Tsun-Po Yang; Chien-Yi Tung; Shu-Han Su; Chi-Hung Lin; Ming-Ta Hsu; Hsei-Wei Wang
Journal:  BMC Bioinformatics       Date:  2008-01-28       Impact factor: 3.169

9.  Characterization of digital medical images utilizing support vector machines.

Authors:  Ilias G Maglogiannis; Elias P Zafiropoulos
Journal:  BMC Med Inform Decis Mak       Date:  2004-03-10       Impact factor: 2.796

10.  Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

Authors:  A Gaspar-Cunha; G Recio; L Costa; C Estébanez
Journal:  ScientificWorldJournal       Date:  2014-02-23
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.