Literature DB >> 31429008

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data.

Lokeswari Venkataramana1, Shomona Gracia Jacob2, Rajavel Ramadoss3, Dodda Saisuma4, Dommaraju Haritha4, Kunthipuram Manoja4.   

Abstract

BACKGROUND: Data mining techniques are used to mine unknown knowledge from huge data. Microarray gene expression (MGE) data plays a major role in predicting type of cancer. But as MGE data is huge in volume, applying traditional data mining approaches is time consuming. Hence parallel programming frameworks like Hadoop, Spark and Mahout are necessary to ease the task of computation.
OBJECTIVE: Not all the gene expressions are necessary in prediction, it is very essential to select important genes for improving classification accuracy. So feature selection algorithms are parallelized and executed on Spark framework to eliminate unnecessary genes and identify only predictive genes in very less time without affecting prediction accuracy.
METHODS: Parallelized hybrid feature selection (HFS) method is proposed to serve the purpose. This method includes parallelized correlation feature subset selection followed by rank-based feature selection methods. The selected subset of genes is evaluated using parallel classification algorithms. The accuracy values obtained are compared with existing rank-weight feature selection, parallelized recursive feature selection methods and also with the values obtained by executing parallelized HFS on DistributedWekaSpark.
RESULTS: The classification accuracy obtained with the proposed parallelized HFS method is 97% and 79% for gastric cancer and childhood leukemia respectively. The proposed parallelized HFS method produced ~ 4% to ~ 15% improvement in classification accuracy when compared with previous methods.
CONCLUSION: The results reveal the fact that the proposed parallelized feature selection algorithm is scalable to growing medical data and predicts cancer sub-types in lesser time with higher accuracy.

Entities:  

Keywords:  Correlation feature subset selection; DistributedWekaSpark; Parallel classification; Parallelized hybrid feature selection; Rank-based methods; Spark

Mesh:

Substances:

Year:  2019        PMID: 31429008     DOI: 10.1007/s13258-019-00859-x

Source DB:  PubMed          Journal:  Genes Genomics        ISSN: 1976-9571            Impact factor:   1.839


  4 in total

1.  A hybrid feature selection method for DNA microarray data.

Authors:  Li-Yeh Chuang; Cheng-Huei Yang; Kuo-Chuan Wu; Cheng-Hong Yang
Journal:  Comput Biol Med       Date:  2011-03-03       Impact factor: 4.589

2.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

3.  A robust gene selection method for microarray-based cancer classification.

Authors:  Xiaosheng Wang; Osamu Gotoh
Journal:  Cancer Inform       Date:  2010-02-04

4.  Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.

Authors:  Hongyan Zhang; Lanzhi Li; Chao Luo; Congwei Sun; Yuan Chen; Zhijun Dai; Zheming Yuan
Journal:  Biomed Res Int       Date:  2014-07-23       Impact factor: 3.411

  4 in total
  1 in total

1.  Detecting biomarkers from microarray data using distributed correlation based gene selection.

Authors:  Alok Kumar Shukla; Diwakar Tripathi
Journal:  Genes Genomics       Date:  2020-02-10       Impact factor: 1.839

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.