Literature DB >> 33816921

A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data.

Reinel Tabares-Soto1, Simon Orozco-Arias2,3, Victor Romero-Cano4, Vanesa Segovia Bucheli5, José Luis Rodríguez-Sotelo1, Cristian Felipe Jiménez-Varón6.   

Abstract

Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms' accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario.
© 2020 Tabares-Soto et al.

Entities:  

Keywords:  11_tumor database; Bioinformatics; Cancer classification; Deep Learning; Machine Learning; Microarray gene expression

Year:  2020        PMID: 33816921      PMCID: PMC7924492          DOI: 10.7717/peerj-cs.270

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  20 in total

1.  Gene expression inference with deep learning.

Authors:  Yifei Chen; Yi Li; Rajiv Narayan; Aravind Subramanian; Xiaohui Xie
Journal:  Bioinformatics       Date:  2016-02-11       Impact factor: 6.937

Review 2.  Representation learning: a review and new perspectives.

Authors:  Yoshua Bengio; Aaron Courville; Pascal Vincent
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2013-08       Impact factor: 6.226

3.  Unified Simultaneous Clustering and Feature Selection for Unlabeled and Labeled Data.

Authors:  Dongyoon Han; Junmo Kim
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2018-04-20       Impact factor: 10.451

Review 4.  Deep learning in bioinformatics.

Authors:  Seonwoo Min; Byunghan Lee; Sungroh Yoon
Journal:  Brief Bioinform       Date:  2017-09-01       Impact factor: 11.622

5.  Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer.

Authors:  Zuojian Tang; Jared P Steranka; Sisi Ma; Mark Grivainis; Nemanja Rodić; Cheng Ran Lisa Huang; Ie-Ming Shih; Tian-Li Wang; Jef D Boeke; David Fenyö; Kathleen H Burns
Journal:  Proc Natl Acad Sci U S A       Date:  2017-01-17       Impact factor: 11.205

6.  Molecular classification of human carcinomas by use of gene expression signatures.

Authors:  A I Su; J B Welsh; L M Sapinoso; S G Kern; P Dimitrov; H Lapp; P G Schultz; S M Powell; C A Moskaluk; H F Frierson; G M Hampton
Journal:  Cancer Res       Date:  2001-10-15       Impact factor: 12.701

7.  Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics.

Authors:  Korsuk Sirinukunwattana; Richard S Savage; Muhammad F Bari; David R J Snead; Nasir M Rajpoot
Journal:  PLoS One       Date:  2013-10-23       Impact factor: 3.240

Review 8.  A primer on deep learning in genomics.

Authors:  James Zou; Mikael Huss; Abubakar Abid; Pejman Mohammadi; Ali Torkamani; Amalio Telenti
Journal:  Nat Genet       Date:  2018-11-26       Impact factor: 38.330

9.  Worldwide co-occurrence analysis of 17 species of the genus Brachypodium using data mining.

Authors:  Simon Orozco-Arias; Ana María Núñez-Rincón; Reinel Tabares-Soto; Diana López-Álvarez
Journal:  PeerJ       Date:  2019-01-14       Impact factor: 2.984

Review 10.  Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning.

Authors:  Simon Orozco-Arias; Gustavo Isaza; Romain Guyot
Journal:  Int J Mol Sci       Date:  2019-08-06       Impact factor: 5.923

View more
  7 in total

1.  Feature Subset Selection with Optimal Adaptive Neuro-Fuzzy Systems for Bioinformatics Gene Expression Classification.

Authors:  Anwer Mustafa Hilal; Areej A Malibari; Marwa Obayya; Jaber S Alzahrani; Mohammad Alamgeer; Abdullah Mohamed; Abdelwahed Motwakel; Ishfaq Yaseen; Manar Ahmed Hamza; Abu Sarwar Zamani
Journal:  Comput Intell Neurosci       Date:  2022-05-14

2.  AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays.

Authors:  Saleh Albahli; Hafiz Tayyab Rauf; Abdulelah Algosaibi; Valentina Emilia Balas
Journal:  PeerJ Comput Sci       Date:  2021-04-20

3.  An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.

Authors:  Shilpi Bose; Chandra Das; Abhik Banerjee; Kuntal Ghosh; Matangini Chattopadhyay; Samiran Chattopadhyay; Aishwarya Barik
Journal:  PeerJ Comput Sci       Date:  2021-09-16

4.  Applying Information Gain to Explore Factors Affecting Small-Incision Lenticule Extraction: A Multicenter Retrospective Study.

Authors:  Shuang Liang; Shufan Ji; Xiao Liu; Min Chen; Yulin Lei; Jie Hou; Mengdi Li; Haohan Zou; Yusu Peng; Zhixing Ma; Yuanyuan Liu; Vishal Jhanji; Yan Wang
Journal:  Front Med (Lausanne)       Date:  2022-05-03

5.  K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.

Authors:  Simon Orozco-Arias; Mariana S Candamil-Cortés; Paula A Jaimes; Johan S Piña; Reinel Tabares-Soto; Romain Guyot; Gustavo Isaza
Journal:  PeerJ       Date:  2021-05-19       Impact factor: 2.984

6.  A stacking ensemble deep learning approach to cancer type classification based on TCGA data.

Authors:  Mohanad Mohammed; Henry Mwambi; Innocent B Mboya; Murtada K Elbashir; Bernard Omolo
Journal:  Sci Rep       Date:  2021-08-02       Impact factor: 4.379

7.  Comparative Study of Classification Algorithms for Various DNA Microarray Data.

Authors:  Jingeun Kim; Yourim Yoon; Hye-Jin Park; Yong-Hyuk Kim
Journal:  Genes (Basel)       Date:  2022-03-11       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.