Literature DB >> 33672300

Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification.

Kwang Ho Park1, Erdenebileg Batbaatar1, Yongjun Piao2, Nipon Theera-Umpon3,4, Keun Ho Ryu4,5,6.   

Abstract

Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoietic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, we propose not only to build a classification model for five major subtypes using two kinds of losses, namely reconstruction loss and classification loss, but also to extract suitable features using a deep autoencoder. Furthermore, for considering the data imbalance problem, we apply an oversampling algorithm, the synthetic minority oversampling technique (SMOTE). For validation of our proposed autoencoder-based feature extraction approach for hematopoietic cancer subtype classification, we compared other traditional feature selection algorithms (principal component analysis, non-negative matrix factorization) and classification algorithms with the SMOTE oversampling approach. Additionally, we used the Shapley Additive exPlanations (SHAP) interpretation technique in our model to explain the important gene/protein for hematopoietic cancer subtype classification. Furthermore, we compared five widely used classification algorithms, including logistic regression, random forest, k-nearest neighbor, artificial neural network and support vector machine. The results of autoencoder-based feature extraction approaches showed good performance, and the best result was the SMOTE oversampling-applied support vector machine algorithm consider both focal loss and reconstruction loss as the loss function for autoencoder (AE) feature selection approach, which produced 97.01% accuracy, 92.60% recall, 99.52% specificity, 93.54% F1-measure, 97.87% G-mean and 95.46% index of balanced accuracy as subtype classification performance measures.

Entities:  

Keywords:  autoencoder; bioinformatics; cancer classification; data mining; hematopoietic cancer; machine learning; subtype classification

Mesh:

Year:  2021        PMID: 33672300      PMCID: PMC7926954          DOI: 10.3390/ijerph18042197

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


  16 in total

Review 1.  Logistic regression and artificial neural network classification models: a methodology review.

Authors:  Stephan Dreiseitl; Lucila Ohno-Machado
Journal:  J Biomed Inform       Date:  2002 Oct-Dec       Impact factor: 6.317

2.  Reducing the dimensionality of data with neural networks.

Authors:  G E Hinton; R R Salakhutdinov
Journal:  Science       Date:  2006-07-28       Impact factor: 47.728

3.  Estimation of the probability of an event as a function of several independent variables.

Authors:  S H Walker; D B Duncan
Journal:  Biometrika       Date:  1967-06       Impact factor: 2.445

4.  Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data.

Authors:  Runpu Chen; Le Yang; Steve Goodison; Yijun Sun
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

5.  Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles.

Authors:  Yongjun Piao; Minghao Piao; Keun Ho Ryu
Journal:  Comput Biol Med       Date:  2016-11-21       Impact factor: 4.589

6.  A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data.

Authors:  Huijuan Lu; Huiyun Gao; Minchao Ye; Xiuhui Wang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2021-06-03       Impact factor: 3.710

Review 7.  p130Cas: a key signalling node in health and disease.

Authors:  Angela Barrett; Caroline Pellet-Many; Ian C Zachary; Ian M Evans; Paul Frankel
Journal:  Cell Signal       Date:  2012-12-29       Impact factor: 4.315

8.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors:  Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal:  Nat Biotechnol       Date:  2010-05-02       Impact factor: 54.908

9.  Epigenetic silencing of miR-340-5p in multiple myeloma: mechanisms and prognostic impact.

Authors:  Zhenhai Li; Kwan Yeung Wong; George A Calin; Wee-Joo Chng; Godfrey Chi-Fung Chan; Chor Sang Chim
Journal:  Clin Epigenetics       Date:  2019-05-07       Impact factor: 6.551

Review 10.  Cancer is a preventable disease that requires major lifestyle changes.

Authors:  Preetha Anand; Ajaikumar B Kunnumakkara; Ajaikumar B Kunnumakara; Chitra Sundaram; Kuzhuvelil B Harikumar; Sheeja T Tharakan; Oiki S Lai; Bokyung Sung; Bharat B Aggarwal
Journal:  Pharm Res       Date:  2008-07-15       Impact factor: 4.200

View more
  3 in total

1.  An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis.

Authors:  Tuan Tran; Uyen Le; Yihui Shi
Journal:  PLoS One       Date:  2022-05-27       Impact factor: 3.752

2.  Identifying Cancer Subtypes Using a Residual Graph Convolution Model on a Sample Similarity Network.

Authors:  Wei Dai; Wenhao Yue; Wei Peng; Xiaodong Fu; Li Liu; Lijun Liu
Journal:  Genes (Basel)       Date:  2021-12-27       Impact factor: 4.096

3.  Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis.

Authors:  Abdur Rasool; Chayut Bunterngchit; Luo Tiejian; Md Ruhul Islam; Qiang Qu; Qingshan Jiang
Journal:  Int J Environ Res Public Health       Date:  2022-03-09       Impact factor: 3.390

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.