| Literature DB >> 36246115 |
Surbhi Gupta1,2, Manoj K Gupta1, Mohammad Shabaz2, Ashutosh Sharma3.
Abstract
Cancer is one of the top causes of death globally. Recently, microarray gene expression data has been used to aid in cancer's effective and early detection. The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. The analysis of gene expression is critical in many disciplines of biological study to obtain the necessary information. This study analyses all the research studies focused on optimizing gene selection for cancer detection using artificial intelligence. One of the most challenging issues is figuring out how to extract meaningful information from massive databases. Deep Learning architectures have performed efficiently in numerous sectors and are used to diagnose many other chronic diseases and to assist physicians in making medical decisions. In this study, we have evaluated the results of different optimizers on a RNA sequence dataset. The Deep learning algorithm proposed in the study classifies five different forms of cancer, including kidney renal clear cell carcinoma (KIRC), Breast Invasive Carcinoma (BRCA), lung adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD) and Colon Adenocarcinoma (COAD). The performance of different optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). The experimental results gathered on the dataset affirm that AdaGrad and Adam. Also, the performance analysis has been done using different learning rates and decay rates. This study discusses current advancements in deep learning-based gene expression data analysis using optimized feature selection methods.Entities:
Keywords: Rna-sequences; artificial intelligence; cancer; deep learning; gene expression
Year: 2022 PMID: 36246115 PMCID: PMC9563992 DOI: 10.3389/fphys.2022.952709
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.755
FIGURE 1Cancer cases and deaths in 2020.
FIGURE 2Year-wise Distribution of articles.
FIGURE 3Prisma search strategy.
FIGURE 4Artificial intelligence and sub-parts.
FIGURE 7Long short-term memory.• The input port, also known as the gate, controls new data flow into the memory. • The forget gate forgets the irrelevant/unnecessary information. • The third port must regulate the information stored as output.
Distinction between deep and traditional learning.
| Feature | Traditional learning | Deep learning |
|---|---|---|
| Extraction and representation of features | Traditional learning relied on feature vectors that were manually created and were application-specific. In complexity, these characteristics are difficult to model | Deep learning approaches can learn characteristics from raw sensor data and determine the best pattern for enhancing recognition accurateness |
| Diversity and Generalization | Traditional learning relied on sensor data that had been tagged. Also, use dimensionality reduction strategies to focus on feature selection | Deep learning allows the extraction of intricate properties from complex data |
| Data preparations | Traditional learning derives features from sensors based on their appearance and active windows | Pre-processing and standardization of data are not required in deep learning |
| Changes in Activities’ Temporal and Spatial Dimensions | In traditional learning, handcrafted features are ineffective and unsuitable for resolving inter-class variability and inter-class linkages | Handcrafted characteristics with intra-class variability can be solved by using hierarchical features and translational invariant features |
| Model Training and Execution Time | In traditional training, small-sized data can also train the model and reduced computation time and space usage | Deep learning requires a vast amount of sensor datasets to avoid overfitting. It is accelerated using a graphics processing unit (GPU). It is also utilized to speed up computations |
FIGURE 5Artificial intelligence.
FIGURE 6Convolutional neural network. • Long Short-Term Memory (LSTM) Network: Hochreiter and Schimdhuber collaborated to create the LSTM (Lecun et al, 2015), which is utilized in various applications. LSTMs were chosen by IBM primarily for voice recognition. The LSTM employs a memory unit known as a cell that may retain its value for an extended period and aids the device in remembering the most recent computed value. The memory unit, also known as a cell, comprises three gates that regulate the movement of data inside the unit. Figure 7 shows the logical structure of a LSTM model.
Research analysis.
| Study | Cancer dataset | Objective | Technique | Acc |
|---|---|---|---|---|
|
| Leukemia | Cancer classification employing microarray gene-expression data using deep learning | ANN | 98% |
|
| 12 selected types of cancer | Cancer type classification using deep learning and somatic point mutations | DeepGene | 94% |
|
| 6 different cancer | Cancer classification using microarray data using genetic algorithm | Genetic Algorithm | 94% |
|
| 5 gene microarray datasets | Microarray data classification using novel hybrid method | Artificial Bee Colony (ABC) | 95% |
|
| Colon and Leukemia data | Implementing deep neural networks for cancer classification | GOA-based DBN | 95% |
|
| Breast cancer | Integrated deep neural networks to predict breast cancer | Deep-SVM | 70% |
|
| Breast cancer | Relevant gene identification for better cancer classification | Stacked Denoising Autoencoder (SDAE) | 98% |
|
| 3 cancer databases | Investigating RNA-sequence gene expression data utilizing deep learning | Regularized linear model (standard LASSO) and two deep learning models | 75% |
|
| TCGA | Analyzing the Effect of meta heuristic iteration on the neural networks in cancer data | GA and FWA | 98% |
|
| TCGA RNA-Sequence data | Evaluating deep learning technique for tumor detection | Cox-nnet. | -- |
|
| TCGA LUAD | Examining the relationship between specific gene mutations and lung cancer survival | Information gain, chi-squared test | -- |
|
| RNA-sequence data sets of three cancers | Analyzing deep learning technique to predict cancer employing RNA sequence data | Sparse Auto-Encoder (SSAE) | 98% |
|
| TCGA Leukemia | Introduced deep learning to Predicting Prognosis of Leukemia | Stacked Autoencoders | 83% |
|
| 10 microarray datasets | Implementing a novel strategy for gene selection based on a hybrid technique | Hybrid Bat-inspired Algorithm | 100% |
|
| Gene expression data of liver cancer | Cancer gene recognition using neuro-fuzzy approach | Neuro-Fuzzy method | 96% |
|
| TCGA | Recognition of cancer tissues using RNA-Sequence data | Deep neural network (DNN) | 99.7% |
|
| Two RNA-seq expression datasets | Extracting features for RNA-Sequence data classification | Forest Deep Neural Network (fDNN) | 90.4% |
|
| Multiple Cancer datasets | Cancer subtype classification using RNA-Sequence gene expression data | BCD Forest | 92.8% |
|
| mRNA datasets from the GDC repository | Cancer type recognition using neural network | Deep Learning models | 98% |
|
| 36 datasets from the GEMLeR repository | Implemented transfer learning for molecular cancer classification | Sparse Autoencoders on gene expression data | 98% |
|
| LUAD | Lung Cancer Subtype Classification using deep learning model | Sparse Cross-modal Superlayered Neural Network | 99% |
|
| gene expression data | Cancer subtype prediction using gene expression data | Deep cancer subtype classification (DeepCC) | 90% |
|
| 8 microarray cancer datasets | Deep neural networks for classifying microarray cancer data | 7-layer deep neural network architecture | 90% |
|
| TCGA | Developed hybrid approach for classifying RNA-sequence data | Deep convolutional neural network (DCNN) | 95% |
|
| RNA-seq gene expression data | Cancer subtype classification | Deep flexible neural forest (DFNForest) | 76% |
|
| LUAD, BRCA, and STAD | Cancer type prediction using deep learning model | ensemble-based approach | 97% |
|
| RNA-sequence data from Pan-Cancer Atlas | Cancer type Classification using RNA-sequence data | DeepGx Convolutional neural network (CNN) | 95.65% |
|
| TCGA cancers | Cancer survival prediction from RNA-sequence data | AECOX (AutoEncoder with Cox regression network) | -- |
|
| TCGA stomach cancer dataset | Stomach cancer prediction using gene expression data | CNN | 96% |
|
| 5 types of cancer | Multiclass cancer classification of gene expression RNA-Sequence data | Extreme Learning Machine algorithm | 98.81% |
|
| TCGA | Cancer prediction using gene expression data | NN, SVM, KNN, RF | 94% |
|
| 31 Tumor types | Prediction of cancer survival using gene-expression data | Transfer learning with CNN | 73% |
|
| 10 most common UCI Cancer datasets | Analyzed microarray cancer data using deep neural networks | Elephant search optimization based deep learning approach | 92% |
|
| 15 different cancer types | Prediction of the tissue-of-origin of cancer types on basis of RNA-sequence data | a novel NN model | 80% |
|
| Diabetes, heart, cancer dataset | Disease prediction model for the healthcare system | neural network-based ensemble learning | 100% |
|
| RNA-seq data of three datasets | Cancer survival analysis for microarray dataset. | AutoCox and AutoRandom | 98% |
|
| prostate cancer patients | Prediction of lymph node metastasis straight from tumor histology in prostate malignancy | convolutional neural network | 62% |
|
| 311 NSCLC patients at Massachusetts General Hospital | Tumor detection using CT images | convolutional neural network | 71% |
|
| cervical cancer dataset | Prediction of Cervical Cancer risk factors | Ensemble model | 99.7% |
|
| five benchmark datasets | Cancer diagnosis analysis along with imbalanced classes | Stacked Ensemble Model | 98% |
FIGURE 8Accuracy of multiple optimizers.
FIGURE 9Performance of multiple learning rates.
FIGURE 10Performance of multiple decay rates.
FIGURE 11Deep learning for Cancer Classification.