| Literature DB >> 34545300 |
Tanima Thakur1, Isha Batra1, Monica Luthra2, Shanmuganathan Vimal3, Gaurav Dhiman4, Arun Malik1, Mohammad Shabaz5,6.
Abstract
Cancer is one of the deadliest diseases and with its growing number, its detection and treatment become essential. Researchers have developed various methods based on gene expression. Gene expression is a process that is used to convert deoxyribose nucleic acid (DNA) to ribose nucleic acid (RNA) and then RNA to protein. This protein serves so many purposes, such as creating cells, drugs for cancer, and even hybrid species. As genes carry genetic information from one generation to another, some gene deformity is also transferred to the next generation. Therefore, the deformity needs to be detected. There are many techniques available in the literature to predict cancerous and noncancerous genes from gene expression data. This is an important development from the point of diagnostics and giving a prognosis for the condition. This paper will present a review of some of those techniques from the literature; details about the various datasets on which these techniques are implemented and the advantages and disadvantages.Entities:
Mesh:
Year: 2021 PMID: 34545300 PMCID: PMC8449724 DOI: 10.1155/2021/4242646
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Review of various cancer prediction techniques.
| Sr. no. | Paper name | Objective | Technique/tool | Dataset | Findings |
|---|---|---|---|---|---|
| 1 | [ | To design a method to classify and predict classes of cancer | Neighborhood analysis, DNA microarrays, self organizing maps | 27 ALL samples from Dana-Farber Cancer Institute, 11 adult AML samples from the Cancer and Leukemia Group B (CALGB) leukemia cell bank | Feasible method. Proper experimental care is required |
|
| |||||
| 2 | [ | To classify samples of cancer for gene expression data | Computational analysis, affymetrix oligonucleaotide arrays, neighborhood analysis, genecluster software | 38 leukemia samples (11 AML, 27 ALL), for testing 34 samples (14 AML, 20 ALL) | Genes with no correlation provide a better result, and the median prediction strength is 0.86 |
|
| |||||
| 3 | [ | To specify the specific categories of cancer using their gene expression | ANNs, cDNA microarrays, DeArray software | NCI, ATCC, MSKCC, CHTN, DZNSG, National Institutes of Health | It can work with nonlinear features also. It is robust. It also achieves high sensitivity and specificity. |
|
| |||||
| 4 | [ | To create a framework for predicting predefined classes of tumor | Compound covariate prediction, BRB ArrayTools | Hereditary breast cancer dataset of 22 patients [ | Good setter for comparing prediction methods. Require some improvements. |
|
| |||||
| 5 | [ | To develop a classification system for DNA microarray gene expression data | SOMs, Cluster and TreeView software, PCA, KNN | Multiple datasets have been used, such as one with 99 samples, the other with 42 selections, | Gene expressions provide an excellent way of diagnosing patients with medulloblastomas |
|
| |||||
| 6 | [ | To propose a method that performs classification on interval-scaled attributes basis | PCA, FA, fuzzy FA | 203 samples (a subset of the actual dataset used in [ | Successfully used in supervised learning. FA provides more information compared to surgical-pathological staging |
|
| |||||
| 7 | [ | To propose a method for gene feature selection | Multiple SVM-RFE | Four gene expression datasets available on Kent Ridge Bio-Medical Data Set Repository | MSVM-RFE has classification accuracy better than SVM-RFE. SVM's performance has been improved. |
|
| |||||
| 8 | [ | To propose a framework for addressing the problem of integration of different data types | Generalized singular value decomposition | Fourteen breast cancer cell lines from American Type Culture Collection | Gene expression and copy number data are being analyzed. Improvements can be made to use other data types also. |
|
| |||||
| 9 | [ | To propose a method used to find tissues of the tumor with different gene expression data | ssEAM, PSO | NC160, acute leukemia, ALL dataset | ssEAM performs better than PNN, ANN, LVQ1and KNN at a 0.05 significance level |
|
| |||||
| 10 | [ | To present a selection method for analyzing gene expression data | RBF neural network, rough based feature selection method, naïve Bayes, linear SVM | ALL, AML, lung cancer and prostate cancer dataset ( | The best classification accuracy rate of 99.8% |
|
| |||||
| 11 | [ | To present a framework for discovering cancer classes. | Permutation technique, cluster ensemble, cluster validity index (DAI) | 3 synthetic and 4 real datasets (leukemia [ | DAI finds the number of classes correctly and outperforms other existing methods |
|
| |||||
| 12 | [ | To present a method based on gene expression for classifying NSCLC | Hierarchical clustering, SpotFire decision site, proportional hazards model | 91 NSCLC, six normal lung tissues from GSE3526 (Duke University) | Gene signatures provide the best way for histopathological classification |
|
| |||||
| 13 | [ | To propose a classifier predicting disease in CRC patients | Agilent 44K oligonucleotide arrays, Kaplan–Meier method, unsupervised hierarchical clustering | 188 training samples (NCI, LUMC, SGH) and 206 testing samples (Institute Catalad'Oncologia, Spain) | Eighty-six percent of patients of the validation dataset are identified as low-risk patients. First prognostic technique for CRC |
|
| |||||
| 14 | [ | To propose a framework that combines genome-wide copy number and expression data | L1-L2 constrained regression, local and global search strategies | 89 samples of breast cancer Dataset (UG San Francisco and California Pacific Medical Center [ | Outperforms other existing methods accuracy |
|
| |||||
| 15 | [ | To propose a framework that combines other models that describes gene interaction. | Bayesian model, Gibbs distribution, ANOVA test, parallel programming with GPU/CPU | GSE4290, DREAM dataset | Specificity of 0.99 has been achieved. Better performance than Enet and VAR |
|
| |||||
| 16 | [ | To propose the extended framework for segmentation of breast tumor | Multichannel MRFs, kinetic observation model, Gaussian mixture model | DCE MRI images of breast cancer | AOC of 0.9 has been achieved using multichannel MRF compared to AOC of 0.89 in single-channel MRF. Better segmentation results when applied to SVM |
|
| |||||
| 17 | [ | To propose a gene selection method | LSLS, wrapper method, SVM | Six datasets available at Kent Ridge Biomedical Data repository | LSLS performs better than KW and SPFS |
|
| |||||
| 18 | [ | To present a novel method classifying tumor samples. | RPCA, LDA, SVM | Nine different publically available datasets (acute leukemia data [ | Performance is measured using LOO-CV, accuracy, and AUC. A feasible and effective method. |
|
| |||||
| 19 | [ | To propose a method based on deep learning for inferring target genes expression | D-GEX | Microarray GEO dataset, RNA-Seq-based GTEx dataset | Outperforms linear regression (15.33 relative improvement) and KNN. The lower error rate in most of the genes (81.31%). |
|
| |||||
| 20 | [ | To develop a fused network identifying KIRC stages | Gene expression and DNA methylation data, SNF, SNFTool, sparse partial least square regression, LASSO label prediction method | The Cancer Genome Atlas KIRC data (TCGA data portal) | High prediction accuracy than KNN, MLW, and WDC. It is robust. |
|
| |||||
| 21 | [ | To classify widely and rarely expressed genes | Incremental feature selection method, mRMR, RNN | Gene expression dataset available at the Human Protein Atlas [ | GO terms and KEGG are used at the functional level. Youden's indexes are 0.739 and 0.639 for normal and cancer tissues, respectively. |
|
| |||||
| 22 | [ | To develop a light-weight CNN for classifying breast cancer | CNN, array-array intensity correlation, R-Studio, batch normalization | Breast cancer dataset from Pan-Cancer Atlas | Achieves 98.76% accuracy |
|
| |||||
| 23 | [ | To propose a method for classifying different types of cancer. | BPSO-DT, CNN, deep learning | Cancer types: RNA sequencing values from tumor samples/tissues available at Mendeley datasets | It achieves an accuracy of 96.90%. Various evaluation parameters are recall, precision, and F1 score. |
|
| |||||
| 24 | [ | To propose a method based on NMF to classify tumor | NMF, SNMF, SVM | Colon cancer dataset [ | It is effective and efficient. The effect of sparseness is low. |
|
| |||||
| 25 | [ | To propose a model for biclustering data of gene expression. | PCA, GLPCA, DHPCA, | SRBCT, medulloblastoma, colon cancer, 11_Tumors | It is compared with PCA, GLPCA, GNMF, ONMTF, and NMTFCoS. It provides better accuracy than others. |
|
| |||||
| 26 | [ | To present a framework for predicting the expression of genes employing nonlinear features | Unsupervised clustering algorithm, L-GEPM, LSTM neural network | GEO data from LINCS cloud, GTEx, and 1000G RNA-Seq data | Performs better than D-GM, LR-L1, and KNN-R. Target genes extracted are much closer to the actual gene expression. Flexible and superior for NL features. |
|
| |||||
| 27 | [ | To propose a multilayer framework to classify multitissues of cancer. | CNN, RNA sequencing, supervised learning, stochastic gradient descent optimization, back-propagation | 11093 samples from the Cancer Genome Atlas | 98.93 percent overall accuracy and 0.99 AUC have been achieved |
|
| |||||
| 28 | [ | To propose a gene selection method that can classify tissues in multicategory datasets | PLS, linear support vector classifier, MATLAB, OSU_SVM3.00 toolbox linear SVC, SVM | MIT AML and ALL dataset, SRBCT datasets | It is efficient and robust. It works well for both two-category and multicategory datasets. |
|
| |||||
| 29 | [ | To propose an ST model for finding the effects of CNAs | LST and NA, dynamic modeling, transcriptional bursting, transcriptional oscillation, circular binary segmentation | NCBI/GEO database | It shows the use of mathematical theory to investigate the findings and for a better understanding of cancer bio |
|
| |||||
| 30 | [ | To propose a muti-fusion-based method for profiling gene expression under nonthermal plasma treatment. | Dempster–Shafer method, fuzzy C-Means clustering method, MATLAB R2016b | NCBI Gene Expression Omnibus under GEO (GSE59997) | Reduces uncertainty and increases reliability. The use of C-means finds changes in genes in various nonthermal plasma treatments. |
|
| |||||
| 31 | [ | To present a survey of 1D CNN and its applications. | NA | NA | 1D CNN works well with small data and where fewer computations are required. It also works where low-cost implementation is needed. |
|
| |||||
| 32 | [ | To propose a classification method for ECG signal images based on 2D CC. | CNN, Intel17-5930K CPU, and NVIDIA GTX1080 GPU | MIT-BIH Arrhythmia database | 2D CNN outperforms 1D CNN. 2D CNN is more accurate and robust. 1D CNN works well with limited data. |
Figure 1Comparison of various prediction techniques.