| Literature DB >> 35242297 |
M Sathya1, M Jeyaselvi2, Shubham Joshi3, Ekta Pandey4, Piyush Kumar Pareek5, Sajjad Shaukat Jamal6, Vinay Kumar7, Henry Kwame Atiglah8.
Abstract
In the microarray gene expression data, there are a large number of genes that are expressed at varying levels of expression. Given that there are only a few critically significant genes, it is challenging to analyze and categorize datasets that span the whole gene space. In order to aid in the diagnosis of cancer disease and, as a consequence, the suggestion of individualized treatment, the discovery of biomarker genes is essential. Starting with a large pool of candidates, the parallelized minimal redundancy and maximum relevance ensemble (mRMRe) is used to choose the top m informative genes from a huge pool of candidates. A Genetic Algorithm (GA) is used to heuristically compute the ideal set of genes by applying the Mahalanobis Distance (MD) as a distance metric. Once the genes have been identified, they are input into the GA. It is used as a classifier to four microarray datasets using the approved approach (mRMRe-GA), with the Support Vector Machine (SVM) serving as the classification basis. Leave-One-Out-Cross-Validation (LOOCV) is a cross-validation technique for assessing the performance of a classifier. It is now being investigated if the proposed mRMRe-GA strategy can be compared to other approaches. It has been shown that the proposed mRMRe-GA approach enhances classification accuracy while employing less genetic material than previous methods. Microarray, Gene Expression Data, GA, Feature Selection, SVM, and Cancer Classification are some of the terms used in this paper.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35242297 PMCID: PMC8888099 DOI: 10.1155/2022/5821938
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1The schematic of the mRMRe-GA method.
Figure 2Chromosome representation.
Figure 3Crossover representation.
Figure 4Mutation representation. (a) Before Mutation. (b) After Mutation.
Algorithm 1The pseudo-code for the algorithm is given below.
Figure 5Flowchart of the proposed mRMRe–GA method.
Genetic Algorithm parameters.
| Parameter | Value |
|---|---|
| Maximum no. of generations | 1–100 |
| Population per generation | 20 |
| Probability of crossover | 0.8 |
| Probability of mutation | 0.1 |
Details of performance parameters.
| Name of the parameter | Condition | Definition | Explanation | |
|---|---|---|---|---|
| Positive | Negative | |||
| TPR-true positive rate (sensitivity) | TP-true positive | FP-false positive | TP/(TP + FP) | The closer to 1, the better. TPR = 1 when FP = 0. |
| TNR-true negative rate (specificity) | TN-true negative | FN-false negative | TN/(TN + FN) | The closer to 1, the better. TNR = 1 when FN = 0. |
| FPR-false positive rate | FP-false positive | TN-true negative | FP/(FP + TN) | The closer to 0, the better. FPR = 0 when FP = 0. |
| FNR-false negative rate | FN-false negative | TP-true positive | FN/(FN + TP) | The closer to 0, the better. FNR = 0 when FN = 0. |
Figure 6The performance of the SVM classifier with genes selected from mRMRe.
The performance comparison of SVM kernel functions.
| Dataset | No. of genes | Accuracy of SVM with different kernel functions | |||
|---|---|---|---|---|---|
| Linear | Radial basis | Polynomial | Sigmoid | ||
| Colon | 5 | 87.10 | 87.10 | 75.81 | 88.71 |
| 10 | 85.48 | 90.32 | 64.52 | 88.71 | |
| 20 | 88.71 | 90.32 | 64.52 | 88.71 | |
| 30 | 83.87 | 90.32 | 64.52 | 90.32 | |
| 40 | 87.10 | 90.32 | 64.52 | 90.32 | |
| 50 | 83.87 | 90.32 | 64.52 | 90.32 | |
| 60 | 83.87 | 88.71 | 64.52 | 90.32 | |
| 70 | 80.65 | 88.71 | 64.52 | 88.71 | |
| 80 | 82.26 | 88.71 | 64.52 | 87.10 | |
| 90 | 82.26 | 88.71 | 64.52 | 88.71 | |
| 100 | 82.26 | 88.71 | 64.52 | 88.71 | |
|
| |||||
| DLBCL outcome | 5 | 82.76 | 81.03 | 55.17 | 68.97 |
| 10 | 86.21 | 87.93 | 55.17 | 82.76 | |
| 15 | 91.38 | 98.28 | 55.17 | 89.66 | |
| 20 | 91.38 | 91.38 | 55.17 | 87.93 | |
| 30 | 84.48 | 89.66 | 58.62 | 87.93 | |
| 40 | 84.48 | 89.66 | 55.17 | 89.66 | |
| 50 | 87.93 | 94.83 | 62.07 | 86.21 | |
| 60 | 87.93 | 93.10 | 55.17 | 89.66 | |
| 70 | 87.93 | 91.38 | 55.17 | 93.10 | |
| 80 | 86.21 | 91.38 | 55.17 | 87.93 | |
| 90 | 89.66 | 91.38 | 55.17 | 87.93 | |
| 100 | 87.93 | 93.10 | 55.17 | 89.66 | |
|
| |||||
| Leukemia | 5 | 94.44 | 100 | 83.33 | 98.61 |
| 10 | 94.44 | 97.22 | 93.06 | 97.22 | |
| 20 | 97.22 | 97.22 | 91.67 | 97.22 | |
| 30 | 95.83 | 94.44 | 94.44 | 97.22 | |
| 40 | 98.61 | 98.61 | 94.44 | 95.83 | |
| 50 | 98.61 | 95.83 | 94.44 | 93.06 | |
| 60 | 98.61 | 98.61 | 94.44 | 95.83 | |
| 70 | 98.61 | 97.22 | 91.67 | 98.61 | |
| 80 | 98.61 | 98.61 | 90.28 | 95.83 | |
| 90 | 98.61 | 97.22 | 91.67 | 95.83 | |
| 100 | 98.61 | 98.61 | 90.28 | 97.22 | |
|
| |||||
| Prostate | 5 | 87.25 | 87.25 | 50.98 | 86.27 |
| 10 | 83.33 | 92.16 | 50.98 | 91.18 | |
| 20 | 90.20 | 97.06 | 50.98 | 98.04 | |
| 30 | 95.10 | 97.06 | 50.98 | 91.10 | |
| 40 | 93.14 | 98.04 | 50.98 | 98.04 | |
| 50 | 97.06 | 98.04 | 50.98 | 98.04 | |
| 60 | 97.06 | 97.06 | 50.98 | 99.02 | |
| 70 | 99.02 | 100 | 88.24 | 99.02 | |
| 80 | 100 | 100 | 94.11 | 100 | |
| 90 | 98.04 | 99.02 | 97.06 | 99.02 | |
| 100 | 98.04 | 100 | 95.10 | 99.02 | |
Description of Microarray datasets.
| Name of the dataset | No. of samples | No. of genes | No. of classes |
|---|---|---|---|
| Colon | 62 | 2000 | 2 |
| DLBCL outcome | 58 | 7129 | 2 |
| Leukemia | 72 | 7129 | 2 |
| Prostate | 102 | 12600 | 2 |
Figure 7Comparison of the mRMRe-GA method with other gene selection methods for four microarray datasets. (a) Colon. (b) DLBCL Outcome. (c) Leukemia. (d) Prostate.
The presentation procedures of the proposed mRMRe-GA method for four microarray datasets.
| Dataset | # Genes | Accuracy (%) | Sensitivity (%) | Specificity (%) |
| Kappa value |
|---|---|---|---|---|---|---|
| Colon | 4 | 100 | 100 | 100 | 2.542 | 1 |
| DLBCL outcome | 6 | 100 | 100 | 100 | 1.209 | 1 |
| Leukemia | 3 | 100 | 100 | 100 | 7.874 | 1 |
| Prostate | 5 | 100 | 100 | 100 | 2.887 | 1 |
Comparison of the mRMRe-GA with other methods.
| Algorithms | Colon | DLBCL outcome | Leukemia | Prostate | ||||
|---|---|---|---|---|---|---|---|---|
| #Genes | #Genes | #Genes | Accuracy | #Genes | Accuracy | #Genes | Accuracy | |
| mRMRe-GA | 4 | 100 | 6 | 100 | 3 | 100 | 5 | 100 |
| GBC (Alshamlan et al. [ | 10 | 98.38 | 4 | 100 | ||||
| mRMR-ABC (Alshamlan et al. [ | 15 | 96.77 | 14 | 100 | ||||
| Co-ABC (Alshamlan [ | 9 | 96.77 | 3 | 100 | ||||
| COA-HS (Elyasigomari et al. [ | 5 | 100 | 6 | 100 | 5 | 100 | ||
| GA (Peng et al. [ | 12 | 93.55 | 6 | 100 | ||||
| mRMR-GA (Akadi et al. [ | 5 | 95.61 | 45 | 87.93 | 15 | 100 | 50 | 96.08 |
| PSO (Shen et al. [ | 20 | 85.48 | 23 | 94.44 | ||||
| mRMR-PSO (Abdi et al. [ | 10 | 90.32 | 18 | 100 | ||||
| GA-SVM (Gunavathi and Hemalatha [ | 10 | 95 | 10 | 77.27 | 10 | 95.45 | 10 | 92.68 |
| AACO (Xiong and Wang [ | 4 | 96.77 | 3 | 100 | ||||
| GADP (Lee and Leu [ | 8 | 100 | 5 | 100 | ||||
| CS (Gunavathi and Premalatha [ | 10 | 95 | 10 | 72.72 | 10 | 95.45 | 10 | 92.68 |