| Literature DB >> 33102596 |
Sunil Kumar Prabhakar1, Harikumar Rajaguru2, Sun-Hee Kim1.
Abstract
One of the deadliest diseases which affects the large intestine is colon cancer. Older adults are typically affected by colon cancer though it can happen at any age. It generally starts as small benign growth of cells that forms on the inside of the colon, and later, it develops into cancer. Due to the propagation of somatic alterations that affects the gene expression, colon cancer is caused. A standardized format for assessing the expression levels of thousands of genes is provided by the DNA microarray technology. The tumors of various anatomical regions can be distinguished by the patterns of gene expression in microarray technology. As the microarray data is too huge to process due to the curse of dimensionality problem, an amalgamated approach of utilizing bilevel feature selection techniques is proposed in this paper. In the first level, the genes or the features are dimensionally reduced with the help of Multivariate Minimum Redundancy-Maximum Relevance (MRMR) technique. Then, in the second level, six optimization techniques are utilized in this work for selecting the best genes or features before proceeding to classification process. The optimization techniques considered in this work are Invasive Weed Optimization (IWO), Teaching Learning-Based Optimization (TLBO), League Championship Optimization (LCO), Beetle Antennae Search Optimization (BASO), Crow Search Optimization (CSO), and Fruit Fly Optimization (FFO). Finally, it is classified with five suitable classifiers, and the best results show when IWO is utilized with MRMR, and then classified with Quadratic Discriminant Analysis (QDA), a classification accuracy of 99.16% is obtained.Entities:
Mesh:
Year: 2020 PMID: 33102596 PMCID: PMC7578727 DOI: 10.1155/2020/8427574
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Dataset details.
| Dataset | Number of genes | Class 1 (tumor) | Class 2 (healthy) | Total samples |
|---|---|---|---|---|
| Colon cancer | 2000 | 40 | 22 | 62 |
Figure 1Illustration of the work.
Performance analysis of classifiers in terms of classification accuracies with six optimization techniques for different gene selection methods using 30-60-90 selected genes.
| Classifiers | Gene selection | Optimization techniques | |||||
|---|---|---|---|---|---|---|---|
| Invasive Weed Optimization | Teaching Learning-Based Optimization | League Championship Optimization | Beetle Antennae Search Optimization | Crow Search Optimization | Fruit fly Optimization | ||
| RF | 30 | 85.74344 | 93.75 | 89.85625 | 76.135 | 76.135 | 79.94629 |
| 60 | 85.22406 | 94.01125 | 93.36 | 76.3375 | 89.20625 | 87.48398 | |
| 90 | 75.84375 | 81.9 | 89.6 | 76 | 78.71281 | 92.51797 | |
|
| |||||||
| Adaboost | 30 | 84.375 | 85.67875 | 91.1475 | 75.75 | 76.3375 | 89.96621 |
| 60 | 94.01125 | 93.555 | 95.575 | 76.3375 | 76.75938 | 85.9375 | |
| 90 | 92.19 | 82.942 | 85.74344 | 76.23625 | 75.625 | 88.62695 | |
|
| |||||||
| LR | 30 | 80.86 | 90.49688 | 94.53375 | 77.27594 | 77.83109 | 93.49766 |
| 60 | 76.675 | 86.76016 | 90.1125 | 76.25313 | 75.60938 | 98.69688 | |
| 90 | 88.8125 | 89.6 | 76.86906 | 93.23 | 85.54938 | 94.03444 | |
|
| |||||||
| DT | 30 | 97.395 | 77.05469 | 77.53719 | 95.705 | 92.19 | 94.23867 |
| 60 | 95.575 | 89.20625 | 78.38625 | 76.3375 | 95.575 | 94.23867 | |
| 90 | 79.69 | 78.45156 | 94.795 | 93.75 | 97.655 | 82.86824 | |
|
| |||||||
| QDA | 30 | 76.37125 | 77.08 | 93.555 | 86.32813 | 93.36 | 97.915 |
| 60 | 88.55 | 91.93 | 94.2725 | 76.3375 | 86.66 | 95.835 | |
| 90 | 99.16 | 92.19 | 92.19 | 76.3375 | 91.47912 | 95.70313 | |
|
| |||||||
| Average | 86.71975 | 86.97377 | 89.1689 | 80.55673 | 84.57899 | 91.43377 | |
Performance analysis of classifiers in terms of PC with six optimization techniques for different gene selection methods using 30-60–90 selected genes.
| Classifiers | Gene selection | Optimization techniques | |||||
|---|---|---|---|---|---|---|---|
| Invasive Weed Optimization | Teaching Learning-Based Optimization | League Championship Optimization | Beetle Antennae Search Optimization | Crow Search Optimization | Fruit fly Optimization | ||
| RF | 30 | 71.48688 | 87.5 | 79.69 | 52.27 | 52.27 | 59.89258 |
| 60 | 70.44813 | 88.0225 | 86.72 | 51.625 | 78.38813 | 74.96797 | |
| 90 | 51.6875 | 63.8 | 79.17 | 52 | 57.42563 | 85.02938 | |
|
| |||||||
| Adaboost | 30 | 68.75 | 71.3575 | 82.295 | 51.5 | 51.125 | 79.93242 |
| 60 | 88.0225 | 87.11 | 91.15 | 55.00906 | 53.51875 | 71.875 | |
| 90 | 84.38 | 65.884 | 71.48688 | 52.4725 | 51.25 | 77.24688 | |
|
| |||||||
| LR | 30 | 61.72 | 80.99 | 89.0675 | 54.55188 | 55.66219 | 86.97938 |
| 60 | 53.35 | 73.50344 | 80.21 | 52.50625 | 51.21875 | 97.39375 | |
| 90 | 77.60625 | 79.17 | 53.73813 | 86.46 | 71.09875 | 88.06888 | |
|
| |||||||
| DT | 30 | 94.79 | 54.10938 | 55.07438 | 91.41 | 84.38 | 88.47734 |
| 60 | 91.15 | 78.38813 | 56.7725 | 55.12336 | 91.15 | 88.47734 | |
| 90 | 59.38 | 56.90313 | 89.59 | 87.5 | 95.31 | 65.73648 | |
|
| |||||||
| QDA | 30 | 52.7425 | 54.16 | 87.11 | 72.65625 | 86.72 | 95.83 |
| 60 | 77.085 | 83.86 | 88.545 | 54.10938 | 73.31953 | 91.67 | |
| 90 | 98.96 | 84.38 | 84.38 | 52.135 | 82.95589 | 91.40625 | |
|
| |||||||
| Average | 73.43725 | 73.94254 | 78.33329 | 61.42191 | 69.05284 | 82.86558 | |
Performance analysis of classifiers in terms of PI with six optimization techniques for different gene selection methods using 30-60-90 selected genes.
| Classifiers | Gene selection | Optimization techniques | |||||
|---|---|---|---|---|---|---|---|
| Invasive Weed Optimization | Teaching Learning-Based Optimization | League Championship Optimization | Beetle Antennae Search Optimization | Crow Search Optimization | Fruit fly Optimization | ||
| RF | 30 | 60.105 | 85.7 | 78.4275 | 8.64875 | 8.64875 | 30.28875 |
| 60 | 58.02563 | 86.37 | 84.66125 | 6.27625 | 76.62938 | 65.66844 | |
| 90 | 6.511875 | 43.225 | 78.93 | 7.69 | 25.80313 | 83.35969 | |
|
| |||||||
| Adaboost | 30 | 54.54 | 59.85 | 78.465 | 5.805 | 4.39125 | 73.49922 |
| 60 | 86.37 | 85.18063 | 90.78 | 18.14688 | 13.08297 | 60.87 | |
| 90 | 81.4325 | 48.1615 | 60.105 | 9.367813 | 4.8625 | 70.92234 | |
|
| |||||||
| LR | 30 | 37.86063 | 77.17125 | 87.71 | 16.64625 | 20.29063 | 86.7675 |
| 60 | 12.48375 | 63.30375 | 77.925 | 9.487656 | 4.744688 | 97.28125 | |
| 90 | 74.32875 | 78.93 | 13.86195 | 84.315 | 59.34 | 83.59028 | |
|
| |||||||
| DT | 30 | 94.49 | 15.18023 | 18.36125 | 91.18 | 81.4325 | 87.07812 |
| 60 | 90.78 | 76.62938 | 23.8125 | 18.52203 | 90.78 | 87.07812 | |
| 90 | 31.4425 | 24.21063 | 88.38 | 85.7 | 95.07 | 45.81047 | |
|
| |||||||
| QDA | 30 | 10.32656 | 15.36 | 85.18063 | 62.3175 | 84.66125 | 95.65 |
| 60 | 72.795 | 80.72125 | 87.04 | 15.18023 | 59.64609 | 91.58 | |
| 90 | 98.935 | 81.4325 | 81.4325 | 8.169375 | 74.04878 | 90.64688 | |
|
| |||||||
| Average | 58.02848 | 61.42841 | 69.00484 | 29.83018 | 46.89546 | 76.67274 | |
Performance analysis of classifiers in terms of GDR with six optimization techniques for different gene selection methods using 30-60 – 90 selected genes.
| Classifiers | Gene selection | Optimization techniques | |||||
|---|---|---|---|---|---|---|---|
| Invasive Weed Optimization | Teaching Learning-Based Optimization | League Championship Optimization | Beetle Antennae Search Optimization | Crow Search Optimization | Fruit fly Optimization | ||
| RF | 30 | 71.48527 | 85.71429 | 79.69 | 52.27 | 52.27 | 59.89258 |
| 60 | 70.44469 | 86.39268 | 86.72 | 51.625 | 78.38813 | 74.96797 | |
| 90 | 51.6875 | 63.80399 | 79.17 | 52 | 57.42563 | 85.02938 | |
|
| |||||||
| Adaboost | 30 | 54.54545 | 71.35536 | 82.295 | 51.5 | 51.125 | 79.93242 |
| 60 | 88.0225 | 87.11 | 91.15 | 55.00906 | 13.1496 | 71.875 | |
| 90 | 84.38 | 65.88466 | 71.48527 | 9.423984 | 51.25 | 77.24688 | |
|
| |||||||
| LR | 30 | 61.72 | 80.99 | 89.0675 | 16.68824 | 20.34483 | 86.98345 |
| 60 | 12.55858 | 73.50344 | 80.21 | 9.546483 | 4.758999 | 97.32657 | |
| 90 | 71.1444 | 73.68953 | 13.91238 | 86.46 | 71.09555 | 88.07165 | |
|
| |||||||
| DT | 30 | 94.79474 | 54.10938 | 18.42734 | 91.41 | 84.38 | 88.47734 |
| 60 | 90.29073 | 78.38813 | 23.85838 | 55.12336 | 91.15 | 88.47734 | |
| 90 | 59.38 | 56.90313 | 89.59 | 87.5 | 95.31715 | 65.73803 | |
|
| |||||||
| QDA | 30 | 52.7425 | 54.16 | 87.11 | 72.65625 | 84.68635 | 95.83958 |
| 60 | 77.085 | 83.86 | 87.06308 | 54.10938 | 73.3201 | 91.67 | |
| 90 | 98.96 | 84.38 | 84.38 | 52.135 | 82.9582 | 91.40911 | |
|
| |||||||
| Average | 69.28276 | 73.34964 | 70.94193 | 53.16378 | 60.77463 | 82.86249 | |
Average performance analysis of classifiers in terms of parameters with average to six optimization techniques for different gene selection methods using 30-60-90 selected genes.
| Classifiers | Gene selection | Parameters (%) | |||
|---|---|---|---|---|---|
| Accuracy | Perfect Classification | Performance Index | GDR | ||
| RF | 30 | 83.59433 | 67.18491 | 45.30313 | 66.88702 |
| 60 | 87.60384 | 75.02862 | 62.93849 | 74.75641 | |
| 90 | 82.42909 | 64.85208 | 40.91995 | 64.85275 | |
|
| |||||
| Adaboost | 30 | 83.87583 | 67.49332 | 46.09174 | 65.12554 |
| 60 | 87.02927 | 74.44755 | 59.07174 | 67.71936 | |
| 90 | 83.56061 | 67.12004 | 45.80861 | 59.94513 | |
|
| |||||
| LR | 30 | 85.74922 | 71.49516 | 54.40771 | 59.299 |
| 60 | 84.01784 | 68.03036 | 44.20435 | 46.31734 | |
| 90 | 88.0159 | 76.02367 | 65.72766 | 67.39558 | |
|
| |||||
| DT | 30 | 89.02009 | 78.04018 | 64.62035 | 71.93313 |
| 60 | 88.21978 | 76.84355 | 64.60034 | 71.21466 | |
| 90 | 87.8683 | 75.7366 | 61.76893 | 75.73805 | |
|
| |||||
| QDA | 30 | 87.4349 | 74.86979 | 58.91599 | 74.53245 |
| 60 | 88.93083 | 78.09815 | 67.8271 | 77.85126 | |
| 90 | 91.22996 | 82.36952 | 72.44417 | 82.37038 | |
Figure 2Performance analysis of accuracy in various classifiers under six different optimization methods for 30-60-90 gene selected in colon cancer.
Figure 3Average performance analysis of classifier parameters.