| Literature DB >> 35872866 |
M Kalaiyarasi1, Harikumar Rajaguru1.
Abstract
The most common gynecologic cancer, behind cervical and uterine, is ovarian cancer. Ovarian cancer is a severe concern for women. Abnormal cells form and spread throughout the body. Ovarian cancer microarray data can diagnose and prognosis. Typically, ovarian cancer microarray data contains tens of thousands of genes. In order to reduce computational complexity, selecting the most critical genes or attributes in the entire dataset is necessary. Because microarray datasets have limited samples and many characteristics, classifier detection lags. So, dimensionality reduction measures are essential to protect disease classification genes. In this research, initially the ANOVA method is used for gene selection and then two clustering-based and three transform-based feature extraction methods, namely, Fuzzy C Means, Softmax Discriminant Algorithm (SDA), Hilbert Transform, Fast Fourier Transform (FFT), and Discrete Cosine Transform (DCT), respectively, are used to select relevant genes further. Six classifiers further classify the features as normal and abnormal. The NLR classifier gives the highest accuracy for SDA features at 92%, and KNN gives the lowest accuracy of 55% for SDA, Hilbert, and DCT features. With correlation distance feature selection, the NLR classifier attains the lowest accuracy of 53%, and the highest accuracy of 88% is obtained by the GMM classifier.Entities:
Mesh:
Year: 2022 PMID: 35872866 PMCID: PMC9307352 DOI: 10.1155/2022/6750457
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.246
Figure 1Schematic representation of the methodology of the classification for ovarian and normal tissues from microarray gene data.
Figure 2Volcano plot for ovarian and normal microarray gene data.
Figure 3FCM feature histogram for normal and ovarian cancer microarray gene data.
Figure 4Scatter plot of ovarian cancer and normal to FCM feature output.
Figure 5Normal probability plot for ovarian cancer of SDA feature output.
Statistical parameters of clustering- and transform-based feature extraction for ovarian and normal data.
| Statistical parameters | FCM | SDA | Hilbert | FFT | DCT | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ovarian | Normal | Ovarian | Normal | Ovarian | Normal | Ovarian | Normal | Ovarian | Normal | |
| Mean | 0.60958 | 0.61855 | 12.09748 | 0.21740 | 0.49855 | 0.49357 | 0.00810 | 0.00734 | 0.00731 | 0.00663 |
| Std dev | 0.0722 | 0.06839 | 7.56472 | 0.15279 | 0.09348 | 0.08900 | 0.01736 | 0.01736 | 0.01741 | 0.01740 |
| Variance | 0.0052 | 0.00470 | 1.14519 | 0.02336 | 0.00875 | 0.00794 | 0.00030 | 0.00030 | 0.00030 | 0.00030 |
| Skewness | 1.8415 | 1.98612 | 75.87783 | 1.69340 | 1.39834 | 1.62165 | 56.62020 | 56.73028 | 56.23927 | 56.42410 |
| Kurtosis | 4.4688 | 5.03687 | 122.1244 | 2.92235 | 2.89985 | 3.65340 | 3236.49200 | 3244.93585 | 3207.17014 | 3221.32183 |
| Pearson | 0.9653 | 0.94569 | 46.05714 | 0.91892 | 0.94541 | 0.91843 | 0.99891 | 0.99861 | 0.99807 | 0.99757 |
|
| 0.0214 | 0.01497 | 1.0306 | 0.00714 | 0.00389 | 0.00184 | 0.00341 | 0.00005 | 0.01034 | 0.00226 |
| Sample entropy | 9.3663 | 9.36632 | 582.756 | 11.638 | 11.688 | 11.68825 | 10.68857 | 10.68849 | 11.68772 | 11.68772 |
Performance metrics of the classifiers.
| Performance metrics | Description of metrics | Derived from confusion matrix |
|---|---|---|
| Accuracy | Average of no of samples identified as positive to no. of samples identified as negative |
|
| Precision | From all correct predictions, accurately predicted. |
|
| MSE | Average of the squared error |
|
| F1 score | Mean of precision and recall to get classification accuracy for a specific class |
|
| Mathews correlation coefficient | Pearson correlation between true and attained output |
|
| Fowlkes mallows index | Measure of similarity between clustering |
|
| Error rate | Based on the number of observations, the sum of all inaccurate predictions. |
|
| Jaccard metric | The predicted real positives outnumbered the actual positives, whether they happened to be real or predicted. |
|
| Classification success index | Averaging the class-specific symmetric measure of overall class |
|
Average MSE and confusion matrix for normal and ovarian data without feature selection.
| Feature extraction | Classifiers | TP | TN | FP | FN | MSE |
|---|---|---|---|---|---|---|
| FCM | GMM | 44 | 26 | 24 | 6 | 1.82e-04 |
| Detrend FA | 48 | 26 | 24 | 2 | 2.00e-04 | |
| NLR | 47 | 26 | 24 | 3 | 2.10e-04 | |
| BDLC | 39 | 27 | 23 | 11 | 1.06e-04 | |
| LR | 40 | 27 | 23 | 10 | 3.03e-05 | |
| KNN | 34 | 26 | 24 | 16 | 1.82e-04 | |
|
| ||||||
| SDA | GMM | 42 | 43 | 7 | 8 | 4.62E-06 |
| Detrend FA | 46 | 40 | 10 | 4 | 6.98E-06 | |
| NLR | 48 | 44 | 6 | 2 | 2.38E-06 | |
| BDLC | 35 | 26 | 24 | 15 | 2.06E-04 | |
| LR | 36 | 26 | 24 | 14 | 2.34E-04 | |
| KNN | 28 | 27 | 23 | 22 | 1.75E-04 | |
|
| ||||||
| Hilbert Transform | GMM | 30 | 27 | 23 | 20 | 1.85E-04 |
| Detrend FA | 47 | 26 | 24 | 3 | 2.01E-04 | |
| NLR | 48 | 26 | 24 | 2 | 2.10E-04 | |
| BDLC | 37 | 40 | 10 | 13 | 1.80E-05 | |
| LR | 33 | 42 | 8 | 17 | 2.67E-05 | |
| KNN | 28 | 27 | 23 | 22 | 1.68E-04 | |
|
| ||||||
| FFT | GMM | 33 | 42 | 8 | 17 | 2.77E-05 |
| Detrend FA | 45 | 28 | 22 | 5 | 7.31E-05 | |
| NLR | 45 | 29 | 21 | 5 | 4.33E-05 | |
| BDLC | 34 | 34 | 16 | 16 | 3.97E-05 | |
| LR | 29 | 29 | 21 | 21 | 7.87E-05 | |
| KNN | 26 | 41 | 9 | 24 | 1.34E-04 | |
|
| ||||||
| DCT | GMM | 36 | 41 | 9 | 14 | 1.73E-05 |
| Detrend FA | 45 | 37 | 13 | 5 | 1.19E-05 | |
| NLR | 46 | 29 | 21 | 4 | 3.59E-05 | |
| BDLC | 37 | 36 | 14 | 13 | 2.68E-05 | |
| LR | 31 | 29 | 21 | 19 | 6.92E-05 | |
| KNN | 29 | 26 | 24 | 21 | 2.21E-04 | |
Performance measures of classifiers for normal and ovarian data without feature selection.
| Feature extraction | Classifiers | Performance measures | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | F1Score | MCC | FM | Error rate | Jaccard metric | CSI | ||
| FCM | GMM | 70 | 64.706 | 74.576 | 0.429 | 0.755 | 30 | 59.459 | 52.706 |
| Detrend FA | 74 | 66.667 | 78.689 | 0.535 | 0.800 | 26 | 64.865 | 62.667 | |
| NLR | 73 | 66.197 | 77.686 | 0.507 | 0.789 | 27 | 63.514 | 60.197 | |
| BDLC | 66 | 62.903 | 69.643 | 0.330 | 0.700 | 34 | 53.425 | 40.903 | |
| LR | 67 | 63.492 | 70.796 | 0.352 | 0.713 | 33 | 54.795 | 43.492 | |
| KNN | 60 | 58.621 | 62.963 | 0.203 | 0.631 | 40 | 45.946 | 26.621 | |
|
| |||||||||
| SDA | GMM | 85 | 85.714 | 84.848 | 0.700 | 0.849 | 15 | 73.684 | 69.714 |
| Detrend FA | 86 | 82.143 | 86.792 | 0.725 | 0.869 | 14 | 76.667 | 74.143 | |
| NLR | 92 | 88.889 | 92.308 | 0.843 | 0.924 | 8 | 85.714 | 84.889 | |
| BDLC | 61 | 59.322 | 64.220 | 0.224 | 0.644 | 39 | 47.297 | 29.322 | |
| LR | 62 | 60.000 | 65.455 | 0.245 | 0.657 | 38 | 48.649 | 32.000 | |
| KNN | 55 | 54.902 | 55.446 | 0.100 | 0.554 | 45 | 38.356 | 10.902 | |
|
| |||||||||
| Hilbert Transform | GMM | 57 | 56.604 | 58.252 | 0.140 | 0.583 | 43 | 41.096 | 16.604 |
| Detrend FA | 73 | 66.197 | 77.686 | 0.507 | 0.789 | 27 | 63.514 | 60.197 | |
| NLR | 74 | 66.667 | 78.689 | 0.535 | 0.800 | 26 | 64.865 | 62.667 | |
| BDLC | 77 | 78.723 | 76.289 | 0.541 | 0.763 | 23 | 61.667 | 52.723 | |
| LR | 75 | 80.488 | 72.527 | 0.508 | 0.729 | 25 | 56.897 | 46.488 | |
| KNN | 55 | 54.902 | 55.446 | 0.100 | 0.554 | 45 | 38.356 | 10.902 | |
|
| |||||||||
| FFT | GMM | 75 | 80.488 | 72.527 | 0.508 | 0.729 | 25 | 56.897 | 46.488 |
| Detrend FA | 73 | 67.164 | 76.923 | 0.489 | 0.777 | 27 | 62.500 | 57.164 | |
| NLR | 74 | 68.182 | 77.586 | 0.507 | 0.783 | 26 | 63.380 | 58.182 | |
| BDLC | 68 | 68.000 | 68.000 | 0.360 | 0.680 | 32 | 51.515 | 36.000 | |
| LR | 58 | 58.000 | 58.000 | 0.160 | 0.580 | 42 | 40.845 | 16.000 | |
| KNN | 67 | 74.286 | 61.176 | 0.356 | 0.622 | 33 | 44.068 | 26.286 | |
|
| |||||||||
| DCT | GMM | 77 | 80.000 | 75.789 | 0.543 | 0.759 | 23 | 61.017 | 52.000 |
| Detrend FA | 82 | 77.586 | 83.333 | 0.648 | 0.836 | 18 | 71.429 | 67.586 | |
| NLR | 75 | 68.657 | 78.632 | 0.532 | 0.795 | 25 | 64.789 | 60.657 | |
| BDLC | 73 | 72.549 | 73.267 | 0.460 | 0.733 | 27 | 57.813 | 46.549 | |
| LR | 60 | 59.615 | 60.784 | 0.200 | 0.608 | 40 | 43.662 | 21.615 | |
| KNN | 55 | 54.717 | 56.311 | 0.100 | 0.563 | 45 | 39.189 | 12.717 | |
Figure 6Accuracy analyses without feature selection by correlation distance of classifiers.
Average MSE and confusion matrix for normal and ovarian data with correlation distance feature selection.
| Feature extraction | Classifiers | TP | TN | FP | FN | MSE |
|---|---|---|---|---|---|---|
| FCM | GMM | 30 | 36 | 14 | 20 | 4.58E-05 |
| Detrend FA | 28 | 41 | 9 | 22 | 5.41E-05 | |
| NLR | 30 | 33 | 17 | 20 | 5.54E-05 | |
| BDLC | 36 | 29 | 21 | 14 | 4.82E-05 | |
| LR | 44 | 33 | 17 | 6 | 2.36E-05 | |
| KNN | 29 | 27 | 23 | 21 | 0.000125 | |
|
| ||||||
| SDA | GMM | 42 | 34 | 16 | 8 | 2.45E-05 |
| Detrend FA | 41 | 27 | 23 | 9 | 9.06E-05 | |
| NLR | 26 | 27 | 23 | 24 | 0.000262 | |
| BDLC | 43 | 29 | 21 | 7 | 4.18E-05 | |
| LR | 34 | 29 | 21 | 16 | 5.55E-05 | |
| KNN | 28 | 27 | 23 | 22 | 0.000142 | |
|
| ||||||
| Hilbert Transform | GMM | 29 | 34 | 16 | 21 | 5.53E-05 |
| Detrend FA | 49 | 27 | 23 | 1 | 0.000105 | |
| NLR | 44 | 35 | 15 | 6 | 1.71E-05 | |
| BDLC | 45 | 37 | 13 | 5 | 1.3E-05 | |
| LR | 44 | 26 | 24 | 6 | 0.000192 | |
| KNN | 33 | 37 | 13 | 17 | 3.43E-05 | |
|
| ||||||
| FFT | GMM | 29 | 42 | 8 | 21 | 4.53E-05 |
| Detrend FA | 45 | 41 | 9 | 5 | 6.42E-06 | |
| NLR | 35 | 27 | 23 | 15 | 9.02E-05 | |
| BDLC | 29 | 27 | 23 | 21 | 0.000163 | |
| LR | 35 | 26 | 24 | 15 | 0.000187 | |
| KNN | 27 | 31 | 19 | 23 | 0.000115 | |
|
| ||||||
| DCT | GMM | 46 | 42 | 8 | 4 | 4.64E-06 |
| Detrend FA | 46 | 27 | 23 | 4 | 8.89E-05 | |
| NLR | 45 | 28 | 22 | 5 | 7.92E-05 | |
| BDLC | 37 | 36 | 14 | 13 | 2.28E-05 | |
| LR | 34 | 42 | 8 | 16 | 2.42E-05 | |
| KNN | 28 | 33 | 17 | 22 | 6.8E-05 | |
Performance measures of classifiers for ovarian data with feature selection.
| Feature extraction | Classifiers | Performance measures | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | F1 score | MCC | FM | Error rate | Jaccard metric | CSI | ||
| FCM | GMM | 66 | 68.182 | 63.830 | 0.322 | 0.640 | 34 | 46.875 | 28.182 |
| Detrend FA | 69 | 75.676 | 64.368 | 0.394 | 0.651 | 31 | 47.458 | 31.676 | |
| NLR | 63 | 63.830 | 61.856 | 0.260 | 0.619 | 37 | 44.776 | 23.830 | |
| BDLC | 65 | 63.158 | 67.290 | 0.303 | 0.674 | 35 | 50.704 | 35.158 | |
| LR | 77 | 72.131 | 79.279 | 0.554 | 0.797 | 23 | 65.672 | 60.131 | |
| KNN | 56 | 55.769 | 56.863 | 0.120 | 0.569 | 44 | 39.726 | 13.769 | |
|
| |||||||||
| SDA | GMM | 76 | 72.414 | 77.778 | 0.527 | 0.780 | 24 | 63.636 | 56.414 |
| Detrend FA | 68 | 64.063 | 71.930 | 0.375 | 0.725 | 32 | 56.164 | 46.063 | |
| NLR | 53 | 53.061 | 52.525 | 0.060 | 0.525 | 47 | 35.616 | 5.061 | |
| BDLC | 72 | 67.188 | 75.439 | 0.458 | 0.760 | 28 | 60.563 | 53.188 | |
| LR | 63 | 61.818 | 64.762 | 0.261 | 0.648 | 37 | 47.887 | 29.818 | |
| KNN | 55 | 54.902 | 55.446 | 0.100 | 0.554 | 45 | 38.356 | 10.902 | |
|
| |||||||||
| Hilbert Transform | GMM | 63 | 64.444 | 61.053 | 0.261 | 0.611 | 37 | 43.939 | 22.444 |
| Detrend FA | 76 | 68.056 | 80.328 | 0.579 | 0.817 | 24 | 67.123 | 66.056 | |
| NLR | 79 | 74.576 | 80.734 | 0.590 | 0.810 | 21 | 67.692 | 62.576 | |
| BDLC | 82 | 77.586 | 83.333 | 0.648 | 0.836 | 18 | 71.429 | 67.586 | |
| LR | 70 | 64.706 | 74.576 | 0.429 | 0.755 | 30 | 59.459 | 52.706 | |
| KNN | 70 | 71.739 | 68.750 | 0.401 | 0.688 | 30 | 52.381 | 37.739 | |
|
| |||||||||
| FFT | GMM | 71 | 78.378 | 66.667 | 0.435 | 0.674 | 29 | 50.000 | 36.378 |
| Detrend FA | 86 | 83.333 | 86.538 | 0.722 | 0.866 | 14 | 76.271 | 73.333 | |
| NLR | 62 | 60.345 | 64.815 | 0.243 | 0.650 | 38 | 47.945 | 30.345 | |
| BDLC | 56 | 55.769 | 56.863 | 0.120 | 0.569 | 44 | 39.726 | 13.769 | |
| LR | 61 | 59.322 | 64.220 | 0.224 | 0.644 | 39 | 47.297 | 29.322 | |
| KNN | 58 | 58.696 | 56.250 | 0.161 | 0.563 | 42 | 39.130 | 12.696 | |
|
| |||||||||
| DCT | GMM | 88 | 85.185 | 88.462 | 0.762 | 0.885 | 12 | 79.310 | 77.185 |
| Detrend FA | 73 | 66.667 | 77.311 | 0.497 | 0.783 | 27 | 63.014 | 58.667 | |
| NLR | 73 | 67.164 | 76.923 | 0.489 | 0.777 | 27 | 62.500 | 57.164 | |
| BDLC | 73 | 72.549 | 73.267 | 0.460 | 0.733 | 27 | 57.813 | 46.549 | |
| LR | 76 | 80.952 | 73.913 | 0.527 | 0.742 | 24 | 58.621 | 48.952 | |
| KNN | 61 | 62.222 | 58.947 | 0.221 | 0.590 | 39 | 41.791 | 18.222 | |
Figure 7Accuracy analyses with feature selection by correlation distance of classifiers.