| Literature DB >> 25162043 |
Jaison Bennet1, Chilambuchelvan Arul Ganaprakasam1, Kannan Arputharaj2.
Abstract
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.Entities:
Mesh:
Year: 2014 PMID: 25162043 PMCID: PMC4138760 DOI: 10.1155/2014/195470
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
m samples.
| Gene ID | Sample 1 | Sample 2 |
| Sample |
|---|---|---|---|---|
| Gene 1 | −1.2 | −2.1 | 3.0 | 2.9 |
| Gene 2 | 2.7 | 0.2 | −1.1 | 1.6 |
|
|
|
|
|
|
| Gene | −2.9 | −1.9 | 2.6 | −2.1 |
Figure 1Block diagram of the proposed system.
Algorithm 1Feature selection method by DWT using scaling coefficient and detailed coefficient.
Summary of datasets used for our experimental study.
| Dataset | Number of features | Number of samples | Number of normal cases | Number of cancer cases |
|---|---|---|---|---|
| Breast | 24481 | 97 | 51 | 46 |
| Colon | 1909 | 62 | 22 | 40 |
| Ovarian | 15154 | 253 | 91 | 162 |
| CNS | 7129 | 60 | 39 | 21 |
| Leukemia | 7129 | 72 | 25 | 47 |
Performance analysis of colon dataset using db7 with variations in window sizes and levels.
| Classifier | Window size | Classification accuracy (%) | |||||
|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Level 6 | ||
| KNN | 64 |
| 75.42 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 71.19 | 60.06 | 72.75 | 0.00 | 0.00 | 0.00 | |
| 256 | 65.74 | 53.51 | 62.72 | 53.62 | 0.00 | 0.00 | |
| 512 | 75.07 | 41.28 | 70.96 | 45.62 | 68.41 | 0.00 | |
| 1024 | 64.41 | 54.84 | 58.84 | 64.52 | 50.72 | 47.94 | |
|
| |||||||
| Bayes | 64 | 68.52 | 59.07 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 52.06 | 49.39 | 60.29 | 0.00 | 0.00 | 0.00 | |
| 256 | 54.96 | 49.16 | 56.17 | 50.72 | 0.00 | 0.00 | |
| 512 | 60.41 | 61.51 | 42.61 | 46.49 |
| 0.00 | |
| 1024 | 67.07 | 50.96 | 64.17 | 43.71 | 54.72 | 59.19 | |
|
| |||||||
| SVM | 64 |
| 69.97 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 60.52 | 55.07 | 73.97 | 0.00 | 0.00 | 0.00 | |
| 256 | 65.97 | 52.06 | 67.30 | 63.30 | 0.00 | 0.00 | |
| 512 | 64.41 | 64.17 | 58.61 | 38.26 | 56.29 | 0.00 | |
| 1024 | 71.30 | 41.04 | 71.07 | 35.48 | 53.51 | 56.64 | |
|
| |||||||
| Hybrid | 64 | 96.77 | 95.16 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 98.39 | 96.77 | 96.77 | 0.00 | 0.00 | 0.00 | |
| 256 | 85.48 | 85.48 | 83.87 | 82.26 | 0.00 | 0.00 | |
| 512 |
| 69.35 | 98.39 | 59.68 | 80.65 | 0.00 | |
| 1024 | 75.81 | 75.81 | 70.97 | 95.16 | 54.84 | 69.35 | |
Performance analysis of breast dataset using db7 with variations in window sizes and levels.
| Classifier | Window size | Classification accuracy (%) | |||||
|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Level 6 | ||
| KNN | 64 | 52.22 | 52.22 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 |
| 55.56 | 61.11 | 0.00 | 0.00 | 0.00 | |
| 256 | 55.56 | 61.11 | 47.78 | 57.78 | 0.00 | 0.00 | |
| 512 | 50.00 | 52.22 | 47.78 | 48.89 | 57.78 | 0.00 | |
| 1024 | 50.00 | 58.89 | 41.11 | 54.44 | 52.22 | 58.89 | |
|
| |||||||
| Bayes | 64 | 46.67 | 52.22 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 47.78 | 47.78 | 50.00 | 0.00 | 0.00 | 0.00 | |
| 256 | 54.44 |
| 53.33 | 48.89 | 0.00 | 0.00 | |
| 512 | 51.11 | 53.33 | 53.33 | 51.11 | 54.44 | 0.00 | |
| 1024 | 50.00 | 54.44 | 51.11 | 51.11 | 54.44 | 53.33 | |
|
| |||||||
| SVM | 64 | 56.67 | 52.22 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 60.00 | 50.00 |
| 0.00 | 0.00 | 0.00 | |
| 256 | 51.11 | 58.89 | 61.11 | 57.78 | 0.00 | 0.00 | |
| 512 | 66.67 | 58.89 | 60.00 | 55.56 | 51.11 | 0.00 | |
| 1024 | 54.44 | 60.00 | 48.89 | 42.22 | 53.33 | 52.22 | |
|
| |||||||
| Hybrid | 64 | 88.66 | 89.69 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 100.00 | 90.72 | 100.00 | 0.00 | 0.00 | 0.00 | |
| 256 | 82.47 | 93.81 | 78.35 | 93.81 | 0.00 | 0.00 | |
| 512 | 76.29 | 87.63 | 83.51 | 85.57 | 96.91 | 0.00 | |
| 1024 | 92.78 | 82.47 | 56.70 |
| 79.38 | 82.47 | |
Performance analysis of CNS dataset using rbio2.2 with variations in window sizes and levels.
| Classifier | Window size | Classification accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Level 6 | Level 7 | ||
| KNN | 64 | 57.46 | 55.41 | 60.67 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 58.11 | 54.68 | 53.29 | 58.55 | 0.00 | 0.00 | 0.00 | |
| 256 | 55.41 | 63.74 | 47.73 | 59.94 | 58.92 | 0.00 | 0.00 | |
| 512 | 64.11 | 54.39 | 48.76 | 60.96 |
| 62.72 | 0.00 | |
| 1024 | 59.28 | 55.77 | 59.58 | 48.76 | 62.72 | 54.68 | 50.88 | |
|
| ||||||||
| Bayes | 64 | 50.15 | 63.01 | 67.91 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 60.23 | 57.82 | 69.30 | 67.62 | 0.00 | 0.00 | 0.00 | |
| 256 | 54.31 | 51.54 | 49.12 |
| 56.43 | 0.00 | 0.00 | |
| 512 | 59.21 | 59.94 | 53.29 | 53.29 | 59.21 | 59.94 | 0.00 | |
| 1024 | 54.02 | 48.46 | 62.72 | 50.88 | 58.55 | 54.68 | 55.77 | |
|
| ||||||||
| SVM | 64 | 62.72 | 63.74 | 65.20 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 |
| 64.11 | 57.82 | 59.94 | 0.00 | 0.00 | 0.00 | |
| 256 | 59.58 | 52.63 | 47.00 | 64.47 | 51.61 | 0.00 | 0.00 | |
| 512 | 51.24 | 48.76 | 57.82 | 45.98 | 60.60 | 49.49 | 0.00 | |
| 1024 | 49.49 | 57.09 | 67.25 | 48.39 | 69.37 | 47.73 | 39.69 | |
|
| ||||||||
| Hybrid | 64 | 100.00 | 90.00 | 95.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 100.00 | 96.67 | 86.67 | 75.00 | 0.00 | 0.00 | 0.00 | |
| 256 | 96.67 | 96.67 | 95.00 | 90.00 | 90.00 | 0.00 | 0.00 | |
| 512 | 93.33 | 78.33 | 81.67 | 98.33 |
| 98.33 | 0.00 | |
| 1024 | 90.00 | 70.00 | 86.67 | 73.33 | 73.33 | 86.67 | 70.00 | |
Performance analysis of leukemia set using rbio2.2 with variations in window sizes and levels.
| Classifier | Window size | Classification accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Level 6 | Level 7 | ||
| KNN | 64 | 81.56 | 83.86 | 88.21 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 81.26 | 82.41 | 87.61 | 83.56 | 0.00 | 0.00 | 0.00 | |
| 256 | 68.57 | 62.52 | 74.06 | 81.56 |
| 0.00 | 0.00 | |
| 512 | 63.97 | 74.91 | 62.22 | 61.97 | 88.21 | 89.36 | 0.00 | |
| 1024 | 60.22 | 75.51 | 60.17 | 83.01 | 59.37 | 85.31 | 56.47 | |
|
| ||||||||
| Bayes | 64 |
| 91.35 | 86.46 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 81.01 | 85.31 | 90.75 | 86.46 | 0.00 | 0.00 | 0.00 | |
| 256 | 70.91 | 92.50 | 77.81 | 80.11 | 87.61 | 0.00 | 0.00 | |
| 512 | 83.01 | 72.86 | 69.17 | 68.62 | 83.86 | 91.35 | 0.00 | |
| 1024 | 64.27 | 81.86 | 65.97 | 87.91 | 61.12 | 85.61 | 61.37 | |
|
| ||||||||
| SVM | 64 | 91.35 | 87.31 |
| 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 87.06 | 78.41 | 92.50 | 85.31 | 0.00 | 0.00 | 0.00 | |
| 256 | 72.91 | 85.86 | 73.46 | 77.51 | 68.62 | 0.00 | 0.00 | |
| 512 | 74.91 | 75.21 | 63.72 | 63.72 | 83.61 | 84.16 | 0.00 | |
| 1024 | 67.72 | 80.71 | 61.92 | 82.71 | 64.57 | 85.31 | 59.37 | |
|
| ||||||||
| Hybrid | 64 | 98.61 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 100.00 | 90.28 | 98.61 | 97.22 | 0.00 | 0.00 | 0.00 | |
| 256 | 95.83 | 95.83 | 90.28 | 100.00 | 100.00 | 0.00 | 0.00 | |
| 512 | 93.06 | 95.83 | 93.06 | 79.17 |
| 97.22 | 0.00 | |
| 1024 | 76.39 | 88.89 | 79.17 | 93.06 | 83.33 | 94.44 | 76.39 | |
Performance analysis of ovarian dataset using sym2 with variations in window sizes and levels.
| Classifier | Window size | Classification accuracy (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | Level 4 | Level 5 | Level 6 | Level 7 | Level 8 | ||
| KNN | 64 | 91.31 |
| 91.36 | 91.04 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 94.08 | 87.99 | 87.66 | 70.92 | 85.49 | 0.00 | 0.00 | 0.00 | |
| 256 | 88.64 | 73.05 | 82.50 | 71.84 | 67.43 | 80.61 | 0.00 | 0.00 | |
| 512 | 84.62 | 53.85 | 74.50 | 64.07 | 65.98 | 62.61 | 72.88 | 0.00 | |
| 1024 | 68.31 | 69.62 | 65.81 | 60.75 | 78.27 | 65.21 | 57.23 | 61.96 | |
|
| |||||||||
| Bayes | 64 | 89.35 |
| 90.98 | 91.69 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 80.07 | 82.78 | 88.32 | 93.10 | 89.95 | 0.00 | 0.00 | 0.00 | |
| 256 | 82.45 | 83.91 | 83.53 | 79.08 | 87.54 | 84.79 | 0.00 | 0.00 | |
| 512 | 83.76 | 80.47 | 82.83 | 74.67 | 78.96 | 73.74 | 75.00 | 0.00 | |
| 1024 | 71.73 | 69.36 | 79.95 | 63.85 | 84.41 | 70.82 | 69.03 | 67.77 | |
|
| |||||||||
| SVM | 64 | 90.00 | 93.75 | 92.72 | 83.65 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 85.01 | 90.60 | 86.03 |
| 90.33 | 0.00 | 0.00 | 0.00 | |
| 256 | 80.82 | 83.33 | 88.97 | 81.74 | 86.68 | 78.31 | 0.00 | 0.00 | |
| 512 | 72.52 | 82.98 | 76.63 | 70.82 | 76.36 | 77.98 | 73.04 | 0.00 | |
| 1024 | 61.20 | 60.31 | 80.35 | 69.12 | 81.86 | 67.22 | 60.49 | 62.17 | |
|
| |||||||||
| Hybrid | 64 | 98.81 |
| 95.26 | 98.81 | 0.00 | 0.00 | 0.00 | 0.00 |
| 128 | 99.21 | 95.65 | 98.81 | 96.44 | 94.47 | 0.00 | 0.00 | 0.00 | |
| 256 | 99.21 | 94.47 | 89.33 | 87.35 | 91.30 | 93.68 | 0.00 | 0.00 | |
| 512 | 96.84 | 86.96 | 92.09 | 87.75 | 88.14 | 81.82 | 85.77 | 0.00 | |
| 1024 | 84.98 | 84.98 | 85.38 | 72.73 | 89.72 | 79.45 | 81.03 | 84.98 | |
Parameter settings to achieve 100% classification accuracy.
| Dataset | Wavelet | Level | Best window size | Number of features used |
|---|---|---|---|---|
| Breast | db7 | 4 | 1024 | 24 |
| Colon | db7 | 1 | 512 | 4 |
| Ovarian | sym2, bior2.2, rbio2.2 | 2 | 64 | 237 |
| CNS | rbio2.2, sym2 | 5 | 512 | 14 |
| Leukemia | rbio2.2 | 5 | 512 | 14 |
Figure 2Classification accuracy of the proposed system for breast microarray dataset using window size of 1024.
Figure 3Classification accuracy of the proposed system for colon microarray dataset using window size of 512.
Figure 4Classification accuracy of the proposed system for ovarian microarray dataset using window size of 64.
Figure 5Classification accuracy of the proposed system for CNS microarray dataset using window size of 512.
Figure 6Classification accuracy of the proposed system for Leukemia dataset using window size of 512.