| Literature DB >> 26171415 |
Neha Sharma1, Hari Om2.
Abstract
In India, the oral cancers are usually presented in advanced stage of malignancy. It is critical to ascertain the diagnosis in order to initiate most advantageous treatment of the suspicious lesions. The main hurdle in appropriate treatment and control of oral cancer is identification and risk assessment of early disease in the community in a cost-effective fashion. The objective of this research is to design a data mining model using probabilistic neural network and general regression neural network (PNN/GRNN) for early detection and prevention of oral malignancy. The model is built using the oral cancer database which has 35 attributes and 1025 records. All the attributes pertaining to clinical symptoms and history are considered to classify malignant and non-malignant cases. Subsequently, the model attempts to predict particular type of cancer, its stage and extent with the help of attributes pertaining to symptoms, gross examination and investigations. Also, the model envisages anticipating the survivability of a patient on the basis of treatment and follow-up details. Finally, the performance of the PNN/GRNN model is compared with that of other classification models. The classification accuracy of PNN/GRNN model is 80% and hence is better for early detection and prevention of the oral cancer.Entities:
Mesh:
Year: 2015 PMID: 26171415 PMCID: PMC4485993 DOI: 10.1155/2015/234191
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Use of PNN/GRNN classification for early detection and prevention of oral cancer.
Figure 2Architecture of probabilistic neural network and general regression neural network.
Performance of PNN/GRNN model for classification of malignant and benign case for oral cancer.
| Performance estimation parameter | Performance |
|---|---|
| Positive/negative ratio | 3.0837 |
| Accuracy | 99.02% |
| True positive (TP) | 75.02% |
| True negative (TN) | 24.00% |
| False positive (FP) | 0.49% |
| False negative (FN) | 0.49% |
| Sensitivity | 99.35% |
| Specificity | 98.01% |
| Geometric mean of sensitivity-specificity | 98.68% |
| Positive predictive value (PPV) | 99.35% |
| Negative predictive value (NPV) | 98.01% |
| Geometric mean of PPV and NPV | 98.68% |
| Precision | 99.35% |
| Recall | 99.35% |
|
| 0.9935 |
| Area under ROC curve | 0.9974 |
Performance of PNN/GRNN model for classification of various types of oral cancer.
| Estimation parameter | Performance (in %) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Acantholytic | Adenocarcinoma | Basaloid | Lymphoepithelioma-like | Plaque-like | Sarcomatoid | SCC | Verrucous | Benign | |
| Accuracy | 99.61 | 99.41 | 99.71 | 99.90 | 99.90 | 99.61 | 99.95 | 95.61 | 100 |
| True positive (TP) | 0.10 | 0.10 | 0.10 | 0.00 | 0.00 | 0.10 | 56.20 | 1.37 | 35.90 |
| True negative (TN) | 99.51 | 99.32 | 99.61 | 99.90 | 99.90 | 99.51 | 37.76 | 94.24 | 64.10 |
| False positive (FP) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.05 | 0.10 | 0.00 |
| False negative (FN) | 0.39 | 0.59 | 0.29 | 0.10 | 0.10 | 0.39 | 0.00 | 4.29 | 0.00 |
| Sensitivity | 20.00 | 14.29 | 25 | 0.00 | 0.00 | 20.00 | 100 | 24.14 | 100 |
| Specificity | 100 | 100 | 100 | 100 | 100 | 100 | 86.19 | 99.90 | 100 |
| Geometric mean of sensitivity and specificity | 44.72 | 37.80 | 50 | 0.0 | 0.00 | 44.72 | 92.84 | 49 | 100 |
| Positive predictive value (PPV) | 100 | 100 | 100 | 0.00 | 99.90 | 100 | 90.28 | 93.33 | 100 |
| Negative predictive value (NPV) | 99.61 | 99.41 | 99.71 | 99.90 | 0.00 | 99.61 | 100 | 95.64 | 100 |
| Geometric mean of PPV and NPV | 99.80 | 99.71 | 99.85 | 0.00 | 0.00 | 99.80 | 95.02 | 94.48 | 100 |
| Precision | 100 | 100 | 100 | 0.00 | 0.00 | 100 | 90.28 | 93.33 | 100 |
| Recall | 20.00 | 14.29 | 25 | 0.00 | 0.00 | 20.00 | 100 | 24.14 | 100 |
|
| 0.33 | 0.25 | 0.40 | 0.00 | 0.00 | 0.33 | 0.94 | 0.38 | 1.00 |
Probability of occurrence of type of oral cancer using PNN/GRNN model.
| Diagnosis (type of tumour) | Probability |
|---|---|
| Squamous cell carcinoma (SCC) | 56.19% |
| Verrucous | 5.6% |
| Acantholytic | 0.48% |
| Basaloid | 0.39% |
| Adenocarcinoma | 0.68% |
| Sarcomatoid | 0.48% |
| Lymphoepithelioma-like | 0.09% |
| Plaque-like | 0.09% |
| Benign | 35.9% |
Performance of PNN/GRNN model for prediction of oral cancer stage.
| Estimation parameter | Performance (in %) | |||
|---|---|---|---|---|
| Stage I | Stage II | Stage IV | Stage N0 | |
| Accuracy | 99.90 | 86.93 | 86.83 | 100 |
| True positive (TP) | 0.00 | 7.90 | 43.02 | 35.90 |
| True negative (TN) | 99.90 | 79.02 | 43.80 | 64.10 |
| False positive (FP) | 0.00 | 0.00 | 13.17 | 0.00 |
| False negative (FN) | 0.10 | 13.07 | 0.00 | 0.00 |
| Sensitivity | 0.00 | 37.67 | 100.00 | 100.00 |
| Specificity | 100.00 | 100.00 | 76.88 | 100.00 |
| Geometric mean of sensitivity-specificity | 0.00 | 61.38 | 87.68 | 100.00 |
| Positive predictive value | 0.00 | 100.00 | 76.56 | 100.00 |
| Negative predictive value | 98.01 | 85.81 | 100.00 | 100.00 |
| Geometric mean of PPV and NPV | 0.00 | 92.62 | 87.50 | 100.00 |
| Precision | 0.00 | 100.00 | 76.56 | 100.00 |
| Recall | 0.00 | 37.67 | 100.00 | 100.00 |
|
| 0.00 | 0.547 | 0.867 | 1.00 |
Performance of PNN/GRNN model for predicting survivability of oral cancer patients.
| Performance estimation parameter | Performance |
|---|---|
| Positive/negative ratio | 0.674 |
| Accuracy | 69.95% |
| True positive (TP) | 36.68% |
| True negative (TN) | 33.27% |
| False positive (FP) | 26.44% |
| False negative (FN) | 3.61% |
| Sensitivity | 91.04% |
| Specificity | 55.72% |
| Geometric mean of sensitivity-specificity | 71.22% |
| Positive predictive value (PPV) | 58.11% |
| Negative predictive value (NPV) | 90.21% |
| Geometric mean of PPV and NPV | 72.41% |
| Precision | 58.11% |
| Recall | 91.04% |
|
| 0.709 |
| Area under ROC curve | 0.7491 |
Comparison of performance of classification models for training data.
| Estimation parameters | Linear | Decision | TreeBoost | MLP | CCNN | PNN/GRNN |
|---|---|---|---|---|---|---|
| Accuracy | 60.10% | 76.68% | 74.76% | 70.05% | 72.10% | 80.00% |
| True positive (TP) | 1.37% | 30.44% | 32.80% | 31.02% | 33.46% | 39.76% |
| True negative (TN) | 68.73% | 46.24% | 41.95% | 39.02% | 38.63% | 46.34% |
| False positive (FP) | 0.98% | 13.46% | 17.80% | 20.68% | 21.07% | 6.46% |
| False negative (FN) | 38.93% | 9.85% | 7.44% | 9.27% | 6.83% | 3.58% |
| Sensitivity | 3.39% | 75.54% | 81.52% | 77.00% | 83.05% | 92.78% |
| Specificity | 98.37% | 77.45% | 70.20% | 65.36% | 64.71% | 79.85% |
| Geometric mean of sensitivity and specificity | 18.26% | 76.49% | 75.65% | 70.94% | 73.31% | 80.55% |
| Positive predictive value (PPV) | 58.33% | 69.33% | 64.82% | 60.00% | 61.36% | 71.49% |
| Negative predictive value (NPV) | 60.14% | 82.43% | 84.94% | 80.81% | 84.98% | 90.79% |
| Geometric mean of PPV and NPV | 59.23% | 75.60% | 74.20% | 69.63% | 72.21% | 79.14% |
| Average gain for survival = | 1.25% | 1.26% | 1.369% | 1.28% | 1.31% | 1.40% |
| Average gain for survival = | 1.34% | 1.35% | 1.57% | 1.36% | 1.43% | 1.65% |
| Precision | 58.33% | 69.33% | 64.82% | 60.00% | 61.36% | 71.49% |
| Recall | 3.39% | 75.54% | 81.52% | 77.00% | 83.05% | 91.8% |
|
| 0.0641 | 0.7231 | 0.7221 | 0.6744 | 0.7058 | 0.7715 |
| Area under ROC curve | 0.722 | 0.835 | 0.8476 | 0.769 | 0.779 | 0.892 |
Comparison of performance of classification models for validation data.
| Estimation parameters | Linear | Decision | Decision | TreeBoost | MLP | CCNN | PNN/GRNN |
|---|---|---|---|---|---|---|---|
| Accuracy | 61.27% | 68.88% | 67.41% | 72.68% | 69.76% | 68.29% | 73.76% |
| True positive (TP) | 20.68% | 25.85% | 30.93% | 32.30% | 33.07% | 30.34% | 35.31% |
| True negative (TN) | 40.59% | 43.02% | 36.49% | 40.49% | 36.68% | 37.95% | 41.88% |
| False positive (FP) | 19.12% | 16.68% | 23.22% | 19.02% | 23.02% | 21.79% | 12.83% |
| False negative (FN) | 19.61% | 14.44% | 9.37% | 8.29% | 7.22% | 9.95% | 4.41% |
| Sensitivity | 51.33% | 64.16% | 76.76% | 79.52% | 82.08% | 75.30% | 87.67% |
| Specificity | 67.97% | 72.06% | 61.11% | 68.03% | 61.44% | 63.56% | 69.46% |
| Geometric mean of sensitivity and specificity | 59.07% | 68.00% | 68.49% | 73.55% | 71.01% | 69.18% | 74.05% |
| Positive predictive value (PPV) | 51.96% | 60.78% | 57.12% | 62.86% | 58.96% | 58.24% | 62.86% |
| Negative predictive value (NPV) | 67.42% | 74.87% | 79.57% | 83.00% | 83.56% | 79.23% | 88.17% |
| Geometric mean of PPV and NPV | 59.19% | 67.46% | 67.42% | 72.23% | 70.19% | 67.93% | 72.23% |
| Average gain for survival = | 1.149 | 1.15 | 1.273 | 1.274 | 1.28 | 1.26% | 1.32% |
| Average gain for survival = | 1.17 | 1.17 | 1.324 | 1.413 | 1.31 | 1.32% | 1.48% |
| Precision | 51.96% | 60.78% | 57.12% | 62.86% | 58.96% | 58.24% | 63.53% |
| Recall | 51.33% | 64.16% | 76.76% | 79.52% | 82.08% | 75.30% | 86.67% |
|
| 0.5164 | 0.6243 | 0.655 | 0.7021 | 0.6862 | 0.6568 | 0.6593 |
| Area under ROC curve | 0.631 | 0.835 | 0.765 | 0.7705 | 0.739 | 0.731 | 0.821 |
Performance parameter-wise best model for training and validation data.
| Estimation parameters | Description | Training data | Validation data | ||
|---|---|---|---|---|---|
| Model | % | Model | % | ||
| Accuracy | Accuracy of classification | PNN/GRNN | 80.00% | PNN/GRNN | 73.76% |
|
| |||||
| True positive (TP) | Patients who are predicted as malignant among the malignant patients | PNN/GRNN | 39.76% | PNN/GRNN | 35.51% |
|
| |||||
| True negative (TN) | Patients who are predicted as nonmalignant among nonmalignant patients | PNN/GRNN | 46.34% | Decision tree | 43.02% |
|
| |||||
| False positive (FP) | Patients who are predicted as malignant among nonmalignant patients | PNN/GRNN | 3.58% | PNN/GRNN | 12.83% |
|
| |||||
| False negative (FN) | Patients who are predicted as nonmalignant among malignant patients | PNN/GRNN | 3.58% | PNN/GRNN | 4.41% |
|
| |||||
| Sensitivity | Probability to correctly predict malignancy | PNN/GRNN | 92.78% | PNN/GRNN | 87.67% |
|
| |||||
| Specificity | Probability to correctly predict nonmalignant cases | PNN/GRNN | 79.85% | Decision tree | 72.06% |
|
| |||||
| Geometric mean of sensitivity and specificity | Geometric mean of sensitivity and specificity | PNN/GRNN | 80.55% | PNN/GRNN | 74.05% |
|
| |||||
| Positive predictive value (PPV) | Proportion of patients with the disease who are correctly predicted to have the disease | PNN/GRNN | 71.49% | PNN/GRNN | 62.86% |
|
| |||||
| Negative predictive value (NPV) | Proportion of patients who do not have the disease and who are correctly predicted as not having the disease | PNN/GRNN | 90.79% | PNN/GRNN | 88.17% |
|
| |||||
| Geometric mean of PPV and NPV | Geometric mean of PPV and NPV | PNN/GRNN | 79.14% | PNN/GRNN | 72.23% |
|
| |||||
| Average gain for survival = | The gain shows how much of an improvement is provided by the model | PNN/GRNN | 1.40% | PNN/GRNN | 1.32% |
|
| |||||
| Average gain for survival = | The gain shows how much of an improvement is provided by the model | PNN/GRNN | 1.65% | PNN/GRNN | 1.48% |
|
| |||||
| Precision | Proportion of cases selected by the model that have the true value; precision is equal to PPV | PNN/GRNN | 71.49% | TreeBoost | 62.86% |
|
| |||||
| Recall | Proportion of the true cases that are identified by the model; recall is equal to sensitivity | PNN/GRNN | 91.8% | PNN/GRNN | 86.67% |
|
| |||||
|
| It combines precision and recall to give an overall measure of the quality of the prediction | PNN/GRNN | 0.7715 | TreeBoost | 0.7021 |
|
| |||||
| Area under ROC curve | Area under the Receive Operating Characteristic (ROC) curve for the model | PNN/GRNN | 0.892 | Decision Tree | 0.835 |