| Literature DB >> 35548037 |
P Malin Bruntha1, S Immanuel Alex Pandian1, J Anitha2, Siril Sam Abraham3, S Niranjan Kumar1.
Abstract
Purpose: In the field of medical diagnosis, deep learning-based computer-aided detection of diseases will reduce the burden of physicians in the diagnosis of diseases especially in the case of lung cancer nodule classification. Materials andEntities:
Keywords: Convolutional neural network; hybridized features; radial basis function support vector machine; residual neural network; transfer learning
Year: 2022 PMID: 35548037 PMCID: PMC9084582 DOI: 10.4103/jmp.jmp_61_21
Source DB: PubMed Journal: J Med Phys ISSN: 0971-6203
Figure 1Proposed methodology for lung nodule classification
Malignancy rate for lung nodules in lung image database consortium - image database resource initiative
| Malignancy rate | Number of nodules | Nature of the nodule |
|---|---|---|
| 1, 2 | 1136 | Highly unlikely for cancer |
| 3 | 980 | Indeterminate |
| 4, 5 | 509 | Highly likely for cancer |
Dataset used for this work
| Type of nodule | Nodules extracted from database | Augmented nodules |
|---|---|---|
| Benign | 1136 | 4544 |
| Malignant | 509 | 2036 |
Figure 2Sample benign and malignant nodule slices from LIDC-IDRI datase
Figure 3Modified ResNet50 model to extract deep features
Figure 4Skip connection in ResNet50
Structure of the confusion matrix for lung nodule classification
| Benign | Malignant | |
|---|---|---|
| Actual class | ||
| Benign | TN | FP |
| Malignant | FN | TP |
| Predicted class |
TN: True negative, TP: True positive, FP: False positive, FN: False negative
Performance measures of handcrafted features
| Model | Explanation | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | FPR (%) | FNR (%) | |
|---|---|---|---|---|---|---|---|---|
| Model 1 | GLCM + logistic regression | 57.31 | 85.18 | 40 | 46.44 | 0.6 | 60 | 14.8 |
| Model 2 | GLCM + linear SVM | 56.86 | 83.79 | 40.53 | 46 | 0.59 | 59 | 16 |
| Model 3 | GLCM + RBF SVM | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 4 | GLCM + random forest | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 5 | LBP + logistic regression | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 6 | LBP + linear SVM | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 7 | LBP + RBF SVM | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 8 | LBP + random forest | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 9 | HOG + logistic regression | 74 | 93.47 | 62 | 60 | 0.73 | 38 | 6.52 |
| Model 10 | HOG + linear SVM | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 11 | HOG + RBF SVM | 78 | 95.45 | 67 | 64 | 0.77 | 32.6 | 4.5 |
| Model 12 | HOG + random forest | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
FPR: False-positive rate, FNR: False-negative rate, GLCM: Gray level co-occurrence matrix, RBF: Radial basis function, SVM: Support vector machine, LBF: Local binary pattern, HOG: Histogram of oriented gradients
Performance measures of deep features
| Model | Explanation | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | FPR (%) | FNR (%) | |
|---|---|---|---|---|---|---|---|---|
| Model 13 | VGG16 + logistic regression | 58.36 | 63.24 | 55.4 | 46.24 | 0.534 | 44.6 | 36.76 |
| Model 14 | VGG16 + linear SVM | 62.24 | 0 | 100 | 0 | 0 | 0 | 100 |
| Model 15 | VGG16 + RBF-SVM | 82.985 | 83.79 | 82.49 | 74.39 | 0.788 | 17.5 | 16.2 |
| Model 16 | VGG16 + random forest | 77.84 | 82.21 | 75.18 | 66.77 | 0.737 | 24.8 | 17.78 |
| Model 17 | VGG19 + logistic regression | 74.22 | 80.04 | 70.69 | 62.31 | 0.701 | 29.3 | 19.96 |
| Model 18 | VGG19 + linear SVM | 76.42 | 76.68 | 76.26 | 66.21 | 0.711 | 23.74 | 23.32 |
| Model 19 | VGG19 + RBF-SVM | 83.06 | 90.12 | 78.78 | 72.04 | 0.8 | 21.22 | 9.88 |
| Model 20 | VGG19 + random forest | 80.15 | 84.58 | 77.46 | 69.48 | 0.763 | 22.54 | 15.4 |
| Model 21 | ResNet50 + logistic regression | 78.36 | 83 | 75.54 | 67.31 | 0.74 | 24.46 | 17 |
| Model 22 | ResNet50 + linear SVM | 79.03 | 87.15 | 74 | 67.12 | 0.758 | 25.89 | 12.85 |
| Model 23 | ResNet50 + RBF-SVM | 83.06 | 95.06 | 75.78 | 70.42 | 0.81 | 24.2 | 4.94 |
| Model 24 | ResNet50 + random forest | 80.3 | 87.15 | 79.14 | 68.9 | 0.77 | 23.86 | 12.85 |
FPR: False-positive rate, FNR: False-negative rate, RBF: Radial basis function, SVM: Support vector machine
Performance analysis of hybridized features in lung nodule classification
| Model | Explanation | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | FPR (%) | FNR (%) | |
|---|---|---|---|---|---|---|---|---|
| Model 25 | VGG16 + HOG + logistic regression | 78.66 | 80.83 | 77.34 | 68.39 | 0.74 | 22.66 | 19.16 |
| Model 26 | VGG16 + HOG + linear SVM | 78.43 | 82 | 76.25 | 67.69 | 0.74 | 23.74 | 17.98 |
| Model 27 | VGG16 + HOG + RBF-SVM | 82.46 | 54.35 | 99.5 | 98.56 | 0.7 | 0.005 | 45.6 |
| Model 28 | VGG16 + HOG + random forest | 73.88 | 31.22 | 99.76 | 98.75 | 0.47 | 0.002 | 68.77 |
| Model 29 | VGG19 + HOG + logistic regression | 76.56 | 87.15 | 70.14 | 63.9 | 0.74 | 29.85 | 12.84 |
| Model 30 | VGG19 + HOG + linear SVM | 76.49 | 85.38 | 71 | 79.37 | 0.82 | 28.89 | 14.89 |
| Model 31 | VGG19 + HOG + RBF-SVM | 93.28 | 89.5 | 95.6 | 92.43 | 0.91 | 4.44 | 10.5 |
| Model 32 | VGG19 + HOG + random forest | 73.65 | 30.63 | 99.76 | 98.72 | 0.47 | 0.002 | 69.37 |
| Model 33 | ResNet50 + HOG + logistic regression | 82.9 | 89.72 | 78.77 | 71.94 | 0.79 | 21.2 | 10.27 |
| Model 34 | ResNet50 + HOG + linear SVM | 79.6 | 95.8 | 69.78 | 65.8 | 0.78 | 30.2 | 4 |
| Model 35 | ResNet50 + HOG + random forest | 88.13 | 84.78 | 90.16 | 83.95 | 0.84 | 9.8 | 15.2 |
| Model 36 | ResNet50 + HOG + RBF-SVM | 97.53 | 98.62 | 96.88 | 95.04 | 0.97 | 3.12 | 1.38 |
FPR: False-positive rate, FNR: False-negative rate, RBF: Radial basis function, SVM: Support vector machine, HOG: Histogram of oriented gradients
Figure 5Feature reduction using principal component analysis
Figure 6Computation time for hybrid models
Figure 7Receiver operating characteristic of the proposed hybrid model
Comparison of performance metrics with state-of-the-art methods
| Related works | Accuracy (%) | Sensitivity (%) | Specificity (%) | Precision (%) | FPR (%) | FNR (%) | AUC | |
|---|---|---|---|---|---|---|---|---|
| Proposed approach | 97.53 | 98.62 | 96.88 | 95.04 | 0.97 | 3.12 | 1.38 | 0.996 |
| Li | 88.58 | 82.60 | 91.82 | - | - | 8.28 | 17.4 | - |
| Wang | 91.75 | - | - | - | - | - | - | 0.970 |
| Nibali | 89.9. | 91.07 | 88.64 | 89.35 | - | - | - | 0.946 |
| da Nóbrega | 88.41 | 85.38 | - | 73.48 | 0.79 | - | - | 0.932 |
| Xie | 87.74 | 81.11 | 89.67 | - | - | - | - | 0.945 |
| Shen | 87.14 | 77 | 93 | - | - | - | - | 0.93 |
| de Carvalho | 92.63 | 90.7 | 93.47 | - | - | - | - | 0.934 |
| Kumar | 75.01 | 83.35 | - | - | - | - | - | - |
| Han | - | - | - | - | - | - | - | 0.927 |
| Dhara | - | 82.89 | 80.73 | - | - | - | - | 0.882 |
| Hussein | 91.26 | - | - | - | - | - | - | - |
FPR: False-positive rate, FNR: False-negative rate, AUC: Area under the curve