| Literature DB >> 34121775 |
Vitória de Carvalho Brito1,2, Patrick Ryan Sales Dos Santos1,2, Nonato Rodrigues de Sales Carvalho1,2, Antonio Oseas de Carvalho Filho1,2.
Abstract
COVID-19 is an infectious disease caused by a newly discovered type of coronavirus called SARS-CoV-2. Since the discovery of this disease in late 2019, COVID-19 has become a worldwide concern, mainly due to its high degree of contagion. As of April 2021, the number of confirmed cases of COVID-19 reported to the World Health Organization has already exceeded 135 million worldwide, while the number of deaths exceeds 2.9 million. Due to the impacts of the disease, efforts in the literature have intensified in terms of studying approaches aiming to detect COVID-19, with a focus on supporting and facilitating the process of disease diagnosis. This work proposes the application of texture descriptors based on phylogenetic relationships between species to characterize segmented CT volumes, and the subsequent classification of regions into COVID-19, solid lesion or healthy tissue. To evaluate our method, we use images from three different datasets. The results are promising, with an accuracy of 99.93%, a recall of 99.93%, a precision of 99.93%, an F1-score of 99.93%, and an AUC of 0.997. We present a robust, simple, and efficient method that can be easily applied to 2D and/or 3D images without limitations on their dimensionality.Entities:
Keywords: 3D Texture Analysis; COVID-19; Computed Tomography; Phylogenetic Diversity
Year: 2021 PMID: 34121775 PMCID: PMC8180348 DOI: 10.1016/j.patcog.2021.108083
Source DB: PubMed Journal: Pattern Recognit ISSN: 0031-3203 Impact factor: 7.740
Fig. 1Stages in the proposed methodology, consisting of image acquisition, extraction of characteristics through phylogenetic diversity indexes, data classification for the target classes, and validation of the results.
Distribution of images between datasets.
| Dataset | Diagnosis | Number of images |
|---|---|---|
| LIDC | Solid lesions | 1679 |
| Healthy tissue | 17742 | |
| COVID-19 (1) | GGO, consolidation and pleural effusion | 215 |
| COVID-19 (2) | GGO, consolidation and pleural effusion | 274 |
Fig. 2Example of a cladogram for a set of primates.
Fig. 3Correspondence between biological concepts and the elements in the proposed method.
Fig. 4(a) Example of an analyzed volume; (b) representation of the cladogram extracted from the example image.
Distances calculated for the cladogram shown in Fig. 4.
| 0 | 1 | |
| 0 | 2 | |
| 0 | 3 | |
| 0 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 1 | 4 | |
| 2 | 3 | |
| 2 | 4 | |
| 3 | 4 |
Calculation of the and terms of the PD index.
| 0 | 1 | 2 | 1.5 | 3 |
| 0 | 2 | 3 | 1 | 3 |
| 0 | 3 | 4 | 1.25 | 5 |
| 0 | 4 | 5 | 1.6 | 8 |
| 1 | 2 | 3 | 1.5 | 4.5 |
| 1 | 3 | 4 | 1.66 | 6.64 |
| 1 | 4 | 5 | 2 | 10 |
| 2 | 3 | 3 | 1 | 3 |
| 2 | 4 | 4 | 1.66 | 6.64 |
| 3 | 4 | 3 | 2.5 | 7.5 |
| 15.67 | 57.28 | |||
Classifiers used in the proposed method.
| Classifier | Description | Parameters |
|---|---|---|
| RF | RF is a regression and classification algorithm developed in | number of estimators (number of trees) = 100, |
| min samples split = 2, | ||
| min samples leaf = 1, | ||
| max number of features = ”auto” (sqrt-number features) | ||
| bootstrap = True, | ||
| max depth = None (unlimited) | ||
| XGBoost | XGBoost is an optimized machine learning technique developed in | max depth = 6, |
| learning rate = 0.1, | ||
| number of estimators (number of trees) = 100, | ||
| booster = ”gbtree”, | ||
| objective = ”binary:logistic”, | ||
| gamma = 0, | ||
| max delta step = 0 |
Fig. 5Flow of the proposed method for a sample image from the dataset.
Experiments performed at work.
| Experiment | Description/classes | Number of images |
|---|---|---|
| 1 | Healthy tissue | 19.868 |
| 2 | Healthy tissue | 19.624 |
| 3 | Healthy tissue | 19.653 |
Results of the proposed method.
| Experiment | Classifier | Acc (%) | Rec (%) | Prec (%) | F1 (%) | AUC | Classification runtime (s) |
|---|---|---|---|---|---|---|---|
| 1 | RF | 99.90 | 99.90 | 99.90 | 99.90 | 0.995 | 3.1 |
| 2 | |||||||
| XGBoost | 99.92 | 99.92 | 99.92 | 99.92 | 0.994 | 2.5 | |
| 3 | RF | 99.86 | 99.86 | 99.86 | 99.86 | 0.988 | 2.9 |
-The best result for each experiment is shown in bold.
Comparison with other descriptors.
| Experiment | Method | Classifier | Acc (%) | Rec (%) | Prec (%) | F1 (%) | AUC | Classification runtime (s) |
|---|---|---|---|---|---|---|---|---|
| 1 | Histogram | XGBoost | 96.52 | 96.52 | 96.38 | 96.28 | 0.840 | 7.5 |
| GLCM (angle 45) | XGBoost | 98.17 | 98.17 | 98.09 | 98.11 | 0.904 | 117.1 | |
| DenseNet-121 | XGBoost | 98.81 | 98.81 | 98.73 | 98.74 | 0.863 | 4018.4 | |
| DenseNet-169 | XGBoost | 98.80 | 98.80 | 98.72 | 98.72 | 0.858 | 5924.6 | |
| DenseNet-201 | XGBoost | 98.83 | 98.83 | 98.76 | 98.76 | 0.866 | 7420.2 | |
| Inception-V3 | XGBoost | 98.71 | 98.71 | 98.61 | 98.61 | 0.844 | 5228.4 | |
| VGG16 | XGBoost | 98.95 | 98.95 | 98.90 | 98.91 | 0.888 | 1249.3 | |
| EfficientNet-B0 | XGBoost | 99.07 | 99.07 | 99.03 | 99.04 | 0.902 | 6738.7 | |
| EfficientNet-B1 | XGBoost | 99.02 | 99.02 | 98.98 | 98.98 | 0.897 | 6589.5 | |
| 2 | Histogram | XGBoost | 97.01 | 97.01 | 96.86 | 96.70 | 0.799 | 7.3 |
| GLCM (angle 45) | XGBoost | 98.28 | 98.28 | 98.21 | 98.23 | 0.907 | 95.2 | |
| DenseNet-121 | XGBoost | 98.96 | 98.96 | 98.89 | 98.90 | 0.869 | 3833.0 | |
| DenseNet-169 | XGBoost | 98.93 | 98.93 | 98.86 | 98.87 | 0.862 | 7355.8 | |
| DenseNet-201 | XGBoost | 98.96 | 98.96 | 98.90 | 98.91 | 0.869 | 8617.4 | |
| Inception-V3 | XGBoost | 98.86 | 98.86 | 98.79 | 98.79 | 0.852 | 5056.9 | |
| VGG16 | XGBoost | 99.02 | 99.02 | 98.97 | 98.98 | 0.881 | 1165.5 | |
| EfficientNet-B0 | XGBoost | 99.09 | 99.09 | 99.05 | 99.06 | 0.892 | 6780.7 | |
| EfficientNet-B1 | XGBoost | 99.06 | 99.06 | 99.02 | 99.03 | 0.887 | 6601.9 | |
| 3 | Histogram | XGBoost | 97.15 | 97.15 | 97.09 | 97.01 | 0.863 | 7.2 |
| GLCM (angle 0) | XGBoost | 98.40 | 98.40 | 98.33 | 98.34 | 0.866 | 109.3 | |
| DenseNet-121 | XGBoost | 99.27 | 99.27 | 99.24 | 99.22 | 0.828 | 3246.4 | |
| DenseNet-169 | XGBoost | 99.25 | 99.25 | 99.22 | 99.19 | 0.822 | 6607.8 | |
| DenseNet-201 | XGBoost | 99.28 | 99.28 | 99.24 | 99.23 | 0.830 | 8448.9 | |
| Inception-V3 | XGBoost | 99.19 | 99.19 | 99.16 | 99.13 | 0.815 | 4150.1 | |
| VGG16 | XGBoost | 99.40 | 99.40 | 99.39 | 99.38 | 0.893 | 996.6 | |
| EfficientNet-B0 | XGBoost | 99.47 | 99.47 | 99.46 | 99.46 | 0.917 | 5563.4 | |
| EfficientNet-B1 | XGBoost | 99.45 | 99.45 | 99.45 | 99.45 | 0.920 | 5519.4 | |
Comparison with other approaches using diversity indexes.
| Work | Goal | Image type | Methodology | Descriptors | Number of indexes in common with the proposed work | Classification | Results |
|---|---|---|---|---|---|---|---|
| Classification of lung nodules into benign and malignant | CT (3D) | Diversity indexes are adapted to generate a standardized entry for CNN | Topology-based phylogenetic diversity indexes: sum of basic taxic weights and sum of standardized taxic weights | 0 | k-fold cross- validation, with k = 10 | Accuracy: 92.63% Sensitivity: 90.7% Specificity: 93.47% ROC: 0.934 | |
| Glaucoma classification | Retinal images (2D) | Generative Adversarial Network used in conjunction with taxonomic indexes | Taxonomic diversity indexes: | 2 | k-fold cross- validation, with k = 10 | Accuracy: 100% Sensitivity: 100% Specificity: 100% ROC: 1 | |
| Breast cancer diagnosis | Histological images (2D) | Phylogenetic diversity indexes used for classification and content-based image retrieval | Phylogenetic diversity indexes: PD, SPD, MNND, PSV and PSR | 5 | k-fold cross-validation, with k = 10 | Accuracy: 95.0% Precision: 96.0% AUC: 0.98 | |
| Proposed Method | Lung lesions classification for COVID-19 detection | CT - COVID-19 (3D) | Phylogenetic diversity indexes and cladogram optimization for classification of multiple lung lesions | Phylogenetic diversity indexes + taxonomic diversity indexes: PD, SPD, MNND, PSV, PSR, MPD, | - | k-fold cross-validation, with k = 5 | Accuracy: 99.93% Recall: 99.93% Precision: 99.93% F1-score: 99.93% AUC: 0.997 |
Comparison of our method with related works.
| Work | Exam type | Number of images | Goal/Classes | Acc (%) | Recall (%) | Precision (%) | F1-score (%) | AUC |
|---|---|---|---|---|---|---|---|---|
| X-ray | 206 | COVID-19 and non-COVID-19 | 95.12 | 97.91 | - | - | - | |
| X-ray | 558 | COVID-19, bacterial pneumonia, non-COVID-19 viral pneumonia and normal | 79.52 | - | - | - | 0.87 | |
| CT | 118 | Non-severe and severe COVID-19 | 89.00 | - | - | - | 0.98 | |
| CT | 746 | COVID-19 and non-COVID-19 | 86.00 | - | - | 85.00 | 0.94 | |
| CT | 150 | COVID-19 and non-COVID-19 | 98.27 | 98.93 | 97.63 | 98.28 | - | |
| CT | 453 | COVID-19 and non-COVID-19 | 82.90 | 84.00 | - | - | - | |
| CT | 1396 | COVID-19 and non-COVID-19 | 95.99 | 94.04 | - | 92.84 | 0.99 | |
| CT | 2492 | COVID-19 and non-COVID-19 | 96.25 | 96.29 | 96.29 | 96.29 | - | |
| CT | 4356 | COVID-19 and non-COVID-19 | - | 90.00 | - | - | 0.96 | |
| CT | 540 | COVID-19 and non-COVID-19 | - | 90.70 | - | - | 0.95 | |
| CT | 81.00 | Common and severe COVID-19 | - | - | - | - | 0.93 | |
| CT | 103 | Healthy, IPF and COVID-19 cases | 89.60 | 96.10 | - | - | - | |
Fig. 6a) PCA with three components representing the extracted characteristics; b) confusion matrix of results.
Fig. 7(1) Confusion matrix with correctly and incorrectly classified regions; (2) graphs of the indexes extracted from the images in the confusion matrix.