| Literature DB >> 32446037 |
Rodolfo M Pereira1, Diego Bertolini2, Lucas O Teixeira3, Carlos N Silla4, Yandre M G Costa3.
Abstract
BACKGROUND ANDEntities:
Keywords: COVID-19; Chest X-ray; Medical image analysis; Pneumonia; Texture
Mesh:
Year: 2020 PMID: 32446037 PMCID: PMC7207172 DOI: 10.1016/j.cmpb.2020.105532
Source DB: PubMed Journal: Comput Methods Programs Biomed ISSN: 0169-2607 Impact factor: 5.428
Fig. 1The hierarchical class structure of pneumonia caused by micro-organisms.
Fig. 2Different classes distribution in a binary labeled dataset.
Summary of classic binary resampling algorithms.
| Algorithm | Main Idea | Strategy | Reference |
|---|---|---|---|
| ADASYN | Creates synthetic samples for the minority class adaptively. | Oversampling | [ |
| SMOTE | Creates synthetic samples by combining the existing ones. | Oversampling | [ |
| SMOTE-B1/B2 | Creates synthetic samples considering the borderline between the classes. | Oversampling | [ |
| AllKNN | Removes samples in which a kNN algorithm misclassifies them. | Undersampling | [ |
| ENN/RENN | Removes samples in which its label differs from the most of its nearest neighbors. | Undersampling | [ |
| TomekLinks | Removes samples which are nearest neighbors but has different labels. | Undersampling | [ |
| SMOTE+TL | Apply SMOTE and TomekLink algorithms. | Hybrid |
Fig. 3Example of datasets before and after applying the resampling techniques.
Summary of the works described in this section.
| Reference | Image Type | Database/applications | Computational/ML* techniques |
|---|---|---|---|
| Nanni et al. | Neonatal facial, fluorescence microscope and smear cells images | Three databases:Neonatal facial images,2D-HeLa dataset and Pap smear datasets | LBP, LPQ, EQP, LTP, EBP, ILBP CSLBP and SVM |
| Parveen and Sthik | CXR | Pneumonia detection | DWT, WFT, WPT and fuzzy C-means clustering |
| Scalco and Rizzi | CT, PET and MR | Tumour heterogeneity characterization | Grey-level histogram, GLCM, NGTDM, GLRLM and GLSZM |
| Zhou et al. | CT | NCP/influenza differentiation images from 1138 suspected patients, being 361 viral pneumonia, 35 confirmed NCP and 156 confirmed influenza | YOLOv3, VGGNet and AlexNet |
| Li et al. | CT | 2969 images obtained in Chinese hospitals 400 NCP images 1396 other viral pneumonia and 1173 non-pneumonia | COVNet deep learning model based on ResNet-50 |
| Narin et al. | CT | NCP identification on a dataset composed of x-ray images from 50 healthy patients and 50 COVID-19 patients | ResNet50, InceptionV3 and Inception-ResNetV2 |
| Gozes et al. | CT | NCP detection and analysis using images taken from 157 patients | 2D and 3D deep learning models, and other AI models |
| Wang and Wong | CXR | NCP detection using 16,756 images taken from 13,645 patients | COVID-Net a deep neural network created to detect NCP |
| Khan et al. | CXR | NCP detection using 1251 images from four classes | CoroNet a CNN created to detect NCP |
| Ozturk et al. | CXR | NCP detection using 500 pneumonia images and 500 non-pneumonia images | DarkNet and YOLO |
* Machine Learning.
Fig. 4The proposed classification schema for the COVID-19 identification in CXR images.
Features dimensions and main parameters.
| LBP | 59 | |
| EQP | 256 | |
| LDN | 56 | |
| LETRIST | 413 | |
| BSIF | 256 | |
| LPQ | 256 | |
| oBIFs | 484 | |
| Inception-V3 | 2048 |
Fig. 5Example of combinations with late fusion strategies using the sum, product and voting strategies. The example dataset has M samples and L labels.
RYDLS-20 main characteristics.
| Samples | 1144 |
| Train | 802 |
| Test | 342 |
| Labels (Multi-Class Scenario) | 7 |
| Label Paths (Hierarchical Scenario) | 14 |
Fig. 6RYDLS-20 image samples.
RYDLS-20 samples distribution for the multi-class scenario.
| Normal | 1000 | 700 | 300 |
| COVID-19 | 90 | 63 | 27 |
| MERS | 10 | 7 | 3 |
| SARS | 11 | 8 | 3 |
| Varicella | 10 | 7 | 3 |
| Streptococcus | 12 | 9 | 3 |
| Pneumocystis | 11 | 8 | 3 |
RYDLS-20 samples distribution for the hierarchical scenario.
| Normal | 1000 | 700 | 300 |
| Pneumonia | 144 | 102 | 42 |
| Pneumonia/Acellular | 121 | 85 | 36 |
| Pneumonia/Acellular/Viral | 121 | 85 | 36 |
| Pneumonia/Acellular/Viral/Coronavirus | 111 | 78 | 33 |
| Pneumonia/Acellular/Viral/Coronavirus/COVID-19 | 90 | 63 | 27 |
| Pneumonia/Acellular/Viral/Coronavirus/MERS | 10 | 7 | 3 |
| Pneumonia/Acellular/Viral/Coronavirus/SARS | 11 | 8 | 3 |
| Pneumonia/Acellular/Viral/Varicella | 10 | 7 | 3 |
| Pneumonia/Celullar | 23 | 17 | 6 |
| Pneumonia/Celullar/Bacterial | 12 | 9 | 3 |
| Pneumonia/Celullar/Bacterial/Streptococcus | 12 | 9 | 3 |
| Pneumonia/Celullar/Fungus | 11 | 8 | 3 |
| Pneumonia/Celullar/Fungus/Pneumocystis | 11 | 8 | 3 |
Parameter settings of the classic algorithms.
| Algorithm | Parameters | |
|---|---|---|
| KNN | Number of Neighbors | 3 and 5 |
| Distance | Euclidean | |
| SVM | Kernel | RBF |
| Penalty Parameter (C) | 1 | |
| Degree | 3 | |
| Gamma | Scale | |
| Cache size | 200 | |
| Decision Function Shape | Ovr | |
| Tolerance | 0.001 | |
| MLP | Solver | LBFGS |
| Alpha | 1e-5 | |
| Shuffle | True | |
| Max Iterations | 500 | |
| Learning Rate Init | 0.3 | |
| Momentum | 0.2 | |
| Hidden Layer Sizes | 13 | |
| DT | Criterion | Gini |
| Splitter | Best | |
| Min Samples Leaf | 10 | |
| Min Samples Split | 20 | |
| Max Leaf Nodes | None | |
| Max Depth | 10 | |
| RF | Number of Trees | 10 |
| Class Weight | Balance | |
| Type of Trees | Same of DT | |
Clus-HMC execution parameters.
| Type | Tree |
| ConvertToRules | No |
| HSeparator | “/” |
| FTest | [0.001, 0.005, 0.01, 0.05, 0.1, 0.125] |
| EnsembleMethod | RForest |
| Iterations | 10 |
| VotingType | Majority |
| EnsembleRandomDepth | No |
| SplitSampling | None |
| Heuristic | Default |
| PruningMethod | Default |
| CoveringMethod | Standard |
Best results for COVID-19 label for each prediction schema in the Multi-Class scenario.
| Individual | LPQ | MLP | ENN or None | 0.8333 |
| Early Fusion | LBP & LPQ | MLP | AllKNN or RENN | 0.8000 |
| Late Fusion (Top-5) | LPQ | MLP | ENN | 0.8333 |
| Late Fusion (Top-Features) | BSIF, EQP & LPQ | MLP | ENN & RENN | 0.8333 |
| Late Fusion (Top-Classifiers) | LDN & LPQ | MLP & DT | SMOTE+TL & ENN | 0.8333 |
Best macro-avg results for each prediction schema in the Multi-Class scenario.
| Individual | LBP | MLP | RENN or AllKNN | 0.6491 |
| Early Fusion | BSIF & EQP & LPQ | MLP | TomekLink | 0.5563 |
| Late Fusion (Top-5) | LBP | MLP | AllKNN & RENN | 0.6491 |
| Late Fusion (Top-Features) | BSIF & LBP | MLP | RENN & SMOTE-B2 | 0.6491 |
| Late Fusion (Top-Classifiers) | LDN & LETRIST | DT & KNN-3 | RENN or None | 0.4500 |
Fig. 7F1-Score results per label in the best case scenario for multi-class context.
Fig. 8Confusion Matrix in the best case scenario for the multi-class experiments.
Best results for COVID-19 label for each prediction schema in the Hierarchical scenario.
| Individual | BSIF | SMOTE-B1 | 0.8387 |
| Early Fusion | BSIF & EQP & LPQ | SMOTE or TL | 0.8889 |
| Late Fusion (Top-5) | BSIF & OBIF | SMOTE-B1 & None | 0.8276 |
| Late Fusion (Top-Features) | BSIF & EQP | SMOTE-B2 & None | 0.8276 |
Best macro-avg results for each prediction schema in the Hierarchical scenario.
| Individual | LETRIST | SMOTE-B2 | 0.4615 |
| Early Fusion | LBP & INCEPTION-V3 & LETRIST | SMOTE-B1 | 0.5669 |
| Late Fusion (Top-5) | LDN & LETRIST | SMOTE & SMOTE-B2 | 0.4751 |
| Late Fusion (Top-Features) | BSIF & LETRIST | SMOTE-B1 & SMOTE | 0.4751 |
Fig. 10Confusion Matrix in the best case scenario for the hierarchical experiments.
Fig. 9F1-Score results per label in the best case on the hierarchical scenario.
Fig. 11Best F1-Score results on multi-class and hierarchical scenarios for COVID-19 Identification.
Fig. 12Best macro-avg F1-Score results on multi-class and hierarchical scenarios.
Ranking of the results per feature set in all classification scenarios.
| BSIF | 4.00 | 3.67 | 1.00 | 3.00 | 2.92 |
| INCEPTION-V3 | 6.33 | 5.00 | 4.67 | 3.33 | 4.83 |
| LBP | 4.67 | 1.33 | 4.00 | 3.00 | 3.25 |
| LDN | 3.00 | 4.33 | 4.00 | 3.00 | 3.58 |
| LETRIST | 5.33 | 2.33 | 2.67 | 1.00 | 2.83 |
| EQP | 4.00 | 5.00 | 3.33 | 4.67 | 4.25 |
| LPQ | 1.00 | 3.33 | 3.00 | 3.00 | 2.58 |
| OBIF | 3.33 | 4.33 | 2.33 | 3.00 | 3.25 |
Ranking of results per classifier in the multi-class classification scenario.
| KNN | 4.67 | 1.67 | 3.17 |
| RF | 3.00 | 4.67 | 3.83 |
| SVM | 3.67 | 4.33 | 4.00 |
| DT | 2.67 | 3.00 | 2.83 |
| MLP | 1.00 | 1.33 | 1.17 |
Ranking of results per resampling method in the classification scenarios.
| ADASYN | 7.67 | 7.67 | 4.67 | 4.00 | 6.00 |
| AllKNN | 4.33 | 2.33 | 5.33 | 7.33 | 4.83 |
| ENN | 1.67 | 4.00 | 5.33 | 6.33 | 4.33 |
| RENN | 3.00 | 2.33 | 6.33 | 6.67 | 4.58 |
| SMOTE | 6.33 | 5.00 | 2.67 | 1.67 | 3.92 |
| SMOTE-B1 | 5.67 | 5.00 | 3.67 | 1.67 | 4.00 |
| SMOTE-B2 | 7.00 | 4.67 | 4.33 | 2.00 | 4.50 |
| SMOTE+TL | 6.33 | 7.33 | 4.00 | 3.00 | 5.17 |
| TL | 3.00 | 3.33 | 3.67 | 5.33 | 3.83 |
Fig. 13Examples of samples with “COVID-19” label that were predicted as “Normal”.
Fig. 14Different examples of CXR with “normal” lungs.