| Literature DB >> 35112097 |
Luís Vinícius de Moura1, Christian Mattjie1,2, Caroline Machado Dartora1,2, Rodrigo C Barros3, Ana Maria Marques da Silva1,2.
Abstract
Both reverse transcription-PCR (RT-PCR) and chest X-rays are used for the diagnosis of the coronavirus disease-2019 (COVID-19). However, COVID-19 pneumonia does not have a defined set of radiological findings. Our work aims to investigate radiomic features and classification models to differentiate chest X-ray images of COVID-19-based pneumonia and other types of lung patterns. The goal is to provide grounds for understanding the distinctive COVID-19 radiographic texture features using supervised ensemble machine learning methods based on trees through the interpretable Shapley Additive Explanations (SHAP) approach. We use 2,611 COVID-19 chest X-ray images and 2,611 non-COVID-19 chest X-rays. After segmenting the lung in three zones and laterally, a histogram normalization is applied, and radiomic features are extracted. SHAP recursive feature elimination with cross-validation is used to select features. Hyperparameter optimization of XGBoost and Random Forest ensemble tree models is applied using random search. The best classification model was XGBoost, with an accuracy of 0.82 and a sensitivity of 0.82. The explainable model showed the importance of the middle left and superior right lung zones in classifying COVID-19 pneumonia from other lung patterns.Entities:
Keywords: SHAP; X-rays; coronavirus; explainable models; machine learning; radiological findings; radiomics
Year: 2022 PMID: 35112097 PMCID: PMC8801500 DOI: 10.3389/fdgth.2021.662343
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Examples of CXR images from COVID-19 and non-COVID-19 datasets.
Demographic data of our study.
|
|
| |
|---|---|---|
| Age | 62 ± 16 | 64 ± 19 |
| Sex (male) | 1,358 | 1,268 |
| Sex (female) | 1,253 | 1,343 |
Figure 2Workflow of the segmentation process.
Parameter values explored by random search in XGBoost and Random Forest Classifier models.
|
|
| ||
|---|---|---|---|
| min_child_weight | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | n_estimators | 10, 50, 100, 200 |
| max_depth | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | max_depth | None, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
| lambda | 0–1 | criterion | gini, entropy |
| gamma | 0–1 | min_sample_split | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
| eta | 0–1 | min_sample_leaf | 1, 2, 3, 4, 5 |
| objective | binary:logistic | ||
| tree_method | gpu_hist | ||
10 random values in the range.
Mean cross-validation metrics of XGBoost and Random Forest models.
|
|
|
|
|
|
|---|---|---|---|---|
| XGB | 0.82 | 0.82 | 0.82 | 0.82 |
| RF | 0.77 | 0.78 | 0.81 | 0.75 |
Hyperparameters for XGBoost.
|
| |
|---|---|
| min_child_weight | 6 |
| max_depth | 3 |
| lambda | 1 |
| gamma | 0.50230907476997 |
| eta | 0.50515969241175 |
| objective | binary:logistic |
| tree_method | hist |
Chosen features by XGBoost model.
|
|
|
|---|---|
| Bottom Left - First Order - Maximum | BL-1st-M |
| Bottom Right - First Order - Energy | BR-1st-E |
| Bottom Right - First Order - Kurtosis | BR-1st-K |
| Bottom Right - GLCM - Cluster | BR-GLCM-CP |
| Bottom Right - GLCM - Difference | BR-GLCM-DV |
| Middle Left - First Order - Kurtosis | ML-1st-K |
| Upper Left - First Order - Range | UL-1st-R |
| Upper Left - GLCM - Idmn | UL-GLCM-LDMN |
| Upper Left - GLRLM - Run Entropy | UL-GLRLM-RE |
| Upper Right - First Order - Robust Mean | UR-1st-RMAD |
| Upper Right - GLCM - Cluster Prominence | UR-GLCM-CP |
| Upper Right - GLCM - Cluster Shade | UR-GLCM-CS |
| Upper Right - GLCM - MCC | UR-GLCM-MCC |
| Upper Right - GLRLM - Gray Level | UR-GLRLM-GLNU |
| Upper Right - GLRLM - High Gray Level | UR-GLRLM-HGLRE |
| Upper Right - GLSZM - Gray Level | UR-GLSZM-GLNUN |
| Upper Right - GLSZM - Gray Level | UR-GLSZM-GLV |
| Upper Right - GLSZM - Large Area High | UR-GLSZM-LAHGLE |
| Upper Right - GLSZM - Size Zone | UR-GLSZM-SZNU |
| Upper Right - GLSZM - Small Area High | UR-GLSZM-SAHGLE |
Figure 3Feature importance for each radiomic feature with the XGBoost model.
Figure 4Impact of important features in the XGBoost model output.
Figure 5Example of SHAP values affecting XGBoost model output for a single COVID-19 CXR image.
Figure 6Example of SHAP values affecting XGBoost model output for a single non-COVID-19 CXR image.