| Literature DB >> 35185296 |
Tao Tan1, Bipul Das2, Ravi Soni3, Mate Fejes4, Hongxu Yang1, Sohan Ranjan2, Daniel Attila Szabo4, Vikram Melapudi2, K S Shriram2, Utkarsh Agrawal2, Laszlo Rusko4, Zita Herczeg4, Barbara Darazs4, Pal Tegzes4, Lehel Ferenczi4, Rakesh Mullick2, Gopal Avinash3.
Abstract
The front-line imaging modalities computed tomography (CT) and X-ray play important roles for triaging COVID patients. Thoracic CT has been accepted to have higher sensitivity than a chest X-ray for COVID diagnosis. Considering the limited access to resources (both hardware and trained personnel) and issues related to decontamination, CT may not be ideal for triaging suspected subjects. Artificial intelligence (AI) assisted X-ray based application for triaging and monitoring require experienced radiologists to identify COVID patients in a timely manner with the additional ability to delineate and quantify the disease region is seen as a promising solution for widespread clinical use. Our proposed solution differs from existing solutions presented by industry and academic communities. We demonstrate a functional AI model to triage by classifying and segmenting a single chest X-ray image, while the AI model is trained using both X-ray and CT data. We report on how such a multi-modal training process improves the solution compared to single modality (X-ray only) training. The multi-modal solution increases the AUC (area under the receiver operating characteristic curve) from 0.89 to 0.93 for a binary classification between COVID-19 and non-COVID-19 cases. It also positively impacts the Dice coefficient (0.59 to 0.62) for localizing the COVID-19 pathology. To compare the performance of experienced readers to the AI model, a reader study is also conducted. The AI model showed good consistency with respect to radiologists. The DICE score between two radiologists on the COVID group was 0.53 while the AI had a DICE value of 0.52 and 0.55 when compared to the segmentation done by the two radiologists separately. From a classification perspective, the AUCs of two readers was 0.87 and 0.81 while the AUC of the AI is 0.93 based on the reader study dataset. We also conducted a generalization study by comparing our method to the-state-art methods on independent datasets. The results show better performance from the proposed method. Leveraging multi-modal information for the development benefits the single-modal inferencing.Entities:
Keywords: Artificial intelligence; COVID-19; Multi-modal; Reader study
Year: 2022 PMID: 35185296 PMCID: PMC8847079 DOI: 10.1016/j.neucom.2022.02.040
Source DB: PubMed Journal: Neurocomputing ISSN: 0925-2312 Impact factor: 5.719
Fig. 1The training scheme and inferencing design.
Fig. 2The illustration of synthetic X-ray and its mask generation.
Fig. 3A synthetic X-ray with its corresponding projected disease mask as an overlay.
Fig. 4An example of transferring synthetic X-ray mask to X-ray mask. Top: a representative synthetic X-ray generated from CT, the corresponding lung image and disease mask; Middle: paired X-ray, the corresponding lung image and direct disease annotation from X-ray; bottom: X-ray with transferred annotations from CT shown as red contour; registered lung image from synthetic X-ray and transferred disease annotations from synthetic X-ray.
Fig. 5The schematic overview of our proposed classification and segmentation deep-learning model.
Dataset breakdown for our experiments.
| Dataset | X-rays (# XMA, # PMA) | Synthetic X-rays (# SMA) |
|---|---|---|
| train COVID-19 | 974 (247, 77) | 21487 (8322) |
| train pneumonia | 10175(6108, 17) | 11312 (5380) |
| train negative | 14859 (NA,NA) | 12542 (NA) |
| val COVID-19 | 113 (37,2) | NA |
| val pneumonia | 531 (473,8) | NA |
| val negative | 3301 (NA, NA) | NA |
| test COVID-19 | 307 (68, 52) | NA |
| test pneumonia | 1006 (345,33) | NA |
| test negative | 2271 (NA,NA) | NA |
| in–house test COVID-19 | 266 (68,52) | NA |
| in–house test pneumonia | 116 (45,33) | NA |
| in–house test negative | 37 (NA,NA) | NA |
Data source details.
| Data source | COVID-19 (train/val/test) | Pneumonia (train/val/test) | Negative (train/val/test) |
|---|---|---|---|
| Kaggle Pneumonia RSNA | NA | 5412/300/300 | NA |
| Kaggle Pneumonia Chest | NA | 3875/8/390 | 1341/8/234 |
| PadChest Dataset | NA | 694/200/200 | 4925/2000/2000 |
| IEEE github dataset RSNA | 122/29/41 | NA | NA |
| NIH dataset | NA | NA | 6018/757/0 |
| In–house negative data source | NA | NA | 2379/497/0 |
| In–house three class source | 852/84/266 | 194/23/116 | 196/39/37 |
Different training sets.
| training dataset setting | Description |
|---|---|
| S1 | X-ray images with XMA |
| S2 | S1 + synthetic X-ray images with SMA |
| S3 | S1 + X-ray images with PMA |
| S4 | S1 + synthetic X-ray images with SMA + X-ray images with PMA |
Area overlapping between different annotations.
| Comparisons | |
|---|---|
| XMA vs TMA | 0.28 |
| PMA vs TMA | 0.47 |
| XMA vs PMA | 0.50 |
Fig. 6Examples with large annotation inconsistencies where TMA as red contour, XMA as blue regions and PMA as green regions.
Classification results: AUC measures of different training schemes on different datasets with different positive and negative compositions.
| Training dataset setting vs test AUC | AUC C vs P + N | AUC C vs P | AUC C vs P + N | AUC C vs P |
|---|---|---|---|---|
| S1 | 0.98 | 0.98 | 0.89 | 0.87 |
| S2 | 0.98 | |||
| S3 | 0.98 | 0.91 | 0.90 | |
| S4 | ||||
| S4 ensemble | 0.99 | 0.99 | 0.93 | 0.93 |
Segmentation results: Dice measures of different training schemes on different datasets.
| Train dataset setting vs test Dice | |||
|---|---|---|---|
| S1 | 0.58 | 0.59 | 0.59 |
| S2 | 0.57 | 0.56 | 0.58 |
| S3 | |||
| S4 | 0.57 | 0.58 | 0.62 |
| S4 ensemble | 0.58 | 0.59 | 0.64 |
Fig. 7Segmentation examples where images on the left are original X-ray images, in the middle are PMA and on the right are AI segmentations.
Area overlapping between readers and AI.
| Comparisons | |
|---|---|
| Radiologist 1 vs Radiologist 2 | 0.53 |
| Radiologist 1 vs AI | 0.52 |
| Radiologist 2 vs AI | 0.55 |
Fig. 8ROC curves of radiologists and the AI system in the use case of COVID-19 versus non-COVID-19 classification.
Classification performance on independent dataset.
| Comparisons | AUC | Specificity | Sensitivity |
|---|---|---|---|
| Covid-Net | 0.55 | 0.55 | 0.55 |
| Proposed | 0.81 | 0.73 | 0.72 |