| Literature DB >> 33859385 |
Guangyu Wang1, Xiaohong Liu2, Jun Shen3, Chengdi Wang4, Zhihuan Li5, Linsen Ye6, Xingwang Wu7, Ting Chen8, Kai Wang2, Xuan Zhang2, Zhongguo Zhou9, Jian Yang10, Ye Sang10, Ruiyun Deng11, Wenhua Liang12, Tao Yu3, Ming Gao3, Jin Wang6, Zehong Yang3, Huimin Cai11, Guangming Lu13, Lingyan Zhang14, Lei Yang15, Wenqin Xu5, Winston Wang5, Andrea Olevera5, Ian Ziyar5, Charlotte Zhang11, Oulan Li11, Weihua Liao16, Jun Liu17, Wen Chen18, Wei Chen19, Jichan Shi20, Lianghong Zheng5, Longjiang Zhang13, Zhihan Yan19, Xiaoguang Zou21, Guiping Lin3, Guiqun Cao4, Laurance L Lau5, Long Mo16, Yong Liang5, Michael Roberts22,23, Evis Sala24, Carola-Bibiane Schönlieb23, Manson Fok5, Johnson Yiu-Nam Lau25, Tao Xu11, Jianxing He12, Kang Zhang26,27, Weimin Li28, Tianxin Lin29.
Abstract
Common lung diseases are first diagnosed using chest X-rays. Here, we show that a fully automated deep-learning pipeline for the standardization of chest X-ray images, for the visualization of lesions and for disease diagnosis can identify viral pneumonia caused by coronavirus disease 2019 (COVID-19) and assess its severity, and can also discriminate between viral pneumonia caused by COVID-19 and other types of pneumonia. The deep-learning system was developed using a heterogeneous multicentre dataset of 145,202 images, and tested retrospectively and prospectively with thousands of additional images across four patient cohorts and multiple countries. The system generalized across settings, discriminating between viral pneumonia, other types of pneumonia and the absence of disease with areas under the receiver operating characteristic curve (AUCs) of 0.94-0.98; between severe and non-severe COVID-19 with an AUC of 0.87; and between COVID-19 pneumonia and other viral or non-viral pneumonia with AUCs of 0.87-0.97. In an independent set of 440 chest X-rays, the system performed comparably to senior radiologists and improved the performance of junior radiologists. Automated deep-learning systems for the assessment of pneumonia could facilitate early intervention and provide support for clinical decision-making.Entities:
Mesh:
Year: 2021 PMID: 33859385 PMCID: PMC7611049 DOI: 10.1038/s41551-021-00704-1
Source DB: PubMed Journal: Nat Biomed Eng ISSN: 2157-846X Impact factor: 25.671
Fig. 1The AI system for the detection of viral pneumonia.
a, Model development of the AI system. The system included a pipeline consisting of a CXR standardization module, a common chest thoracic disease detection module, and a pneumonia analysis module. The pneumonia analysis module consisted of viral pneumonia classification, COVID-19 detection, and COVID-19 severity assessment. b, Application and evaluation of the AI system. Left panel: An AI system was trained to identify the presence and absence of 14 common thoracic pathologies, and its performance was evaluated in external validation cohorts. Middle panel: In training with the Chinese cohort (CC-CXRI-P) and the re-annotated public dataset (CheXpert-P), the AI system made a diagnosis of viral pneumonia (including COVID-19 pneumonia). The model was then tested on external cohorts to assess the AI system’s generalizability. Right panel: the performance of the AI system was compared with the performances of radiologists and with the performance of the combination of human and machine intelligence.
The CXR datasets for the training, validation and testing of the deep-learning system.
| Cohorts | Developmental Dataset | External | ||
|---|---|---|---|---|
| Training dataset | Tuning dataset | Testing dataset | (SYSU-PE) | |
| Number of images | 96,543 | 12,035 | 12,124 | 24,500 |
| Number of subjects | 73,917 | 9,160 | 9,250 | 23,585 |
| Inpatient | 38,438 (52.0%) | 4,761 (52.0%) | 4,871 (52.7%) | -- |
| Outpatient | 35,479 (48.0%) | 4,377 (47.8%) | 4,354 (47.1%) | -- |
| Physical Examination | -- | 22 (0.2%) | 25 (0.2%) | 23,585 (100.0%) |
| Male (%) | 31,019 (42.0%) | 3,840 (41.9%) | 3,850 (41.6%) | 11,868 (50.3%) |
| Age (years), mean (IQR) | 44.9 (32-59) | 45.1 (32-60) | 44.9 (32-59) | 37.8 (28-46) |
| Atelectasis | 167(0.23%) | 26(0.28%) | 22(0.24%) | 4(0.02%) |
| Cardiomegaly | 1,828(2.47%) | 242(2.64%) | 239(2.58%) | 46(0.20%) |
| Fibrosis | 4,405(5.96%) | 523(5.71%) | 560(6.05%) | 431(1.83%) |
| Infiltration | 7,085(9.59%) | 914(9.98%) | 886(9.58%) | 88(0.37%) |
| Mass | 708(0.96%) | 86(0.94%) | 82(0.89%) | 17(0.07%) |
| Nodule | 4,187(5.66%) | 550(6.00%) | 554(5.99%) | 463(1.96%) |
| Pleural thickening | 4,192(5.67%) | 545(5.95%) | 544(5.88%) | 412(1.75%) |
| Pneumonia | 8,099(10.96%) | 1,015(11.08%) | 1,042(11.26%) | 164(0.70%) |
| Pneumothorax | 552(0.75%) | 67(0.73%) | 61(0.66%) | 0(0.00%) |
| Consolidation | 118(0.16%) | 12(0.13%) | 12(0.13%) | 0(0.00%) |
| Edema | 133(0.18%) | 12(0.13%) | 21(0.23%) | 0(0.00%) |
| Effusion | 3,903(5.28%) | 485(5.29%) | 462(4.99%) | 43(0.18%) |
| Hernia | 23(0.03%) | 3(0.03%) | 1(0.01%) | 1(0.01%) |
| Emphysema | 715(0.97%) | 84(0.92%) | 84(0.91%) | 29(0.12%) |
| No finding | 55,320(74.84%) | 6,823(74.49%) | 6,882(74.40%) | 22,319(94.63%) |
Fig. 2Performance of the AI system in the multi-label classification of common chest diseases encompassing opacity.
Receiver operating characteristic curves (ROC) and normalized confusion matrices of the classification model. Opacity included atelectasis, mass, edema, pneumonia, and consolidation. a, The AI system’s performance on the hold-out test dataset. b, The AI system’s performance on the external validation cohorts that represent the population for physical examination. Compared with the patient distribution from a, there existed merely edema, and consolidation.
Fig. 3Performance of the AI system in the discrimination of viral pneumonia, other types of pneumonia, and absence of pneumonia, from CXR images.
Receiver operating characteristic curves (ROC) and normalized confusion matrices of the classification model. a and b, AI system’s performance on the hold-out test dataset. c and d, The AI system’s performance on the independent external validation data in the China cohort. For the three-way classification. e and f, The AI system’s performance on the external validation set for subjects screening for suspicious pneumonia. CI, confidence interval.
Number of CXR images for training, validation and testing in differentiating among viral pneumonia, other types of pneumonia, and absence of pneumonia (normal).
| Cohorts | Viral pneumonia | |||||
|---|---|---|---|---|---|---|
| Other types of viral pneumonia | COVID-19 pneumonia | Other types of pneumonia | Normal | Total | ||
| Training | “Gold-standard labels” China (CC-CXRI) | 2,506 | 1,248 | 5,015 | 4,389 | 13,158 |
| “Silver-standard labels” US (CheXpert-P) | 2,840 | -- | 5,309 | 4,999 | 13,148 | |
| Validation | 169 | 159 | 637 | 554 | 1,519 | |
| Testing | 190 | 164 | 630 | 535 | 1,519 | |
| External validation | 142 | 98 | 610 | 1,049 | 1,899 | |
| Population Study | 46 | 0 | 220 | 768 | 1,034 | |
| International cohort | 63 | 132 | 226 | 229 | 650 | |
Fig. 4Performance of the AI system in the identification of COVID-19 pneumonia from CXR images.
ROC curves and normalized confusion matrices for binary classification. a and b, The AI system’s performance on differentiating COVID-19 pneumonia from others (e.g., bacterial pneumonia) on test dataset: AUC = 0.966 (95% CI: 0.955-0.975), sensitivity = 92.07%, specificity = 90.12%. d and e, The AI system’s performance on differentiating COVID-19 pneumonia from other viral pneumonia (OVP) on the test dataset: AUC = 0.867 (95% CI: 0.828-0.902), sensitivity = 82.32%, specificity = 72.63%. c and f, ROC curves showing the AI system’s performance on identifying severe or non-severe COVID-19 from others pneumonia (c) (e.g., bacterial pneumonia) and other types of viral pneumonia (f).
Fig. 5Severity analysis of COVID-19 pneumonia patients from CXR images.
a, Scatter plot showing the correlation of the CXR severity index by the AI model versus the CXR severity index by the radiologist’s assessment. b, Bland-Altmann plot showing the agreement between the AI predicted severity index and the radiologist assessed severity index. X-axis represents the mean of the two measurements, and the Y-axis represented the difference between the two measurements. c, ROC curves for the binary classification of the clinical severity. The blue curve represented the severity prediction by using the AI predicted severity index as input: AUC = 0.868 (95% CI: 0.816-0.915). The orange curve represented the severity prediction by using the radiologist assessed severity index as input: AUC = 0.832 (95% CI: 0.782-0.885). d, Confusion matrix for the binary classification of the clinical severity. The performance of the AI reviewer: accuracy = 81.12%, sensitivity = 82.05%, specificity = 80.65%. e, An example of lung-lesion segmentation of viral pneumonia of a CXR image. PCC, Pearson correlation coefficient; MAE, mean absolute error; ICC, Intraclass correlation coefficient.
Fig. 6Performance of the AI system and of radiologists in identifying pneumonia conditions from CXR images.
The performance comparison of four groups: the AI system, an average of a group of four junior radiologists, an average of a group of four senior radiologists, and an average of the group of four junior radiologists with AI assistance. a, The ROC curves for diagnosing viral pneumonia from the rest (other types of pneumonia and normal). The star denoted the operating point of the AI system. Filled dots denoted the junior and senior radiologists’ performance, while the hollow dots denoted the performance of the junior group with the AI’s assistance. Dashed lines linked the paired performance values of the junior group. b, Weighted errors of the four groups based on a penalty metric. P < 0.001 computed using a two-sided permutation test of 10,000 random re-samplings. c, An evaluation experiment on diagnostic performance when the AI system acted as a “second reader” or an “arbitrator”.