| Literature DB >> 31795409 |
Dana Li1,2, Bolette Mikela Vilmun1,2, Jonathan Frederik Carlsen1, Elisabeth Albrecht-Beste3, Carsten Ammitzbøl Lauridsen1,4, Michael Bachmann Nielsen1,2, Kristoffer Lindskov Hansen1,2.
Abstract
The aim of this study was to systematically review the performance of deep learning technology in detecting and classifying pulmonary nodules on computed tomography (CT) scans that were not from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) database. Furthermore, we explored the difference in performance when the deep learning technology was applied to test datasets different from the training datasets. Only peer-reviewed, original research articles utilizing deep learning technology were included in this study, and only results from testing on datasets other than the LIDC-IDRI were included. We searched a total of six databases: EMBASE, PubMed, Cochrane Library, the Institute of Electrical and Electronics Engineers, Inc. (IEEE), Scopus, and Web of Science. This resulted in 1782 studies after duplicates were removed, and a total of 26 studies were included in this systematic review. Three studies explored the performance of pulmonary nodule detection only, 16 studies explored the performance of pulmonary nodule classification only, and 7 studies had reports of both pulmonary nodule detection and classification. Three different deep learning architectures were mentioned amongst the included studies: convolutional neural network (CNN), massive training artificial neural network (MTANN), and deep stacked denoising autoencoder extreme learning machine (SDAE-ELM). The studies reached a classification accuracy between 68-99.6% and a detection accuracy between 80.6-94%. Performance of deep learning technology in studies using different test and training datasets was comparable to studies using same type of test and training datasets. In conclusion, deep learning was able to achieve high levels of accuracy, sensitivity, and/or specificity in detecting and/or classifying nodules when applied to pulmonary CT scans not from the LIDC-IDRI database.Entities:
Keywords: artificial intelligence; deep learning; nodule classification; nodule detection
Year: 2019 PMID: 31795409 PMCID: PMC6963966 DOI: 10.3390/diagnostics9040207
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flowchart of the literature search and study selection.
Performance of the studies exploring detection of pulmonary nodules.
| Detection | ||||||||
|---|---|---|---|---|---|---|---|---|
| Author | Year | Deep Learning Architecture | Dataset for Training | Dataset for Testing | Sensitivity | Specificity | AUC | Accuracy |
| Suzuki, Kenji * [ | 2009 | MTANN | Independent dataset A | Independent dataset B | 97 | N/A | N/A | N/A |
| Tajbakhsh, Nima et al. [ | 2017 | CNN | Independent dataset | Independent dataset | 100 | N/A | N/A | N/A |
| MTANN | Independent dataset | Independent dataset | 100 | N/A | N/A | N/A | ||
| Masood, Anum et al. [ | 2018 | FCNN | LIDC-IDRI, RIDER, LungCT-diagnosis, LUNA16, LISS, SPIE challenge dataset and independent dataset | RIDER | 74.6 | 86.5 | N/A | 80.6 |
| SPIE challenge dataset | 81.2 | 83 | N/A | 84.9 | ||||
| LungCT-diagnosis | 82.5 | 93.6 | N/A | 89.5 | ||||
| Independent dataset | 83.7 | 96.2 | N/A | 86.3 | ||||
| Chen, Sihang et al. [ | 2019 | CNN | Independent dataset | Independent dataset | 97 | N/A | N/A | N/A |
| Liao, Fangzhou et al. [ | 2019 | CNN | LUNA16 and DSB17 | DSB17 | 85.6 | N/A | N/A | N/A |
| Liu, Mingzhe et al. [ | 2018 | CNN | LUNA16 and DSB17 | DSB17 | 85.6 | N/A | N/A | N/A |
| Li, Li et al. * [ | 2018 | CNN | LIDC-IDRI and NLST | Independent dataset | 86.2 | N/A | N/A | N/A |
| Wang, Yang et al. [ | 2019 | RCNN | Independent dataset | Independent dataset | N/A | N/A | N/A | N/A |
| Setio, A.A.A et al. * [ | 2016 | CNN | LIDC-IDRI and ANODE09 | DLCST | 76.5 | N/A | N/A | 94 |
| ANODE09 | N/A | N/A | N/A | N/A | ||||
| Wang, Jun et al. [ | 2019 | CNN | Tianchi AI challenge dataset and independent dataset | Independent dataset | 75.6 | N/A | N/A | N/A |
Studies marked with * are studies where test dataset was different from training dataset. AUC: area under the curve. Abbreviations: massive training artificial neural network (MTANN), convolutional neural network (CNN), lung image database consortium and image database resource initiative (LIDC-IDRI), reference image database to evaluate therapy response (RIDER), Society of Photo-Optical Instrumentation Engineers (SPIE), lung nodule analysis 2016 (LUNA16), lung CT imaging signs (LISS), Kaggle data science bowl 2017 (DSB17), Danish lung cancer screening trial (DLCST), automatic nodule detection 2009 (ANODE09).
Performance of studies exploring classification of pulmonary nodules.
| Classification | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Author | Year | Deep Learning Architecture | Dataset for Training | Dataset for Testing | Categories for Testing | Sensitivity | Specificity | AUC | Accuracy |
| Alakwaa, Wafaa et al. [ | 2017 | CNN | LUNA16 and DSB17 | DSB17 | Cancer vs. no cancer | N/A | N/A | N/A | 86.6 |
| Chen, Sihang et al. [ | 2019 | CNN | Independent dataset | Independent dataset | Adenocarcinoma vs. benign | N/A | N/A | N/A | 87.5 |
| Ciompi, Francesco et al. [ | 2015 | CNN | ImageNet and NELSON | NELSON | Peri-fissural nodules (PFN) vs. non-PFN | N/A | N/A | 84.7 | N/A |
| Ciompi, Francesco et al. *[ | 2017 | CNN | MILD | DLCST | Multiple categories (overall) | N/A | N/A | N/A | 79.5 |
| Jakimovski, Goran et al. [ | 2019 | CDNN | LONI database | LONI database | Cancer vs. no cancer | 99.9 | 98.7 | N/A | 99.6 |
| Lakshmanaprabu, S.K. et al. [ | 2018 | ODNN | ELCAP | ELCAP | Abnormal vs. normal | 96.2 | 94.2 | N/A | 94.5 |
| Li, Li et al. * [ | 2018 | CNN | LIDC-IDRI and NLST | Independent dataset | Multiple categories (overall) | N/A | N/A | N/A | N/A |
| Liao, Fangzhou et al. [ | 2019 | CNN | LUNA16 and DSB17 | DSB17 | Cancer vs. no-cancer (scale) | N/A | N/A | 87 | 81.4 |
| Liu, Shuang et al. [ | 2017 | CNN | NLST and ELCAP | NLST and ELCAP | Malign vs. benign | N/A | N/A | 78 | N/A |
| Liu, Xinglong et al. * [ | 2017 | CNN | LIDC-IDRI | ELCAP | Multiple categories (overall) | N/A | N/A | N/A | 90.3 |
| Masood, Anum et al. [ | 2018 | FCNN | LIDC-IDRI, RIDER, LungCT-Diagnosis, LUNA16, LISS, SPIE challenge dataset and Independent dataset | Independent dataset | Four stage categories (overall) | 83.7 | 96.2 | N/A | 96.3 |
| Nishio, Mizuho et al. [ | 2018 | CNN | Independent dataset | Independent dataset | Benign, primary and metastic cancer (overall) | N/A | N/A | N/A | 68 |
| Onishi, Yuya et al. [ | 2018 | DCNN | Independent dataset | Independent dataset | Malign vs. benign | N/A | N/A | 84.1 | 81.7 |
| Polat, Huseyin et al. [ | 2019 | CNN | DSB17 | DSB17 | Cancer vs. no cancer | 88.5 | 94.2 | N/A | 91.8 |
| Qiang, Yan et al. [ | 2017 | Deep SDAE-ELM | Independent dataset | Independent dataset | Malign vs. benign | 84.4 | 81.3 | N/A | 82.8 |
| Rangaswamy et al. [ | 2019 | CNN | ILD | ILD | Malign vs. benign | 98 | 94 | N/A | 96 |
| Sori, Worku Jifara et al. [ | 2018 | CNN | LUNA16 and DSB17 | DSB17 | Cancer vs. no cancer | 87.4 | 89.1 | N/A | 87.8 |
| Suzuki, Kenji * [ | 2009 | MTANN | Independent dataset A | Independent dataset B | Malign vs. benign | 96 | N/A | N/A | N/A |
| Tajbakhsh, Nima et al. [ | 2017 | CNN | Independent dataset | Independent dataset | Malign vs. benign | N/A | N/A | 77.6 | N/A |
| MTANN | Independent dataset | Independent dataset | Malign vs. benign | N/A | N/A | 88.1 | N/A | ||
| Wang, Shengping et al. [ | 2018 | CNN | Independent dataset | Independent dataset | PIL vs. IAC | 88.5 | 80.1 | 89.2 | 84 |
| Wang, Yang et al. [ | 2019 | RCNN | Independent dataset | Independent dataset | Malign vs. benign | 76.5 | 89.1 | 90.6 | 87.3 |
| Yuan, Jingjing et al. * [ | 2017 | CNN | LIDC-IDRI | ELCAP | Multiple categories (overall) | N/A | N/A | N/A | 93.9 |
| Zhang, Chao et al. * [ | 2019 | CNN | LUNA16, DSB17 and Independent dataset(A) | Independent dataset(B) | Malign vs. benign | 96 | 88 | N/A | 92 |
Studies marked with * are studies where test dataset was different from training dataset. Abbreviations: massive training artificial neural network (MTANN), convolutional neural network (CNN), deep neural network (DNN), lung image database consortium and image database resource initiative (LIDC-IDRI), the Dutch–Belgian randomized lung cancer screening trial (Dutch acronym; NELSON), multicentric Italian lung detection (MILD), laboratory of neuro imaging (LONI), early lung cancer action program (ELCAP), reference image database to evaluate therapy response (RIDER), Society of Photo-Optical Instrumentation Engineers (SPIE), lung nodule analysis 2016 (LUNA16), lung CT imaging signs (LISS), Kaggle data science bowl 2017 (DSB17), interstitial lung disease (ILD), Danish lung cancer screening trial (DLCST), automatic nodule detection 2009 (ANODE09), pre-invasive lesions (PIL), invasive adenocarcinomas (IAC).
(a)
| Author | Year | Sensitivity | Specificity |
|---|---|---|---|
| Jakimovski, Goran et al. [ | 2019 | 99.9 | 98.7 |
| Lakshmanaprabu, S.K. et al. [ | 2018 | 96.2 | 94.2 |
| Masood, Anum et al. [ | 2018 | 83.7 | 96.2 |
| Polat, Huseyin et al. [ | 2019 | 88.5 | 94.2 |
| Qiang, Yan et al. [ | 2017 | 84.4 | 81.3 |
| Rangaswamy et al. [ | 2019 | 98 | 94 |
| Sori, Worku Jifara et al. [ | 2018 | 87.4 | 89.1 |
| Suzuki, Kenji et al. [ | 2009 | 96 * | N/A |
| Wang, Shengping et al. [ | 2018 | 88.5 | 80.1 |
| Wang, Yang et al. [ | 2019 | 76.5 | 89.1 |
| Zhang, Chao et al. [ | 2019 | 96 * | 88 * |
(b)
| Author | Year | AUC |
|---|---|---|
| Ciompi, Francesco et al. [ | 2015 | 84.7 |
| Liao, Fangzhou et al. [ | 2019 | 87 |
| Liu, Shuang et al. [ | 2017 | 78 |
| Onishi, Yuya et al. [ | 2018 | 84.1 |
| Tajbakhsh, Nima et al.(CNN) [ | 2017 | 77.6 |
| Tajbakhsh, Nima et al.(MTANN) [ | 88.1 | |
| Wang, Shengping et al. [ | 2018 | 89.2 |
| Wang, Yang et al. [ | 2019 | 90.6 |
(c)
| Author | Year | Accuracy |
|---|---|---|
| Alakwaa, Wafaa et al. [ | 2017 | 86.6 |
| Chen, Sihang et al. [ | 2019 | 87.5 |
| Ciompi, Francesco et al. [ | 2017 | 79.5 * |
| Jakimovski, Goran et al. [ | 2019 | 99.6 |
| Lakshmanaprabu, S.K. et al. [ | 2018 | 94.5 |
| Liao, Fangzhou et al. [ | 2019 | 81.4 |
| Liu, Xinglong et al. [ | 2017 | 90.3 * |
| Masood, Anum et al. [ | 2018 | 96.3 |
| Nishio, Mizuho et al. [ | 2018 | 68 |
| Onishi, Yuya et al. [ | 2018 | 81.7 |
| Polat, Huseyin et al. [ | 2019 | 91.8 |
| Qiang, Yan et al. [ | 2017 | 82.8 |
| Rangaswamy et al. [ | 2019 | 96 |
| Sori, Worku Jifara et al. [ | 2018 | 87.8 |
| Wang, Shengping et al. [ | 2018 | 84 |
| Wang, Yang et al. [ | 2019 | 87.3 |
| Yuan, Jingjing et al. [ | 2017 | 93.9 * |
| Zhang, Chao et al. [ | 2019 | 92 * |
Results marked with * are from studies where test dataset was different from training dataset.