Literature DB >> 32323497

Clinical Implementation of Deep Learning in Thoracic Radiology: Potential Applications and Challenges.

Abstract

Chest X-ray radiography and computed tomography, the two mainstay modalities in thoracic radiology, are under active investigation with deep learning technology, which has shown promising performance in various tasks, including detection, classification, segmentation, and image synthesis, outperforming conventional methods and suggesting its potential for clinical implementation. However, the implementation of deep learning in daily clinical practice is in its infancy and facing several challenges, such as its limited ability to explain the output results, uncertain benefits regarding patient outcomes, and incomplete integration in daily workflow. In this review article, we will introduce the potential clinical applications of deep learning technology in thoracic radiology and discuss several challenges for its implementation in daily clinical practice.

Entities: Disease Gene Species

Keywords: Artificial intelligence; Chest X-ray; Chest radiograph; Computed tomography; Deep learning

Mesh：

Year: 2020 PMID： 32323497 PMCID： PMC7183830 DOI： 10.3348/kjr.2019.0821

Source DB: PubMed Journal: Korean J Radiol ISSN： 1229-6929 Impact factor: 3.500

INTRODUCTION

In recent years, artificial intelligence (AI) and deep learning (DL) have become highlighted technologies across society, including in the field of medicine. The concept of DL is not brand-new (12), but the recent rapid growth of computing power and digital data have enabled its success in various fields of application, such as speech recognition (3), natural language processing (4), self-driving vehicles (5), and medicine (67). One of the most successful areas of DL is computer vision. A specific type of DL algorithm called the convolutional neural network (CNN) has played a central role in this success. In 2012, a DL algorithm called “AlexNet,” using a CNN architecture, won the annual ImageNet Large Scale Visual Recognition Challenge, which is the biggest competition in the image recognition field (8), exhibiting a much lower error rate than the winning algorithm from the previous year (16% vs. 26%) (9). In 2015, the winning algorithm of the competition called “ResNet,” based on a CNN, exhibited an error rate of 3.6%, surpassing human-level performance (10). For medical image analyses, CNN-based DL models showed expert or beyond-expert level performances in various tasks, including the diagnosis of skin cancer from skin photographs (11), diagnosis of diabetic retinopathy from fundus photographs (1213), and detection of breast cancer metastasis from pathologic slides (14). These initial successes raised expectations that DL-based medical image analysis tools would soon be implemented in daily practice. Recently, it has been asserted that radiologists could and should be educated consumers by understanding the value of AI tools in clinical practices and evaluating their performance before their clinical implementation (15). Chest X-ray (CXR) ragiographic images and computed tomography (CT), which are the two pillars of thoracic radiology, have been the most actively investigated imaging modalities for various computer-aided image analyses. Some investigations have shown promising results using conventional computer-aided image analyses (1617181920212223), but few of them have been implemented in actual clinical practice because of their suboptimal performances (2425). It is now anticipated that DL technology will overcome the limitations in performance shown by conventional computer-aided image analyses and be implemented in the daily practice of thoracic radiology (1726). Indeed, there have been several early investigations reporting the surprisingly high performance of DL technologies in thoracic radiology, particularly CXRs (272829303132). The aim of this review article is to introduce potential applications of DL technology in the field of thoracic radiology (Table 1) and possible scenarios of implementation in the clinical workflow. In addition, we aim to discuss challenges in the application of DL in routine clinical practice.

Table 1

Task-Based Classification of Potential Applications of Deep Learning Technology in Field of Thoracic Radiology

Detection of abnormalities

Detection of lung nodule on CXR (3043) or chest CT (69)

Image classification

Classification of lung nodules according to morphology (71)

Classification of lung nodules according to likelihood of malignancy (727374)

Diagnosis of specific diseases (active tuberculosis (282944), lung cancer (7577), COPD (85), pulmonary fibrosis (8184))

Prediction of patient prognosis or treatment response (768586)

Image segmentation

Organ segmentation (lung (9596), pulmonary lobes (97), airway (98))

Lung nodule segmentation (99100)

Image generation

Image neutralization (108109110)

Image quality improvement (image noise reduction) (114115116)

COPD = chronic obstructive pulmonary disease, CT = computed tomography, CXR = chest X-ray

Lung Nodule Detection on CXR

Lung nodule detection on CXR is important because lung nodules may represent lung cancer. However, this can often be challenging and lung nodules are not uncommonly missed by radiologists (3334). Therefore, a computer-aided detection (CAD) system for lung nodules is by far the most investigated task of CAD with respect to CXRs (2535363738). Investigations in this field began in the 1960s (39). Regarding conventional CAD systems, multiple studies reported a potential to enhance radiologists' performance (38404142); however, their standalone performance was suboptimal for a clinical implementation, resulting in many false-positive nodules (1.7–3.3 false positive results per image) (25). Recent investigations based on DL suggest a potential to overcome the limitations of conventional CAD systems (3043). In a study by Nam et al. (30), a DL algorithm exhibited a per-nodule sensitivity of 70–82% with 0.02–0.34 false-positives per image for the detection of malignant lung nodules on CXRs. In another study by Cha et al. (43), a DL algorithm showed a 76.8% per-nodule sensitivity at 0.3 false-positives per image. In both studies, the performance of the DL algorithms was better than that of the radiologists. Although it is difficult to directly compare the performances in the two studies because of the differences in the test dataset, the above-radiologist performances of DL suggest the potential of its implementation in daily practice (Fig. 1).

Fig. 1

Detection of lung nodules on chest X-ray.

A. Chest X-ray image shows nodular opacity at juxta-diaphragmatic right basal lung (arrow). B. Corresponding CT image shows 1.5-cm solid nodule at right lower lobe of lung (arrow). C. DL algorithm successfully detected nodule with output probability score of 25% (Courtesy of authors, DL algorithm is same as that in study by Nam et al. (30)). CT = computed tomography, DL = deep learning

Detection of Multiple Abnormalities on CXR

In addition to lung nodules, the DL-based algorithm has shown good performance in various thoracic diseases, such as pulmonary tuberculosis (area under receiver operating characteristic curve [AUC], 0.83–0.99) (28294445), pneumonia (maximum AUC in internal validation, 0.93) (46), and pneumothorax (AUC, 0.82–0.91) (3247), and in the evaluation of medical devices on CXRs (484950). However, algorithms specific to a single disease or abnormality may have limited value in real clinical practice, as the interpretation of CXR requires the assessment of various diseases and abnormalities in the thorax. In 2017, Wang et al. (51) reported a large open-source dataset including 112120 CXRs from 30805 patients, which were labeled for 14 thoracic abnormalities, provided by the U.S. National Institute of Health (50). The authors reported benchmark performances of DL algorithm for the identification of various abnormalities; they showed AUCs ranging from 0.66 (for identification of pneumonia) to 0.87 (for identification of hernia) with an average AUC value of 0.75 (51). Several subsequent studies reported better performances in detecting specific abnormalities using the same dataset (average AUC values, 0.76–0.81) (315253). More recently, additional large-scale open-source CXR datasets labeled for various abnormalities have been released (e.g., CheXpert dataset (54): 224316 CXRs labeled for 14 findings from Stanford Hospital, US; MIMIC-CXR dataset (55): 371920 CXRs labeled for 14 findings from Beth Israel Deaconess Medical Center, US: PADChest dataset (56): 160868 CXRs labeled for 174 findings from Hospital San Juan, Spain) (Table 2). Investigations of DL-based CAD in the detection of multiple abnormalities on CXRs must continue for the time being.

Table 2

Major Large-Scale Open-Source Datasets of CXR

Name of Dataset	Distributor	Data Source	No. of Data	Labels	Location
ChestX-ray14 (51)	US National Institute of Health	National Institute of Health Clinical Center (US)	112120 CXRs from 30805 patients	14 radiological findings	https://nihcc.app.box.com/v/ChestXray-NIHCC
CheXpert (54)	Stanford University	Stanford Hospital (US)	224316 CXRs from 65240 patients	14 radiological findings^*	https://stanfordmlgroup.github.io/competitions/chexpert/chexpert/
MIMIC-CXR (55)	Massachusetts Institute of Technology	Beth Israel Deaconess Medical Center (US)	371920 CXRs from 65383 patients	14 radiological findings^*	https://physionet.org/content/mimic-cxr/2.0.0/
PADChest (56)	Medical Imaging Databank of Valencia Region	Hospital San Juan (Spain)	160868 CXRs from 67625 patients	174 radiologic findings; 19 differential diagnoses; 104 anatomic locations	http://bimcv.cipf.es/bimcv-projects/padchest/

*Identical labels.

Differentiation of various abnormalities on CXR using DL algorithms can be a challenging task, as various thoracic diseases have overlapping radiologic findings (Fig. 2). In a study by Hwang et al. (27), a DL algorithm could accurately differentiate pneumothorax (accuracy: 95%) from parenchymal diseases (lung cancer, tuberculosis, pneumonia) while exhibiting much lower performance for the differentiation of three parenchymal diseases (accuracy: 21–84%). Despite the limited performance in differential diagnosis, overlapping radiologic findings of various diseases may help the detection of rare, non-targeted diseases using the DL algorithm, considering that training a DL algorithm to cover all diseases that can be found on CXRs is virtually impossible. In our recent study, a DL-based algorithm that had been trained for four diseases (lung cancer, tuberculosis, pneumonia, and pneumothorax) could identify clinically referable CXRs, including those with not only target diseases of the algorithm (sensitivity: 87.9–93.6%), but also non-target diseases of the algorithm (sensitivity: 73.9–82.6%) with an AUC of 0.95, in the emergency department (57). The algorithm also exhibited higher sensitivities compared to those of on-call radiology residents (sensitivity, 65.6%).

Fig. 2

Detection and differentiation of different abnormalities on chest X-ray.

Chest X-ray (A) and CT (B) obtained on same day from patient with pulmonary edema shows consolidation in both lung fields, bilateral pleural effusion, and mild cardiomegaly. DL algorithm classified chest X-ray image as abnormal, with 82% probability score. C. Algorithm identified Csns, PEf, and Cm on chest X-ray and localized each abnormality separately. Notably, algorithm recognized focal area of dense consolidation in right lower lung field as nodule (Courtesy of authors, DL algorithm is same as that in study by Kim et al. (132)). Cm = cardiomegaly, Csn = consolidation, Ndl = nodule, PEf = pleural effusion

Screening for Tuberculosis on CXR

Detection of pulmonary tuberculosis is another important task of CAD with a high potential for clinical application. The World Health Organization (WHO) recommends systemic screening for active tuberculosis in high-risk populations to reduce the global burden of tuberculosis (58). Although the choice of screening algorithm depends on the population and availability of diagnostic modalities, CXR plays a key role in the suggested screening algorithms (58). However, although CXR has good diagnostic ability for tuberculosis (sensitivity of 87% and specificity of 89% for tuberculosis-related abnormalities) (585960), the number of expert radiologists able to interpret them are limited, especially in high-prevalence countries. In this regard, a commercialized CAD system for tuberculosis (CAD4TB, Delft imaging systems, 's-Hertogenbosch, The Netherlands) has been tested in various screening settings, exhibiting AUCs ranging from 0.71 to 0.84 (58). However, as of 2016, the WHO recommendation is that CAD should be used only in research because of the limited evidence regarding its diagnostic accuracy (61). DL may boost the performance of CAD systems for tuberculosis. Recent studies using DL algorithms have shown promising performances (AUC, 0.83–0.99) (Fig. 3) (2829444562), suggesting a potential for implementation. Furthermore, in a study by Hwang et al. (28), the DL algorithm (AUC, 0.99) outperformed human readers (AUC, 0.75–0.97), including thoracic radiologists and the performance of human readers improved after reviewing the algorithm results (AUC, 0.75–0.97 to 0.85–0.98). Thus far, DL algorithms, however, have been tested on datasets collected for case-control studies (45). Further investigations on the diagnostic performance of DL-based algorithms are required in actual screening or triage situations to prove their applicability in real-world practice.

Fig. 3

Identification of chest X-ray with active pulmonary tuberculosis.

A. Chest X-ray of patient with active pulmonary tuberculosis shows subtle nodular infiltration in right upper lung field (arrow). B. Corresponding CT image shows clustered centrilobular nodules and mild bronchiectasis in right upper lobe of lung (arrow). C. DL algorithm successfully detected lesion, with heat map overlaid on chest X-ray (Courtesy of authors, DL algorithm is same as that in study by Hwang et al. (28)).

Lung Cancer Screening with Low-Dose CT

With cumulative evidence of reduced lung cancer mortality following screening using low-dose chest CTs (636465), nationwide systemic lung cancer screening programs have been implemented or are expected to be implemented in the near future (6667). The workload of radiologists is expected to increase with the implementation of lung cancer screening (68). DL may help radiologists in various aspects of the interpretation of low-dose CTs for lung cancer screening. Lung nodule detection is a classic task for CAD in the field of chest CTs (19). In 2016, a challenge for lung nodule detection called LUng Nodule Analysis (LUNA) 16 was held (69) and best-performing DL algorithms exhibited over 90% per-nodule sensitivity, ranging from 0.125 to eight false positives per examination (70). Classification or categorization of lung nodules is another key task in lung cancer screening. The DL algorithm may categorize lung nodules based on their size, location, number, calcification, internal consistency, or existing criteria, such as the Lung CT screening Reporting and Data System (Lung-RADS), to reduce inter-reader variability among radiologists (71) or directly classify each lung nodule based on its likelihood of malignancy (727374). Per-examination level classification, that is, to classify CT examinations directly into those with and without lung cancer, is another potential strategy. In 2017, Kaggle, a representative online data science competition community, held a competition called “Data Science Bowl 2017” with a task of predicting lung cancer diagnosis within one year of a single CT examination (75). With a similar strategy, the Google AI team published a remarkable study (77). In the study, a DL algorithm evaluated a full set of low-dose chest CT images, with or without a prior CT examination for comparison. The algorithm revealed the likelihood of the subject to be diagnosed with lung cancer. Compared to the Lung-RADS categorization by thoracic radiologists, the algorithm exhibited better performance using a single CT volume (sensitivity, 79.5–95.2% vs. 62.5–90.0%; specificity, 81.3–96.5% vs. 69.7–95.3%; varied by threshold for positive results) and in-par performance with radiologists using two CT volumes (previous and current CTs; sensitivity, 72.5–87.5% vs. 70.0–86.7%; specificity, 84.2–96.5% vs. 83.7–96.3%; varied by threshold for positive results). Although the ability to explain the output and possibility of integration with the current workflow is questionable, the high performance of DL algorithms in the diagnosis of lung cancer, without any intervention by radiologists, is impressive.

Classification of Diffuse Lung Diseases on CT

DL can also be utilized in the interpretation of CTs of patients with diffuse lung diseases. The classification of radiologic findings of interstitial lung disease (ILD) is prone to high intra- and inter-reader variabilities, and DL technologies may help reduce this variability. Several studies reported that DL algorithms can classify CT findings of ILD (e.g., honeycombing, reticulation, ground-glass opacity, and consolidation) (787980). Furthermore, in a recent study by Walsh et al. (81), DL could classify CTs with fibrotic lung disease according to existing guidelines (8283), exhibiting overall accuracies of 73.3% and 76.4% for different test datasets. The algorithm exhibited better accuracy than 66% of radiologists. More recently, Christe et al. (84) reported an end-to-end DL algorithm that could segment the lung and airway, classify and segment different findings of lung parenchyma, and finally, classify the examination-level diagnosis based on the current criteria for idiopathic pulmonary fibrosis (83). The algorithm exhibited similar performance to two radiologists (overall accuracy, 56%). DL can also be utilized to evaluate chronic obstructive pulmonary disease (COPD). In a study by González et al. (85), a DL algorithm could identify CTs with COPD with a C-statistic of 0.856 in a cohort from the COPDGene study. The DL algorithm classified CTs of patients with different stages of COPD, exhibiting accuracies of 51.1% and 29.4% in different cohorts.

Beyond Detection and Diagnosis: DL for Novel Imaging Biomarkers

Most investigations of DL in the field of radiology to date have focused on detection of radiologic abnormalities or identifying diseases. However, the prediction of patient prognosis or therapeutic response may be another potential application of DL. Recently, Lu et al. (86) reported that a DL-based risk score obtained from CXR images could predict long-term all-cause mortality. The authors validated the DL-based risk score in cohorts from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening trial and National Lung Screening Trial and found a graded association of the mortality rate and risk score, independent of age, sex, and the radiologists' interpretation (86). In a study by González et al. (85), the DL algorithm could predict the occurrence of acute respiratory diseases (C-statistic, 0.55–0.64) and death (C-statistic, 0.60–0.72) from chest CT images. In a study by Hosny et al. (76), a DL algorithm could predict 2-year overall survival after radiotherapy (AUC, 0.70) or surgery (AUC, 0.71) for non-small cell lung cancer, outperforming conventional machine learning techniques. DL can also be utilized in radiogenomics research. In a study by Wang et al. (87), a DL algorithm could predict the mutation of epidermal growth factor receptor from CT images, with an AUC of 0.81 in an independent cohort, outperforming the conventional method of using hand-crafted CT features. Although early investigations have shown the promising performance of novel DL-based imaging biomarkers, outperforming conventional techniques, thorough validation might be warranted for those novel DL-based imaging biomarkers, as the prediction of patient outcomes is a less-intuitive task than lesion detection or image classification.

Applications of AI in Quantitative Imaging Analyses

Segmentation

There have been continuous efforts to extract quantitative biomarkers from chest CT images to evaluate various diseases (88899091). More recently, radiomics, extracting high-throughput quantitative features from images to predict diagnosis or prognosis, has emerged as an important field of radiologic research (92). The accurate segmentation of specific anatomic structures or pathologic findings of interest might be a gateway for those quantitative image analyses. However, manual segmentation by radiologists is extremely time-consuming and practically impossible in daily practice. DL has shown excellent performance in image segmentation and image classification (9394). The excellent performance of DL algorithms has been reported in the segmentation of various anatomic structures in chest CTs, including the lung parenchyma (9596), pulmonary lobes (97), airways (98), and lung nodules (99100).

Image Neutralization

Another barrier in quantitative image analyses is the variation in reconstructed images caused by variations in scanners, and scanning and reconstruction protocols (101102103). For radiomics, those variations have been indicated as the major source of variability in radiomic features, limiting the reproducibility of results and generalizability of radiomics (101104105106). DL can help overcome this barrier by neutralizing images with various image styles. DL can generate a new image with different image textures while maintaining the image content using a specific type of DL algorithm called generative adversarial network (107). Recently, Lee et al. (108) reported that a DL algorithm could convert CT images into those of different reconstruction kernels and reduce the variability in the quantification of emphysema using converted CT images. In subsequent studies, the group reported reduced variability in radiomic features by utilizing DL-based CT image reconstruction kernel conversion (Fig. 4) (109) and slice thickness reduction techniques (110) using DL algorithms.

Fig. 4

Conversion of reconstruction kernel on chest CT.

CT images reconstructed with soft kernel (A) and sharp kernel (B) from single scan of patient with lung nodule showing different image textures, which may cause variability in radiomic features of lung nodule. DL algorithm could generate CT image with similar texture to that of soft kernel image from sharp kernel image (C), and vice versa (D). Utilizing generated images with similar textures, variability in radiomic features can be reduced compared to that when using images with different textures (Courtesy of Sang Min Lee, University of Ulsan College of Medicine, Asan Medical Center, DL algorithm is same as that in study by Choe et al. (109)).

Image Quality Improvement

Optimization of the image quality while minimizing the radiation dose are major issues in clinical practice. In the previous decade, the iterative reconstruction (IR) technique achieved remarkable advancements and contributed to image noise reduction on CT images (111112113). However, there are several limitations to IR: 1) vendor-specific technologies requiring sinogram data from the CT scanner; 2) over-smoothening of images resulting in the loss of anatomic structures, such as the interlobar fissures; and 3) production of unfamiliar image textures (112). These limitations can potentially be overcome by the utilization of DL-based image generation by providing images with lesser noise but more familiar image style to radiologists, similar to images from a filtered back projection. Some vendors have already demonstrated DL-based noise reduction algorithms (114115). Furthermore, DL-based noise reduction can be applied independent of vendors or scanners, as the DL algorithm can generate new images from reconstructed CT images, without sinogram data. The DL may also contribute to the improvement of CXR image quality. In a study by Ahn et al. (116), DL-based software could generate CXR images simulating those obtained with grids from those obtained without the utilization of grids. The subjective image quality and radiologists' preference improved in those generated grid-like images, without the need for an additional radiation dose because of the utilization of grids.

Service Delivery Scenarios of DL Systems in the Clinical Workflow

The two classic scenarios of integrating CAD into the clinical workflow are add-on and stand-alone scenarios (Fig. 5) (117118). In the add-on scenario, radiologists check the results from CAD during (concurrent reader) or after (second reader) image interpretation. In previous studies, the performance of radiologists improved after reviewing the output of DL algorithms when identifying CXRs with malignant nodules (30), active tuberculosis (28), or major thoracic disease (27). In the stand-alone scenario, CAD may automatically classify CXRs without intervention from radiologists. In this scenario, the CAD system may require a more thorough validation of its diagnostic performance and reliability, and should be utilized only in selected clinical situations with narrow tasks (e.g., screening for specific diseases in the healthy population), particularly when the availability of radiologists is limited.

Fig. 5

Delivery scenarios for DL-based CAD systems.

DL-based CAD system can be utilized as assistance tool to enhance diagnostic accuracy of radiologists as concurrent (A) or second reader (B). C. In select situations in which radiologists' interpretations are unavailable, DL-based CAD system may interpret images alone to identify patients requiring referral. D. In triage scenario, CAD system may analyze images before radiologists' interpretations to triage examinations based on presence of findings requiring immediate diagnosis and management and prioritize radiologists' worklists to improve turnaround time for examinations with critical findings. E. Finally, in prescreening scenario, CAD system may analyze large volumes of examinations before radiologists' interpretation to identify clearly negative cases, and radiologists may then only interpret remaining uncertain examinations. AI = artificial intelligence, CAD = computer-aided detection

The third scenario of CAD integration into the clinical workflow can be in the triage of examinations. In this scenario, CAD makes a provisional analysis of each image before radiologists' interpretation and can prioritize the work list in terms of the criticality of the disease or abnormalities. Consequently, when there is a large volume of examinations with limited radiologist availability, such prioritization may help reduce the turnaround time for examinations with critical findings and prevent a delay in treatment. This concept of prioritization has been mainly investigated in the field of neuroradiology, in which the timely diagnosis and management of acute neurologic diseases are critical (119). DL algorithms can automatically identify critical brain CT findings and perform prioritization to minimize delayed diagnosis (120121). For CXRs, Annarumma et al. (122) utilized a DL algorithm to identify CXRs with critical or urgent findings to prioritize examination. In a simulation study, the median delay for CXRs with critical findings was reduced from 7.2 hours to 43 minutes with the application of DL-based automated prioritization (122). The last scenario for implementing CAD is a prescreening of negative examinations (123). This scenario could be utilized in selected clinical situations. Very low disease prevalence settings, such as screening of an asymptomatic population (e.g., screening tuberculosis with CXR or screening lung cancer with low-dose CT), with limited availability of experts to interpret images is a typical indication for this scenario. In this scenario, CAD may analyze a large volume of examinations before the interpretation of radiologists to identify clearly negative examinations, and radiologists would interpret the remaining examinations that were positive or inconclusive in the CAD analysis. Such prescreening schemes may help radiologists to reduce the time burden of interpreting large volumes of negative examinations and to focus on more clinically relevant cases. To be utilized as a prescreening tool, the high sensitivity of CAD should be ensured.

Challenges in the Clinical Application of DL

Ability to Explain the DL Algorithm

In order for a DL algorithm to receive credit or the acceptance of radiologists, it should appropriately explain the logical background for the output (124125126). For example, let us consider a DL algorithm that can predict lung cancer from screening low-dose CT. In addition to the final output of the algorithm (i.e., the likelihood of lung cancer), radiologists or clinicians may want to know why the algorithm provided the outcome based on their existing knowledge. Was there any pulmonary nodule? What was the size, internal consistency, and location of the nodule? Were there specific features of the nodule or background lung that raised the suspicion of lung cancer? To solve this explainability issue (or “black-box” issue), the most common method is to utilize a saliency map (126127). By overlaying a saliency map on the input image (Fig. 3), one can visualize the specific areas of the image that contributed to the final output of the DL algorithm. Saliency maps would be good solutions in detection tasks (e.g., detection of pulmonary nodules) or classification algorithms with intuitive tasks (e.g., identification of cardiomegaly). However, it may be insufficient for non-intuitive tasks, such as the diagnosis of specific diseases or prognostication. Radiologic AI should provide interpretability, transparency, reproducibility, and high performance to receive credibility from radiologists and be implemented in clinical practice.

Validation in Actual Clinical Practice

Although previous studies have reported the excellent performance of various DL algorithms in various tasks, most were validated in the algorithms' development setting with retrospectively and conveniently collected datasets (128). Such conveniently collected datasets may have enriched disease prevalence and a narrow disease spectrum. In contrast, the population in the real-word situation may have a much lower disease prevalence and a much broader spectrum of diseases, some of which may not be covered during the development of the DL algorithm (129). Therefore, the excellent performance of the DL algorithm in the algorithms' development setting may not guarantee performance in real-world settings; thus, DL algorithms should be further validated in actual clinical situations before their clinical application. Indeed, in our recent study (57), a DL algorithm to identify CXRs with major thoracic disease showed a decreased performance in a diagnostic cohort comprising patients from the emergency department (AUC, 0.95), compared to that for the test datasets collected for the case-control study (AUC, 0.97–1.00). The ultimate goal of applying the DL algorithm is improving patient outcomes in clinical practice. However, validation of the DL algorithm with respect to patient outcomes might be a much more challenging task than the validation of diagnostic performance. Improved diagnostic performance does not necessarily mean improved patient outcomes as there are multiple stages between diagnosis and patient outcomes, including patient referral, therapeutic decision making, and patient management (129). To date, little evidence exists regarding the influence of DL algorithms on patient outcomes or on their cost-effectiveness.

Integration into the Daily Clinical Workflow

Even before the rise of DL, a number of software programs for computer-assisted image analysis had demonstrated acceptable performance and passed regulatory approval (1718192125). However, most failed to survive in the daily clinical workflow (130131). Seamless integration with the existing daily clinical workflow is a prerequisite for the utilization of DL in daily clinical practice. Currently, most radiologists perform their daily practice using the picture archiving and communication system (PACS). Therefore, PACS should serve as a platform for the clinical implementation of various DL-based software programs in order for radiologists to utilize these software programs in their routine practice without limitation. In addition, integration with existing PACS or electronic health record systems is essential to enhance the clinical value of DL in improving workflow efficiency in the prioritization of examinations or critical value reporting.

CONCLUSION

In the history of radiology, the introduction of new technologies, including CT, magnetic resonance imaging, and PACS, has dramatically changed the clinical practice of clinicians and radiologists. AI and DL seem to be the “next big thing” in the field of radiology. Although DL is still in its infancy, cumulative evidence suggests that DL has the potential to survive and change clinical practice. CXRs and chest CTs, the two main pillars of thoracic radiology, are now the first target modalities for clinical implementation of DL beyond research. In this paper, we reviewed the applications of DL in thoracic radiology and discussed different scenarios regarding their implementation in clinical practice. We believe that finding the best clinical scenario where the DL can work well is a substantial task for radiologists prior to its clinical implementation and can provide the initiative to radiologists to determine the evolution of radiology. Therefore, understanding and becoming familiar with the potential of clinical applications and remaining challenges would be essential for radiologists in the era of DL.

109 in total

1. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management.

Authors: Ganesh Raghu; Harold R Collard; Jim J Egan; Fernando J Martinez; Juergen Behr; Kevin K Brown; Thomas V Colby; Jean-François Cordier; Kevin R Flaherty; Joseph A Lasky; David A Lynch; Jay H Ryu; Jeffrey J Swigris; Athol U Wells; Julio Ancochea; Demosthenes Bouros; Carlos Carvalho; Ulrich Costabel; Masahito Ebina; David M Hansell; Takeshi Johkoh; Dong Soon Kim; Talmadge E King; Yasuhiro Kondoh; Jeffrey Myers; Nestor L Müller; Andrew G Nicholson; Luca Richeldi; Moisés Selman; Rosalind F Dudden; Barbara S Griss; Shandra L Protzko; Holger J Schünemann
Journal: Am J Respir Crit Care Med Date: 2011-03-15 Impact factor: 21.405

2. Automated deep-neural-network surveillance of cranial images for acute neurologic events.

Authors: Joseph J Titano; Marcus Badgeley; Javin Schefflein; Margaret Pain; Andres Su; Michael Cai; Nathaniel Swinburne; John Zech; Jun Kim; Joshua Bederson; J Mocco; Burton Drayer; Joseph Lehar; Samuel Cho; Anthony Costa; Eric K Oermann
Journal: Nat Med Date: 2018-08-13 Impact factor: 53.440

3. Deep Learning-A Technology With the Potential to Transform Health Care.

Authors: Geoffrey Hinton
Journal: JAMA Date: 2018-09-18 Impact factor: 56.272

4. Fully Automated Lung Lobe Segmentation in Volumetric Chest CT with 3D U-Net: Validation with Intra- and Extra-Datasets.

Authors: Jongha Park; Jihye Yun; Namkug Kim; Beomhee Park; Yongwon Cho; Hee Jun Park; Mijeong Song; Minho Lee; Joon Beom Seo
Journal: J Digit Imaging Date: 2020-02 Impact factor: 4.056

5. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

6. Reduced lung-cancer mortality with low-dose computed tomographic screening.

Authors: Denise R Aberle; Amanda M Adams; Christine D Berg; William C Black; Jonathan D Clapp; Richard M Fagerstrom; Ilana F Gareen; Constantine Gatsonis; Pamela M Marcus; JoRean D Sicks
Journal: N Engl J Med Date: 2011-06-29 Impact factor: 91.245

Review 7. Quantitative computed tomography measurements to evaluate airway disease in chronic obstructive pulmonary disease: Relationship to physiological measurements, clinical index and visual assessment of airway disease.

Authors: Atsushi Nambu; Jordan Zach; Joyce Schroeder; Gongyoung Jin; Song Soo Kim; Yu-Il Kim; Christina Schnell; Russell Bowler; David A Lynch
Journal: Eur J Radiol Date: 2016-09-13 Impact factor: 3.528

Review 8. Sensitivity and specificity of chest X-ray screening for lung cancer: review article.

Authors: G Gavelli; E Giampalma
Journal: Cancer Date: 2000-12-01 Impact factor: 6.860

9. Deep Learning to Assess Long-term Mortality From Chest Radiographs.

Authors: Michael T Lu; Alexander Ivanov; Thomas Mayrhofer; Ahmed Hosny; Hugo J W L Aerts; Udo Hoffmann
Journal: JAMA Netw Open Date: 2019-07-03

10. iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network.

Authors: Guilherme Aresta; Colin Jacobs; Teresa Araújo; António Cunha; Isabel Ramos; Bram van Ginneken; Aurélio Campilho
Journal: Sci Rep Date: 2019-08-12 Impact factor: 4.379

16 in total

1. Artificial Intelligence in Thoracic Radiology. A Challenge in COVID-19 Times?

Authors: María Dolores Corbacho Abelaira; Alberto Ruano-Ravina; Alberto Fernández-Villar
Journal: Arch Bronconeumol Date: 2020-10-22 Impact factor: 4.872

Review 2. A narrative review of deep learning applications in lung cancer research: from screening to prognostication.

Authors: Jong Hyuk Lee; Eui Jin Hwang; Hyungjin Kim; Chang Min Park
Journal: Transl Lung Cancer Res Date: 2022-06

3. Development of a Deep Learning System to Detect Esophageal Cancer by Barium Esophagram.

Authors: Peipei Zhang; Yifei She; Junfeng Gao; Zhaoyan Feng; Qinghai Tan; Xiangde Min; Shengzhou Xu
Journal: Front Oncol Date: 2022-06-21 Impact factor: 5.738

4. Diagnostic performance of artificial intelligence approved for adults for the interpretation of pediatric chest radiographs.

Authors: Hyun Joo Shin; Nak-Hoon Son; Min Jung Kim; Eun-Kyung Kim
Journal: Sci Rep Date: 2022-06-17 Impact factor: 4.996

5. Improved diagnostic performance of plain radiography for cervical ossification of the posterior longitudinal ligament using deep learning.

Authors: Hee-Dong Chae; Sung Hwan Hong; Hyun Jung Yeoh; Yeo Ryang Kang; Su Min Lee; Minyoung Kim; Seok Young Koh; Yongeun Lee; Moo Sung Park; Ja-Young Choi; Hye Jin Yoo
Journal: PLoS One Date: 2022-04-27 Impact factor: 3.752

6. Implementation of a Deep Learning-Based Computer-Aided Detection System for the Interpretation of Chest Radiographs in Patients Suspected for COVID-19.

Authors: Eui Jin Hwang; Hyungjin Kim; Soon Ho Yoon; Jin Mo Goo; Chang Min Park
Journal: Korean J Radiol Date: 2020-07-17 Impact factor: 3.500

7. Combining Initial Radiographs and Clinical Variables Improves Deep Learning Prognostication in Patients with COVID-19 from the Emergency Department.

Authors: Young Joon Fred Kwon; Danielle Toussie; Mark Finkelstein; Mario A Cedillo; Samuel Z Maron; Sayan Manna; Nicholas Voutsinas; Corey Eber; Adam Jacobi; Adam Bernheim; Yogesh Sean Gupta; Michael S Chung; Zahi A Fayad; Benjamin S Glicksberg; Eric K Oermann; Anthony B Costa
Journal: Radiol Artif Intell Date: 2020-12-16

8. Artificial intelligence for ultrasonography: unique opportunities and challenges.

Authors: Seong Ho Park
Journal: Ultrasonography Date: 2020-11-03

Review 9. Applications of artificial intelligence in the thorax: a narrative review focusing on thoracic radiology.

Authors: Yisak Kim; Ji Yoon Park; Eui Jin Hwang; Sang Min Lee; Chang Min Park
Journal: J Thorac Dis Date: 2021-12 Impact factor: 2.895

10. AI Lung Segmentation and Perfusion Analysis of Dual-Energy CT Can Help to Distinguish COVID-19 Infiltrates from Visually Similar Immunotherapy-Related Pneumonitis Findings and Can Optimize Radiological Workflows.

Authors: Andreas S Brendlin; Markus Mader; Sebastian Faby; Bernhard Schmidt; Ahmed E Othman; Sebastian Gassenmaier; Konstantin Nikolaou; Saif Afat
Journal: Tomography Date: 2021-12-23