Literature DB >> 35070379

Applications of artificial intelligence in the thorax: a narrative review focusing on thoracic radiology.

Yisak Kim^1,2, Ji Yoon Park³, Eui Jin Hwang³, Sang Min Lee⁴, Chang Min Park^1,2,3,5.

Abstract

OBJECTIVE: This review will focus on how AI-and, specifically, deep learning-can be applied to complement aspects of the current healthcare system. We describe how AI-based tools can augment existing clinical workflows by discussing the applications of AI to worklist prioritization and patient triage, the performance-boosting effects of AI as a second reader, and the use of AI to facilitate complex quantifications. We also introduce prominent examples of recent AI applications, such as tuberculosis screening in resource-constrained environments, the detection of lung cancer with screening CT, and the diagnosis of COVID-19. We also provide examples of prognostic predictions and new discoveries beyond existing clinical practices.
BACKGROUND: Artificial intelligence (AI) has shown promising performance for thoracic diseases, particularly in the field of thoracic radiology. However, it has not yet been established how AI-based image analysis systems can help physicians in clinical practice.
METHODS: This review included peer-reviewed research articles on AI in the thorax published in English between 2015 and 2021.
CONCLUSIONS: With advances in technology and appropriate preparation of physicians, AI could address various clinical problems that have not been solved due to a lack of clinical resources or technological limitations. KEYWORDS: Artificial intelligence (AI); deep learning (DL); computer aided diagnosis (CAD); thoracic radiology; pulmonary medicine. 2021 Journal of Thoracic Disease. All rights reserved.

Entities: Chemical

Year: 2021 PMID： 35070379 PMCID： PMC8743417 DOI： 10.21037/jtd-21-1342

Source DB: PubMed Journal: J Thorac Dis ISSN： 2072-1439 Impact factor: 2.895

Introduction

Artificial intelligence (AI) techniques have shown promising performance in medicine, particularly in the field of medical image analysis. Convolutional neural network (CNN)-based deep learning (DL) models have shown performance equal to or even surpassing that of experts in various tasks, including the detection of retinal pathologies in fundus photographs (1-4), interpretation of echocardiography (5-7) and screening mammography (8), and the diagnosis of major thoracic diseases on chest radiographs (CXR) and computed tomography (CT) (9-27). Radiology in respiratory medicine is a particularly important field for which a variety of AI applications are actively being developed. Major thoracic diseases such as lung cancer and tuberculosis are among the leading causes of death worldwide (28,29), and numerous radiologic studies have been performed to diagnose them. For example, CXR, which is often the first imaging study acquired to diagnose pathologies in the thorax, remains the most commonly performed radiologic exam worldwide, with an average of 238 CXRs acquired per 1,000 population annually (30). For this reason, even though DL research in medicine is notoriously data-hungry, DL in the thoracic radiology field, which has relatively large imaging databases, has been actively researched. Recently, Nam et al. showed that a DL algorithm detecting 10 common abnormalities on CXR could improve radiologists’ performance and shorten the reporting time for critical and urgent cases (18). Similarly, Seah et al. showed that a DL algorithm significantly improved the classification accuracy of radiologists for 102 clinical findings (31). In 2021, more than 13 US Food and Drug Administration–cleared AI algorithms developed for pulmonary diseases are available (32), and they are expected to be implemented sooner or later in daily clinical practice. This review will focus on how AI (specifically, DL) can be applied to complement aspects of the current healthcare system. We included peer-reviewed research articles on AI in the thorax published in English between Jan 2015 and July 2021. A PubMed literature search performed on July 16, 2021, using the search terms “(artificial intelligence OR machine learning OR deep learning) AND (thorax OR pulmonary OR respiratory OR chest OR lung) AND medicine”. Under these search terms, more than 3,600 papers were searched, so given the narrative nature of this review, articles were carefully selected by reviewing their title and abstracts to provide a general understanding of this topic ().

Table 1

Cohort size, validation, algorithm type and reported performance for selected studies in thoracic radiology

First author (ref.)	Journal (year)	Cohort sizes	Source of data	Algorithm type	Task	Performance
Nam (18)	Eur Respir J (2021)	146,717 CXRs from 108,053 patients	Data collected from Seoul National University Hospital	ResNet34-based	Detect 10 common abnormalities on CXR	AUROC 0.895–1.00 in the CT-confirmed external dataset and 0.913–0.997 in the PadChest
Seah (31)	Lancet Digit Health (2021)	821,681 CXRs from 284,649 patients	MIMIC, I-MED, ChestX-ray14, CheXpert, and PadChest	EfficientNet-based for classification and U-Net-based for segmentation	CXR interpretation across 127 clinical findings	AUROC 0.954–0.959 in the MIMIC and I-MED
Huang (33)	NPJ Digit Med (2020)	1,797 CTPA studies from 1,773 patients	CTPA dataset collected from a single institution	3D CNN PENet	Detect PE on volumetric CTPA scans	AUROC of 0.82–0.87 on the hold out internal testset and 0.81–0.88 on external dataset
Hata (34)	Eur Radiol (2021)	170 non-contrast-enhanced CT from 170 patients	Data collected from single institution	Xception-based	Detect AD on non-contrast-enhanced CT	Accuracy, sensitivity, and specificity of 90.0%, 91.8%, and 88.2%
Hwang (16)	Radiology (2019)	89,834 CXRs for train and CXRs from 1,135 patients for validation	Data collected from Seoul National University Hospital	DenseNet-based	Detect four major thoracic diseases on CXRs	AUROC of 0.93-0.96 for validation dataset
Hasenstab (35)	Radiol Cardiothorac Imaging (2021)	CT from 8,951 patients	Data collected from the COPD Genetic Epidemiology study	Deep CNN	Stage the severity of COPD through quantification of CT	Stages correlated with the GOLD criteria, with AUROC of 0.86–0.96
Chassagnon (36)	Radiol Artif Intell (2020)	CT from 208 patients	Data collected from single institution	SegNet autoencoder-based AtlasNet	Assessment of the extent of systemic sclerosis related ILD	Dice similarity coefficients of 0.74-0.75 for ILD contours
Hwang (20)	Clin Infect Dis (2019)	60,989 CXRs from 50,593 patients	Data collected from Seoul National University Hospital and 6 external multicenter or validation	27-layer deep CNN	Detect active pulmonary tuberculosis on CXRs	AUROC of 0.977–1.000 for classification and AUAFROC of 0.973–1.000 for localization in external dataset
Ciompi (37)	Sci Rep (2017)	1,805 nodules from 943 patients	Data from the Multicentric Italian Lung Detection trial	Deep CNN	Classifying lung nodules into 6 classes	Average accuracy of 72.9%
Ardila (38)	Nat Med (2019)	42,290CT from 14,851patients	Data from the National Lung Cancer Screening Trial	Mask-RCNN, RetinaNet, Inception V1 and 3D Inception	Predict the risk of lung cancer based on CT	AUROC of 0.94
Harmon (39)	Nat Commun (2020)	CT from 1,280 patients for training and 1,337 patients for validation	Data from four international centers	AH-Net and DenseNet 121-based	Detect COVID-19 pneumonia on CT	Accuracy, sensitivity, and specificity of 90.8%, 84%, and 93%
González (40)	Am J Respir Crit Care Med (2018)	CT from 7,983 COPDGene participants and 1,672 ECLIPSE participants	Data from COPDGene and ECLIPSE	Deep CNN	Acute respiratory disease event and mortality prediction on CT	Acute respiratory disease event prediction (C-index, 0.64 and 0.55 for internal and external validation) and mortality prediction (C-index, 0.72 and 0.60)
Hosnv (41)	PLoS Med (2018)	CT from 1,194 patients	Data from 7 independent datasets across 5 institutions	3D CNN	2-year mortality prediction of NSCLC patients	AUROC of 0.70 and 0.71 for 2-year mortality after the start of radiotherapy and after surgery
Lu (42)	JAMA Netw Open(2019)	CXRs from 57,813 patients	Data from Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial and the National Lung Screening Trial	CXR-risk CNN	12-year mortality prediction from a CXR	The very high CXR-risk group had mortality of 53.0% (PLCO) and 33.9% (NLST)
Chao (43)	Nat Commun (2021)	CT from 10,730 patients	Data from the National Lung Screening Trial and Massachusetts General Hospital	Tri2D-Net-based	Predict cardiovascular mortality with low-dose-CT	AUROC of 0.734–0.801
Raghu (44)	JACC Cardiovasc Imaging (2021)	CXRs from 116,035 individuals	Data from Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial and the National Lung Screening Trial	Deep CNN	Estimate biological age from a CXR to predict longevity	CXR-Age carried a higher risk of all-cause mortality than a 5-year increase in chronological age

CXR, chest radiograph; CNN, convolutional neural network; COPD, chronic obstructive pulmonary disease; CTPA, computed tomography pulmonary angiography; NSCLC, non-small-cell lung cancer; ILD, interstitial lung disease; PE, pulmonary embolism; AD, aortic dissection. The rest of the paper is organized as follows. In “Application schemes of AI tools in clinical practices” section, we describe how AI-based tools can augment existing clinical workflows by discussing the applications of AI to worklist prioritization and patient triage, the performance-boosting effects of AI as a second reader, and the use of AI to facilitate complex quantifications. We also introduce prominent examples of recent AI applications, such as tuberculosis screening in resource-constrained environments, the detection of lung cancer with screening CT, and the diagnosis of coronavirus disease 2019 (COVID-19) in “Potential examples of AI-assisted clinical practice for thoracic diseases practices” section. We also provide examples of prognostic predictions and new discoveries beyond existing clinical practices in the “Prognostic prediction and new discoveries” section. Lastly, we close our review with a discussion of challenges and future directions of AI applications in thoracic radiology. We present the following article in accordance with the Narrative Review reporting checklist (available at https://dx.doi.org/10.21037/jtd-21-1342).

Application schemes of AI tools in clinical practices

It is now well known that AI shows expert-level performance in interpreting medical images, but how AI-based tools can help physicians in clinical practice has not been yet established. One of the classic ways of integrating AI-based tools into the existing clinical workflows is the triage scenario, in which an AI system makes a provisional analysis and prioritizes the worklist in terms of the urgency of detected findings (11). It has been mainly investigated in the field of emergency medicine, in which the timely diagnosis and management of acute diseases are critical. Another major use of AI in clinical workflows is the add-on scenario (11). In this scenario, physicians check the results of the AI system during or after their interpretation to improve their diagnostic performance. The results of the AI system could include the probability value of a certain radiologic study being abnormal and localization of a specific disease. Alternatively, the results could be quantification values for further quantitative analysis of pathologies in imaging studies.

AI-based triage and worklist prioritization

Worklist prioritization is an important application of AI in thoracic radiology. It is clinically relevant, especially in the ED, where the timely diagnosis and management of acute diseases can often be critical. In the United States, there were over 130 million total visits to EDs in 2011, accounting for approximately 15% of all hospital visits (45-47). As pneumonia and respiratory symptoms have become one of the most common reasons for ED visits, radiologic studies—particularly CXR, a primary examination tool for evaluations in the ED—have shown a significant increase in use over the past two decades (48). Since the number of physicians, including radiologists, is not sufficient compared to the increasing number of imaging tests, provisional analysis and prioritization by AI systems can directly improve the clinical outcomes of patients whose timely diagnosis is critical. Nam et al. (18) performed simulated reading tests for CXRs from ED patients with and without AI assistance, and they found that radiologists detected significantly more critical (29.2% to 70.8%) and urgent (78.2% to 82.7%) abnormalities when aided by the AI system; furthermore, AI assistance shortened the time-to-report for CXRs of critical and urgent cases (from 3,371.0 to 640.5 s and from 2,127.1 to 1,840.3 s, respectively). shows an example of utilizing AI tools for worklist prioritization.

Figure 1

Implementation of AI CAD into a PACS system for prioritization of chest radiographs. An AI-integrated PACS system can display the results of analysis by AI CAD on the exam list (A). It can provide not only the presence of any abnormal finding, but also the presence of urgent findings requiring timely interpretation (e.g., pneumothorax), along with the corresponding probability scores. An interpreting radiologist can sort chest radiographs by the presence and type of urgent findings or corresponding probability scores to interpret chest radiographs with urgent findings first. A chest radiograph of a 73-year-old female patient shows left hydropneumothorax (B). The AI CAD system identified the pneumothorax with a probability score of 96% (C). CAD, computer aided diagnosis; PACS, picture archiving and communication system. Prioritization of CT images is another important topic in the ED. For example, the usage of computed tomography pulmonary angiography (CTPA) in the ED to diagnose pulmonary embolism (PE) has increased 27-fold over the past two decades (49,50). A triage tool to automatically identify clinically important PEs and prioritize CT images of PE patients can improve care pathways via more efficient diagnoses. Huang et al. (33) developed a DL algorithm to automatically detect PE on volumetric CTPA scans as an end-to-end solution. Without requiring computationally intensive and time-consuming preprocessing, it achieved AUCs of 0.84 and 0.85 on detecting PE in the internal test set and external dataset, respectively. Aortic dissection (AD) is another common emergency that is often fatal. Contrast-enhanced CT is the most commonly used diagnostic modality for AD (51), but detection and prioritization of acute AD on non-contrast-enhanced CT is also useful in the ED. Hata et al. (34) developed a DL algorithm to detect AD on non-contrast-enhanced CT. The developed DL algorithm provided accuracy, sensitivity, and specificity of 90.0%, 91.8%, and 88.2%, respectively, with a cutoff value of 0.400. For the radiologists, the median accuracy, sensitivity, and specificity were 88.8%, 90.6%, and 94.1%, respectively. There was no significant difference in performance between the DL algorithm and the average performance of the radiologists, demonstrating the potential of the DL algorithm for provisional analysis and prioritization of CT images in the ED.

AI as a second reader

Many recent studies have compared radiologists’ performance in image interpretation with and without AI assistance. For identifying abnormalities on CXRs, such as active tuberculosis (20), malignant nodules (21), or major thoracic diseases (19), AI assistance led to meaningful improvements in physician readers’ performance. In this scheme, the AI system provides probability values of specific diseases with or without localization information, and physicians check these results during or after their image interpretation. An example of this type of clinical workflow is presented in .

Figure 2

Identification of a lung nodule on chest radiograph using an AI CAD system A chest radiograph of a 71-year-old male patient shows a nodular opacity in the right upper lung field (A, arrow). The AI CAD system identified the nodule with a probability score of 90% (B). Chest CT of the patient shows an irregular nodule with air-bronchogram and a ground-glass opacity component in the right upper lobe of the lung (C,D). The nodule was proven to be lung cancer after surgery. CAD, computer aided diagnosis. Nam et al. (18) developed a DL algorithm detecting 10 common abnormalities (pneumothorax, mediastinal widening, pneumoperitoneum, nodule/mass, consolidation, pleural effusion, linear atelectasis, fibrosis, calcification, and cardiomegaly) on CXRs, as well as providing a visual localization of the abnormality. Similarly, Seah et al. (31) conducted a study in which 20 radiologists reviewed CXRs across 127 clinical findings with and without the assistance of a DL algorithm, and found that radiologists assisted by the DL algorithm showed much better reading performances, with higher areas under the curve (AUCs) when assisted by the DL algorithm (AUC, 0.808; 95% CI, 0.763–0.839) than when not assisted (AUC, 0.713; 95% CI, 0.645–0.785). The DL algorithm significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings and was statistically non-inferior for 19 (15%) findings; furthermore, no findings showed a decrease in accuracy when radiologists used the DL algorithm. The added value of AI assistance is particularly prominent in specific situations such as emergencies. CXR is a simple and widely accessible imaging modality; however, its interpretation is not easy and often requires a high quality of expertise and experience. Many studies have found substantial discordances in CXR interpretation in the emergency department (ED), ranging from 0.3% to 17% (52-54). This kind of misinterpretation and discordant interpretation of critical cases could directly influence patients’ clinical courses and outcomes. Furthermore, physicians in the ED often have limited time or opportunity to reach an on-call radiologist for consultations (55). Hwang et al. (16) investigated whether the application of a commercially available DL algorithm could enhance clinicians' reading performance for clinically relevant abnormalities on CXRs in the ED setting. Assistance from the DL algorithm improved the sensitivity of radiology residents' interpretation from 65.6% to 73.4%. Later, in 2020, Kim et al. (12) also reported that with the support of a DL algorithm, physicians' diagnostic performance for pneumonia improved (sensitivity: 53.2% to 82.2%; specificity: 88.7% to 98.1%). It is commonly understood that AI assistance brings synergistic effects because it finds missed findings or possible mistakes, like a second reader in double-reading. Double-reading is generally considered to be of added value in diagnostic radiology (56), and high-accuracy AI can act as a competent second reader to reduce perceptual errors (57). However, the interaction between AI systems and physicians is still poorly understood (58), and further research is needed to maximize the synergistic effects of AI.

Automatic quantification for complex quantitative analysis

AI applications in the add-on scenario can provide quantification results from medical images. Many studies have been conducted to find quantitative biomarkers from chest CT for lung cancer, chronic obstructive pulmonary disease (COPD), and interstitial lung disease (ILD) (59-62). However, manual quantification is extremely time-consuming and practically impossible in routine clinical practice. In this context, AI-based quantification has been actively conducted, and several recent studies have shown that AI could improve the quantitative analysis in a highly accurate and time-efficient manner through automatic segmentation of lung parenchyma (63,64), pulmonary lobes (65), airways (66), and pulmonary pathologies (67,68). Hasenstab et al. (35) developed a DL algorithm to stage the severity of COPD through quantification of emphysema and air trapping on chest CT images. They proposed five CT-based COPD stages based on the percentage of emphysema and total lung involvement. The proposed stages correlated with the predicted spirometry-based Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria, with AUCs of 0.86–0.96, and predicted disease progression (odds ratio, 1.50–2.67) and mortality (hazard ratio, 2.23; P<0.001) both with and without the GOLD criteria. Similarly, Chassagnon et al. (36) developed a multicomponent deep neural network (AtlasNet), a DL algorithm for the automatic assessment of the extent of systemic sclerosis–related ILD on chest CT images. AtlasNet performed similarly to radiologists for disease-extent contouring, which is correlated with pulmonary function, to assess CT images from patients with systemic sclerosis–related ILD. The median dice similarity coefficients (DSCs) between the readers and the deep learning ILD contours ranged from 0.74 to 0.75, whereas the median DSCs between the contours from radiologists ranged from 0.68 to 0.71. and present examples of automatic quantification of COPD and ILD on CT, respectively.

Figure 3

Figure 4

Fully automated quantification of non-enhanced chest CT using AI software in 66-year-old man with usual interstitial pneumonia (FEV1/FVC =74%, FVC =53%, FEV1 =56%). (A) Axial, sagittal, coronal, and volume-rendering images of fully automated lung and lobe segmentation results using an AI engine (Aview, version 1.1.39.6; Coreline Soft, Seoul, South Korea). (B) Axial images with/without a lung texture segmentation mask [red = honeycombing (H), orange = reticular opacity (R), cyan = ground glass opacity (G), blue = consolidation (C), yellow = emphysema (E)]. (C) Pie chart (red box) and results table (yellow box) with quantification of texture analysis, based on the lung and lobe segmentation. FEV1/FVC, forced expiratory volume in one second/forced vital capacity.

Fully automated quantification of non-enhanced chest CT using AI software in a 67-year-old man with chronic obstructive pulmonary disease (FEV1/FVC =33%, FEV1 =37%). (A) Axial, sagittal, coronal, and volume rendering images of fully automated lung and lobe segmentation results using an AI engine (Aview, version 1.1.39.6; Coreline Soft, Seoul, South Korea). (B) Coronal image with a LAA (under −950 HU) mask (red box) and a results table (red box) of the quantification analysis, with results such as volume, the LAA under −950 HU, mean lung density, and percentile index based on the lung and lobe segmentation. (C) Volume-rendering image (red box) and results table (yellow box) of a segmented airway with quantification results, including bronchus level, wall thickness, wall area, wall area percent, lumen diameter, lumen area, and tapering ratio. FEV1/FVC, forced expiratory volume in one second/forced vital capacity; LAA, low attenuation area. Fully automated quantification of non-enhanced chest CT using AI software in 66-year-old man with usual interstitial pneumonia (FEV1/FVC =74%, FVC =53%, FEV1 =56%). (A) Axial, sagittal, coronal, and volume-rendering images of fully automated lung and lobe segmentation results using an AI engine (Aview, version 1.1.39.6; Coreline Soft, Seoul, South Korea). (B) Axial images with/without a lung texture segmentation mask [red = honeycombing (H), orange = reticular opacity (R), cyan = ground glass opacity (G), blue = consolidation (C), yellow = emphysema (E)]. (C) Pie chart (red box) and results table (yellow box) with quantification of texture analysis, based on the lung and lobe segmentation. FEV1/FVC, forced expiratory volume in one second/forced vital capacity. Radiomics feature is another important quantitative biomarker that has recently emerged as a new field of radiologic research (69). There are studies that these measures can be strong indicators for lung cancer prognosis and phenotyping (70,71), but at the same time, challenging components of radiomics such as accurate segmentation and reproducibility over various devices institutions (69,71,72). DL is considered a promising method to solve these problems, as it has shown excellent performance in segmentation in chest CTs that can automate time-consuming manual segmentation, and ability to generates various image styles while maintaining the content that can improves radiomics reproducibility (73).

Potential examples of AI-assisted clinical practice for thoracic diseases

This section describes some prominent applications of AI-assisted systems for major thoracic diseases. We cover tuberculosis screening in resource-constrained environments, lung cancer detection on chest CT, and the diagnosis of COVID-19 through imaging studies.

Tuberculosis screening in resource-constrained environments

Although displaced by COVID-19 in 2020, tuberculosis was the leading cause of death among infectious diseases until 2019 (29). Because of the high proportion of undetected patients and the potential to reduce mortality through early detection and treatment, the World Health Organization (WHO) has recommended systematic screening for tuberculosis for people at risk since 2013 (74). CXR is an effective screening tool for both children and adults due to its reasonably high sensitivity and specificity for tuberculosis detection (75). However, in low- and middle-income countries (LMICs), CXR often shows lower sensitivity and specificity for tuberculosis detection than expected, which is related to the lack of well-trained radiologists. In fact, the sensitivity and specificity of tuberculosis diagnosis through CXR in a Nepalese center were 78% and 51%, respectively, and in Yogyakarta, Indonesia, the sensitivity and specificity were 88.6% and 82.9% (76,77). Furthermore, van't Hoog et al. compared the sensitivity and specificity of each tuberculosis screening method in seven countries including Kenya, Cambodia, and Vietnam. Although CXR outperformed symptom-based screening, there was substantial variation across countries (78). Various attempts have been made to overcome this limitation, and computer-aided diagnosis (CAD) has emerged as a potential solution for tuberculosis screening. Because tuberculosis presents heterogeneous radiologic findings, early AI models with human-derived features did not show satisfactory performance and thus were not applied in practice for screening. However, in recent years, the application of DL has led to remarkable improvements in tuberculosis screening models. In 2016, Hwang et al. (79) developed a tuberculosis screening model by applying a deep CNN, which was in the spotlight in the image processing field at the time. They trained the model with 10,848 CXR images and tested it with datasets from Korea, the United States, and China to obtain AUCs of 0.964, 0.88, and 0.93, respectively. In 2019, Hwang et al. (20) developed a 27-layer deep CNN model and validated it with six external multicenter, multinational datasets. The created model showed sensitivity of 94.3–100% and specificity of 91.1–100%, with significantly higher performance in both classification and localization than a group of physicians consisting of non-radiology physicians, board-certified radiologists, and thoracic radiologists. shows an example of identifying tuberculosis using a DL-based AI solution in clinical practice.

Figure 5

Identification of a chest radiograph from a patient with active pulmonary tuberculosis using an AI CAD system. (A) chest radiograph of a 52-year-old male patient with a cough shows clustered consolidation and nodules at the left lung apex (A, arrows). The AI CAD system identified the lesion with a probability score of 86% (B). Chest CT of the patient shows irregular consolidation and micronodules with bronchiectasis in the left upper lobe of the lung (C,D). The patient was diagnosed with active pulmonary tuberculosis by sputum acid-fast bacilli culture. CAD, computer aided diagnosis. Driven by these advances in deep learning technology, the updated WHO guideline in 2020 recommended the use of CAD for tuberculosis screening for individuals aged 15 years and older in populations in which tuberculosis screening is recommended (75). The performance of commercialized CAD systems was found to be non-inferior to that of physicians when applied in various regions, including LMICs. Qin et al. (80) tested three deep learning systems with the Xpert MTB/RIF assay-proven dataset from Nepal and Cameroon and obtained AUCs of 0.92, 0.94, and 0.94, respectively. When the sensitivity was matched with that of the radiologists, the specificity of two out of three systems was significantly higher than that of the radiologists. Khan et al. (81) applied commercial DL algorithms in Pakistan and showed that the sensitivity and specificity satisfied the WHO guidelines of 90% and 70%, respectively. CAD in LMICs can perform tuberculosis screening, thereby improving health equity and accessibility and reducing mortality due to tuberculosis.

Lung cancer screening programs

In 2020, 2.21 million people were newly diagnosed with lung cancer, making it the second most diagnosed cancer after breast cancer. The number of cancer deaths due to lung cancer was 1.79 million, the highest among all cancers (28). The US National Lung Screening Trial (NLST) research team found that screening for lung cancer with low-dose CT (LDCT) in high-risk populations could reduce mortality from lung cancer by 20% (82). The US Preventive Services Task Force currently recommends lung cancer screening with LDCT for adults aged 50 to 80 years with a smoking history of 20 pack-years or more, who are currently smoking or who have quit smoking within the past 15 years (83). The European Society of Radiology and the European Respiratory Society also recommend lung cancer screening in routine clinical practice at certified multidisciplinary medical centers (84). However, the increased number of CT scans is beyond the amount that radiologists can handle, and the high false-positive rate puts a strain on the lung cancer management system. In this context, CAD could be an option for dealing with personnel shortages and false positives in lung cancer screening programs. There are two ways of utilizing CAD for lung cancer screening programs. The first is to mark the location of pulmonary nodules, and the second is to determine whether detected pulmonary nodules are malignant. In terms of nodule detection, CAD models before the use of DL techniques showed insufficient performance to be implemented in clinical practice (85-87). The performance gradually improved with the application of CNN-based DL models, and in 2016, the LUng Nodule Analysis (LUNA) challenge was held for complete nodule detection and false-positive reduction based on 888 annotated images. The best model of the challenge achieved a sensitivity of 93% for the individual model and a sensitivity of 95% or more for the combined model (88). Recent studies focus on determining whether a detected nodule is malignant or not beyond simple detection of pulmonary nodule. Ciompi et al. (37) developed a DL model that classified lung nodules into solid, non-solid, part-solid, calcified, perifissural, and spiculated, which showed improved accuracy (39.9% vs. 79.5%) compared to conventional machine learning. Its accuracy was not inferior to that of six physician observers (69.6% vs. 72.9%). Ardila et al. (38) developed a model to predict the risk of lung cancer based on CT scans from the National Lung Cancer Screening Trial cases. The model had an AUC of 0.94, with an 11% reduction in false-positive rates and a 5% reduction in false-negative rates compared to radiologists when no prior CT images were provided. An example of a lung nodule identified on screening low-dose chest CT using an AI system is presented in .

Figure 6

Identification and classification of a lung nodule on screening low-dose chest CT using an AI system. A screening low-dose chest CT scan of a 57-year-old ex-smoker (45 pack-years, quit smoking 7 years before) shows a small nodule with a cystic appearance at the left lower lobe of the lung (A). An AI system automatically identified the nodule. The average diameter of the nodule measured by the AI system was 10.2 mm, corresponding to Lung-RADS category 4A (B). A chest CT obtained 2 years later shows growth of the nodule, which was proven to be lung cancer (C).

Diagnosis of COVID-19 in pandemic areas

Since its emergence in late 2019, COVID-19 has spread worldwide, with more than 202 million confirmed patients and more than 4.2 million deaths worldwide as of August 8, 2021 (89). Healthcare workers have also been devoting more time and energy to COVID-19-related medical duties, facing shortages of equipment and supplies, as well as staffing shortages (90). To address these problems, researchers have rushed to develop AI models to support clinicians. The primary diagnostic method for COVID-19 is the detection of SARS-CoV-2 via real-time reverse transcriptase-polymerase chain reaction (RT-PCR) in respiratory specimens. Although RT-PCR is the gold standard to diagnose COVID-19, imaging can complement its use to achieve greater diagnostic certainty and even serve as an alternative method in some regions where RT-PCR is not readily available. In some cases, CXR may exhibit findings of abnormalities in patients who initially had a negative RT-PCR test (91), and several recent studies have shown that chest CT has a higher sensitivity for COVID-19 than RT-PCR and can be considered as a screening tool for COVID-19 in pandemic areas (92-94). Applying AI methods to COVID-19 radiologic imaging might enhance the accuracy of the diagnosis compared with RT–PCR, while also resolving the shortage of healthcare workers in pandemic areas. For example, AI can assist in the automated diagnosis and screening of COVID-19 using image analysis from CXR (95), CT scans (96-98) and lung ultrasonography (99). Harmon et al. (39) showed that a DL algorithm could achieve up to 90.8% accuracy, with 84% sensitivity and 93% specificity in the detection of COVID-19 pneumonia on chest CT using multinational datasets. In addition, some studies have suggested that AI can assist radiologists in distinguishing COVID-19 from other pulmonary infections on CXR (22) and chest CT (100,101). AI models have the potential to exploit the vast amount of multimodal data collected from patients and, if successful, could transform the detection, diagnosis, and triage of patients with suspected COVID-19. shows an example of an AI-assisted interpretation of CXR with COVID-19-associated pneumonia.

Figure 7

Identification of pneumonia associated with COVID-19 on chest radiograph using an AI CAD system. A chest radiograph of a 54-year-old male patient with COVID-19 shows diffusely increased opacities in both lung fields (A). The AI CAD system identified the opacities with a probability score of 99% (B). Chest CT of the patient shows regions of ground-glass opacities in both peripheral lungs, suggesting pneumonia associated with COVID-19 (C,D). COVID-19, coronavirus disease 2019; CAD, computer aided diagnosis. Despite the potential of AI-assisted practice, there are still many limitations in deploying AI tools into the clinical workflow. Roberts et al. (102) found that none of the 415 papers selected for their study had a sufficiently documented manuscript describing a reproducible method, a methodology that followed best practices for developing AI models, and sufficient external validation. Further studies are required to address these issues before AI can take its place in the clinic.

Prognostic prediction and new discoveries

Most AI systems regarding thoracic diseases have focused on assisting the detection and diagnosis of radiologic abnormalities or diseases on imaging studies. Since the radiologic findings associated with specific thoracic diseases are well understood, AI systems are trained and evaluated to mimic the clinical practices of detecting disease-related findings. However, radiologists rarely know the outcome of patients undergoing radiologic examinations a decade later; therefore, it is difficult to determine the imaging features with long-term prognostic value. DL algorithms can independently extract features from a large amount of data, and they have the potential to find novel imaging biomarkers. Thus, prognostication and therapeutic response prediction may be another important application of AI for thoracic diseases. Early attempts to use AI for prognostic prediction were based on clinical information of patients (such as demographics, laboratory test results, treatment types, or gene expression data), and machine learning techniques have already shown superior performance over the existing survival prediction models (103,104). However, unlike mortality prediction models based on structured clinical information, the usage of image data for prognostic prediction is challenging. In this context, human-derived feature extractions from images can be advantageous, but the loss of important information during these procedures is unavoidable. Hence, in 2017, González et al. (40) introduced a CNN-based DL model for acute respiratory disease event prediction (C-index, 0.64 and 0.55 for internal and external validation, respectively) and mortality prediction (C-index, 0.72 and 0.60, respectively) in smokers relying only on CT image data. To compare the performance of conventional machine learning techniques with a DL-based prognostication model utilizing imaging data alone, Hosny et al. (41) presented a DL model for 2-year mortality prediction of non-small-cell lung cancer (NSCLC) patients. The model, which was trained and evaluated on seven independent NSCLC patient datasets across five institutions, outperformed existing structured data–based techniques, with AUCs of 0.70 and 0.71 for 2-year mortality after the start of radiotherapy and after surgery, respectively. Lu et al. (42) presented CXR-risk, a CNN-based DL model for 12-year mortality prediction from a single CXR. The high performance of CXR-risk proved the ability of CNN models to extract hidden prognostic information from medical images. Additionally, in a recent study of Chao et al. (43), a CNN and Tri2D-Net based DL model successfully predicted cardiovascular mortality with LDCT images (AUC, 0.768), outperforming existing DL models. DL can be used to find new discoveries for applications that are not part of current clinical practices. Raghu et al. (44) developed a DL algorithm that can estimate biological age (CXR-Age) from a chest radiograph to predict longevity beyond chronological age. Interestingly, their external validation tests performed on the PLCO and NLST populations showed significant improvements in the prediction of both long-term all-cause and cardiovascular mortality when CXR-age was used instead of chronological age (44).

Challenges and future directions of AI in pulmonary medicine

Applications of AI for thoracic diseases have demonstrated promising results in augmenting existing clinical systems, and prognostic prediction. However, there are still substantial limitations. For augmenting existing systems, as in the add-on scenario, it is not clear how to effectively integrate AI tools with physician decision-makers. Indeed, some studies showed no improvement of clinical outcomes with AI assistance in randomized controlled trials (105,106). The interaction between AI models and human users is poorly understood and little work has evaluated the potential impact of such systems. Interestingly, Gaube et al. (58) reported that radiologists rated advice as lower-quality when that advice seemed to come from an AI system, while non-radiology physicians with less task-expertise did not. Diagnostic accuracy was significantly worse when participants received inaccurate advice, regardless of the purported source. Their work raised the importance of the quality of advice and the importance of how advice (both from AI and non-AI sources) should be delivered in clinical environments. Studies on DL models that can ingest multimodal data, including imaging data, clinical information and hopefully, genetic information, are also needed. Currently, the most advanced DL model for radiology applications only considers pixel value information, without data on the clinical background (107). However, in practice, relevant clinical information allows clinicians to interpret imaging results in the appropriate clinical context, providing information relevant for clinical decision-making and improving the diagnosis and prognosis. Apart from the prognostic prediction of thoracic diseases, several DL models have demonstrated improved performance of prognostic prediction based on imaging data when complemented by other multimodal data. For example, in a study by Cheerla et al. (108), a pancancer survival prediction model integrating clinical, mRNA, miRNA, and whole-slide imaging data exhibited a C-index of 0.78. As asserted by Warth et al. (109), there exists a definite correlation between the morphological features of pathological images and genetic data in adenocarcinomas. This implies that integrating clinical and genetic data into image-based DL models may improve their performance. Reproducibility and generalizability are also important issues. In many studies reporting excellent results, AI systems were only tested with internal validation data, which are retrospectively and non-rigorously collected (110). In this case, validation data may have an enriched disease prevalence and a narrow disease spectrum. In contrast, the population in real-world situations may have a much lower disease prevalence and a much broader spectrum of diseases, potentially hindering the performance of the DL algorithm. Thus, AI system needs to be further validated in real-world situations before they are implemented in clinical practice. AI systems also need to be appropriately explained, with a particular focus on the logical background of their output. In order for a DL algorithm to receive credit or acceptance from physicians, it should appropriately explain the logical basis of the output (111). In particular, if a DL algorithm is used for prognostic prediction beyond existing clinical systems, clinicians may want to know why the algorithm provides certain outcomes based on their existing knowledge. AI systems should provide interpretability to receive credibility from physicians and be implemented in clinical practice.

Conclusions

AI has shown considerable potential for many thoracic diseases, particularly in the field of thoracic radiology. This promising technique is expected to effectively address various clinical problems that have not been solved due to a lack of clinical resources or technological limitations. AI could be a cost-effective and excellent second reader, providing automated quantification and prioritization. In addition, AI models could be used to improve screening for tuberculosis and lung cancer, as well as for prognostic prediction. It is necessary not only to advance the performance of AI systems, but also to understand and discuss how to use AI in clinical practice. With advances in technology and appropriate preparation of physicians, AI could transform current medical practices and contribute to improving human health. The article’s supplementary files as

102 in total

1. Increasing Utilization of Chest Imaging in US Emergency Departments From 1994 to 2015.

Authors: Jonathan H Chung; Richard Duszak; Jennifer Hemingway; Danny R Hughes; Andrew B Rosenkrantz
Journal: J Am Coll Radiol Date: 2019-01-02 Impact factor: 5.532

2. Fully Automated Lung Lobe Segmentation in Volumetric Chest CT with 3D U-Net: Validation with Intra- and Extra-Datasets.

Authors: Jongha Park; Jihye Yun; Namkug Kim; Beomhee Park; Yongwon Cho; Hee Jun Park; Mijeong Song; Minho Lee; Joon Beom Seo
Journal: J Digit Imaging Date: 2020-02 Impact factor: 4.056

Review 3. Quantitative computed tomography of chronic obstructive pulmonary disease.

Authors: Harvey O Coxson; Robert M Rogers
Journal: Acad Radiol Date: 2005-11 Impact factor: 3.173

4. Reduced lung-cancer mortality with low-dose computed tomographic screening.

Authors: Denise R Aberle; Amanda M Adams; Christine D Berg; William C Black; Jonathan D Clapp; Richard M Fagerstrom; Ilana F Gareen; Constantine Gatsonis; Pamela M Marcus; JoRean D Sicks
Journal: N Engl J Med Date: 2011-06-29 Impact factor: 91.245

Review 5. Quantitative computed tomography measurements to evaluate airway disease in chronic obstructive pulmonary disease: Relationship to physiological measurements, clinical index and visual assessment of airway disease.

Authors: Atsushi Nambu; Jordan Zach; Joyce Schroeder; Gongyoung Jin; Song Soo Kim; Yu-Il Kim; Christina Schnell; Russell Bowler; David A Lynch
Journal: Eur J Radiol Date: 2016-09-13 Impact factor: 3.528

6. Reduction of "callbacks" to the ED due to discrepancies in plain radiograph interpretation.

Authors: C A Preston; J J Marr; K K Amaraneni; B S Suthar
Journal: Am J Emerg Med Date: 1998-03 Impact factor: 2.469

7. Disease Staging and Prognosis in Smokers Using Deep Learning in Chest Computed Tomography.

Authors: Germán González; Samuel Y Ash; Gonzalo Vegas-Sánchez-Ferrero; Jorge Onieva Onieva; Farbod N Rahaghi; James C Ross; Alejandro Díaz; Raúl San José Estépar; George R Washko
Journal: Am J Respir Crit Care Med Date: 2018-01-15 Impact factor: 21.405

8. Deep learning with multimodal representation for pancancer prognosis prediction.

Authors: Anika Cheerla; Olivier Gevaert
Journal: Bioinformatics Date: 2019-07-15 Impact factor: 6.937

9. Deep Learning to Assess Long-term Mortality From Chest Radiographs.

Authors: Michael T Lu; Alexander Ivanov; Thomas Mayrhofer; Ahmed Hosny; Hugo J W L Aerts; Udo Hoffmann
Journal: JAMA Netw Open Date: 2019-07-03

10. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems.

Authors: Zhi Zhen Qin; Melissa S Sander; Bishwa Rai; Collins N Titahong; Santat Sudrungrot; Sylvain N Laah; Lal Mani Adhikari; E Jane Carter; Lekha Puri; Andrew J Codlin; Jacob Creswell
Journal: Sci Rep Date: 2019-10-18 Impact factor: 4.379

1 in total

1. The adding value of contrast-enhanced CT radiomics: Differentiating tuberculosis from non-tuberculous infectious lesions presenting as solid pulmonary nodules or masses.

Authors: Wenjing Zhao; Ziqi Xiong; Di Tian; Kunpeng Wang; Min Zhao; Xiwei Lu; Dongxue Qin; Zhiyong Li
Journal: Front Public Health Date: 2022-10-04

1 in total