Literature DB >> 33877878

Deep learning in structural and functional lung image analysis.

Joshua R Astley1,2, Jim M Wild2, Bilal A Tahir1,2.   

Abstract

The recent resurgence of deep learning (DL) has dramatically influenced the medical imaging field. Medical image analysis applications have been at the forefront of DL research efforts applied to multiple diseases and organs, including those of the lungs. The aims of this review are twofold: (i) to briefly overview DL theory as it relates to lung image analysis; (ii) to systematically review the DL research literature relating to the lung image analysis applications of segmentation, reconstruction, registration and synthesis. The review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. 479 studies were initially identified from the literature search with 82 studies meeting the eligibility criteria. Segmentation was the most common lung image analysis DL application (65.9% of papers reviewed). DL has shown impressive results when applied to segmentation of the whole lung and other pulmonary structures. DL has also shown great potential for applications in image registration, reconstruction and synthesis. However, the majority of published studies have been limited to structural lung imaging with only 12.9% of reviewed studies employing functional lung imaging modalities, thus highlighting significant opportunities for further research in this field. Although the field of DL in lung image analysis is rapidly expanding, concerns over inconsistent validation and evaluation strategies, intersite generalisability, transparency of methodological detail and interpretability need to be addressed before widespread adoption in clinical lung imaging workflow.

Entities:  

Mesh:

Year:  2021        PMID: 33877878      PMCID: PMC9153705          DOI: 10.1259/bjr.20201107

Source DB:  PubMed          Journal:  Br J Radiol        ISSN: 0007-1285            Impact factor:   3.629


Introduction

Respiratory diseases constitute significant global health challenges; five respiratory diseases are among the most common causes of death. 65 million people suffer from chronic obstructive pulmonary disease (COPD) and 339 million from asthma.[1,2] There are 1.8 million new lung cancer cases diagnosed annually and 1.6 million deaths worldwide, making it the most common and deadliest cancer on the planet.[3] Lung imaging is a critical component of respiratory disease diagnosis, treatment planning, monitoring and treatment assessment. Acquiring lung images, processing them and interpreting them clinically are crucial to achieving global reductions in lung-related deaths. Traditionally, the techniques employed to quantitatively analyse these images evolved from the disciplines of computational modelling and image processing; however, in recent years, deep learning (DL) has received significant attention from the lung imaging community. DL is a subfield of machine learning that employs artificial neural networks with multiple deep or hidden layers. Whilst the fundamental theory was posited several decades ago,[4] DL gained international interest in 2012 when AlexNet, a type of neural network referred to as a convolutional neural network (CNN), won the ImageNet Large Scale Visual Recognition Challenge. That paper has been cited over 47,000 times and triggered a renaissance in DL research.[5] Subsequently, CNNs, and DL more generally, began to impact the medical imaging field profoundly. Development of fully convolutional networks such as V-Net and ConvNet demonstrated how deep-layered architectures could provide valuable functions in solving some of the field’s most critical applications, including common image analysis tasks.[6,7] Increased computational power due to the reduced cost of graphical processing units (GPUs) and publicly available annotated imaging data sets have since led to rapid developments and applications.[8] This review assesses the current literature on DL’s role in lung image analysis applications, discusses critical limitations for clinical adoption, and sets out a roadmap for future research.

Theory

Artificial neural networks

An artificial neural network (ANN), inspired by biological neurons, can be thought of as a series of connected nodes containing weights and biases which are combined using an activation function to produce an activation; the activation determines the strength of connections within the network. At the heart of DL is optimisation; an ANN learns by optimising weights and biases for a generalisable solution. This optimisation occurs in a two-step process of forward propagation and backpropagation. A basic diagram of an ANN with two hidden layers and generalised examples of forward propagation and backpropagation are shown in Figure 1. The use of hidden layers in the network allows more freedom for the weights and biases to be optimised. Forward propagation refers to the process of feeding an example to the network during training where the output of the neural network is compared to a desired output and a loss is calculated using a loss function. Backpropagation uses this loss to propagate changes in weights and biases throughout the network; thus by continually providing new examples, known as iterations, the model is optimised to approximate the function between the input and output domains. Figure 2 provides a glossary of the key technical terms used in this review.
Figure 1.

Simplified diagrams of the processes of forward propagation (left) and backpropagation (right) for a neural network with two hidden layers. The neural network is represented as a series of nodes, each of which contains a weight and bias. The weight and bias are combined using the activation function to produce an activation that impacts the strength of connections within the network. Once an input has been passed through the network, it is compared to a desired output, such as an expert segmentation of an anatomical region of interest, to produce a loss. This loss is used to propagate changes to weights and biases, hence, changing the strength of connections for the subsequent example. The continued repetition of this two-step process is known as network training.

Figure 2.

Glossary of key technical terms related to deep learning and image analysis. ANN, artificial neural network.

Simplified diagrams of the processes of forward propagation (left) and backpropagation (right) for a neural network with two hidden layers. The neural network is represented as a series of nodes, each of which contains a weight and bias. The weight and bias are combined using the activation function to produce an activation that impacts the strength of connections within the network. Once an input has been passed through the network, it is compared to a desired output, such as an expert segmentation of an anatomical region of interest, to produce a loss. This loss is used to propagate changes to weights and biases, hence, changing the strength of connections for the subsequent example. The continued repetition of this two-step process is known as network training. Glossary of key technical terms related to deep learning and image analysis. ANN, artificial neural network. The structure of a DL network is known as an architecture. In the medical imaging field, three key architectures, namely, CNNs, recurrent neural networks (RNNs) and generative adversarial networks (GANs) are particularly prevalent. These structures are outlined in Figure 3. Understanding specific architectures such as V-Nets and GANs requires an in-depth understanding of complex linear algebra and matrix manipulation and is beyond this review’s scope; the interested reader is directed to several excellent papers on the subject.[6,9,10]
Figure 3.

Illustration of three common types of deep learning architectures used in medical imaging: (a) CNN), (b) RNN and (c) GAN. In the lung image analysis examples given, the CNN and RNN are used for image segmentation while the GAN is used for image synthesis. CNN, convolutional neural network; GAN, generative adversarial network; RNN, recurrent neural network.

Illustration of three common types of deep learning architectures used in medical imaging: (a) CNN), (b) RNN and (c) GAN. In the lung image analysis examples given, the CNN and RNN are used for image segmentation while the GAN is used for image synthesis. CNN, convolutional neural network; GAN, generative adversarial network; RNN, recurrent neural network.

Preprocessing

Before images are fed into a neural network, they are frequently processed, often by accentuating differences between foreground and background voxels, to enhance performance and/or reduce training time. DL theory suggests that in high-dimensional matrices, local minima are very unlikely; instead, saddle points are more common due to the improbable likelihood that every dimension produces a minimum at the same location. These techniques can decrease the likelihood that the algorithm reaches a shallow saddle point, thereby causing slower optimisation. This is achieved through regularisation techniques and limiting outlier intensities. Cropping is regularly used to restrict the processing to voxels within the patient,[11] or coarse, manually drawn bounding boxes.[12] Table 1 summarises commonly used preprocessing techniques in the DL lung image analysis literature. In CNNs, other techniques such as batch normalisation, have been shown to reduce training time, acting as secondary regularisation techniques to minimise outliers and improve performance.[62,63]
Table 1.

Summary of common pre-processing techniques used for lung image analysis tasks, including values prevalent in the literature

Preprocessing technique Description Modality Literature values References
Thresholding The process of constraining the pixel values of an image to be between predefined values.CT, MRICT intensity:[-1000, 700 HU]MRI intensity: [0,667]Wang et al. (2018),[13] Sousa et al. (2019),[14] Javaid et al. (2018),[15] Hofmanninger et al. (2020),[16] Jiang et al. (2019),[17] Tahmasebi et al. (2018),[18] Z. Zhong et al. (2019),[19] Zhou et al. (2019),[20] Park et al. (2019),[21] Gerard et al. (2019),[22] Yun et al. (2019),[23] Eppenhof & Pluim (2019),[24] Fu et al. (2020),[25] Jiang et al. (2020),[26] De Vos et al.(2019),[27] Stergios et al. (2018),[28] Ren et al. (2019)[29]
Normalisation and whitening The process of transforming the distribution of image pixels to some distribution which is standardised across images.CT, MRI, X-rayNormalisation: [0,1]Mean/variance ≈ 0Wang et al. (2018),[13] Liu et al. (2019),[30] Javaid et al. (2018),[15] Hofmanninger et al. (2020),[16] Akila Agnes et al. (2018),[31] Novikov et al. (2018),[32] Gaal et al. (2020),[33] Jiang et al. (2019),[17] Tahmasebi et al. (2018),[18] Zhou et al. (2019),[20] Hatamizadeh et al. (2019),[34] Sandkühler et al. (2019),[35] Rajchl et al. (2017),[36] Sentker et al. (2018),[37] Fletcher and Baltas (2020),[38] Jiang et al. (2020),[26] De Vos et al.(2019),[27] Galib et al. (2019),[39] Ferrante et al. (2018),[40] Stergios et al. (2018),[28] Beaudry et al. (2019),[41] Duan et al. (2019),[42] Liu et al. (2020),[43] Ren et al. (2019),[29] Olberg et al. (2018)[44]
Denoising The process of removing noise from images in order to improve their quality.CT, MRIGaussian, adaptive patch-basedJ.Xu & Liu (2017),[45] Zha et al. (2019),[46] Tustison et al. (2019)[47]
Bias correction A technique to correct for the low-frequency bias field that corrupts MR images.HP gas MRI, MRIN3/N4 bias correctionTustison et al. (2019),[47] Zha et al. (2019),[46] Rajchl et al. (2017)[36]
Cropping Cropping refers to the process of removing unwanted outer pixels or voxels of an image prior to being inputted to the network. This includes cropping by manually-defined regions of interest or external body masks. Cropping is commonly used to reduce computational cost and/or eliminate the influence of background voxels.CT, MRI, X-ray, PETCropping to body mask, specific organ or manually-defined region.Negahdar et al. (2018),[12] Soans & Shackleford (2018),[48] Zhu et al. (2019),[49] Hofmanninger et al. (2020),[16] Zha et al. (2019),[46] Hooda et al. (2018),[50] Mittal et al. (2018),[51] Jiang et al. (2018),[11] Zhao et al. (2019),[52] Zhou et al. (2019),[20] Moriya et al. (2018),[53] Kalinovsky et al. (2017),[54] Sandkühler et al. (2019),[35] Anthimopoulos et al. (2019),[55] Gao et al. (2016),[56] Rajchl et al. (2017),[36] C. Wang et al. (2019),[57] Juarez et al. (2019),[58] Juarez et al. (2018),[59] Eppenhof & Pluim (2019),[24] Sentker et al. (2018),[37] Fletcher and Baltas (2020),[38] Blendowski & Heinrich (2019),[60] Zhong et al. (2019),[61] Liu et al. (2020),[43] Olberg et al. (2018)[44]

HU, Hounsfield unit; PET, Positron emission tomography.

Modalities included are those for which the pre-processing techniques have been used in the reviewed studies. This is not an exhaustive list of pre-processing techniques used.

Summary of common pre-processing techniques used for lung image analysis tasks, including values prevalent in the literature HU, Hounsfield unit; PET, Positron emission tomography. Modalities included are those for which the pre-processing techniques have been used in the reviewed studies. This is not an exhaustive list of pre-processing techniques used.

Validation

Validation is used to evaluate the performance of trained DL networks and assess their generalisability to non-experimental settings. The goal is to develop a validation strategy that best represents the situation in which the algorithm is to be deployed.

Evaluation metrics

It is imperative to evaluate the performance of DL algorithms accurately. Evaluation metrics can be categorised into overlap, distance, error and similarity metrics and are summarised in Figure 4.
Figure 4.

Overview of four key categories of evaluation metrics (overlap, distance, error and similarity) used to evaluate the performance of deep learning methods in medical image analysis. Each category contains brief descriptions and mathematical formulations for some common metrics. In these equations, ‘x’ and ‘y’ denote the prediction and target of any deep learning task, respectively.

Overview of four key categories of evaluation metrics (overlap, distance, error and similarity) used to evaluate the performance of deep learning methods in medical image analysis. Each category contains brief descriptions and mathematical formulations for some common metrics. In these equations, ‘x’ and ‘y’ denote the prediction and target of any deep learning task, respectively.

Validation techniques

Aside from the training set, an internal validation set is commonly used for tuning DL parameters to improve performance. A testing set is then used to provide an unbiased evaluation of performance on unseen data. In this review, validation sets used throughout the training phase are counted as training sets as the network has previously seen these images before testing. Therefore, the data split is the percentage of the total data used for training and internal validation vs that used for testing. Maintaining completely separate testing sets is somewhat uncommon in the literature and represents the ideal form of validation.[22,23,64] Validating on external multicentre data sets that have not been used for training should be the gold-standard in ensuring comparison between methods and generalisability.[65] However, this is uncommon as single-centre data sets, split into training and testing sets, are frequently used. To make the validation process more robust and generalisable, specific techniques are applied, such as k-fold cross-validation. In fourfold cross-validation, the datas et is randomly partitioned into a 75/25% training/testing split; this process is repeated with four different 25% blocks. Another approach is leave-one-out cross-validation which uses all of the data for training except one case for testing and repeats until all cases have been evaluated.

Methods

The protocol for this literature review was performed using the preferred reporting items for systematic reviews and meta-analyses (PRISMA)-statement.[66] The literature search was conducted on 1 April 2020 using multiple databases (Web of Science, Scopus, PubMed) and aimed to identify studies written in English published between 1 January 2012, the same year that the seminal AlexNet paper was published,[5] and the date of the search. The search strategy is defined in Figure 5. Further studies that met the selection criteria were identified by handsearching references and through the authors’ input.
Figure 5.

The search strategy used on Scopus, Web of Science and PubMed to identify relevant studies for inclusion in the review. Further studies that met the selection criteria were identified by handsearching references and through the authors’ input.

The search strategy used on Scopus, Web of Science and PubMed to identify relevant studies for inclusion in the review. Further studies that met the selection criteria were identified by handsearching references and through the authors’ input. Several recent reviews have focussed primarily on DL-based lung classification and detection[67-69]; accordingly, this review was limited in scope to the lung image analysis applications of segmentation, registration, reconstruction and synthesis. Both published peer-reviewed scientific papers and conference proceedings were included due to recent developments in the field.

Results and discussion

Study selection

479 non-overlapping papers were retrieved. 355 papers were excluded due to not meeting the eligibility criteria. In particular, many papers focused on classification or used traditional machine learning techniques beyond this review’s scope. Upon reviewing the remaining papers, 82 studies were included for analysis. The PRISMA flowchart is shown in Figure 6.
Figure 6.

PRISMA flowchart of studies identified, screened, assessed for eligibility and included in the literature review analysis. PRISMA, preferred reporting items for systematic reviews and meta-analyses.

PRISMA flowchart of studies identified, screened, assessed for eligibility and included in the literature review analysis. PRISMA, preferred reporting items for systematic reviews and meta-analyses. No studies that met the inclusion criteria were published before 2016 with the majority appearing since 2018. Image segmentation applications accounted for 65.9% of the studies reviewed. The remaining 34% are divided between synthesis, reconstruction and registration applications. Full details are shown in Figure 7.
Figure 7.

Graphical overview of the number of studies per year for the four image analysis applications considered in this review. 2020 values calculated up to 1 April 2020.

Graphical overview of the number of studies per year for the four image analysis applications considered in this review. 2020 values calculated up to 1 April 2020. The majority of studies reviewed used structural imaging modalities (87.8%), with most using CT (63.5%). Functional lung imaging studies only constitute 12.1% of the reviewed studies and are spread across PET, SPECT and hyperpolarised gas MRI. Graphical summaries of the studies reviewed with respect to disease present in patient cohorts, imaging modality and architecture are shown in Figure 8.
Figure 8.

Graphical overview of breakdown of deep learning lung image analysis studies reviewed by (a) disease present in patient cohorts, (b) imaging modality and (c) architecture. Absolute numbers of papers are provided in (a, b).

Graphical overview of breakdown of deep learning lung image analysis studies reviewed by (a) disease present in patient cohorts, (b) imaging modality and (c) architecture. Absolute numbers of papers are provided in (a, b).

Segmentation

Image segmentation is the process of partitioning an image into one or more segments that encompass anatomical or pathological specific regions of interest (ROIs), such as the lungs, lobes, or a tumour. Studies describing DL-based segmentation applications of pulmonary ROIs are summarised in Table 2.
Table 2.

Summary of reviewed studies on deep learning for lung image segmentation. The entries are arranged alphabetically by pulmonary region of interest (ROI), followed by modality

Study Modality ROI Disease Number of subjects Dimentionality Architecture Pre-processing Percentage data split(training*/testing) Performance
Wang et al. (2018)[13]CTWhole lungCOPD, IPF5752DResNet-101Clipped −1000 to +1000 HU, Normalisation [0,1]5-fold CVDSC = 0.988 ± 0.012ASD = 0.562±0.52 mm
Dong et al. (2019)[70]CTWhole lungLung cancer353DU-Net-GANLOOCVDSC = 0.97±0.01HD95 = 2.29±2.64 mmMSD = 0.63±0.63 mm
Liu et al. (2019)[30]CTWhole lungNR1002DSegNetClass grouping, Normalisation [−1000,800]40/60DSC = 0.98
Lustberg et al. (2018)[71]CTWhole lungLung cancer470NRCNN95/5DSC = 0.99±0.01Median HD = 0.4±0.2 cm
Negahdar et al. (2018)[12]CTWhole lungMultiple833DV-NetBounding box for lung, cropped to bounding box 58/42DSC(n = 12)=0.983±0.002DSC(n = 23)=0.990±0.002
Soans & Shackleford (2018)[48]CTWhole lungLung cancer4223DCNN with spatial constraintsROI extraction for organ localisation71/29ROC(Left)=0.954ROC(right)=0.949
Soliman et al. (2018)[72]CTWhole lungNR953DDeep-CNNPost-processed hole fillingLOOCVDSC = 0.984±0.068HD95 = 2.79±1.32 mmPVD = 3.94±2.11%
Sousa et al. (2019)[14]CTWhole lungLung lesion9083DModified V-NetClipped [−1000, 400 HU]98/2ASD = 0.576 mmDSC = 0.987
X. Zhou et al. (2017)[73]CTWhole lungNR1062D/3DFCN VGG16Transfer learning from ImageNet ILSVRC‐201495/5JSC = 0.903±0.037
Zhu et al. (2019)[49]CTWhole lungLung Cancer663DU-NetCropping to ROI55/45DSC = 0.95±0.01MSD = 1.93±0.51 mmHD95 = 7.96±2.57 mm
Gerard et al. (2018)[74]CTWhole lungCOPD, IPF17493DCourse-Fine ConvNetTransfer learning from COPDGene and SPIROMICS, fine-tuned on animal model92/8JSC = 0.99ASD = 0.29 mm
Javaid et al. (2018)[15]CTWhole lungLung cancer132DDilated U-NetOnly axial slices selected, clipped −1000 to 3000 HU, Normalisation [0,1]94/6DSC = 0.99 ± 0.01HD ≈ 4.5 mm
J. Xu & Liu (2017)[45]CTWhole lungNR202DMFCNNgaussian denoising50/50DSC = 0.754
Hu et al. (2020)[75]CTWhole lungNR752DMask R-CNN +k-meansNRDSC = 0.973 ±0.032
Hofmanninger et al. (2020)[16]CTWhole lungMultiple2662DU-NetBody mask, Clipped [−1024, 600 HU], Normalisation [0,1]87/13DSC = 0.98 ±0.03HD95 = 3.14 ±7.4 mmMSD = 0.62 ±0.93
Xu et al. (2019)[76]CTWhole lungLung cancer, COPD2242Done layer CNNPost-processed hole filling8-fold CVDSC = 0.967 ±0.001HD = 1.44±0.04 mm
Tustison et al. (2019)[47]HP gas MRIProton MRIFunctional lungWhole lungNRNR1132682D3DU-NetU-NetTemplate-based data augmentation, N4 bias correction, denoising65/3577/23DSC (HP gas)=0.92DSC (Proton) = 0.94
Akila Agnes et al. (2018)[31]LDCTWhole lungNR2202D CDWNNormalised [mean = 0]91/9DSC = 0.95 ± 0.03JSC = 0.91 ± 0.04
Zha et al. (2019)[46]UTE proton MRIWhole lungHealthy, CF, asthma452DCED (U-Net and autoencoder)Denoising, bias field correction, body mask5-fold CVDSC (right) = 0.97±0.015DSC (left) = 0.96±0.012
Hwang & Park (2017)[77]X-rayWhole lungHealthy, lung nodules2472DU-Net2-fold CVDSC = 0.980±0.008JSC = 0.961±0.015ASD (mm) = 0.675±0.122ACD (mm) = 1.237±0.702
Souza et al. (2019)[78]X-rayWhole lungHealthy, Tuberculosis1382DResNet-18 with FC layerScaled to same input size, post processing erosion, dilation, filtering73/27DSC = 0.936JSC = 0.881
Dai et al. (2018)[64]X-rayWhole lungHealthy, Tuberculosis, lung nodules3852DSCAN (structure correcting adversieral network)Scaled to same input size85/15IoU = 94.7±0.4%DSC = 0. 973 ± 0.02
C. Wang (2017)[79]X-rayWhole lungHealthy, lung nodules2472DMulti task U-NetScaled to same input size, post processing hole fillingNRJSC = 0.959 ± 0.017AD = 1.29 ± 0.80 mm
Novikov et al. (2018)[32]X-rayWhole lungHealthy, lung nodules2472DInvertedNet + All-dropoutNormalised [mean = 0, SD = 0]3-fold CVDSC = 0.974JSC = 0.949
Hooda et al. (2018)[50]X-rayWhole lungHealthy, Tuberculosis, lung nodules3852DFCN-8+dropoutScaled to same input size, random cropping75/25DSC = 0.959
Mittal et al. (2018)[51]X-rayWhole lungHealthy, Tuberculosis, lung nodules3852DLF-SegNetScaled to same input size, random cropping48/52DSC = 0.951
Gaal et al. (2020)[33]X-rayWhole lungHealthy, Tuberculosis, lung nodules10472DAdversarial attention U-NetScaled to same input size, CLAHE, Normalisation [−1,1]24/76DSC = 0.962±0.04
Chen et al. (2019)[80]CTLung tumourLung cancer1343DHSN (2D + 3D CNN)78/22DSC = 0.888±0.033
Jiang et al. (2018)[11]CT, MRILung tumourLung cancer400CT (377)MRI (23)2DTumour aware semi-supervised Cycle-GANScaled to same input size, Image synthesis from CT to MRI, body mask98/2DSC = 0.63 ± 0.24HD95 = 11.65±6.53
Jiang et al. (2019)[17]CT, MRILung tumourLung cancer405CT (377)MRI (28)2DTumour aware pseudo MR and T2w MR U-NetScaled to same input size, Image synthesis from CT to MR, Clipped [−1000,500 HU] and [0,667], Normalised [−1, 1]95/5DSC = 0.75±0.12HD95 = 9.36±6.00 mmVR = 0.19±0.15
Tahmasebi et al. (2018)[18]MRILung tumourLung cancer62DAdapted FCNRescaled 10–95% of intensities, Normalisation [0,1]5-fold CVDSC = 0.91 ± 0.03HD = 2.88 ± 0.86 mmRMSE = 1.20 ± 0.34
Z. Zhong et al. (2019)[19]FDG PET, CTLung tumourLung cancer60PET (60)CT (60)3DDFCN Co-Seg U-NetScaled to same input size, Clipped [−500,200 HU] and [0.01,20]80/20DSC (CT) = 0.861±0.037DSC (PET) = 0.828±0.087
Zhao et al. (2019)[52]PET, CTLung tumourLung cancer84PET (84)CT (84)3DV-Net +feature fusionCropped to ROI57/43DSC = 0.85±0.08VE = 0.15±0.14
Zhou et al. (2019)[20]CTLung tumourNR13503DP-SiBATransfer learning from ImageNet ILSVRC‐2014, Cropped to ROI, Rescaled by +1000 HU and dividing by 3000 and Normalisation [0,1]NRDSC = 0.809 ± 0.12HD = 7.612 ± 5.03 mmvs = 0.883 ± 0.13
Moriya et al. (2018)[53]Micro CTLung tumourLung cancer33DJULE CNN + k-meansBody mask, patch extractionNMI = 0.390
Imran et al. (2019)[81]CTLobesCOPD, ILD5633DProgressive dense V-Net48/52DSC (n = 84)=0.939±0.02DSC (n = 154)=0.950±0.007DSC (n = 55)=0.934
Park et al. (2019)[21]CTLobesCOPD1963DU-NetClipped [-1024,–400 HU]80/20DSC = 0.956 ± 0.022JSC = 0.917 ± 0.031MSD = 1.315 ± 0.563HSD = 27.89±7.50
Wang et al. (2018)[13]CTLobesCOPD, IPF12803DDenseNetClipped −1000 to +1000 HU, Normalisation [0,1]5-fold CVDSC = 0.959±0.087ASD = 0.873±0.61 mm
Hatamizadeh et al. (2019)[34]CTLung lesionNR873DDALS CNNScaled to same input size, Normalisation [NR]90/10DSC = 0.869 ± 0.113HD = 2.095 ± 0.623 mm
Kalinovsky et al. (2017)[54]CTLung lesionTuberculosis3382DGoogLeNet CNNImages cropped into four quadrants80/20IoU = 0.95ROC = 0.775
Gerard et al. (2019)[22]CTLung fissureCOPD, Lung cancer53273DTwo Seg3DNetsClipped [-1024,–200 HU], Linear rescaling30/70ASD = 1.25SDSD = 2.87
Sandkühler et al. (2019)[35]MRILung defect regionNR352DGAE-LAE RNN with LCI LossZ-normalisation [−4,4], Lung mask, Normalisation [0,1], Histogram stretching80/20Qualitative evaluation - 42% images rated ‘very good’, 19% rated ‘perfect’
Vakalopoulou et al. (2018)[82]CTILD patternILD462DAtlasNet37/63DSC = 0.677HD = 3.981 mmASD = 1.274 mm
Anthimopoulos et al. (2019)[55]CTILD patternILD1722DFCN-CNNPre-computed lung mask5-fold CVAccuracy = 81.8%
B. Park et al. (2019)[83]CTILD patternCOP, UIP, NSIP6472DU-Net88/12DSC = 0. 988 ± 0.006JSC = 0.978 ± 0.011MSD = 0.27 ± 0.18 mmHSD = 25.47 ± 13.63 mm
Gao et al. (2016)[56]CTILD patternILD172DCNN based CRF unary classifierTransfer learning from ImageNet, Pre-computed lung maskAccuracy = 92.8%
Suzuki et al. (2020)[84]CTDiffuse lung diseaseNR3723DU-Net5-fold CVDSC = 0.780±0.169
Wang et al. (2018)[85]MRIFoetal lungNR182DBIFSeg P-NetTrained on different organs, Image specific fine-tuning66/33DSC = 0.854±0.059
Rajchl et al. (2017)[36]MRIFoetal lungHealthy, IUGR553DDeepCut CNN + CRFBounding box for ROI, Bias correction, Normalisation [mean = 0], Transfer learning from LeNet5-fold CVDSC = 0.749±0.067
Edmunds et al. (2019)[86]Cone-beam CTDiaphragmLung cancer102DMask R-CNNScaled to same input size9-fold CVMean error = 4.4 mm
C. Wang et al. (2019)[57]CTAirwaysNR383DSpatial-CNN (U-Net)Random cropping92/8 3-fold MCCVDSC = 0. 887 ± 0.012CO = 0.766 ± 0.06
Juarez et al. (2019)[58]CTAirwaysLung cancer323DU-Net GNNBounding box for ROI63/37DSC = 0.885Airway completeness = 74%
Yun et al. (2019)[23]CTAirwaysCOPD892D2.5D CNNClipped [−700,700 HU]78/22Mean Branch detected = 65.7%
Juarez et al. (2018)[59]CTAirwaysHealthy, CF, CVID243DU-NetBounding box for ROI75/25DSC = 0.8

ACD, Average contour distance; AD, Average distance; ASD, Average surface distance; CDWN, Convolutional deep wide network; CE, Classification error; CF, Cystic fibrosis; CLAHE, Contrast limited adaptive histogram equalisation; CNN, Convolutional neural network; CO, Centreline overlap; COPD, Chronic obstructive pulmonary disorder; CV, Cross-validation; CVID, Common variable immunodeficiency disorders; DSC, Dice similarity coefficient; FDG, Fluorine-18‐fluorodeoxyglucose; GAN, Generative adversarial network; HD95, Hausdorff distance 95%; HD, Hausdorff distance; HSD, Hausdorff surface distance; HU, Hounsfield unit; ILD, Interstitial lung disease; IPF, Idiopathic pulmonary fibrosis; IUGR, Intrauterine growth restriction; IoU, Intersection over union; JSC, Jaccard similarity coefficient; LOOCV, Leave-one-out cross-validation; MAP, Mean average precision; MCCV, Monte carlo cross-validation; MSD, Mean surface distance; NMI, Normalised mutual information; NR, Not reported; NSIP, Nonspecific interstitial pneumonia; PVD, Percent ventilated defect; RMSE, Root mean square error; ROC, Receiver operating characteristic; ROI, Region of interest; SD, Standard deviation; SDSD, Standard deviation of surface distances; UIP, Usual interstitial pneumonia; VE, Volume error; VR, Relative volume ratio; VS, Volumetric similarity.

The entries are arranged alphabetically by pulmonary ROI, followed by modality.

The training data set includes internal validation data.

Summary of reviewed studies on deep learning for lung image segmentation. The entries are arranged alphabetically by pulmonary region of interest (ROI), followed by modality ACD, Average contour distance; AD, Average distance; ASD, Average surface distance; CDWN, Convolutional deep wide network; CE, Classification error; CF, Cystic fibrosis; CLAHE, Contrast limited adaptive histogram equalisation; CNN, Convolutional neural network; CO, Centreline overlap; COPD, Chronic obstructive pulmonary disorder; CV, Cross-validation; CVID, Common variable immunodeficiency disorders; DSC, Dice similarity coefficient; FDG, Fluorine-18‐fluorodeoxyglucose; GAN, Generative adversarial network; HD95, Hausdorff distance 95%; HD, Hausdorff distance; HSD, Hausdorff surface distance; HU, Hounsfield unit; ILD, Interstitial lung disease; IPF, Idiopathic pulmonary fibrosis; IUGR, Intrauterine growth restriction; IoU, Intersection over union; JSC, Jaccard similarity coefficient; LOOCV, Leave-one-out cross-validation; MAP, Mean average precision; MCCV, Monte carlo cross-validation; MSD, Mean surface distance; NMI, Normalised mutual information; NR, Not reported; NSIP, Nonspecific interstitial pneumonia; PVD, Percent ventilated defect; RMSE, Root mean square error; ROC, Receiver operating characteristic; ROI, Region of interest; SD, Standard deviation; SDSD, Standard deviation of surface distances; UIP, Usual interstitial pneumonia; VE, Volume error; VR, Relative volume ratio; VS, Volumetric similarity. The entries are arranged alphabetically by pulmonary ROI, followed by modality. The training data set includes internal validation data.

CT segmentation

CT is the most common modality for clinical lung imaging due to superior spatial resolution, rapid scan times and widespread availability. This is reflected in the DL lung segmentation literature with the majority of studies to date focusing on CT. For whole-lung segmentation, 3D networks are often used, whereas in interstitial lung disease (ILD) pattern segmentation, only 2D networks have been applied to date. The application often dictates the use of 2D and 3D networks; segmentation of the whole lung leads to a volumetric 3D region in which features such as overall lung shape, or the position of the trachea can be encoded. In contrast, segmenting ILD patterns is often conducted on central 2D slices; hence, a 2D network may be more appropriate as, in this approach, no features are conserved between slices.[55,83] Across the CT papers reviewed, both the median and mode training/testing data splits were 80/20%, with many using k-fold cross-validation with less than 50 patients. Even as an independent testing set, using only 5–10 patients for testing limits generalisability. Moreover, some studies cite the number of images or 2D slices rather than the number of subjects. If data from the same subject are included in both the testing and training phases, it is likely that the algorithm has already seen a similar slice from the same patient as the individual data points are spatially correlated and do not strictly represent independent data points. The Dice similarity coefficient (DSC) overlap metric is the most common evaluation metric used. Most studies tackling whole-lung segmentation report DSC values above 0.90, with some achieving values above 0.98. For other pulmonary ROIs, the highest DSC values reported are often lower (e.g. DSC (airways) ≈ 0.85). However, overlap metrics such as the DSC can be insensitive to errors in large volumes as the percent error is low compared to the overall pixel count.[87] Frequently, high DSC values are reported despite errors that require significant manual intervention before a segmentation is clinically useful. As the airways occupy smaller volumes, the DSC metric is more sensitive. In terms of Hausdorff-based distance metrics, whole-lung segmentation studies report HD95 values ≈10 mm; however, Dong et al[70] report a HD95 as low as 2.249 ± 1.082 mm averaged across both lungs. The lack of a standardised evaluation metric can make direct comparisons between different methods challenging. Image segmentation is challenging to evaluate. Currently, manual segmentations by expert observers are used as the gold-standard; however, it is well-known that expert segmentations are susceptible to interobserver variability.[88] Often, only one observer segments all the images in a training data set; hence, if a different observer segments the testing images, the algorithm may not perform as expected. This poses problems for widespread generalisation if certain biases in segmentation are preserved as there is no clear ‘true’ expert segmentation; therefore, differences in DL segmentations and expert segmentations may not be solely the result of DL errors. Most expert segmentations are conducted using semi-automatic software and image editing tools; the tools given to the user can convey a propensity for features, such as smooth lung borders, which may, in fact, be inaccurate. In other anatomical sites such as the liver, a DSC of 0.95 was obtained by DL; the interobserver variability for the DL approach was 0.69% compared to 2.75% for manual expert observers.[89] The low degree of interobserver variability in DL segmentations may be a positive step towards consistent segmentations between institutions. Using multiple expert segmentations and averaging the error may reduce interobserver variability effects; however, this is unlikely to be widely adopted due to the time required. In addition, medical imaging grand challenges can provide diverse data from multiple institutions with corresponding expert segmentations, limiting the extent of individual researcher bias.

MRI segmentation

There are limited studies to date regarding pulmonary MRI segmentation, attributable perhaps to less widespread clinical use of the modality and lack of large-scale annotated pulmonary MRI data sets. However, pulmonary MRI techniques, such as contrast-enhanced lung perfusion MRI and hyperpolarised gas ventilation MRI, can provide further insights into pulmonary pathologies currently not possible with alternative techniques.[90] Quantitative biomarkers derived from hyperpolarised gas MRI, including the ventilated defect percentage, require accurate segmentation of ventilated and whole-lung volumes which can be very time consuming when performed manually. Example images of DL-based hyperpolarised gas MRI segmentations are provided in Figure 9.
Figure 9.

Example images from the authors’ own work using deep learning for hyperpolarised gas MRI segmentation. The 129Xe MR ventilation images are taken from three subjects in a testing set, a healthy volunteer, asthma patient and cystic fibrosis patient. The patient images selected are characterised by significant ventilation defects. These are compared to expert segmentations of the same image. DSC values are displayed for all images. DSC, Dice similarity coefficient.

Example images from the authors’ own work using deep learning for hyperpolarised gas MRI segmentation. The 129Xe MR ventilation images are taken from three subjects in a testing set, a healthy volunteer, asthma patient and cystic fibrosis patient. The patient images selected are characterised by significant ventilation defects. These are compared to expert segmentations of the same image. DSC values are displayed for all images. DSC, Dice similarity coefficient. Tustison et al[47] used CNNs to provide fast, accurate segmentations for hyperpolarised gas and proton MRI.[47] A 2D U-Net was used for hyperpolarised gas MRI segmentation whilst a 3D U-Net was used for proton MRI segmentation. They introduced a novel template-based data augmentation method to expand the limited lung imaging data. Hyperpolarised gas and proton MR images were segmented with DSC values of 0.94 ± 0.03 and 0.94 ± 0.02, respectively. Zha et al evaluated DL-based proton MRI segmentation, which yielded an average DSC of 0.965 across both lungs, outperforming conventional region growing and k-means techniques.[46]

X-ray segmentation

Although the majority of segmentation studies reviewed used CT and MRI, early studies focused on X-ray segmentation.[77,79] This was due to the public availability of large-scale, annotated X-ray datasets, such as the Japanese Society of Radiological Technology (JSRT)[91] and Montgomery[92] data sets, enabling researchers to experiment with large numbers of images not previously accessible. The majority of X-ray studies reviewed used these datasets, making comparisons between methods more applicable.[32,50,51,64,78,79]

Registration

Image registration is the process of transforming a moving image onto the spatial domain of a fixed image. Registration is used in numerous applications within the lung imaging field, including adaptive radiotherapy,[93] computation of functional lung metrics such as the VDP[94] and generation of surrogates of regional lung function from multi-inflation CT[95] or 1H MRI.[96] However, most image registration algorithms assume that the moving and fixed images’ topology are the same. This is not always the case in lung imaging as often functional images do not follow the same topology as structural images, especially in individuals with severe pathologies where functional lung images may show substantial heterogeneity.[97] Studies describing DL-based pulmonary registration applications are summarised in Table 3.
Table 3.

Summary of reviewed studies using deep learning for lung image registration

Study Modality Disease Public data set Number of subjects Dimensionality Architecture Preprocessing Percentage data split(training*/testing) Performance
Eppenhof et al. (2018)[98]4DCTLung cancerDIR-LAB, CREATIS173DModified VGGSynthetic DVFs for data augmentation42 (CREATIS) / 58 (DIR-LAB)TRE = 4.02±3.08
Eppenhof & Pluim (2019)[24]4DCTLung cancerDIR-LAB, CREATIS173DModified U-NetSynthetic DVFs for data augmentation, Resized, Pre-computed body mask, intensity-based lung mask < −250 HU42 (CREATIS) / 58 (DIR-LAB)TRE = 2.17±1.89 mm
Ali & Rittscher (2019)[99]4DCTLung cancerDIR-LAB, CREATIS172DConv2Wrap (Linear and Deformable ConvNet)58 (DIR-LAB) / 42 (CREATIS)DSC = 0.90JSC = 0.84
Sentker et al. (2018)[37]4DCTLung cancerDIR-LAB, CREATIS863DGDL-FIRE4D U-Net with VarRegNormalisation [0,1], Cropped to same input size, Pre-computed body mask69/31 (DIR-LAB, CREATIS, In house)TRE (DIR-LAB) = 2.50±1.16 mmTRE (CREATIS) = 1.74±0.57 mm
Fletcher and Baltas (2020)[38]4DCTLung cancerDIR-LAB, CREATIS, Sunnybrook313DU-Net one-shot learningPre-computed body mask, Normalisation [mean = 0, SD = 1]LOOCV (DIR-LAB)0/100 (CREATIS)TRE (DIR-LAB) = 1.83±2.35 mmTRE (CREATIS) = 1.49±1.59 mm
Fu et al. (2020)[25]4DCTLung cancerDIR-LAB203DLungRegNet (CourseNet, FineNet)Vessel enhancement, Clipped at −700 HU5-fold CV, DIR-LAB testingMAE (in house)=52.1±18.4TRE (in house)=1.00±0.53TRE (DIR-LAB) = 1.59±1.58 mm
Jiang et al. (2020)[26]4DCTLung cancerDIR-LAB, SPARE323DMJ-CNNClipped [-1000,–200 HU], Normalisation [0,0.2]75 (SPARE, DIR-LAB) / 25 (DIR-LAB)TRE = 1.58±1.19 mm
De Vos et al.(2019)[27]4DCT, CTLung cancerDIR-LAB, NLST20703DDLIR framework ConvNetClipped [-1000,–200 HU], Normalisation [0,1]99 (NLST) / 1 (NLST, DIR-LAB)DSC (NLST) = 0.75±0.08HD (NLST) = 19.34±13.41TRE (DIR-LAB) = 5.12±4.64 mm
Sokooti et al. (2017)[100]CTCOPD193DRegNet CNNSynthetic DVFs for data augmentation, Initial affine registration63/37(SPREAD)TRE = 4.39 ± 7.54 mm
Sokooti et al. (2019)[101]CT, 4DCTLung cancer, COPDSPREAD, DIR-LAB393DRegNet CNN (U-Net)Synthetic DVFs for data augmentation, Initial affine registration54 (SPREAD, DIR-LAB COPD) / 46 (SPREAD, DIR-LAB)TRE (DIR-LAB) = 1.86±2.12 mm
Blendowski & Heinrich (2019)[60]CTCOPDDIR-LAB103DCNNCropped to lung regionLOOCV(DIR-LAB)TRE = 3.00 ± 0.48 mm
Qin et al. (2019)[102]CT, MRICOPDCOPDGene10002DUMDIR-LaGANCross-modality registration, transformation into domain invariant latent space90/10(COPDGene)DSC = 0.967±0.03HD = 8.257±4.43 mmMCD = 0.71±0.44 mm
Galib et al. (2019)[39]CT, CBCTHealthy, COPD, Lung cancerDIR-LAB, VCU273DCNNNormalisation [0,1]37 (DIR-LAB) / 63(VCU)AUC-ROC = 0.882±0.11 CI=68%
Ferrante et al. (2018)[40]X-rayHealthy, Lung noduleJSRT2472DU-NetNormalisation [0–1], Domain adaption Cardiac MR81/19(JSRT)MAD ≈ 6.3CMD ≈ 5 mmDSC ≈ 0.9
Mahapatra et al. (2018)[103]X-rayMultipleNIH-ChestXray144202DJRSNet (cycleGAN with U-Net)Joint segmentation and registrationNR(SCR, NIH-ChestXray14)TRE = 7.75 mm
Stergios et al. (2018)[28]MRISystemic sclerosis, healthy413DCNN with transformation layerClipped [0, 1300], Normalisation [0,1]68/32DSC = 0. 915 ± 2.33Euclydian error = 4.358 mm

AUR-ROC, Area under curve-receiver operator characteristic; CMD, Contour mean distance; CNN, Convolutional neural network; COPD, Chronic obstructive pulmonary disorder; CV, Cross-validation; DLIR, Deep learning image registration; DSC, Dice similarity coefficient; HD, Hausdorff distance; HU, Hounsfield unit; JSC, Jaccard similarity coefficient; LOOCV, Leave-one-out cross-validation; MAD, Mean absolute differences; MAE, Mean absolute error; MCD, Mean contour distance; MRF, Markovian random field; TRE, Target registration error; VGG, Visual geometry group.

Summary of reviewed studies using deep learning for lung image registration AUR-ROC, Area under curve-receiver operator characteristic; CMD, Contour mean distance; CNN, Convolutional neural network; COPD, Chronic obstructive pulmonary disorder; CV, Cross-validation; DLIR, Deep learning image registration; DSC, Dice similarity coefficient; HD, Hausdorff distance; HU, Hounsfield unit; JSC, Jaccard similarity coefficient; LOOCV, Leave-one-out cross-validation; MAD, Mean absolute differences; MAE, Mean absolute error; MCD, Mean contour distance; MRF, Markovian random field; TRE, Target registration error; VGG, Visual geometry group. Eppenhof and Pluim[24] built upon previous work by Lafarge et al[98] using publicly available data sets to directly map displacement vector fields from inspiratory and expiratory CT pairs using a 3D U-Net with extensive data augmentation. Synthetic transforms were used to directly train the network as the deformation fields are known. The approach achieved fast, accurate registrations, reducing mean TRE from 8.46 to 2.17 mm. The results are further validated using landmarks from multiple observers, indicating the level of interobserver variability. Notwithstanding, only 24 images for testing and training were used, limiting the study’s generalisability. In addition, synthetic transforms do not directly represent real transforms likely found in patients. Other approaches use a CNN to learn expressive local binary descriptors from landmarks before applying Markov random field registration.[60] This is compared to a method using handcrafted local descriptors with high self-similarity, facilitating faster computation. The results suggest that a combination of both CNN-learned descriptors and handcrafted features produce the best registration results. In a generic registration approach, a U-Net-like architecture with a differentiable spatial transformer that can register both X-ray and MR images was used.[40] The algorithm was evaluated using the contour mean distance (CMD). CMD was approximately 5 mm on average across the testing data. Whilst this is a less accurate registration than other methods reviewed, it is more broadly applicable; the generic algorithm (in this case trained on X-ray and MR images) can learn features that are independent of modality. By fixing these weights and adding additional layers, transfer learning can then be applied to a specific modality; the additional data across modalities may lead to improved results.[104]

Reconstruction

Image reconstruction is the process of generating a usable image from the raw data acquired by a scanner. CT and SPECT reconstruction fundamentally differ from MRI reconstruction and, as such, the role of DL in these applications is also different. CT and SPECT reconstruction use analytic (e.g. filtered backprojection) or iterative algorithms to produce 3D images from projections taken at multiple angles around a subject. MRI reconstruction, in contrast, produces images by transforming raw k-space data via Fourier transforms. Full details of image reconstruction methods have been described elsewhere.[105,106] Studies describing DL-based lung image reconstruction applications are summarised in Table 4.
Table 4.

Summary of reviewed studies using deep learning for lung image reconstruction

Study Modality Disease Number of patients Dimensionality Architecture Preprocessing Percentage data split(training*/testing) Performance
Beaudry et al. (2019)[41]4D cone beam CTLung cancer162DSino-Net (Modified U-Net)Cropped to same input size, Sinogram Normalisation [0,1]88/12RMSE Translational = 1.67 mm(other metrics given)
Lee et al. (2019)[107]CTCOPD602DFCNNo sinogram usedDataset 1: 80/20Dataset 2: 40/60Mean reduction RMSE (Dataset 1) = 65.7±15.8%Mean reduction RMSE (Dataset 2) = 59.6±5.5%
Ge et al. (2020)[108]CTLiver lesion54132DADAPTIVE-NET CNNConvert from HU to linear attenuation coefficient90/10PSNR = 43.15±1.9SSIM = 0.968±0.013Normalized RMSE = 0.0071±0.002
Duan et al. (2019)[42]HP Gas MRICOPD, nodule, PTB, healthy, asthma722DC-Net and F-Net (U-Net based)Under sampled K-space (AF = 4), Removed SNR below 6.6, Normalisation [0,1]NRMAE = 4.35%SSIM = 0.7558VDP bias = 0.01±0.91%
Dietze et al. (2019)[109][99]mTc-MAA SPECTLiver Cancer1282DCNNInitial filtered back projection94/6LSF = 5.1%CNR = 12.5

CNN, Convolutional neural network; CNR, Contrast to noise ratio; COPD, Chronic obstructive pulmonary disorder; EIT, Electrical impedance tomography; HU, Hounsfield unit; LSF, Lung shunting fraction; MAE, Mean absolute error; PSNR, Peak signal to noise ratio; PTB, Pulmonary tuberculosis; RMSE, Root mean square error; SSIM, Structural similarity index metric; VDP, Ventilation defect percentage; VDP, Volume defect percentage; 99mTc-MAA, Technetium-99m macroaggregated albumin.

The training data set includes internal validation data

Summary of reviewed studies using deep learning for lung image reconstruction CNN, Convolutional neural network; CNR, Contrast to noise ratio; COPD, Chronic obstructive pulmonary disorder; EIT, Electrical impedance tomography; HU, Hounsfield unit; LSF, Lung shunting fraction; MAE, Mean absolute error; PSNR, Peak signal to noise ratio; PTB, Pulmonary tuberculosis; RMSE, Root mean square error; SSIM, Structural similarity index metric; VDP, Ventilation defect percentage; VDP, Volume defect percentage; 99mTc-MAA, Technetium-99m macroaggregated albumin. The training data set includes internal validation data CT/SPECT images can be reconstructed accurately using Monte-Carlo-based iterative reconstruction[110]; however, this process is computationally expensive and time-consuming.[111] In addition, multiple studies have demonstrated the success of analytical methods such as filtered backprojection.[105] Building upon this, CNNs have been used to speed up the process of filtered backprojection to shorten reconstruction times.[109] The results suggest DL can accurately reconstruct SPECT images in under 10 sec. Furthermore, the authors compare clinical metrics, such as the lung shunting fraction (LSF), between methods in a specific time frame. DL produced an LSF of 4.7% comparable to 5.8% for Monte-Carlo methods, indicating the potential for use in clinical applications.[109] Multiple studies have employed DL for MRI reconstruction[112] but only one published study has applied it to pulmonary MRI.[42] MRI of the lungs can take upwards of 10 sec to acquire, often requiring that patients maintain inflation levels for a significant period; this can be particularly challenging for patients with severe lung pathologies. Compressed sensing can be used to reconstruct randomly undersampled k-space in conjunction with regularisation methods to produce accurate reconstructions in hyperpolarised gas MRI[113,114] and enables reduced acquisition time without significantly reducing image quality. A coarse-to-fine neural network has been proposed to yield an accurate hyperpolarised gas MRI scan with an accelerating factor of 8 (undersampled 1/8 of k-space).[42] The method can also improve inherent spatial coregistration accuracy when acquiring proton and hyperpolarised gas MRI in the same breath,[115] possibly alleviating the need for substantial post-acquisition image registration. Tangentially related to the goal of image reconstruction, images can also be improved further using image enhancement at the post-acquisition stage. Multiple studies have shown the effectiveness of using CNNs combined with gradient regularisation and superresolution modules to enhance low-dose CT images with noise and artefacts, potentially limiting radiation exposure without degrading image quality.[116,117]

Synthesis

Image synthesis, also referred to as regression, is the process of generating artificial images of unknown target images from given source images. Synthesis has been applied to a range of applications, such as generating functional or metabolic images from structural images. For example, estimating contrast-based functional images from routinely acquired non-contrast structural modalities reduces the need for additional scans, specialised equipment and administration of contrast agents. Even within traditional model-based techniques, accurate synthesis has proved challenging due to the complex mathematical functions mapping input to output images. The development of DL architectures such as GANs enables a more unsupervised approach, which lends itself to the complex problem of synthesis.[9] Studies describing DL-based lung image synthesis applications are summarised in Table 5.
Table 5.

Summary of reviewed studies using deep learning for lung image synthesis

Study Modality(originaltarget) Disease Number of subjects Dimensionality Model Preprocessing Percentage data split(training*/testing) Performance
Bi et al. (2017)[118]CT ⇒ FDG PETLung cancer502DMultichannel-GAN (U-Net)Manual segmentation of tumour/lymph nodes, axial slices containing tumours only50/50MAE = 4.6PSNR = 28.06
Jang et al. (2019)[119]CT ⇒[99]mTc-MAA SPECT perfusionLung cancer542DConditional GANResized images, segmentation and removal of bone, soft tissue and heart91/9MS-SSIM = 0.87γ index 2%/2mm = 97.7±1.2%
Zhong et al. (2019)[61]4DCT ⇒ CT ventilationLung cancer, COPD822DDeep CNNImages cropped to ROI10-fold CVMSE = 7.6%γ index 5%/5mm = 80.6±1.4%SSIM = 0.880±0.035
Liu et al. (2020)[43]4DCT ⇒[99]mTc-Technegas SPECT ventilationLung cancer, oesophageal cancer502DU-NetPre-computed lung mask, normalisation [0,1], post-processing normalisation [90th percentile]10-fold CVSpearman’s ρ = 0.73±0.17DSC = 0.73±0.09
Ren et al. (2019)[29]CT ⇒[99]mTc-MAA SPECT perfusionLung cancer303DU-NetClipped [-1000,–300 HU] for segmentation, normalisation [0,1]83/17Correlation coefficient = 0.53 ± 0.14
Preiswerk et al. (2018)[120]Ultrasound ⇒ MRINR73DLRCNPCA = 10 components66/33 (conducted in time segments)SSE = 39.0 ± 12
Olberg et al. (2018)[44]MRI ⇒ CTNR41NRGAN (U-Net)Normalisation [NR], pre-computed body mask90/103D γ index passing rate 99.2%Lung V20% difference = 0.11%

CNN, Convolutional neural network; COPD, Chronic obstructive pulmonary disease; FDG, Fluorine-18‐fluorodeoxyglucose; GAN, Generative adversarial network; HU, Hounsfield unit; LRCN, Long-term recurrent convolutional network; MAE, Mean absolute error; MSE, Mean square error; MS-SSIM, Multi-scale structural similarity index metric; NR, Not reported; PCA, Principle component analysis; PSNR, Peak signal to noise ratio; ROI, Region of interest; SSE, Sum of squared error; 99mTc-MAA, Technetium-99m macroaggregated albumin.

Summary of reviewed studies using deep learning for lung image synthesis CNN, Convolutional neural network; COPD, Chronic obstructive pulmonary disease; FDG, Fluorine-18‐fluorodeoxyglucose; GAN, Generative adversarial network; HU, Hounsfield unit; LRCN, Long-term recurrent convolutional network; MAE, Mean absolute error; MSE, Mean square error; MS-SSIM, Multi-scale structural similarity index metric; NR, Not reported; PCA, Principle component analysis; PSNR, Peak signal to noise ratio; ROI, Region of interest; SSE, Sum of squared error; 99mTc-MAA, Technetium-99m macroaggregated albumin. DL has been used to generate synthetic fluorine-18‐fludeoxyglucose (FDG) PET images from CT images via a GAN.[118] The GAN’s inputs were varied to include either a CT image, label, or both CT and corresponding label; the multichannelled GANs (M-GAN) provided the most accurate synthetic PET images, demonstrating that multiple inputs increase synthesis accuracy. To explore this further, the authors also evaluate the synthetic PET images by feeding them into a network as training data. The network aims to delineate tumours by learning relationships from the training data; the data were then divided into real PET images and synthetic PET images. The trained model was then evaluated on unseen tumour detection problems. The synthetic PET-trained network produced 2.79% lower recall accuracy. This indicates that, as a whole, the synthetic PET images are closely related to the real images in terms of tumour identification. The paper posits that synthetic PET images can be used as additional training data in other DL tasks. However, it is unclear if synthetic PET images can be used in treatment planning and other clinical tasks with this level of accuracy.[118] GANs have continued to show promise in synthesis problems.[119] CT images have been used to generate SPECT images via a conditional GAN (cGAN) instead of a CNN.[29] The method used a 2D GAN with 49 patients consisting of 3054 2D images as training data; the testing data contains 5 patients. cGANs differ from the regular GAN architecture by using both the observed image and a random noise vector, mapping these to the output image instead of only the noise vector. The generator used is based on the U-Net architecture with multiple inputs. Synthetic and real SPECT images were compared using the multiscale structural similarity index measure (MS-SSIM), yielding MS-SSIM = 0.87. Further analysis used a γ index with a passing rate of 97.7±1.2% with 2%/2 mm. The authors note qualitatively that errors occur more frequently at the base of the lungs, possibly caused by the increased deformation in this region. A key limitation for synthesis methods is the errors introduced by the registration of source and target images. Consequently, it has been suggested that images that are not matched anatomically due to breathing discrepancies are excluded,[119] complicating validation for clinical adoption.[29,119] A major application of DL image synthesis is for MR-guided radiotherapy. The current paradigm in radiotherapy is to derive electron density information required for dose calculations directly from CT scans; MRI does not directly provide this information. DL has been invoked to generate pseudo-CT images for use in MR-guided stereotactic body radiotherapy using GANs, precluding the need for CT.[44] Zhong et al used a CNN to synthesise ventilation images from 4DCT scans.[61] Whilst good performance was observed, the major limitation of this study is that the target images in the training phase were CT-based surrogates of ventilation generated from aligned inspiratory and expiratory CT scans via deformable registration and computational modelling. These images are still the subject of intense validation efforts.[121] Using more direct measures of regional lung function, such as hyperpolarised gas MRI, and larger data sets are critical to the success of future work in structure-to-function DL synthesis applications.

Future research directions

The studies reviewed show that DL has significant potential to outperform more traditional methods in a wide range of lung image analysis applications. Novel ways of using DL to synthesise more training examples[122] or combine segmentation and registration in one process[103] have been shown to enhance performance. The scope of such innovation is still in its infancy, providing an opportunity for novel technical developments. As shown through the improved performance observed by combining traditional approaches with machine learning and DL for registration, great synergy can be achieved by combining DL and conventional image processing approaches.[60] In image synthesis, researchers have developed techniques to synthesise CT images from MRI scans of the brain[123]; similar advancements in lung imaging would allow patients to receive less radiation exposure as well as reduce the cost and time for additional scans. Using synthesis to generate functional lung images from routinely acquired structural images would allow clinicians to understand which areas of the lungs are ventilated or perfused without the need to acquire dedicated functional scans, which often require contrast agents and specialised equipment, reducing costs and acquisition times. Such applications require further DL research in architectural development and the input of lung imaging experts. Using DL for CT enhancement to reduce radiation dose or improve compressed sensing methods in MRI has the potential to reduce scan times, improving image quality and patient compliance. Promising results have been shown for both proton MRI and hyperpolarised gas MRI segmentation[47]; however, further work is required to demonstrate accurate MRI segmentation in an independent multicentre validation. The importance of collaborative research to boost training data and inject heterogeneity of centre and scanner will lead to more robust and generalisable models. The paucity of published DL studies in functional lung imaging (only 12.9% of reviewed studies here) provides significant opportunities for innovations and further research in this field. The literature on CT segmentation provides a positive picture of the success of DL methods in providing fast, accurate automatic segmentations. However, producing impressive results in a research setting is no substitute for clinical validation. Long-term clinical case studies are required with large numbers of patients before these novel developments have a real impact. The ‘black box’ nature of DL methods and the lack of explainability of generated outputs can undermine clinicians and patients’ trust, despite, or even because of, an unprecedented level of hype. Another challenge is transparency; although most software used for DL is well documented and open source, a requirement for continued use, the open-source nature also generates safety concerns relating to software edits and bugs. Developing a standardised literature consensus on validation and evaluation procedures is key to ensuring transparency. All of these challenges need to be overcome before DL can live up to its full potential.

Conclusions

We have reviewed the role of DL for several lung image analysis tasks, including segmentation, registration, reconstruction and synthesis. CT-based lung segmentation was the most prevalent application where exceptional performance has been demonstrated. However, research in other applications and modalities, including functional lung imaging, is still in its infancy. A concerted effort from the research community is required to develop the field further. Before widespread clinical adoption is achievable, challenges remain concerning validation strategies, transparency and trust.
  71 in total

1.  Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method.

Authors:  Xiangrong Zhou; Ryosuke Takayama; Song Wang; Takeshi Hara; Hiroshi Fujita
Journal:  Med Phys       Date:  2017-08-31       Impact factor: 4.071

2.  Combining MRF-based deformable registration and deep binary 3D-CNN descriptors for large lung motion estimation in COPD patients.

Authors:  Max Blendowski; Mattias P Heinrich
Journal:  Int J Comput Assist Radiol Surg       Date:  2018-11-14       Impact factor: 2.924

3.  Fully Automated Lung Lobe Segmentation in Volumetric Chest CT with 3D U-Net: Validation with Intra- and Extra-Datasets.

Authors:  Jongha Park; Jihye Yun; Namkug Kim; Beomhee Park; Yongwon Cho; Hee Jun Park; Mijeong Song; Minho Lee; Joon Beom Seo
Journal:  J Digit Imaging       Date:  2020-02       Impact factor: 4.056

4.  Cross-modality (CT-MRI) prior augmented deep learning for robust lung tumor segmentation from small MR datasets.

Authors:  Jue Jiang; Yu-Chi Hu; Neelam Tyagi; Pengpeng Zhang; Andreas Rimner; Joseph O Deasy; Harini Veeraraghavan
Journal:  Med Phys       Date:  2019-08-20       Impact factor: 4.071

5.  Learning a variational network for reconstruction of accelerated MRI data.

Authors:  Kerstin Hammernik; Teresa Klatzer; Erich Kobler; Michael P Recht; Daniel K Sodickson; Thomas Pock; Florian Knoll
Journal:  Magn Reson Med       Date:  2017-11-08       Impact factor: 4.668

6.  Comparison of CT-based Lobar Ventilation with 3He MR Imaging Ventilation Measurements.

Authors:  Bilal A Tahir; Cedric Van Holsbeke; Rob H Ireland; Andrew J Swift; Felix C Horn; Helen Marshall; John C Kenworthy; Juan Parra-Robles; Ruth Hartley; Richard Kay; Chris E Brightling; Jan De Backer; Wim Vos; Jim M Wild
Journal:  Radiology       Date:  2015-08-28       Impact factor: 11.105

7.  Non-contrast-enhanced perfusion and ventilation assessment of the human lung by means of fourier decomposition in proton MRI.

Authors:  Grzegorz Bauman; Michael Puderbach; Michael Deimling; Vladimir Jellus; Christophe Chefd'hotel; Julien Dinkel; Christian Hintze; Hans-Ulrich Kauczor; Lothar R Schad
Journal:  Magn Reson Med       Date:  2009-09       Impact factor: 4.668

8.  An effective approach for CT lung segmentation using mask region-based convolutional neural networks.

Authors:  Qinhua Hu; Luís Fabrício de F Souza; Gabriel Bandeira Holanda; Shara S A Alves; Francisco Hércules Dos S Silva; Tao Han; Pedro P Rebouças Filho
Journal:  Artif Intell Med       Date:  2020-01-08       Impact factor: 5.326

9.  ADAPTIVE-NET: deep computed tomography reconstruction network with analytical domain transformation knowledge.

Authors:  Yongshuai Ge; Ting Su; Jiongtao Zhu; Xiaolei Deng; Qiyang Zhang; Jianwei Chen; Zhanli Hu; Hairong Zheng; Dong Liang
Journal:  Quant Imaging Med Surg       Date:  2020-02

10.  CT Image Conversion among Different Reconstruction Kernels without a Sinogram by Using a Convolutional Neural Network.

Authors:  Sang Min Lee; June Goo Lee; Gaeun Lee; Jooae Choe; Kyung Hyun Do; Namkug Kim; Joon Beom Seo
Journal:  Korean J Radiol       Date:  2019-02       Impact factor: 3.500

View more
  5 in total

1.  Large-scale investigation of deep learning approaches for ventilated lung segmentation using multi-nuclear hyperpolarized gas MRI.

Authors:  Joshua R Astley; Alberto M Biancardi; Paul J C Hughes; Helen Marshall; Laurie J Smith; Guilhem J Collier; James A Eaden; Nicholas D Weatherley; Matthew Q Hatton; Jim M Wild; Bilal A Tahir
Journal:  Sci Rep       Date:  2022-06-22       Impact factor: 4.996

2.  BJR functional imaging of the lung special feature: introductory editorial.

Authors:  Philippe A Grenier; Eric A Hoffman; Nicholas Screaton; Joon Beom Seo
Journal:  Br J Radiol       Date:  2022-04       Impact factor: 3.629

3.  CAttSleepNet: Automatic End-to-End Sleep Staging Using Attention-Based Deep Neural Networks on Single-Channel EEG.

Authors:  Tingting Li; Bofeng Zhang; Hehe Lv; Shengxiang Hu; Zhikang Xu; Yierxiati Tuergong
Journal:  Int J Environ Res Public Health       Date:  2022-04-25       Impact factor: 3.390

4.  Investigating the use of machine learning to generate ventilation images from CT scans.

Authors:  James Grover; Hilary L Byrne; Yu Sun; John Kipritidis; Paul Keall
Journal:  Med Phys       Date:  2022-05-15       Impact factor: 4.506

Review 5.  Computer-assisted image-based risk analysis and planning in lung surgery - a review.

Authors:  Stefan Krass; Bianca Lassen-Schmidt; Andrea Schenk
Journal:  Front Surg       Date:  2022-09-22
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.