Literature DB >> 36059280

An Effective Approach for Automated Lung Node Detection using CT Scans.

Mohammad Amin Moragheb¹, Ali Badie², Ali Noshad³.

Abstract

Background: Pulmonary or benign nodules are classified as nodules with a diameter of 3 cm or less and defined as non-cancerous nodules. The early diagnosis of malignant lung nodules is important for a more reliable prognosis of lung cancer and less invasive chemotherapy and radiotherapy procedures. Objective: This study aimed to introduce an improved hybrid approach for efficient nodule mask generation and false-positive reduction. Material and
Methods: In this experimental study, nodule segmentation preprocessing was conducted to prepare the input computed tomography (CT) scans for the U-Net convolutional neural network (CNN) model, and includes the normalization of CT scans and transfer of pixel values corresponding to the radiodensity of Hounsfield Units (HU). A U-Net CNN was developed based on lung CT scans for nodule identification.
Results: The U-net model converged to a dice coefficient of 0.678 with a sensitivity of 75%. Many false positives were considered in every real positive, at 11.1, reduced in the proposed CNN to 2.32 FPs (False Positive) per TP (True Positive).
Conclusion: Based on the disadvantages of the largest nodule, the similarity of extracted features of the current study with those of others was imperative. The improved hybrid approach introduced was useful for other image classification tasks as expected. Copyright: © Journal of Biomedical Physics and Engineering.

Entities: Chemical

Keywords: Deep Convolutional Neural Networks; Deep Learning; Diagnostic Imaging; Early Diagnosis; Lung Neoplasms; Lung Nodule Detection

Year: 2022 PMID： 36059280 PMCID： PMC9395629 DOI： 10.31661/jbpe.v0i0.2110-1412

Source DB: PubMed Journal: J Biomed Phys Eng ISSN： 2251-7200

Introduction

Lung cancer is considered the second cancer with the highest percentage of fatality than the others [ 1 , 2 ] with the uncontrolled extension of irregular cells, and developing tumors [ 3 ]. Lung cancer is confirmed by lung nodules illustrating the stage of malady [ 4 , 5 ]. Previous studies showed that advanced recognition of lung cancer is the only way to its cure and decrease the rate of fatality [ 6 - 8 ]. Lung nodules, also known as lung tumors, are irregular masses with a diameter from 3 mm to 3 cm [ 9 ]. The nodules are formed as cells expand uncontrollably outside the lung [ 10 ]. A miniaturized scale nodule and mass are considered smaller than 3 mm and bigger than 30 mm, respectively [ 11 ]. In most cases, lung nodules are represented as either benign or mass. Pulmonary or benign nodules are classified as nodules with a diameter of 3 cm or less, defined as non-cancerous nodules [ 12 , 13 ], and also associated with the early stages of lung cancer. However, lung nodules are usually spherical, they are encircled by an anatomical composition, such as vessels or pneumonic dividers [ 14 ]. Early prognosis of malignant pulmonary nodules is vital for cancer analysis, much less invasive chemo, and radiotherapy procedures [ 3 ]. Discovering lung nodules can be extremely difficult to discriminate between different types of lung nodules [ 15 ]. Moreover, lung cancers are recognized by some methods, including Magnetic Resonance Imaging (MRI), isotope, X-ray, and CT. In addition, X-ray chest radiography and Computer Tomography (CT) are famous anatomic imaging modalities used regularly to diagnose numerous lung illnesses and detect lung nodules [ 16 - 20 ]. A more reliable and standard approach for the diagnosis of lung cancer is a CT scan identifying lung nodules (roughly small spherical masses) [ 3 ]. The length of lung nodules can show the level of malignancy in lung cancer by some protocols, such as LungRADs and Fleischner [ 3 , 21 , 22 ], showing a minimum variety of parameters, such as nodule scale, morphology, texture, and location [ 3 ]. However, low-dose CT is an effective approach to more accurate identification of smaller nodules and early recognition of lung cancer [ 23 , 24 ], some signal-to-noise ratio in CT causes wrong classifications of areas with vulnerable or unordinary contours. Furthermore, lung cancers diagnosed by CT are typically dependent on the observer’s perspective, exhaustion, and emotion, leading to various results [ 3 , 25 ]. Therefore, the decision-making mechanism to identify nodule malignancy is a very useful approach to help physicians plan potential treatments [ 3 , 26 ]. Preprocessing, lung parenchyma segmentation, nodule identification, and false positive (FP) reduction are the four stages of most nodule detection approaches [ 19 , 27 ]. This study aimed to present an appropriate approach to assist radiologists in their examination. This section discusses some issues related to the proposed approach [ 28 ]. Manickavasagam and Selvan [ 29 ] proposed a method for the diagnosis of nodules and classification of the level of lung cancer. In their research, the Naïve Bayes classifier was utilized to classify input images as abnormal or normal. Moreover, they provided a Neuro-Fuzzy classifier with the Cuckoo Search algorithm to detect the four levels of lung cancer [ 29 ]. Convolutional neural networks have accomplished correct effects on pulmonary nodule detection. In the research conducted by Ali et al. [ 30 ], a transferable texture convolutional neural network (CNN) for reinforcing pulmonary nodules in CT scans changed. In their proposed structure, an Energy Layer (EN) for extraction of texture attributes from the convolutional layer was developed. Additionally, Tajbakhsh N et al. used a synthetic neural network and CNN to discover pulmonary nodules in CT images, and experimental consequences confirmed that the overall performance of the CNN become higher than that of the synthetic neural network [ 31 ]. Li et al. [ 32 ] introduced a strategy for the detection of pulmonary nodules based on CT images and presented an approach based on wavelet dynamic analysis for extracting and repairing the lung parenchyma to exclude the noise interference outside of the lung parenchyma. After locating the lung nodules, a CNN-based model on genetic optimization algorithm was introduced to extract the features of the CT images of pulmonary nodules. Techniques, such as oriented gradient histogram, wavelet transform-based features, and local binary patterns were used to extract the best feature for locating lung cancer nodules [ 33 ], and also a state-of-art fuzzy particle swarm optimization CNNs were applied for classifying the selected features. In the study of Li et al. [ 19 ], a faster region-based CNN was developed based on parameter optimization, spatial three-channel input construction, and transfer learning for locating the regions of the lung nodules. In addition, they introduced an FP reduction approach according to anatomical characteristics to reduce FPs and preserve the true nodules. Zhai et al. [ 34 ] used adaptive border marching and placed developed rules into the section lung parenchyma and candidate nodules, after classifying eleven sorts of grey and geometric capabilities of candidate nodules, primarily based on a Fuzzy min-max neural community with the diagnostic sensitivity of 84%. Yan et al. [ 35 ] examined 3 CNNs with extraordinary inputs: a 2D slice stage CNN and a 2D and three-D nodule stage CNN with the accuracy of 86.7%, 87.3%, and 87.4%, respectively. Moreover, the result showed that three-D CNN has higher performance. Wang et al. [ 36 ] proposed a principal focused-CNN to section lung nodules from heterogeneous CT images and also extract 3-d and 2D capabilities of lung nodules. For the classification of CT voxels, Wang et al. also supported a unique pooling layer, preserving extra facts across the voxel patch center. Harsono et al. [ 9 ] presented a novel lung nodule recognition and classification model based on 13DR-Net one-stage detector. In the research conducted by Veronica et al. [ 5 ] the ELCAP (Early Lung Cancer Action Program) lung image database for lung nodule detection was analyzed, and Fuzzy C-Means (FCM) was used for segmenting the potential nodules. Further, an artificial neural network (ANN) was developed based on weight optimization for the categorization of the images as detected nodules and the normal lung. El-Bana et al. [ 37 ] introduced a two-stage framework comprised of a semantic segmentation stage followed by localization and classification for the automated identification of malignant pulmonary nodules in low-dose CT scans. Furthermore, the DeepLab version was also used for the use of semantic segmentation; the procedure was evaluated by two network backbones: mobileNet-V2 and Xception. Finally, the outputs were classified using RCNN (Region-Based CNN) and SSD (Single Shot Detector), with an Inception-V2 as a backbone. Bonavita et al. [ 3 ] evaluated and combined a 3-D CNN for pulmonary nodule tumor in an automatic the current pipeline of lung most cancers identification. In this study, an improved hybrid approach is proposed for the efficient detection of pulmonary lung nodules using CT scans. Nodule segmentation preprocessing, including the normalization of CT scans and transfer of the pixel values corresponding to radiodensity of HU was performed to prepare the input CT scans for the U-Net CNN model. Finally, a U-Net CNN based on lung CT scans was used for nodule identification.

Material and Methods

In this experimental study, the U-Net CNN design was used for the input images to locate the nodules. In the model training process, the extracted data were labeled into two categories: training data set and validation to reduce the false positives of the identified nodules.

Data Description

A dataset, including labeled nodule locations for image segmentation, was used. Further, the lung imaging database [ 38 ] includes digital images obtained from 1018 patients in lung CT scan format. Four experienced lung radiologists analyzed the scans to annotate the nodes in the data set. For each patient sample, these annotations consisted of the nodule’s area, diameter, and X, Y, and Z coordinates. Each patient had 100-200 CT scan sections in 512×512 pixels, in a three-dimensional composition of the lung. The pixel values correspond to the radiodensity of HU, ranging from -1000 and 700-3000 HU for air and bones, respectively. The major lung tissues ranged from -500 to 0 HU. Nodules could also range from -500 HU to more than 200 HU, which is a vital diagnostic criterion for radiologists. Figure 1 illustrates the lung CT scan and radiodensity of HU for lung tissue and lung nodule. Additionally, Figure 2 shows a scheme of the proposed method.

Figure 1

a) A lung computed tomography (CT) scan with an arrow pointing to a labeled nodule coordinate, b) radiodensity of the lung tissue and lung nodule (cut off below – 500 HU)

Figure 2

Proposed method model

a) A lung computed tomography (CT) scan with an arrow pointing to a labeled nodule coordinate, b) radiodensity of the lung tissue and lung nodule (cut off below – 500 HU) Proposed method model

Data Preprocessing

Node segment preprocessing, including the normalization of CT scans, was performed to prepare the input CT scan for the CNN U-Net model. The CT scans were normalized into a range of 0-1 using mean normalization to fix the issue of a large gap in feature quantity leading to slow training and convergence. Normalization of CT images was achieved by subtracting the mean from the images and dividing it by the deviation. The normalization was done column-wise using the Equation (1) as follows: (1) X: Image Mean(x): Simple average of the numbers Stdev(x): The Standard Deviation is a measure of how spread out numbers are. Lung CT was then performed based on the limitation on the radiodensity values and an increase in the number of some pixels on the lung walls to consider some nodules walls. Discovered coordinates of nodules created by the radiologists on the LIDC-IDRI dataset (the lung image database consortium and image database resource initiative) were employed to generate the nodule mask and a nodule region of interest mask for the segmentation label. Threshold radiodensity values were set to -500 HU to dispose of whatever non-lung selected. All the nodules under 25 mm2 were removed from the input data since these were unlikely malignant. Moreover, this method lessened the time and constructed a sturdy model, and all of the slices without any nodules were eliminated from the entered data. Figure 3 represents the processed lung image after normalizing and masking off areas that are not considered the lung tissue and the corresponding nodule mask generated to be used as a label for nodule segmentation in a U-Net CNN.

Figure 3

a) Lung image processed after normalization and covering areas that are not lung tissue. b) A nodule mask was designed as a tag to split nodes in U-Net convolutional neural networks.

a) Lung image processed after normalization and covering areas that are not lung tissue. b) A nodule mask was designed as a tag to split nodes in U-Net convolutional neural networks. After processing the input images for nodule segmentation with U-Net CNN, the nodule malignancy was classified. In this section, nodules were prepared into 64×64 crops for training a convolution neural network. The largest nodule was selected for all nodules extracted from a single patient. A 64×64 crop of the largest nodule was created using the centroid coordinates. Mean normalization was performed to normalize the results.

The Proposed Deep Learning Approach

In the current research, U-Net CNN architecture was employed using the image as input and creating a mask for the output area. This network was operated by a feature vector similar to those of CNN and predicted the mask with a given vector of features. The proposed architecture was also divided into two main parts: 1) the left element referred to as a contracting path, constituted via a convolutional process, and 2) the right part was an expansive path, constituted using transposed two-dimensional convolutional layers. Each process was constructed by two convolutional layers and some channels changed since the depth of the image was increased by the convolution procedure. Max pooling procedure was applied to downsize operation on an image. In this study, this operation was repeated 3 times. In the expansive path (right part), transposed convolution was considered an unsampling technique, which expanded the dimensions of images. After the displacement convolution, the image was magnified and then concatenated with the contraction path image. Similar to the previous section, this procedure was repeated 3 times.

Convolutional Network for nodule classification

Artificial intelligence was revolutionized using the latest advances in deep-learning methods [ 39 , 40 ]. The basic structure of CNN consists of three main parts, as follows: 1) a layer of convolution to extract features by filters on the inputs, 2) an integration layer to reduce the size for computational performance, and 3) a fully connected layer. Moreover, the inner parameters were fine-tuned for a particular task, including class or item recognition [ 41 ]. In the implementation of this architecture, 2-dimensional convolutional layers were used for the efficiency of these layers in dealing with visual data. The complexity of the parameter and the calculation cost in the customary convolutional algorithm is defined as follows: (2) (3) where, K represents the dimensions of the kernel; in the proposed model, these dimensions are considered (3×3). f indicates the dimensions of the input to the convolutional layer. In the first layer, images with resolution (512×512) are considered as input. In this model, 2 convolutional layers were used with the number of filters 8 and 16, respectively. After 2 layers of convolution, a pooling layer was used with a filter in dimensions (2 × 2) with step 2. The max-pooling layer was considered as a layer in the proposed model to reduce the mapping size of the network features and parameters. CNN may be over-fitted by training on several data sets. Accordingly, two layers of Dropout were used to prevent this problem. The task of the Dropout layer was used to reset randomly some columns of a weight matrix in the network. The Relu activation function was used to scale the output on this network. The Adam optimizer was selected as an optimizer in the proposed CNN. Binary labels, such as cancer and non-cancer were assigned to nodules. The CNN can extract the features from the image associated with each label class.

Results

Some suitable standards were used to assess the outcomes and display the accuracy of the proposed version of this research.

Image segmentation

The dice coefficient was considered a suitable loss function for image segmentation with a U-Net CNN. By minimizing the negative Dice coefficient, the model attained a maximal overlap of the predicted mask with the ground truth mask as follows: (4) where TP is defined a true positive, FN is defiend as false negative, and FP is considered False Positive.

Nodule extraction

Sensitivity is useful for calculating the nodule extraction hit rate, or the percentage of true nodules observed. TP has referred the intersection of the predicted mask and the generated mask with the nodule coordinates with a sum of greater than 1 pixel. A second useful statistic minimized is the average number of false positives per scan. (5)

Classification Accuracy

Classification accuracy is perceived as the number of correct predictions and is the most common evaluation criterion for classification problems, as follows: (6) where TP is True Positive, TN is defined as True Negative, FN is considered False Negative, and FP is False Positive.

Discussion

Validation and evaluation of the model

Model 1: U-Net CNN for nodule segmentation

Figure 4 provides a comparison between the results of the proposed U-Net with the ground truth label predicted by the radiologists.

Figure 4

a) Processed computed tomography image, b) floor reality label, c) anticipated label

a) Processed computed tomography image, b) floor reality label, c) anticipated label The data set was divided into two categories: educational data and validation ratio of 80:20. The U-net model converged to a dice coefficient of 0.678 in 20 epochs, showing a 67.8% overlap between expected and ground truth nodule masks (Figure 5). However, 75% of projected masks have at least one pixel of variance with the ground truth masks. Sensitivity and the number of false positives are required in each scanned sample to identify reliably the location of the nodules. There were some FPs per TP, at 11.1, which is further reduced in the second model.

Figure 5

Reduction of the Dice coefficient to 0.678, showing a 67.8% overlap between the anticipated nodule masks and the ground truth nodule masks.

Model 2: CNN for lowering FP of detected nodules

For training the proposed CNN to reduce the false positives of detected nodules, the extracted data was labeled and randomly divided into two sets: training and validation with a 70:30 percentage. After the model training and completed validation on the selected data set, the results were obtained. The classification accuracy of the data sets was used to evaluate the predictions and describe the results. The proposed CNN was trained at a total of 20 epochs on the training set and attained 94.58% accuracy and 0.1335 loss during the training section. In the validation set, the proposed model reached 84.37% accuracy and 0.4118 loss. After performing the nodule classification, the false positive rate decreased to 2.32 from 11.1 per TP. Table 1 represents the performance of the U-Net model before and after classification.

Table 1

The performance of the U-Net before and after the nodule classification

	Sensitivity	Average # of FPs per scan	# of FPs per TP
Results obtained before classifying nodes	0.75	0.060	11.1
Results obtained after classifying nodes	0.65	0.011	2.32

FP: False Positive, TP: True Positive

The performance of the U-Net before and after the nodule classification FP: False Positive, TP: True Positive In the study by Zhang et al. a two-step U-Net CNN segmentation method was proposed for different types of lung nodes in computed tomography images, providing a similar coefficient for Dice (0.8623) in division algorithms [ 42 ]. Banu et al. showed a fully automated deep-learning framework, including models of lung nodule detection and segmentation, and their results showed that their proposed model achieved a Dice score of 89.79% and 90.35%, and an intersection on the alliance of 82.34% and 83.21% [ 43 ]. On the other hand, Monkam et al. showed that the overall performance of CNN models depended significantly on the number of convolution layers and the size of the patches, and also revealed that the CNN model with two layers of convolution could have the best performance with 88.28% accuracy, 0.87 AUC, 83.45% F score, and 83.82% sensitivity [ 44 ]. Considering the accuracy obtained in this research, increase inaccuracy can be due to the use of a combined model.

Conclusion

Overall, the end-to-end solution to estimate a diagnosis of cancer within a year of CT scan required multiple steps with multiple models, including image processing, nodule mask development, nodule identification, nodule false-positive reduction, the feature extraction, and malignancy classification. Many steps limited the accuracy of the prediction is as follows: 1) the generation of nodule masks was complicated by the fact that many nodules lacked well-defined edges; the decision to set the nodule’s threshold at -500 HU could have eliminated certain features from nodules, 2) after false-positive reduction procedure, U-Net could only locate 65 percent of the nodules, 3) the features only extracted on the largest nodule in each patient, and 4) the predicted nodule masks from the U-Net with errors that may prevent from generating accurate features. The feature extraction from the largest nodule led to the investigation of the conformity in the extracted features matched with those in the literature. The probability of cancer for the diameter was very close to the prediction in the literature. In further research, we tend to a dataset with nodules categorized by radiologists as malignant or non-malignant to train a model to higher classify malignancy. Moreover, adjusting the size of filters and converting the shape of the model, including adding layers, the residual blocks, and strengthening the data augmentation step, growth the accuracy of the proposed model to greater than 95% will be investigated, Further, we will plan to use the latest advances in convolutional neural network research to refine the proposed new architecture for deforming natural hostile shapes and patterns. Therefore, the largest nodule is not necessarily the malignant nodule. Instead of isolating the largest nodule of each patient, it might be more informative to average the largest nodules.

Acknowledgement

The authors would like to thank all experts who freely participated in this study.

Authors’ Contribution

A. Badie and A. Noshad conceived the idea. The introduction of the paper was written by A. Noshad and MA. Moragheb. A. Badie and MA. Moragheb. gather the images and the related literature and also help with the writing of the related works. A comparison of thematic literature and works performed by A. Badie was performed. The method implementation was carried out by A. Noshad. All analysis was carried out by A. Noshad and A. Badie. All authors read, modified, and approved the final version of the manuscript.

Ethical Approval

In this study, a set of images of the Lung Image Database Consortium (LIDC-IDRI) was used, which includes diagnostic computed tomography scans of the chest and lung cancer screening.

Conflict of Interest

None

26 in total

1. Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society.

Authors: Heber MacMahon; John H M Austin; Gordon Gamsu; Christian J Herold; James R Jett; David P Naidich; Edward F Patz; Stephen J Swensen
Journal: Radiology Date: 2005-11 Impact factor: 11.105

2. The effect of lung volume on nodule size on CT.

Authors: Iva Petkovska; Matthew S Brown; Jonathan G Goldin; Hyun J Kim; Michael F McNitt-Gray; Fereidoun G Abtin; Raffi J Ghurabi; Denise R Aberle
Journal: Acad Radiol Date: 2007-04 Impact factor: 3.173

3. Soft computing approach to 3D lung nodule segmentation in CT.

Authors: P Badura; E Pietka
Journal: Comput Biol Med Date: 2014-08-16 Impact factor: 4.589

4. A novel approach to CAD system for the detection of lung nodules in CT images.

Authors: Muzzamil Javaid; Moazzam Javid; Muhammad Zia Ur Rehman; Syed Irtiza Ali Shah
Journal: Comput Methods Programs Biomed Date: 2016-07-25 Impact factor: 5.428

Review 5. Benefits and harms of CT screening for lung cancer: a systematic review.

Authors: Peter B Bach; Joshua N Mirkin; Thomas K Oliver; Christopher G Azzoli; Donald A Berry; Otis W Brawley; Tim Byers; Graham A Colditz; Michael K Gould; James R Jett; Anita L Sabichi; Rebecca Smith-Bindman; Douglas E Wood; Amir Qaseem; Frank C Detterbeck
Journal: JAMA Date: 2012-06-13 Impact factor: 56.272

6. Computer-aided detection of early interstitial lung diseases using low-dose CT images.

Authors: Sang Cheol Park; Jun Tan; Xingwei Wang; Dror Lederman; Joseph K Leader; Soo Hyung Kim; Bin Zheng
Journal: Phys Med Biol Date: 2011-01-25 Impact factor: 3.609

7. Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images.

Authors: Colin Jacobs; Eva M van Rikxoort; Thorsten Twellmann; Ernst Th Scholten; Pim A de Jong; Jan-Martin Kuhnigk; Matthijs Oudkerk; Harry J de Koning; Mathias Prokop; Cornelia Schaefer-Prokop; Bram van Ginneken
Journal: Med Image Anal Date: 2013-12-17 Impact factor: 8.545