Literature DB >> 35903037

A Novel Deep Learning Model to Distinguish Malignant Versus Benign Solid Lung Nodules.

Shuwen Wang¹, Leilei Zhou¹, Xiaoran Li², Jie Tang³, Jing Wu¹, Xindao Yin¹, Yu-Chen Chen¹, Lingquan Lu¹.

Abstract

BACKGROUND In this study we aimed to establish a new transfer learning model based on noncontrast and thin-layer computed tomography (CT) scans to distinguish between malignant and benign solid lung nodules. MATERIAL AND METHODS CT images from 202 patients with 210 lesions (malignant: 127, benign: 83) manifesting as solid lung nodules from January 2016 to December 2020 from 3 institutions were retrospectively collected, and each nodule was histopathologically confirmed. Two experienced thoracic radiologists reviewed all images and determined the regions of interest (ROIs) in the three-dimensional (3D) images layer-by-layer. We divided the lesions and images into training and testing sets at a ratio of 7: 3. The Inception V3 model was pretrained by the training dataset. Five-fold cross-validation was used to choose the optimal model. Receiver operator characteristic curves (ROC curves) for methods to evaluate the performance of the models were drafted. RESULTS In the validation set, the AUC, accuracy, sensitivity, and specificity of Inception V3 model (lesion-level) were 0.999, 0.989, 0.983, and 1.0, respectively, which is higher than the image-level (0.997, 0.933, 0.922, and 0.948, respectively). The Inception V3 model (lesion-level) performed better than the image-level but there was no significant difference between the models (P>0.05). The ResNet50 model based on image level achieved AUC, accuracy, sensitivity, and specificity of 0.963, 0.926, 0.916, and 0.944, respectively, which is lower than that of Inception V3. CONCLUSIONS Our study developed a novel deep learning model based on noncontrast and thin-layer CT scans to classify benign vs malignant lung nodules, and the Inception V3 model greatly improved the differentiation accuracy and specificity.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35903037 PMCID： PMC9344882 DOI： 10.12659/MSM.936830

Source DB: PubMed Journal: Med Sci Monit ISSN： 1234-1010

Background

Lung cancer has long been a leading cause of death. Early and accurate diagnosis plays an important role in survival since only 15% of lung cancers are diagnosed at an early stage [1,2]. CT is widely used as a tool for physical examinations due to its high performance in lung nodule screening. According to the Lung CT Screening Reporting and Data System (Lung RADs version 1.1) [3], solid nodules scoring more than 4 points, which means the size is more than 8 mm, are suspicious for malignancy, and additional diagnostic testing is recommend. Nodule management guidelines [4] also recommend that the follow-up interval is a range rather than a precise time point. Apart from size, radiologists also make the propensity diagnosis relying on radiographic features such as spiculation, lobulation, and pleural reaction, among others. However, overdiagnosis and a high false-positive rate (FPR) remain [5,6] due to the misclassification of these lesions, which is related to their similar appearance on CT images and scan parameters in terms of slice thickness and contrast enhancement of various structures within the image [7,8]. Therefore, clinicians still rely on histological testing, which is an invasive procedure, as the criterion standard, and this sometimes results in technical challenges or complications. Radiomics and deep learning (DL) models have been widely proposed and developed for application in medical fields. The former consists of a series of processes, including lesion segmentation, radiomics feature extraction and selection, and model establishment and evaluation. Hawkins et al [9] extracted 23 features for predicting malignant nodules (AUC=0.83). Chen et al [10] obtained an accuracy of 0.84 in nodule classification by developing a support vector machine (SVM) algorithm using 72 cases. Unlike radiomics, deep learning classifiers contain training, testing, and performance evaluation [11], and there are many techniques used, such as convolutional neural networks (CNNs) [12], unsupervised learning [13], and transfer learning [14]. Transfer learning allows for pretraining on a very large dataset of images and then tuning the resulting model using specific samples, which is useful for classification to train a stable, unbiased, and non-overfitting deep learning architecture from the very beginning [15]. Inception and ResNet models are the most frequently used for medical studies, such as the Inception V3 model, which performed well in pathological classification of NSCLC [16], breast cancer [17], and knee injury for MRI images [18] and the ResNet50 model, which is widely applied in diagnosis o9f brain diseases [19], classification of skin lesions [20], and coronary artery calcium detection [21]. However, deep learning algorithms require intensive computational training and more expertise for tuning. To achieve better performance, many investigators have attempted to optimize the models by adjusting the input images. For instance, a multi-CNN model [22] reached an accuracy of 0.87 for binary nodule classification based on multiple down-sampling. Juan et al [23] proposed use of ML-xResNet to classify different types of lung nodule malignancies and achieved 92.19% accuracy. Nasrullah et al [24] obtained a sensitivity of 94% when combining ML-xResNet with clinical factors. A 3D fully-convolutional neural network for reduction of false-positive rate in lung nodule classification was created as well [25]. Additionally, there is still room for improvements in the performance of image acquisition parameters. Some researchers have proven that radiomic features are most affected by varying acquisition parameters [26-28]. Dilger et al [29] improved pulmonary nodule classification by combining the radiomics of intra- and perinodular regions (AUC=0.80). He et al [28] showed that thin-slice and noncontrast CT can provide more accurate information on radiomics features. Dou et al [30] showed that combining the radiomics of perinodular regions can be used to predict lymph node metastasis. Many investigators [31-33] have attempted to distinguish granulomas from malignancies using quantitative radiomics or computerized feature-based analysis as the main concept in DL approaches, which is to expose the network to every possible combination of imaging acquisition parameters. It is imperative to look further into computer-aided diagnosis to develop novel, accurate approaches to aid in monitoring individuals with pulmonary nodules and to allow for safe and cost-effective diagnosis while preventing unnecessary procedures in cases of benign growths. In this study, 2 modern deep learning models that take advantage of transfer learning and use thin slices and noncontrast CT images were designed for nodule classification. Thus, the model could potentially provide a better prediction of the clinical outcomes of lung nodules, even for very small nodules, intending to considerably improve patient prognosis.

Material and Methods

Patients

First, we retrospectively reviewed the electronic pathological records of patients with solid nodules in the lung between January 2016 and December 2020. The inclusion criteria were: 1) histopathology-confirmed, 2) solid nodule in the lung, 3) lesion size ≥6 mm in axial CT images, and 4) without treatment. Then, 359 patients with 367 nodules were included. The exclusion criteria were: 1) CT images with thin slices measuring more than 3 mm and 2) CT images taken with contrast medium. Finally, a total of 202 patients (125 males and 77 females, age range: 24–83 years) were enrolled, including 127 malignant (104 adenocarcinoma and 23 squamous cell carcinoma) and 83 benign (3 organizing pneumonia, 41 tuberculosis, 13 pulmonary cryptococcosis, 6 inflammatory pseudotumors, and 20 granulomas) nodules. As the sources of nodule datasets, we collected data on 73 patients (49 males, 24 females) with 78 nodules from Nanjing First Hospital. In this cohort, 55 malignant and 23 benign nodules were included. Then, were collected data on 98 patients (69 males, 29 females) with 98 nodules from Gaochun People Hospital, which contained 72 malignant and 26 benign nodules. Finally, we included 31 patients (3 males and 28 females) with 34 benign nodules from Nanjing Second Hospital. Categorical clinical characteristics included sex (male or female), age (mean SD), nodule size (0.6–1.0 cm, 1.0–3.0 cm, and >3.0 cm), location, lobulation (absent/present), and spiculation (absent/present). The workflow is shown in Figure 1.

Figure 1

The workflow of the collection of patients. Created using Microsoft Office Visio 2016, China.

CT Image Acquisition and Image Annotation

CT Image Acquisition Parameters

This was a multi-center study. CT examinations without contrast medium were acquired from Nanjing First Hospital and Nanjing Second Hospital using Philips Ingenuity CT Core 128 scanners (Philips Medical Systems, Haifa, Matam) and Philips Brilliance 64 CT scanners (Philips Medical Systems, Jiangsu, China), while images taken with GE LightSpeed RT16 CT scanners (GE Medical Systems, Milwaukee, WI, USA) were acquired from Gaochun People’s Hospital. All CT scans were performed with a fixed-tube voltage of 120 kVp and X-ray tube current exposure time of 30–300 mAs. The pixel spacing of the CT image ranged from 0.625 to 0.843 mm depending on patient size, and the reconstruction slice thickness was 1 mm or 3 mm. The slice images were reconstructed with a matrix of 512×512 pixels. Two thoracic radiologists (Li and Wu, with 5 and 10 years of experience in chest image interpretation, respectively) who were blinded to the results independently reviewed all CT images on the Picture Archiving and Communication Systems (PACS) and reached a consensus by discussion in case of disagreement. Lesion size, spiculation, and lobulation were evaluated as CT morphological features.

CNN Input Image Annotation

The thin-slice and noncontrast images were downloaded from PACS and stored in DICOM format, and the window width and level were set at 750 HU and −500 HU, respectively. The images were subsequently imported into ITK-SNAP software (version 3.8.0 http://fsf.org/), and ROIs for the 3D images were determined layer-by-layer by 2 thoracic radiologists. The dataset of lesions and handcrafted annotation CT images is shown in Table 1.

Table 1

Image data of patients.

	Patients	Lesions	Images
Benign	76	83	2120
Malignant	126	127	2888
Total	202	210	5008

Lesions – pulmonary nodules; Images – handcraft-annotation nodule CT images.

Establishment of a Model

Two deep learning models (Inception V3 and ResNet50) were trained and optimized to distinguish between benign and malignant lung nodules. We set 2 nodes (benign/malignant) in the SoftMax layer, and the other structures of the models remained the same. All models were pretrained on ImageNet [34], which includes 300 000 000 natural images and trained with the combined set of training images from both datasets. Five-fold cross-validation was used to choose the optimal model. A total of 147 lesions were randomized into 5 equal subgroups, and 63 lesions were included in the test group. Five different combinations of subsets were analyzed. Finally, the result of each subset and the mean result of all subsets were reported. CT images with discordant interpretations made by the deep learning models and radiologists are shown in Figure 2.

Figure 2

CT images with discordant interpretations between deep learning models and radiologists. (A) Model False-Positives: a 63-year-old male patient with histologically-confirmed benign nodule when the model predicts a malignant nodule. (B) Model False-Negatives: a 47-year-old male patient with histological-confirmed malignant nodule when the model predicts a benign nodule.

Statistical Analysis

SPSS (v.25.0; IBM) software was used to analyze the clinical characteristic data. The χ2 test was used to analyze the statistical significance of these qualitative data, including sex and lesion location, lobulation, and spiculation. An independent t test was used to compare the CT clinical data, including age and size. A P value of less than 0.05 was considered statistically significant. The performance of the proposed models for nodule classification was evaluated according to various statistical measures such as sensitivity, specificity, accuracy, and area under the receiver operative curve (AUC). AUC values ranged from 0.5 to 1. The Delong test was used to compare the differences. A P value less than 0.05 was considered statistically significant. A receiver operator characteristic (ROC) curve, a graphical technique for describing and comparing the accuracy of diagnostic tests, was used to evaluate the sensitivity and specificity of the models. The matplotlib was used to generate the ROC curves.

Results

Clinical Characteristics and Subjective CT Features of Nodules

In this study, 28.3% of the malignant nodules and 31.3% of the benign nodules were located in the right upper lobe. A high proportion of nodules (59.1% of malignant and 62.6% of benign) measured between 11 cm and 30 cm. Spiculation was found in 48.2% of benign nodules and lobulation was found in 43.4%. There was a significant difference between the malignant and benign nodules in terms of subjective CT features, including spiculation and lobulation (P<0.001), whereas age and sex showed no difference (P>0.05). The patient characteristics are presented in Table 2.

Table 2

Characteristics of solid nodules in the lung.

	Malignant nodules (n=127)	Benign nodules (n=83)	P value
Gender n (%)			0.078
Male	67 (52.8)	54 (65.1)
Female	60 (47.2)	29 (34.9)
Age (mean)	65.17±8.09	62.31±11.6	0.053
Location n (%)			0.172
Left upper lobe	30 (23.6)	25 (30.1)
Left lower lobe	19 (15.0)	11 (13.3)
Left mixed	2 (1.6)	0
Right upper lobe	36 (28.3)	26 (31.3)
Right middle lobe	9 (7.1)	0
Right lower lobe	27 (21.3)	21 (25.3)
Right mixed	4 (3.1)	0
Lobulation n (%)			<0.001*
Absent	29 (22.8)	47 (56.6)
Present	98 (77.2)	36 (43.4)
Spiculation n (%)			<0.001*
Absent	15 (11.8)	43 (51.8)
Present	112 (88.2)	40 (48.2)
Size (mm)			0.904
6–10	11 (8.7)	8 (9.6)
11–30	75 (59.1)	52 (62.6)
>30	41 (32.3)	23 (27.7)
Median size (range)	26.65 (7–100)	24.36 (6–67)
Pathology n (%)			/
ADC	104 (81.9)
SCC	23 (18.1)
OP		3 (3.6)
TB		41 (49.4)
PCP		13 (15.6)
IPT		6 (7.2)
Granuloma		20 (24.1)

ADC – adenocarcinoma; SCC – squamous cell carcinoma; OP – organizing pneumonia; TB – tuberculosis; PCP – pulmonary cryptococcosis; IPT – inflammatory pseudotumor; Granulomas – histopathology-confirmed granulomas but unknown to the specific disease.

P<0.05.

Evaluating Performance of the Models

We separately developed the Inception V3 model based on image level and lesion level. The performance is shown in Table 3. There was no significance among the 2 models in the training group and the test group (P>0.05). The ResNet50 model was built based on image level, and we compared the results between the Inception V3 model and ResNet50 model (Table 4). The ROC curve of the 2 models exhibited a high prediction ability when the true positive rate was close to 1 (Figure 3). There was no significant difference among the 2 models in the training group and the test group (P>0.05). In addition, the diagnostic accuracy of the Inception V3 and ResNet50 models were very high and Fisher’s exact test showed no significant difference (P>0.05).

Table 3

Results of Inception V3 model.

Models	Training set						Validation set
Models	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV
Lesion-level	0.998	0.984	0.974	1.0	1.0	0.962	0.999	0.989	0.983	1.0	1.0	0.969
Image-level	0.978	0.936	0.913	0.964	0.969	0.899	0.977	0.933	0.922	0.948	0.960	0.901

PPV – positive predictive value; NPV – negative predictive value; AUC – area under the curve.

Table 4

Five-fold cross-validation results of InceptionV3 and ResNet50 models (image-level).

	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV
Inception V3	0.977	0.933	0.922	0.948	0.960	0.901
ResNet50	0.963	0.926	0.916	0.944	0.957	0.892

PPV – positive predictive value; NPV – negative predictive value; AUC – area under the curve.

Figure 3

(A, B) Receiver operating characteristic (ROC) curves for methods to predict pulmonary nodules of the 2 transfer learning models (ResNet50 and Inception V3) (image-level). The x-axis represents the false-positive rate (FPR) and the y-axis represents the true-positive rate (TPR). Created using matplotlib version 2.2.2 (Python 3.6).

Discussion

Herein, we developed and validated a new transfer learning model based on thinner slice and noncontrast CT images, which is a noninvasive diagnostic tool for differentiating between malignant nodules and benign lung nodules. Detection and characterization of pulmonary nodules is an important issue since it is the first step in early lung cancer diagnosis. In our study, the clinical characteristics and radiographic features (lobulation and spiculation) were more common in the lung cancer group. Our results agree with several previous studies reporting that lung cancer more commonly affects the upper lobes, especially the right upper lobe [35]. Nodule size is also a primary determinant of malignancy [36], and the guidelines specify that a suspicious nodule with a diameter larger than 6 mm has a risk of malignancy [4]. Therefore, the minimum size of nodules in our study was 7 mm, which was smaller than in previous studies. Additionally, our dataset included many subtypes of benign nodules, such as organizing pneumonia, pulmonary cryptococcosis, and inflammatory pseudotumors, which are hard to distinguish from lung cancer in terms of morphology. DL has shown strong performance in the medical field since it can be trained end-to-end in a supervised method while learning highly discriminative image features [37,38]. There is also a growing body of research on predicting pulmonary nodules [39-42], even lymph metastasis, and it has shown great potential in evaluating the survival rate [30,43]. Our one-classifier construction was Inception [15], which is being studied more often because the multiscale features of input have different CT features. The other model was ResNet [44], which can effectively help to compensate for the loss of small-nodule details. Therefore, we built the models at the image level (5008 images) instead of at the lesion-level only (210 lesions), which avoided having too few training examples to learn a full deep representation. Meanwhile, unlike the radiomics classifier [45], the CNN model can automatically extract high-throughput features instead of features calculated from segmented objects as input information, thus avoiding the complicated process of artificial feature extraction [46]. In our study, the InceptionV3 model based on lesion level exhibited good performance, with AUCs of 0.998 in the training set and 0.999 in the validation set, and the accuracy was 0.984 and 0.989, which is higher than in previous studies [10,29,47], whose AUCs were 0.86–0.94 and diagnostic accuracies were 0.84–0.96. The AUCs and diagnostic accuracy of the InceptionV3 model (image-level) was 0.977 and 0.933, which is higher than that of the ResNet50 model (image-level). PET/CT is also good at discriminating nodules. Lai [48] built a model based on PET/CT and achieved a lower accuracy of 0.79. In addition, PET/CT is expensive and time consuming. Thus, we prefer to build the models based on CT, which has better performance and is more beneficial than conventional CT diagnostic methods for patients who cannot tolerate contrast enhancement agents due to allergies or renal failure. Lesion segmentation is the most challenging aspect of model evaluation [49,50], and we used expert image annotation, which can be an effective way to optimize the model. Referring to earlier lung nodule classification studies, our model takes the effect of CT scanning parameters, including image thickness and contrast medium injection, into consideration because biological heterogeneity within the tumor can be detected and described with plain-phase CT images. However, this phenomenon may be confused by the intravenous contrast agents of existing intratumor contrast agents. Therefore, the present study both simplifies the process and greatly improves the accuracy and specificity. Our study has some limitations. First, this was a retrospective study and thus had inherent selection bias. Second, the sample we selected was small because of strict inclusion criteria, which could easily have caused a data imbalance. To reduce the bias risk caused by samples with more patients with malignant nodules than those with benign nodules, during the training process, the benign dataset was dynamically oversampled at the lesion level and image level [51]. Second, the study was also limited to discrimination between benign and malignant nodules and did not thoroughly identify benign nodules such as OP and TB. Third, use of different CT scanners may have affected the evaluation of some CT results caused by a partial-volume effect.

Conclusions

In conclusion, we built a novel transfer learning model with high accuracy in distinguishing malignant vs benign lung nodules. It can provide added diagnostic value to differentiate lung nodules and reduce the need for invasive diagnostic procedures, and it may assist clinicians in creating personalized treatment strategies and choosing the optimal intervention.

50 in total

Review 1. Automatic 3D pulmonary nodule detection in CT images: A survey.

Authors: Igor Rafael S Valente; Paulo César Cortez; Edson Cavalcanti Neto; José Marques Soares; Victor Hugo C de Albuquerque; João Manuel R S Tavares
Journal: Comput Methods Programs Biomed Date: 2015-12-02 Impact factor: 5.428

2. Perinodular and Intranodular Radiomic Features on Lung CT Images Distinguish Adenocarcinomas from Granulomas.

Authors: Niha Beig; Mohammadhadi Khorrami; Mehdi Alilou; Prateek Prasanna; Nathaniel Braman; Mahdi Orooji; Sagar Rakshit; Kaustav Bera; Prabhakar Rajiah; Jennifer Ginsberg; Christopher Donatelli; Rajat Thawani; Michael Yang; Frank Jacono; Pallavi Tiwari; Vamsidhar Velcheti; Robert Gilkeson; Philip Linden; Anant Madabhushi
Journal: Radiology Date: 2018-12-18 Impact factor: 11.105

3. Classification Of Skin Lesions Using An Ensemble Of Deep Neural Networks.

Authors: Balazs Harangi; Agnes Baran; Andras Hajdu
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2018-07

4. Lung cancer prediction by Deep Learning to identify benign lung nodules.

Authors: Marjolein A Heuvelmans; Peter M A van Ooijen; Sarim Ather; Carlos Francisco Silva; Daiwei Han; Claus Peter Heussel; William Hickes; Hans-Ulrich Kauczor; Petr Novotny; Heiko Peschl; Mieneke Rook; Roman Rubtsov; Oyunbileg von Stackelberg; Maria T Tsakok; Carlos Arteta; Jerome Declerck; Timor Kadir; Lyndsey Pickup; Fergus Gleeson; Matthijs Oudkerk
Journal: Lung Cancer Date: 2021-01-31 Impact factor: 5.705

5. A deep residual learning network for predicting lung adenocarcinoma manifesting as ground-glass nodule on CT images.

Authors: Jing Gong; Jiyu Liu; Wen Hao; Shengdong Nie; Bin Zheng; Shengping Wang; Weijun Peng
Journal: Eur Radiol Date: 2019-12-06 Impact factor: 5.315

Review 6. A survey on active learning and human-in-the-loop deep learning for medical image analysis.

Authors: Samuel Budd; Emma C Robinson; Bernhard Kainz
Journal: Med Image Anal Date: 2021-04-09 Impact factor: 8.545

7. Radiomic features analysis in computed tomography images of lung nodule classification.

Authors: Chia-Hung Chen; Chih-Kun Chang; Chih-Yen Tu; Wei-Chih Liao; Bing-Ru Wu; Kuei-Ting Chou; Yu-Rou Chiou; Shih-Neng Yang; Geoffrey Zhang; Tzung-Chi Huang
Journal: PLoS One Date: 2018-02-05 Impact factor: 3.240

Review 8. An Appraisal of Lung Nodules Automatic Classification Algorithms for CT Images.

Authors: Xinqi Wang; Keming Mao; Lizhe Wang; Peiyi Yang; Duo Lu; Ping He
Journal: Sensors (Basel) Date: 2019-01-07 Impact factor: 3.576

9. Differentiation Between Malignant and Benign Pulmonary Nodules by Using Automated Three-Dimensional High-Resolution Representation Learning With Fluorodeoxyglucose Positron Emission Tomography-Computed Tomography.

Authors: Yung-Chi Lai; Kuo-Chen Wu; Neng-Chuan Tseng; Yi-Jin Chen; Chao-Jen Chang; Kuo-Yang Yen; Chia-Hung Kao
Journal: Front Med (Lausanne) Date: 2022-03-18

10. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.

Authors: Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos
Journal: Nat Med Date: 2018-09-17 Impact factor: 53.440