Literature DB >> 33553330

Predicting aggressive histopathological features in esophageal cancer with positron emission tomography using a deep convolutional neural network.

Joe Chao-Yuan Yeh¹, Wei-Hsiang Yu¹, Cheng-Kun Yang¹, Ling-I Chien², Ko-Han Lin³, Wen-Sheng Huang³, Po-Kuei Hsu⁴.

Abstract

BACKGROUND: The presence of lymphovascular invasion (LVI) and perineural invasion (PNI) are of great prognostic importance in esophageal squamous cell carcinoma. Currently, positron emission tomography (PET) scans are the only means of functional assessment prior to treatment. We aimed to predict the presence of LVI and PNI in esophageal squamous cell carcinoma using PET imaging data by training a three-dimensional convolution neural network (3D-CNN).
METHODS: Seven hundred and ninety-eight PET scans of patients with esophageal squamous cell carcinoma and 309 PET scans of patients with stage I lung cancer were collected. In the first part of this study, we built a 3D-CNN based on a residual network, ResNet, for a task to classify the scans into esophageal cancer or lung cancer. In the second stage, we collected the PET scans of 278 patients undergoing esophagectomy for a task to classify and predict the presence of LVI/PNI.
RESULTS: In the first part, the model performance attained an area under the receiver operating characteristic curve (AUC) of 0.860. In the second part, we randomly split 80%, 10%, and 10% of our dataset into training set, validation set and testing set, respectively, for a task to classify the scans into the presence of LVI/PNI and evaluated the model performance on the testing set. Our 3D-CNN model attained an AUC of 0.668 in the testing set, which shows a better discriminative ability than random guessing.
CONCLUSIONS: A 3D-CNN can be trained, using PET imaging datasets, to predict LNV/PNI in esophageal cancer with acceptable accuracy. 2021 Annals of Translational Medicine. All rights reserved.

Entities: Chemical

Keywords: Convolutional neural networks (CNNs); esophageal cancer; positron emission tomography (PET)

Year: 2021 PMID： 33553330 PMCID： PMC7859760 DOI： 10.21037/atm-20-1419

Source DB: PubMed Journal: Ann Transl Med ISSN： 2305-5839

Introduction

Esophageal cancer is one of the most common causes of cancer-related death worldwide. In 2012, there were an estimated 455,800 new esophageal cancer cases and 400,200 deaths due to esophageal cancer worldwide (1). Even in the early stages, treatment failure is common after radical surgery such as transthoracic esophagectomy with extended lymph node dissection, and the prognosis remains poor (2). Multidisciplinary treatment, including combinations of chemotherapy, radiotherapy, and surgical resection, has been introduced to reduce systemic micrometastasis and increase the complete resection rate in locally advanced esophageal cancer. However, a significant proportion of patients still experience disease recurrence after trimodal treatment (3). Measurements of esophageal cancer aggressiveness are primarily based on the anatomical extent of the disease, including tumor length, depth, and involvement of lymph node or distant organs, which are obtained from clinical examinations, such as computed tomography, esophagogastroscopy, and esophageal endoscopic ultrasound (4). Non-anatomic factors, such as the presence of lymphovascular invasion (LVI) and perineural invasion (PNI), have also been shown to have a large impact on patient survival; however, such information is not available before tumor specimens are collected by surgical resection (5,6). The only non-anatomical assessment of esophageal cancer that can be obtained prior to surgical resection is positron emission tomography (PET), a nuclear medicine imaging technique based on the measurement of gamma rays emitted by a positron-emitting radiotracer, such as 18F-fludeoxyglucose (FDG) (7). Generally, FDG uptake is represented as a standardized uptake value, which measures the highest image pixel in each tumor region. FDG uptake is able to assess metabolic activity and can localize the primary tumor as well as any metastases that may be present. Although PET is currently the only means of functional assessment prior to treatment, whether PET results correlate with prognostic histopathological features remains to be elucidated. Deep neural networks, in particular convolutional neural networks (CNNs), have been increasingly applied to medical image analysis for image classification, image regression, object detection, and image segmentation. The CNN is image-based machine learning. The errors caused by inaccurate feature calculation and segmentation can be avoided and performance can be higher than that of ordinary feature-based classifiers (8). In landmark studies, Gulshan et al. (9), Esteva et al. (10), and Ehteshami Bejnordi et al. (11) demonstrated the potential of deep learnings CNNs to detect diabetic retinopathy, classify skin lesions, and diagnosis lymph node metastasis in breast cancer, respectively. Furthermore, CNNs can be trained to recognize biological features that are overlooked by human experts (12,13). In our previous study, we showed that a CNN can be trained with PET image datasets to predict esophageal cancer outcome with acceptable accuracy (14). In the present study, we trained a CNN using PET images to predict the presence of LVI/PNI. We aim to evaluate whether deep learning CNN can unlock hidden information in PET scans and connect functional images with histopathological features. We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/atm-20-1419).

Methods

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This study was approved by the Institutional Review Board of Taipei-Veterans General Hospital and granted a waiver of the informed consent process (IRB 2018-01-019-AC).

Data source and preprocessing

The PET scans of patients with the diagnosis of esophageal squamous cell carcinoma between September 2009 and August 2017 at Taipei Veterans General Hospital were collected. The PET scans of patients with stage I lung cancer diagnosed between January 2012 and November 2017 at Taipei Veterans General Hospital were also collected for use in first part of this study to pretrain the neural network. Clinicopathological information was gathered from a prospectively established database. The pixel resolution of each PET scan in XY was 128×128, with a pixel size of 5.47 mm × 5.47 mm and a slice thickness of 3.27 mm. Raw PET images in DICOM format were first converted to standardized uptake value images. To focus on critical information, we cropped images to remove any irrelevant areas. Each cropped PET scan covered the body area from the hypopharynx to the stomach and included all of the esophagus and the peri-esophageal regions. The field of view of each cropped scan was 32×32×128 pixels. Additionally, we increased the effective size of the training set using on-the-fly data augmentation. Six augmentation methods were employed in this study: random image translation, random scaling, random rotation, random left/right flipping, random swapping of anterior/posterior view, and random Gaussian blurring. Finally, each input image was zero-centered by subtracting the mean and the min/max normalized by dividing the image by its intensity range.

Model setup

Clinical factors were classified as binary values. Input images with tumors were classified as positive if they showed either LVI or PNI by post-surgical histopathological examination, while those without LVI or PNI were labeled as negative. We built a three-dimensional (3D)-CNN based on a residual network, ResNet (15). For our neural network, we applied full pre-activation, which reorganizes the order of convolutions and activation functions so that batch normalization and a rectified linear unit (ReLU) precede convolution layer. This results in better performance compared to original residual block or other configurations, such as ReLU before addition or ReLU-only pre-activation (16). After several convolutions, we used a global averaging method to flatten the extracted features as a vector. This vector was then connected to the dense layer using a softmax function, with the output being the probability of a given image being classified as positive for either LVI or PNI or negative for both LVI and PNI.

Hyper-parameters setting

The kernel weights of the network were initialized using a recipe published by He et al. (17). To train the model, we used Stochastic Gradient Descent (SGD) (18-20) with Nesterov momentum (21,22) (with an initial learning rate of 0.0001 and momentum of 0.9). We applied a Bayesian optimization method for searching several hyper-parameters combinations, including batch size, learning rate, with training and validation set. According to results of Bayesian optimization, the batch size was set to 12 and the learning rate was set to 9e-4. After hyper-parameters optimization, we trained our model with the same hyper-parameters combination and evaluated the testing set on a single nVIDIA TESLA P40 GPU. Since the model would over-fit easily if it was trained on an unbalanced dataset, we used a batch balancing method, which mixes over-sampling and under-sampling to circumvent class imbalance in the dataset (23). This method allowed us to train the model with balanced samples (6 positive and 6 negative samples per batch). To further prevent model over-fitting, reduce-learning-rate-on-plateau and early-stopping were included during the training process. The reduce-learning-rate-on-plateau schema generally dropped the learning rate by half when a validation performance did not improve for a few epochs, and early-stopping caused the model to stop training when a validation performance did not show improvement after 25 epochs.

Statistics analysis

To evaluate the performance in classifying esophageal cancer with specific aggressive histopathological features, we randomly split 80%, 10%, and 10% of our dataset into training set, validation set and testing set, respectively. We tuned the hyper-parameters with training set and validation set via the Bayesian optimization process. The model performance was evaluated by the testing set in the end. Samples with predicted probability for each classification were split into either positive or negative using various thresholds ranging from 0 to 1 and the model performance was mainly evaluated by the area under receiver operating characteristic (ROC) curve (AUC). To assess statistical metrics, such as sensitivity, specificity, precision, recall, F1-score, and accuracy, we trained the model with same parameters 30 times after parameter searching. Hence, each sample in the testing set can be evaluated multiple times and derived the models’ statistical metrics.

Results

Patients

A total of 798 PET scans from 548 patients with diagnosis of esophageal squamous cell carcinoma and another 309 PET scans from patients with stage I lung cancer were included in the first part of this study. The demographic characteristics of total enrolled patients with esophageal squamous cell carcinoma was shown in our previous report (14). Among the esophageal cancer patients, 278 patients, who underwent esophagectomy for esophageal cancer, were included in the second part of this study for training a deep learning classifier to predict presence or absence of LVI/PNI by their PET scans ().

Figure 1

Patient enrollment and study design.

Performance of classifying lung and esophageal cancer

We built a 3D-CNN as illustrated in . In the first part of this study, which aimed to classify PET scans into esophageal cancer or lung cancer, the model showed an AUC of 0.5 and an accuracy of 0.717 under random guessing, in which the parameters of the neural network were set to random values. Our first part training converged after about 2,700 iterations and attained over 0.860 AUC, 0.867 F1-score, and 0.811 accuracy. The sensitivity was 0.850, and the specificity was 0.710. This result shows that our model has the ability to extract important features that can differentiate different cancers.

Figure 2

3D residual network overview and pre-activation design of residual blocks.

Performance of classifying esophageal cancer with specific histopathological features

In the second part, the demographic characteristics of the 278 patients with esophageal squamous cell carcinoma are listed in . The median interval between PET scan and esophagectomy was 10 days (Q1: 5; Q3: 22 days). Our 3D-CNN model, more specifically, an 18-layer ResNet with the SGD optimizer, was trained for classifying PET scans into presence or absence of LVI/PNI. We randomly spilt dataset into training set, validation set and testing set with 80%, 10%, 10%, respectively. Models generally converged after about 1,600 iterations and attain 0.668 AUC (w/o pretrain) and 0.660 AUC (w/pretrain) on average. The learning curves of the networks showed that the training and validation loss generally decrease and converge along with training time (). The other statistical results on the testing set of our 3D-CNN’s ability to classify patients by the presence or absence of LVI/PNI based on a threshold of 0.5 are listed in . In the ROC curve analysis, the highest combination of sensitivity and specificity occurred at a sensitivity of 0.574 and a specificity of 0.663, based on a predicted probability threshold of 0.45 ().

Table 1

Patient characteristics

Characteristics	Number	%
Age (mean ± SD)	63.3±10.0
Gender
Male	251	90.3
Female	27	9.7
Histology
Squamous cell carcinoma	278	100
Depth of tumor invasion
T0	55	19.8
T1	48	17.3
T2	47	16.9
T3	117	42.1
T4	11	4.0
Lymph node metastasis
N0	180	64.7
N1	66	23.7
N2	26	9.4
N3	6	2.2
Neoadjuvant treatment
No	141	50.7
Yes	137	49.3
LVI/PNI
−/−	176	63.3
−/+	22	7.9
+/−	39	14.0
+/+	41	14.7

SD, standard deviation; LVI/PNI, lymphovascular invasion/perineural invasion.

Figure 3

Learning curves of the networks. The training and validation loss generally decrease and converge along with training time.

Table 2

The results (mean and ranges) of different models to classify LVI/PNI based on a threshold of 0.5

Variables	Pretrain (+)	Pretrain (−)
AUC	0.6598 (0.5316–0.7881)	0.6683 (0.5523–0.7843)
Sensitivity	0.5167 (0.1673–0.8661)	0.5214 (0.1474–0.8954)
Specificity	0.7222 (0.4264–1.000)	0.7193 (0.3683–1.000)
PPV	0.5382 (0.2466–0.8298)	0.5605 (0.2675–0.8535)
NPV	0.7345 (0.6177–0.8514)	0.7211 (0.5131–0.9291)
Precision	0.5382 (0.2466–0.8298)	0.5605 (0.2675–0.8535)
F1 score	0.5015 (0.2798–0.7232)	0.5010 (0.2690–0.733)
Accuracy	0.6488 (0.5017–0.7960)	0.6548 (0.5168–0.7928)

LVI, lymphovascular invasion; PNI, perineural invasion; AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value.

Figure 4

Receiver operating characteristic curves of different models of LVI/PNI classification. LVI/PNI, lymphovascular invasion/perineural invasion. Our model performance and random guess performance are represented in the red line and the blue dotted line, respectively.

SD, standard deviation; LVI/PNI, lymphovascular invasion/perineural invasion. Learning curves of the networks. The training and validation loss generally decrease and converge along with training time. LVI, lymphovascular invasion; PNI, perineural invasion; AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value. Receiver operating characteristic curves of different models of LVI/PNI classification. LVI/PNI, lymphovascular invasion/perineural invasion. Our model performance and random guess performance are represented in the red line and the blue dotted line, respectively.

Discussion

CNNs are a class of deep learning methods that perform especially well on image recognition tasks. A CNN is composed of multiple network layers, such as convolution layers, pooling layers, and fully connected layers; automatically extracts key features from a training data set; and adaptively learns spatial hierarchies of features through a backpropagation algorithm as well as by fine-tuning the hyperparameters of each neural network layer (24). The application of CNN in clinical medical imaging is rapidly increasing. For a topic with an abundance of well-annotated data, the performance of CNN can be outstanding. For example, classification of skin cancer (10), classification of dermoscopic melanoma recognition (25), detection of diabetic retinopathy with retinal fundus photographs (9), detection of lymph node metastases with whole slide images in breast cancer (11), and anatomical classification of esophagogastroduodenoscopy images (26) have been performed using CNN, with resulting AUCs ranging from 0.79 to 0.99. Intriguingly, it has been proposed that CNN may be able to astutely reveal subtle biological characteristics that are not visible to physicians. As examples, researchers have used CNN to predict survival in colorectal cancer with haematoxylin-eosin-stained tumor tissue (27) and to predict cardiovascular risk factors with retinal fundus photographs (13). The resulting discrimination abilities have been acceptable or excellent; for example, an AUC of 0.7 for predicting major cardiac events. Few studies have applied CNN with PET imaging in thoracic oncology. In one study, CNN was trained to classify mediastinal lymph nodes (positive or negative) of non-small cell lung cancers from FDG-PET images (28). In another study, Ypsilantis et al. (29) used data from 107 patients with esophageal cancer to demonstrate that CNN has the potential to extract PET imaging representations that are highly predictive of response to chemotherapy. To our knowledge, this is the first report to apply CNN to predict specific histopathological features in esophageal cancer. The significance of LVI and PNI has been well-established in the literature (5,6), and patients with these features have been shown to have worse outcomes and a higher risk of recurrence after treatment. However, information about these two histological features is currently impossible to gather prior to major surgical resection. Whereas most clinical examinations for esophageal cancer evaluate tumor behavior based on the anatomical extent of disease, PET scans measure the metabolic activity of the tumor. Although relationships between functional imaging and immunohistochemical biomarkers have been reported (30), no study has investigated the association between PET scans and specific histological characteristics in esophageal cancer. In this study, deep learning was applied in an attempt to unlock “hidden” information in PET scans and connect functional images with histological features. One caveat of this deep learning approach is the necessity of a large dataset to train the vast number of parameters contained in a CNN. For target datasets that are considered too small to successfully train a CNN, it is a common practice to use a large image dataset of similar physical characteristics (imaging modality) and image content (for example, natural objects) but of different object category to pre-train a network before finally training the network with the target dataset. Previous studies showed that finetuning networks based on pretrained weights, such as ImageNet, generally get better or equal results (31). Because there is no large 3D dataset of medical images available for pre-training 3D CNNs, we conducted a two-stage workflow in which the weights of network were pretrained based on a differentiating normal and abnormal esophagus task before being tasked with classifying image data based on whether LVI/PNI were present in the abnormal esophagus. Our results showed that pre-training generally didn’t show significant difference compared to training without pre-trained weights. It may indicate either learned features do not transfer well to our main task or the pre-trained weights were still limited by a relative small dataset compared to the data size of ImageNet. This study is novel because it explored the possible links between PET scan data and histological examination results. Our 3D-CNN used PET images of the entire esophagus as input, which eliminated the need for PET slice selection, tumor segmentation, and feature selection. To overcome the limitations of a small dataset and the lack of a 3D image dataset for transfer learning, we adopted a stepwise workflow. This study is also limited by the fact that the histological examinations were based on hematoxylin and eosin staining without special markers, e.g., CD34 and podoplanin for LVI or S100 to detect nerve fibers (32,33). The actual LVI/PNI percentage may have thus differed if more specific immunohistochemistry had been performed. To conclude, our 3D-CNN can be trained with PET imaging datasets to predict LNV/PNI in esophageal cancer with acceptable accuracy. Although our current results cannot be readily applied to clinical decision making, we demonstrated the potential of deep learning to uncover hidden information in PET scans and connect functional imaging with histopathological findings. With a larger dataset, the CNN can be trained to achieve a better prediction performance. The article’s supplementary files as

23 in total

Review 1. Overview of deep learning in medical imaging.

Authors: Kenji Suzuki
Journal: Radiol Phys Technol Date: 2017-07-08

2. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

3. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.

Authors: Babak Ehteshami Bejnordi; Mitko Veta; Paul Johannes van Diest; Bram van Ginneken; Nico Karssemeijer; Geert Litjens; Jeroen A W M van der Laak; Meyke Hermsen; Quirine F Manson; Maschenka Balkenhol; Oscar Geessink; Nikolaos Stathonikos; Marcory Crf van Dijk; Peter Bult; Francisco Beca; Andrew H Beck; Dayong Wang; Aditya Khosla; Rishab Gargeya; Humayun Irshad; Aoxiao Zhong; Qi Dou; Quanzheng Li; Hao Chen; Huang-Jing Lin; Pheng-Ann Heng; Christian Haß; Elia Bruni; Quincy Wong; Ugur Halici; Mustafa Ümit Öner; Rengul Cetin-Atalay; Matt Berseth; Vitali Khvatkov; Alexei Vylegzhanin; Oren Kraus; Muhammad Shaban; Nasir Rajpoot; Ruqayya Awan; Korsuk Sirinukunwattana; Talha Qaiser; Yee-Wah Tsang; David Tellez; Jonas Annuscheit; Peter Hufnagl; Mira Valkonen; Kimmo Kartasalo; Leena Latonen; Pekka Ruusuvuori; Kaisa Liimatainen; Shadi Albarqouni; Bharti Mungal; Ami George; Stefanie Demirci; Nassir Navab; Seiryo Watanabe; Shigeto Seno; Yoichi Takenaka; Hideo Matsuda; Hady Ahmady Phoulady; Vassili Kovalev; Alexander Kalinovsky; Vitali Liauchuk; Gloria Bueno; M Milagro Fernandez-Carrobles; Ismael Serrano; Oscar Deniz; Daniel Racoceanu; Rui Venâncio
Journal: JAMA Date: 2017-12-12 Impact factor: 56.272

4. Global cancer statistics, 2012.

Authors: Lindsey A Torre; Freddie Bray; Rebecca L Siegel; Jacques Ferlay; Joannie Lortet-Tieulent; Ahmedin Jemal
Journal: CA Cancer J Clin Date: 2015-02-04 Impact factor: 508.702

5. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.

Authors: Ryan Poplin; Avinash V Varadarajan; Katy Blumer; Yun Liu; Michael V McConnell; Greg S Corrado; Lily Peng; Dale R Webster
Journal: Nat Biomed Eng Date: 2018-02-19 Impact factor: 25.671

6. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from ¹⁸F-FDG PET/CT images.

Authors: Hongkai Wang; Zongwei Zhou; Yingci Li; Zhonghua Chen; Peiou Lu; Wenzhi Wang; Wanyu Liu; Lijuan Yu
Journal: EJNMMI Res Date: 2017-01-28 Impact factor: 3.138

7. Deep learning based tissue analysis predicts outcome in colorectal cancer.

Authors: Dmitrii Bychkov; Nina Linder; Riku Turkki; Stig Nordling; Panu E Kovanen; Clare Verrill; Margarita Walliander; Mikael Lundin; Caj Haglund; Johan Lundin
Journal: Sci Rep Date: 2018-02-21 Impact factor: 4.379

Review 8. Convolutional neural networks: an overview and application in radiology.

Authors: Rikiya Yamashita; Mizuho Nishio; Richard Kinh Gian Do; Kaori Togashi
Journal: Insights Imaging Date: 2018-06-22

9. Deep Convolutional Neural Network-Based Positron Emission Tomography Analysis Predicts Esophageal Cancer Outcome.

Authors: Cheng-Kun Yang; Joe Chao-Yuan Yeh; Wei-Hsiang Yu; Ling-I Chien; Ko-Han Lin; Wen-Sheng Huang; Po-Kuei Hsu
Journal: J Clin Med Date: 2019-06-13 Impact factor: 4.241

10. The presence of lymphovascular and perineural infiltration after neoadjuvant therapy and oesophagectomy identifies patients at high risk for recurrence.

Authors: S M Lagarde; A W Phillips; M Navidi; B Disep; A Immanuel; S M Griffin
Journal: Br J Cancer Date: 2015-11-10 Impact factor: 7.640

2 in total

1. Perineural Invasion Is a Significant Indicator of High Malignant Degree and Poor Prognosis in Esophageal Cancer: A Systematic Review and Meta-Analysis.

Authors: Liuyang Bai; Liangying Yan; Yaping Guo; Luyun He; Zhiyan Sun; Wenbo Cao; Jing Lu; Saijun Mo
Journal: Front Oncol Date: 2022-06-08 Impact factor: 5.738

2. Atom Search Optimization with the Deep Transfer Learning-Driven Esophageal Cancer Classification Model.

Authors: Nawaf R Alharbe; Raafat M Munshi; Manal M Khayyat; Mashael M Khayyat; Saadia Hassan Abdalaha Hamza; Abeer A Aljohani
Journal: Comput Intell Neurosci Date: 2022-09-16

2 in total