Literature DB >> 34164467

Predicting the recurrence risk of pancreatic neuroendocrine neoplasms after radical resection using deep learning radiomics with preoperative computed tomography images.

Chenyu Song¹, Mingyu Wang², Yanji Luo¹, Jie Chen³, Zhenpeng Peng¹, Yangdi Wang¹, Hongyuan Zhang², Zi-Ping Li¹, Jingxian Shen⁴, Bingsheng Huang², Shi-Ting Feng¹.

Abstract

BACKGROUND: To establish and validate a prediction model for pancreatic neuroendocrine neoplasms (pNENs) recurrence after radical surgery with preoperative computed tomography (CT) images.
METHODS: We retrospectively collected data from 74 patients with pathologically confirmed pNENs (internal group: 56 patients, Hospital I; external validation group: 18 patients, Hospital II). Using the internal group, models were trained with CT findings evaluated by radiologists, radiomics, and deep learning radiomics (DLR) to predict 5-year pNEN recurrence. Radiomics and DLR models were established for arterial (A), venous (V), and arterial and venous (A&V) contrast phases. The model with the optimal performance was further combined with clinical information, and all patients were divided into high- and low-risk groups to analyze survival with the Kaplan-Meier method.
RESULTS: In the internal group, the areas under the curves (AUCs) of DLR-A, DLR-V, and DLR-A&V models were 0.80, 0.58, and 0.72, respectively. The corresponding radiomics AUCs were 0.74, 0.68, and 0.70. The AUC of the CT findings model was 0.53. The DLR-A model represented the optimum; added clinical information improved the AUC from 0.80 to 0.83. In the validation group, the AUCs of DLR-A, DLR-V, and DLR-A&V models were 0.77, 0.48, and 0.64, respectively, and those of radiomics-A, radiomics-V, and radiomics-A&V models were 0.56, 0.52, and 0.56, respectively. The AUC of the CT findings model was 0.52. In the validation group, the comparison between the DLR-A and the random models showed a trend of significant difference (P=0.058). Recurrence-free survival differed significantly between high- and low-risk groups (P=0.003).
CONCLUSIONS: Using DLR, we successfully established a preoperative recurrence prediction model for pNEN patients after radical surgery. This allows a risk evaluation of pNEN recurrence, optimizing clinical decision-making. 2021 Annals of Translational Medicine. All rights reserved.

Entities: Chemical

Keywords: Pancreatic neuroendocrine neoplasms (pNENs); deep learning radiomics (DLR); survival analysis

Year: 2021 PMID： 34164467 PMCID： PMC8184461 DOI： 10.21037/atm-21-25

Source DB: PubMed Journal: Ann Transl Med ISSN： 2305-5839

Introduction

Pancreatic neuroendocrine neoplasms (pNENs) are tumors with complex biological behaviors (1,2). R0 surgical resection is the first-line therapy for non-metastatic neuroendocrine neoplasms, but its postoperative recurrence is variable and difficult to predict, with 5-year recurrence rates ranging from 5% to 80% (3-6). If the probability of a pNEN recurrence (including local recurrence and distant metastasis) could be accurately predicted before surgery, the preoperative surgical plan could be optimized, and the management of the postoperative follow-up and intervention could be arranged in advance. This strategy can minimize the recurrence probability and reduce the adverse impact of postoperative tumor recurrence, thus improving the prognosis of patients (7). Specifically, for patients with low recurrence risk, the frequency of surveillance can be reduced and a relatively longer monitoring interval can be set (8). For patients at high risk of recurrence, surgical margins should be expanded, and lymph node dissection should be more thorough in their preoperative surgical plan. Likewise, emphasis should be placed on postoperative follow-up and combined treatment in these patients. Some studies reported methods for recurrence prediction in pNENs (1,9). Pathological parameters including the Ki-67 index of postoperative specimens or preoperative biopsy were used to predict the prognosis of pNEN patients (10,11). However, these studies were either based on indicators obtained after surgery or from fractional tissue, and the predictive performance was unsatisfactory with sensitivity (SEN) values of less than 40% including the Ki-67 index. Thus, these approaches cannot effectively guide preoperative management. Computed tomography (CT) is commonly used for the diagnosis of pancreatic diseases with high diagnostic accuracy (ACC) in pNENs. Several studies (12-14) have shown that CT findings such as tumor size, tumor vascularity, and CT value can be used to predict the prognosis preoperatively. A CT ratio (the CT value of the tumor divided by that of non-tumorous pancreatic parenchyma) <0.85 and tumor size ≥3.0 cm were shown to be independent prognostic factors associated with the disease-free survival of patients with pNEN (14). However, these studies are all based on indicators evaluated by radiologists with inevitable subjectivity and measurement errors. In addition, such studies are generally limited, because relevant indicators are only for factor analysis but not used for the establishment and validation of more practical prediction models. Radiomics has achieved great success in medical image analysis. Image features with strong identification power can be automatically analyzed with high throughput and extracted by computers for auxiliary diagnosis or therapy response prediction. In research, radiomics has been a commonly used method to predict the prognosis of patients, and medical image analysis technology based on deep learning brings more opportunities and challenges for prognosis prediction. Wang et al. (15) achieved 18F- fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT image-based prediction of lymph node metastasis in non-small cell lung cancer with deep learning. Using deep learning techniques, another group (16) established a 3-year recurrence prediction model for patients with ovarian cancer based on CT images. Chen et al. (17) used enhanced CT images and other predictors (tumor location, size, and other information) to establish the ResNet model that predicts the 3- and 5-year recurrence-free survival (RFS) rates for gastrointestinal stromal tumor patients with area under the curve (AUC) values of more than 0.90. A newly developed method called deep learning radiomics (DLR) (18,19) can extract quantitative and high-throughput features from medical images by pretrained artificial neural networks. This approach is different from the radiomics method that extracts explicitly designed features, and DLR has been proved a promising tool for computer-aided tumor prognosis prediction. It has been successfully applied to many clinical problems, such as predicting the stages of liver fibrosis (18) and predicting axillary lymph node metastasis in early-stage breast cancer (20). To our knowledge, no study has assessed the recurrence prediction in pNEN patients based on radiomics or DLR techniques yet. This study aimed to establish and validate a recurrence prediction model for pNEN patients after radical surgery based on their preoperative CT images using CT findings evaluated by radiologists, radiomics, and DLR. We present the following article in accordance with the STROBE reporting checklist (available at http://dx.doi.org/10.21037/atm-21-25).

Methods

Study design

The study design is shown in . We separately extracted features for training models based on the data from Hospital I (the First Affiliated Hospital of Sun Yat-Sen University, internal group) in three ways, namely, CT findings were evaluated by radiologists, radiomics, and DLR. Among them, radiomics and DLR were both used to extract features from images in the arterial, venous, and arterial & venous phases. In the second step, the models were validated. After cross-validation was completed using the internal group, an external validation was performed with CT images from Hospital II (Sun Yat-sen University Cancer Center, independent external group). Afterward, clinical indicators were added to the optimal model, and cross-validation was again performed on the data from Hospital I to observe the impact of clinical indicators on the predictive performance of the optimal model. In the last step, we constructed an optimum model-based risk stratification model to explore its survival predictive potential.

Figure 1

Flow chart of the study design. Computed tomography (CT) images were obtained in the unenhanced, arterial, and venous phases. Data from Hospital I were used to establish the prediction models (radiologist assessment, radiomics, and deep learning radiomics). Then, the external group from Hospital II was used to validate the prediction models. After the optimal prediction model had been selected, clinical indicators were added to observe changes in the predictive performance of this optimal model. In addition, an optimum model-based risk stratification model was established to explore its survival predictive potential.

Acquisition of patient data

Patient selection and clinical data

This study was conducted in strict accordance with the principles of the Declaration of Helsinki (as revised in 2013). This retrospective study was approved by the Institutional Review Board of the First Affiliated Hospital of Sun Yat-sen University (No.: 2018-181), and written informed consent was waived by the Institutional Review Board. All patients had pNENs, that were pathologically confirmed after radical surgery from Hospital I and Hospital II, between 2010 and 2018 and did not receive any drugs or surgical treatment at the time of (or before) CT imaging. Patients with one of the following four conditions were excluded from the study population: (I) distant metastases had been detected in their first examination; (II) another concomitant malignancy was diagnosed; (III) a multiple endocrine neoplasia syndrome was confirmed; and (IV) not all CT images were available. The data filtering process is shown in and the data inclusion and exclusion criteria of the two medical centers were consistent.

Figure 2

Data filtering procedure. pNEN, pancreatic neuroendocrine neoplasm; MEN, multiple endocrine neoplasia; CT, computed tomography.

Data filtering procedure. pNEN, pancreatic neuroendocrine neoplasm; MEN, multiple endocrine neoplasia; CT, computed tomography. This study included three clinical parameters: age, sex, and neuroendocrine symptoms. Neuroendocrine symptoms were defined as relevant symptoms typically caused by excessive secretion of hormones in patients with corresponding elevated hormone levels detected in blood samples. Patients were followed up from the date of surgery to May 24, 2019. A medical imaging examination [ultrasound/CT/magnetic resonance imaging (MRI)] was performed at least once every 6 months in the first year, and once every six months or 1 year according to tumor pathological grade after 1 year (G1: once every year, G2/3 or neuroendocrine carcinoma: once every 6 months). PET-CTs with 68Ga-labeled somatostatin analogues and 18F-labeled FDG were used to examine suspected cases of recurrence. The date of recurrence (including local recurrence and distant metastasis) was defined as the time of recurrence detected by cross-sectional imaging (CT/MRI) during the follow-up. The neoplasm grew at the primary site or other organ confirmed by PET-CT or biopsy was defined as local recurrence or distant metastasis. For pNEN patients with postoperative recurrence, the 5-year RFS was defined as the time from the date of surgery to the date of the first detection of a postoperative recurrence. For patients without postoperative recurrence, the RFS was defined as the time from the date of surgery to the date of the latest follow-up.

CT image acquisition

CT scans were performed in Hospital I using a 64-slice spiral CT scanner (Aquilion 64; Canon Medical Systems). The scanning parameters were as follows: 0.5-mm slice thickness, 0.5-mm slice interval, 200-mAs tube current, and 120-kVp tube voltage. An iodinated contrast agent (Ultravist 300; Bayer Schering, Berlin) was administered intravenously at a rate of 3 mL/s via a high-pressure syringe after pre-contrast imaging followed by a saline chaser bolus (40 mL) at the same rate. The arterial and venous phases were obtained at 35 and 65 s after contrast injection, respectively. In Hospital II, the CT images were captured using a 128-slice spiral CT system (Discovery CT750 HD; GE System, Milwaukee, WI, USA). The scanning parameters were as follows: 2-mm slice thickness, 1-mm slice interval, automatic tube current modulation (maximum 450 mAs), and 100–140-kVp tube voltage. The contrast agent was administered as described for Hospital I. The arterial and venous phases were obtained after the aortic opacification reached 100 Hounsfield units (HU). The average scan started after contrast injection at 36 s (range, 30–42 s) for the arterial phase and 66 s (range, 58–70 s) for the venous phase.

CT image analysis

CT findings

Regarding the CT findings assessed by radiologists, the conditions of the primary lesion, pancreas, lymph node, hepatobiliary system, and portal system were all evaluated independently by two radiologists with more than 10 years of experience in the imaging diagnosis of abdominal diseases that were blinded to the patients’ pathological results. A detailed description of evaluated CT findings is shown in Table S1.

Radiomics

For radiomics, regions of interest (ROIs) were delineated by two radiologists that were responsible for the CT image evaluation, and they were also blinded to patients’ pathological results during the whole process. The ground truth (GT) values of all patients were labeled on CT images in arterial and venous phases using the ITK-SNAP software, as shown in Figure S1. First, we converted the raw data from the DICOM to the NIFTI format. According to the experience of the radiologists, the window level and window width were set for the arterial phase to 130 and 310 HU, respectively, and for the venous phase to 120 and 320 HU, respectively. Finally, the voxel size of all images was resampled to 1 mm × 1 mm × 1 mm using the 3D cubic interpolation algorithm. Based on our own developed toolkit, we extracted 143 features describing intensity [37] and texture [106], such as gray-level co-occurrence matrix, spatial gray-level dependence matrix, neighborhood gray-tone difference matrix, and neighborhood gray-level difference statistics.

DLR

The prediction process based on DLR required only rough annotations by the radiologists. The ROI annotation was also completed by the two radiologists who were responsible for the CT image evaluation, and the pathological results of the patients were not disclosed to the radiologists. Each radiologist delineated the top layer, the largest layer, and the bottom layer of each tumor in the cross section. No strict criteria were used for delineation. The radiologists had only to draw the quadrilateral area containing the tumor area, as shown in Figure S1. We additionally collected 58 CT images without recurrence tags but with the GT of the segmentation in the arterial phase, and the scanning parameters of these data were consistent with those of the Hospital I data. These data were used specifically for the training and validation of the segmentation network. To ensure that the network learned the characteristics with higher identification power, we randomly sampled 22 cases from the Hospital I data set and mixed them with the additionally collected 58 cases, to randomly divide them at a 1:9 ratio into a verification and a training set. Then, we trained a two-dimensional U-net to extract DLR features. Further details are presented in y Figure S2 and the Supplementary Materials and Methods (“Training U-net for DLR” section). Data preprocessing was performed as described in the “Traditional radiomics” section. The data for the training network only comprised the arterial phase GT, and both arterial and venous phase images used the pretrained arterial model when extracting features. We included all slices into the pretrained U-net to retrieve slice-wise features, and then, we used a clustering-based method to aggregate the slice-wise features into patient-wise features. Details of the feature extraction procedure are shown in the Supplementary Materials and Methods (“DLR features extraction” section).

Training and validation of the prediction models

Training and cross-validation of the models

We built the recurrence prediction models using a support vector machine algorithm (based on Scikit-learn machine learning library). We used 10-fold cross-validation on the internal group to evaluate the performance of the prediction model. In each fold, the two-sample t-test or Mann-Whitney U test was performed, and the features which showed significant differences between recurrence vs. recurrence-free groups were selected from the training set before applying the selection results to the test set. The model parameters for each fold were determined using the grid searching method on the training set. The main evaluation indicators of the final model were ACC, SEN, specificity (SPC), and AUC. The receiver operating characteristic (ROC) curves of all models were compared with the random case (AUC =0.5), and the AUC values between models were also compared using the DeLong test performed by MedCal software (version 12.5.0.0 by MedCal software bvba).

External independent validation

Using the ROC curves of the external independent dataset, we evaluated the robustness of the method which provided the optimal performance on internal dataset. We used a model integration approach to predict recurrence risk in external independent validation. The details of the model integration are presented in the Supplementary Materials and Methods (“Model integration” section).

Statistical analysis

Clinical information and CT findings were analyzed using univariate analysis. Continuous variables conforming to normal distribution were described by the mean and standard deviation, and the independent two-sample t-test was performed. If a normal distribution was not confirmed, the median and interquartile ranges were used, and the Mann-Whitney U test was performed as a non-parametric test. For categorical variables, the χ2 test or exact probability method was used in this study. P<0.05 was defined as statistically significant. In this study, based on the optimal prediction model, patients from the two hospitals were divided into high- and low-risk groups for Kaplan-Meier analyses. Patients were stratified into high- and low-risk groups using the threshold of predicted recurrence probability defined as the highest Youden index (21) of the cross-validation ROC curves. All statistical analysis were performed by SPSS software (version 25.0 for Macintosh, IBM, Chicago, IL, USA).

Results

Finally, a total of 74 pNEN patients were included in this study. Fifty-six patients (recurrence of 10 patients within 5 years) of Hospital I are used for training and internal validation, and 18 patients (recurrence of 9 patients within 5 years) of Hospital II are used for external independent validation. The clinical information of patients of Hospital I is shown in Table S2 with CT findings. As for patients of Hospital II, neither the sex (6 females with recurrence of 3 patients, 12 males with recurrence of 6 patients) nor the mean age (53.56±10.36 in recurrence group, 48.78±15.67 in recurrence-free group) was significantly different between the recurrence and recurrence-free groups. We annotated (by C Song and Y Luo with 4 and 8 years of working experience, respectively) on 5 random cases from the data. The mean time of the two radiologists to locate were 11.30 and 9.98 s, and the medians were 11.04 and 9.79 s. The two radiologists spent an average of 647.19 and 796.01 s in the fine-delineation process, with a median of 305.51 and 382.59 s, respectively.

Clinical information and CT findings

The results of the univariate analysis for Hospital I are shown in Table S2. Among the examined factors, the CT ratios of the primary lesion in the unenhanced phase and the venous phase were significantly different between the recurrence and recurrence-free groups. Neuroendocrine symptoms, the shape and size of the primary lesion, the shape of pancreatic duct, lymph node morphology, and lymph node enhancement pattern were all significantly associated with recurrence. There were more patients with tumor recurrence in the groups with asymptomatic tumors, cystic-solid tumors, tumors with a maximum diameter greater than 20 mm, the dilation or cutoff of pancreatic duct, normal lymph node size, and homogeneous lymph node enhancement. In the univariate analysis for Hospital II, among the examined CT findings, only the CT ratio and the relatively enhanced rate of the primary lesion in the arterial phase and the venous phase were significantly different between the recurrence and recurrence-free groups. The AUCs were 0.53 and 0.52 respectively in the internal and validation groups.

Radiomics

shows the results of the 10-fold cross-validation of the radiomics model based on features in different phases extracted from the data of Hospital I. Using the data of Hospital II, the results of the established prediction models were verified independently. According to the cross-validation and independent validation results, the model performed best in the arterial phase with an AUC of 0.74 for cross-validation and 0.56 for independent validation. The results of the DeLong test comparing the different phases are shown in Table S3. There were no significant differences in AUCs for different phases. The ROC curves of the radiomics model in different contrast phases are shown in Figure S4.

Table 1

Accuracy, sensitivity, specificity, and AUC values of the radiomics models for recurrence prediction (56 patients from Hospital I and 18 patients from Hospital II)

Models	Hospital I (internal data set)					Hospital II (independent data set)
Models	ACC	SEN	SPC	AUC	P	ACC	SEN	SPC	AUC	P
Radiomics-A	0.75	0.70	0.76	0.74	0.020	0.44	0.11	0.78	0.56	0.691
Radiomics-V	0.71	0.80	0.70	0.68	0.083	0.44	0.33	0.56	0.52	0.965
Radiomics-A&V	0.71	0.80	0.70	0.70	0.044	0.56	0.22	0.89	0.56	0.691

DLR

The 10-fold cross-validation results based on DLR features are shown in . The Hospital II data were used to validate this model independently. The model reached the highest AUCs in the arterial phase both for cross-validation (0.80) and independent validation (0.77). The ROCs are compared in Table S3. For the different phases, no significant differences were detected in the cross-validation results. Figure S4 shows the ROC curves of all models trained with DLR.

Table 2

Accuracy, sensitivity, specificity, and AUC values of the DLR models for recurrence prediction (56 patients from Hospital I and 18 patients from Hospital II)

Models	Hospital I (internal data set)					Hospital II (independent data set)
Models	ACC	SEN	SPC	AUC	P	ACC	SEN	SPC	AUC	P
DLR-A	0.71	0.90	0.67	0.80	0.003	0.61	0.55	0.66	0.77	0.058
DLR-V	0.73	0.60	0.76	0.58	0.429	0.44	0.22	0.67	0.48	0.895
DLR-A&V	0.71	0.80	0.70	0.72	0.034	0.61	0.44	0.78	0.64	0.310

The threshold of the predictive probability used to calculate ACC, SEN, and SPC was the highest Youden index of the cross-validation ROC curves for the internal data set. A P value indicates the significance level of the comparison between an AUC with that of a random case (AUC =0.5). AUC, area under the curve; DLR, deep learning radiomics; ACC, accuracy; SEN, sensitivity; SPC, specificity; A, arterial; V, venous; A&V, arterial & venous. Optimal prediction model with and without added clinical features, the comparison of the optimal radiomics model, the optimal DLR model, and the model based on CT findings regarding the prediction of postoperative tumor recurrence is shown in . The highest cross-validated AUC value was observed in the DLR model of the arterial phase (DLR-A; AUC =0.80). The cross-validation results with added clinical information (not included in the feature extraction) are shown in . After including the three clinical parameters, all model indicators, except for SEN decreasing by 0.10, were improved to some extent with ACC, SPC, and AUC reaching 0.80, 0.80, and 0.83, respectively. However, the ROC results of the models before and after the addition of the clinical information were not significantly different. As shown in Table S3, all image-based models showed no statistically significant differences between each other. displays the ROC curves of the optimal radiomics model (radiomics-A), the optimal DLR model (DLR-A), and the model based on CT findings. The ROC curve of the DLR-A model with added clinical information is presented in .

Table 3

Performance comparison between the optimal radiomics model (radiomics-A), the optimal DLR model (DLR-A), and the model based on CT findings (56 patients from Hospital I)

Model	ACC	SEN	SPC	AUC	P
Radiomics-A	0.75	0.70	0.76	0.74	0.020
DLR-A	0.71	0.90	0.67	0.80	0.003
CT findings	0.63	0.50	0.65	0.53	0.748

Table 4

Accuracy, sensitivity, specificity, and AUC values of the DLR-A recurrence prediction model with added clinical information (56 patients from Hospital I)

Model	ACC	SEN	SPC	AUC	P^a	P^b
Model	ACC	SEN	SPC	AUC	P^a	DLR-A + s	DLR-A + sa	DLR-A + sag
DLR-A	0.71	0.90	0.67	0.80	0.003	0.413	0.822	0.680
DLR-A + s	0.71	0.80	0.70	0.75	0.015	–	0.459	0.108
DLR-A + sa	0.76	0.90	0.73	0.79	0.004	–	–	0.483
DLR-A + sag	0.80	0.80	0.80	0.83	0.001	–	–	–

a, a P value indicates the significance level of the comparison between an AUC with that of a random case (AUC =0.5). b, a P value indicates the significance level of comparison between every two AUCs. AUC, area under the curve; DLR, deep learning radiomics; A, arterial; ACC, accuracy; SEN, sensitivity; SPC, specificity; + s, symptom added; + sa, symptom and age added; + sag, symptom, age, and gender added.

Figure 3

The receiver operating characteristic (ROC) of deep learning radiomics (DLR), radiomics and CT findings models in Hospital I. (A) The receiver operating characteristic (ROC) curves of the optimal radiomics (R) model (R-A), the optimal deep learning radiomics model (DLR-A), and the model based on CT findings. (B) The ROC curves of the DLR-A model with added clinical information. AUC, area under the curve.

A P value indicates the significance level of the comparison between an AUC with that of a random case (AUC =0.5). DLR, deep learning radiomics; A, arterial; CT, computed tomography; ACC, accuracy; SEN, sensitivity; SPC, specificity; AUC, area under the curve. a, a P value indicates the significance level of the comparison between an AUC with that of a random case (AUC =0.5). b, a P value indicates the significance level of comparison between every two AUCs. AUC, area under the curve; DLR, deep learning radiomics; A, arterial; ACC, accuracy; SEN, sensitivity; SPC, specificity; + s, symptom added; + sa, symptom and age added; + sag, symptom, age, and gender added. The receiver operating characteristic (ROC) of deep learning radiomics (DLR), radiomics and CT findings models in Hospital I. (A) The receiver operating characteristic (ROC) curves of the optimal radiomics (R) model (R-A), the optimal deep learning radiomics model (DLR-A), and the model based on CT findings. (B) The ROC curves of the DLR-A model with added clinical information. AUC, area under the curve.

Survival analysis

Using the predicted value of the DLR-A model as the risk factor and the highest Youden index in the internal group as its stratification threshold (0.165499), the combined patients from both hospitals were divided into a high-risk and a low-risk group. The mean and median survival times were in the high-risk group 36.28 months [95% confidence interval (CI), 26.37 to 46.20 months] and 38.53 months (95% CI, 10.63 to 66.44 months), respectively. The mean survival time in the low-risk group was 53.11 months (95% CI, 46.91 to 59.32 months). The survival analysis using the Kaplan-Meier method is shown in , in which the P value of the log-rank test is 0.003.

Figure 4

Survival analysis using the high- and low-risk groups according to the DLR-A model. The Kaplan-Meier analysis shows a statistically significant difference (P=0.003; log-rank test) between these groups regarding recurrence-free survival. DLR, deep learning radiomics; A, arterial.

Discussion

In this study, we successfully established recurrence prediction models for pNEN patients based on three methods: radiologist assessment, radiomics, and DLR. We also analyzed the influence of CT imaging phase and clinical information on the performance of the prediction models. Compared with previous studies (1,9,12,22,23) on postoperative recurrence in pNEN patients, we found that most of these studies performed univariate analyses based on biochemical indicators or CT findings without an established and validated prediction model. Some indicators like the Ki-67 index or the pathological grade can only be obtained after surgery limiting their practical application. Our study applied radiomics successfully to postoperative recurrence prediction in patients with pNEN based on preoperative parameters. The DLR-A model performed optimally on both internal and external data sets but without significantly difference between the models. The results of the CT findings in our study were consistent with those in previous publications (13,14,24-26). Smaller and round lesions often indicated less aggressive behavior or early discovery, which are both associated with a better prognosis. The CT ratio represents the difference in CT values between the primary lesion and the pancreas parenchyma. In the unenhanced phase, more patients in the recurrence-free group showed lower attenuation of the primary lesion relative to the pancreas parenchyma. An explanation might be that these lesions contained fewer solid components or that tumor cells proliferated more slowly. pNENs are highly vascularized tumors. Thus, they were significantly enhanced in the arterial phase in both the recurrence and the recurrence-free groups. However, in the venous phase, more patients in the recurrence group presented a lower attenuation of the primary lesion relative to the pancreas parenchyma. Possibly, the lesions of the recurrence group contained more blood vessel connections leading to faster blood flow, and consequently, a more obvious CT value decrease in the venous phase would be observed in the recurrence group. Regarding the lymph node morphology, the confluent multinodular lymph node group with its 100% recurrence rate comprised only one patient. In the enlarged lymph node group and the normal lymph node group, 60% [3/5] and 15% [6/41] of the patients presented with postoperative recurrence, respectively. Considering the aspect of lymph node enhancement patterns, the two patients of the heterogeneous enhancement group both presented with postoperative recurrence. Lymph node enlargement, fusion, and heterogeneous enhancement are important indicators of lymph node metastasis in various tumor types, and the same is true for pNENs (27). Given that the liver is the organ most susceptible to metastasis and that the venous reflux from the pancreas is drained through the portal system of the liver, the CT findings of the liver and its portal system were also included in our study. However, these group differences were not statistically significant. A larger pNEN sample size may be needed for further explorations. The model based on CT findings required the manual image evaluation by the radiologists, whereas the radiomics model involved the radiologists precisely delineating the ROIs. Although in this study as in most previous studies experienced radiologists were employed to avoid the variability and subjectivity of manually delineated ROIs, the level of experience to evaluate CT images differs in practice. Our simple semi-automatic method used in the DLR prediction model greatly reduced subjectivity and task complexity, while achieving high SEN by roughly locating the tumor. Although the ACC and SPC values of the external independent validation were low due to some deviation in the distribution of features, the AUC value of these data reached 0.77 indicating that the model still had a robust ability for risk stratification. We used a segmented network-based DLR method using image properties (mask) to supervise the network training and to automatically obtain more force-expressing features with less data volume. Therefore, over- or underfitting problems due to the use of unbalanced recurrent tags to supervise the training were avoided. Another study (23) conducted by our research team used only grading labels for supervision, and the results demonstrated that the deep learning method was not superior to the radiomics approach. Moreover, the findings suggested that the use of semantic labels such as grading or recurrence labels to supervise networks might limit the performance with small training sets. This study also compared the performance of the models based on different contrast enhancement phases, and we found that the models in the arterial phase performed superior to the models in the venous or arterial & venous phases for both radiomics and DLR models. This result is consistent with the previous findings of our team (23). The reasons for this are as follows: (1) Most pNEN lesions are highly vascularized. Thus, the difference between the primary tumor and the surrounding normal pancreas parenchyma is more obvious in the arterial phase, and the tumor outline is more clearly displayed. By contrast, the demarcation between tumor and surrounding parenchyma is relatively poor in the venous phase. The segmentation network would, therefore, better acquire the ability to distinguish the tumor from the surrounding tissue in the arterial phase. In other words, the segmentation network excluded any interference from the surrounding tissue in the arterial phase and could pay more attention to the characteristics of the tumor itself (2). Compared to DLR, the feature extraction was in the radiomics model limited by the radiologist-defined tumor boundaries. Because pNENs are better distinguishable in the arterial phase, it was easier to observe characteristics such as texture in the arterial than in the venous phase (3). In the DLR models, the characteristics with poor performance may be due to the obscured tumor contour in the venous phase and the inability of the network to effectively identify the tumor area. That a network trained on arterial data was unsuitable for venous data may be another reason, and the inherent phase differences led to the transfer failure (4). Finally, the performances of the combined arterial & venous phase models were for both DLR and radiomics methods not as good as the arterial phase models, which may have been caused by feature redundancy. Feature redundancy means that features have a high degree of collinearity. The same situation occurred in our previous study (23). Theoretically, high collinearity can lead to poor model prediction performance (28). We performed a collinearity analysis on the DLR features of arterial phase and venous phase, and the results showed that most of the two features have a high degree of collinearity (Figure S5). Therefore, the redundant information brought by the highly collinearity feature is the reason why the DLR-arterial & venous (DLR-A&V) model is inferior to the DRL-A model. In the current study we added clinical information to the optimal DLR-A model and found that the performance was improved without reaching statistical significance. This indicates the importance of clinical information and its positive effects on the modeling process. In this study, the optimal model (DLR-A) was selected to stratify the risk of postoperative recurrence in pNEN patients from two hospitals. According to the results of the Kaplan-Meier analysis, in the DLR-A model that determined the recurrence probability with a 5-year RFS cutoff, the survival rates differed significantly between high- and low-risk groups. Moreover, the survival analysis included not only the final status of a patient but also information about the time to reach this status. Compared with other model evaluation indicators like AUC, ACC, etc., the results of the survival analysis (such as survival time, mean survival time) reflected the ability of the prediction model to stratify the survival status of patients and their theoretical survival status with more practical significance. In our study, none of the models performed was better than random chance in external data set. There may be heterogeneity in the imaging data due to the different parameters in the scanning process at different centers, which can reduce the generalization ability of the prediction models. As indicated in some recent studies (29-31), domain adaptive technology based on deep learning may be applied to reduce the difference in data distribution to improve the generalization ability of the method in further studies. Another factors, such as surgeons of different experience in different hospitals, and postoperative monitoring frequency, etc., can be influential, and require prospective studies to verify. There are some limitations to our study. First, although the DLR-A model was the optimal model in our study, it was still only a semi-automatic method that requires a radiologist to provide information regarding the tumor location. Fully automatic localization or segmentation for feature extraction is warranted, not only to avoid the subjectivity of a radiologist but also to improve the prediction performance. Second, similar to our study, published studies in patients with pNENs are mostly limited by small data sets (23,32,33). This might be the reason that no statistically significant differences within each model were detected. However, the difference between the DLR-A and the random model was statistically significant in the internal group and in the external group, the difference between the DLR-A and the random model was nearly significant. The survival analysis also demonstrated the potential of the optimal model for prognosis prediction. In this study, the independent dataset was small. It needs a larger external dataset to further prove the robustness of the model in the future. Third, in four patients of the external group, the records regarding their neuroendocrine symptoms were not available. Therefore, we failed to validate the model with added clinical features using the data from Hospital II. Fourth, in our presented study, we did not predict two outcomes (local recurrence and distal metastasis) separately due to the limitation of the sample size. Finally, the performance of our optimal model remains to be improved with emerging artificial intelligence technologies. We believe that these technologies can overcome the problems of sample size and annotation to further improve the ACC of the prediction model.

Conclusions

In summary, this study successfully established a preoperative prediction model of pNEN recurrence with good generalization in an external data set. It provides the basis to evaluate the risk of postoperative recurrence in pNEN patients with high SEN, thus aiding decision-making processes in clinical practice. But how individual follow-up surveillance and treatment plans in patients with different postoperative risks should be performed, needs to be further explored based on the results of the current study. The article’s supplementary files as

31 in total

1. Recent progress in the understanding, diagnosis, and treatment of gastroenteropancreatic neuroendocrine tumors.

Authors: Kiran K Turaga; Larry K Kvols
Journal: CA Cancer J Clin Date: 2011 Mar-Apr Impact factor: 508.702

2. Maximum Mean Discrepancy Based Multiple Kernel Learning for Incomplete Multimodality Neuroimaging Data.

Authors: Xiaofeng Zhu; Kim-Han Thung; Ehsan Adeli; Yu Zhang; Dinggang Shen
Journal: Med Image Comput Comput Assist Interv Date: 2017-09-04

3. Clinical presentation, recurrence, and survival in patients with neuroendocrine tumors: results from a prospective institutional database.

Authors: Monica Ter-Minassian; Jennifer A Chan; Susanne M Hooshmand; Lauren K Brais; Anastassia Daskalova; Rachel Heafield; Laurie Buchanan; Zhi Rong Qian; Charles S Fuchs; Xihong Lin; David C Christiani; Matthew H Kulke
Journal: Endocr Relat Cancer Date: 2013-03-22 Impact factor: 5.678

4. Surgical Treatment as a Principle for Patients with High-Grade Pancreatic Neuroendocrine Carcinoma: A Nordic Multicenter Comparative Study.

Authors: Sven-Petter Haugvik; Eva Tiensuu Janson; Pia Österlund; Seppo W Langer; Ragnhild Sørum Falk; Knut Jørgen Labori; Lene Weber Vestermark; Henning Grønbæk; Ivar Prydz Gladhaug; Halfdan Sorbye
Journal: Ann Surg Oncol Date: 2015-12-17 Impact factor: 5.344

5. Pancreatic neuroendocrine tumour: Correlation of apparent diffusion coefficient or WHO classification with recurrence-free survival.

Authors: Mimi Kim; Tae Wook Kang; Young Kon Kim; Seong Hyun Kim; Wooil Kwon; Sang Yun Ha; Sang A Ji
Journal: Eur J Radiol Date: 2016-01-04 Impact factor: 3.528

6. Clinical, pathological, and demographic factors associated with development of recurrences after surgical resection in elderly patients with neuroendocrine tumors.

Authors: C Shen; A Dasari; Y Chu; D M Halperin; S Zhou; Y Xu; Y T Shih; J C Yao
Journal: Ann Oncol Date: 2019-11-01 Impact factor: 32.976

7. Preoperative and postoperative prediction of long-term meningioma outcomes.

Authors: Efstathios D Gennatas; Ashley Wu; Steve E Braunstein; Olivier Morin; William C Chen; Stephen T Magill; Chetna Gopinath; Javier E Villaneueva-Meyer; Arie Perry; Michael W McDermott; Timothy D Solberg; Gilmer Valdes; David R Raleigh
Journal: PLoS One Date: 2018-09-20 Impact factor: 3.240

8. Developed and validated a prognostic nomogram for recurrence-free survival after complete surgical resection of local primary gastrointestinal stromal tumors based on deep learning.

Authors: Tao Chen; Shangqing Liu; Yong Li; Xingyu Feng; Wei Xiong; Xixi Zhao; Yali Yang; Cangui Zhang; Yanfeng Hu; Hao Chen; Tian Lin; Mingli Zhao; Hao Liu; Jiang Yu; Yikai Xu; Yu Zhang; Guoxin Li
Journal: EBioMedicine Date: 2018-12-23 Impact factor: 8.143

9. Surgical resection of the primary tumor leads to prolonged survival in metastatic pancreatic neuroendocrine carcinoma.

Authors: Tingting Feng; Wangxia Lv; Meiqin Yuan; Zhong Shi; Haijun Zhong; Sunbin Ling
Journal: World J Surg Oncol Date: 2019-03-21 Impact factor: 2.754

10. Multi-institutional Development and External Validation of a Nomogram to Predict Recurrence After Curative Resection of Pancreatic Neuroendocrine Tumors.

Authors: Alessandra Pulvirenti; Ammar A Javed; Luca Landoni; Nigel B Jamieson; Joanne F Chou; Marco Miotto; Jin He; Mithat Gonen; Antonio Pea; Laura H Tang; Chiara Nessi; Sara Cingarlini; Michael I D'Angelica; Anthony J Gill; T Peter Kingham; Aldo Scarpa; Matthew J Weiss; Vinod P Balachandran; Jaswinder S Samra; John L Cameron; William R Jarnagin; Roberto Salvia; Christopher L Wolfgang; Peter J Allen; Claudio Bassiy
Journal: Ann Surg Date: 2021-12-01 Impact factor: 13.787

5 in total

Review 1. GEP-NET radiomics: a systematic review and radiomics quality score assessment.

Authors: Femke C R Staal; Else A Aalbersberg; Daphne van der Velden; Erica A Wilthagen; Margot E T Tesselaar; Regina G H Beets-Tan; Monique Maas
Journal: Eur Radiol Date: 2022-07-26 Impact factor: 7.034

Review 2. Using Quantitative Imaging for Personalized Medicine in Pancreatic Cancer: A Review of Radiomics and Deep Learning Applications.

Authors: Kiersten Preuss; Nate Thach; Xiaoying Liang; Michael Baine; Justin Chen; Chi Zhang; Huijing Du; Hongfeng Yu; Chi Lin; Michael A Hollingsworth; Dandan Zheng
Journal: Cancers (Basel) Date: 2022-03-24 Impact factor: 6.639

Review 3. Artificial Intelligence and Machine Learning in the Diagnosis and Management of Gastroenteropancreatic Neuroendocrine Neoplasms-A Scoping Review.

Authors: Athanasios G Pantelis; Panagiota A Panagopoulou; Dimitris P Lapatsanis
Journal: Diagnostics (Basel) Date: 2022-03-31

4. Long-term effects of total vs. partial pancreatectomy among patients with pancreatic cancer: a population-based study.

Authors: Zhiwen Yang; Qiang Tao; Salamu Mijiti; Dandong Luo; Xiang Tang; Jia Liu; Lingmin Jiang; Zonghao Liu; Chen Liang; Xinyue Tu; Peng Zhao; Andreas Minh Luu; Francesco Serra; Roberta Gelmini; Yong Wang; Yun Zheng
Journal: Ann Transl Med Date: 2022-05

5. CT-based radiomics for prediction of therapeutic response to Everolimus in metastatic neuroendocrine tumors.

Authors: Damiano Caruso; Michela Polici; Maria Rinzivillo; Marta Zerunian; Ilaria Nacci; Matteo Marasco; Ludovica Magi; Mariarita Tarallo; Simona Gargiulo; Elsa Iannicelli; Bruno Annibale; Andrea Laghi; Francesco Panzuto
Journal: Radiol Med Date: 2022-06-18 Impact factor: 6.313

5 in total