Literature DB >> 33330105

Survival Prediction in Gallbladder Cancer Using CT Based Machine Learning.

Zefan Liu1, Guannan Zhu1, Xian Jiang1, Yunuo Zhao1, Hao Zeng1, Jing Jing1, Xuelei Ma1,2.   

Abstract

OBJECTIVE: To establish a classifier for accurately predicting the overall survival of gallbladder cancer (GBC) patients by analyzing pre-treatment CT images using machine learning technology.
METHODS: This retrospective study included 141 patients with pathologically confirmed GBC. After obtaining the pre-treatment CT images, manual segmentation of the tumor lesion was performed and LIFEx package was used to extract the tumor signature. Next, LASSO and Random Forest methods were used to optimize and model. Finally, the clinical information was combined to accurately predict the survival outcomes of GBC patients.
RESULTS: Fifteen CT features were selected through LASSO and random forest. On the basis of relative importance GLZLM-HGZE, GLCM-homogeneity and NGLDM-coarseness were included in the final model. The hazard ratio of the CT-based model was 1.462(95% CI: 1.014-2.107). According to the median of risk score, all patients were divided into high and low risk groups, and survival analysis showed that high-risk groups had a poor survival outcome (P = 0.012). After inclusion of clinical factors, we used multivariate COX to classify patients with GBC. The AUC values in the test set and validation set for 3 years reached 0.79 and 0.73, respectively.
CONCLUSION: GBC survival outcomes could be predicted by radiomics based on LASSO and Random Forest.
Copyright © 2020 Liu, Zhu, Jiang, Zhao, Zeng, Jing and Ma.

Entities:  

Keywords:  gallbladder cancer; machine learning; prognosis; radiomics; random forest

Year:  2020        PMID: 33330105      PMCID: PMC7729190          DOI: 10.3389/fonc.2020.604288

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   6.244


Introduction

Gallbladder cancer (GBC) is the fifth most common tumor of the digestive system and accounts for 95% of malignant tumors in the biliary system (1). The lack of specific clinical manifestations in the early stage, coupled with high invasive biological features and abnormal anatomic location of the gallbladder, results in poor survival outcomes (2, 3). In addition, due to the low sensitivity to chemotherapy and radiotherapy, and the lack of effective therapeutic targets, surgical resection is still the main treatment option (4). At present, widespread concern about the survival outcomes of GBC has been aroused. Researchers hope to distinguish patients with higher prognostic risk, so as to implement personalized medical treatment. So far, prognostic analyses of GBC have depended on laboratory-tested indicators such as tumor markers, nutritional indicators, and gene expression signatures, but these indicators lack an intuitive analysis of the whole tumor lesion (5–7). Lesions can be directly observed by radiological images in clinic and lesion information not visible to the naked eye can be provided by radiomics. In recent years, radiomics has been developed to focus on the extraction and mining massive medical imaging data. It is hypothesized that these selected imaging features reflect specific tumor phenotypes (8, 9). Because these image signatures provide a comprehensive picture of the entire tumor entity, the heterogeneity of these signatures may have implications for clinical events such as treatment response, survival outcomes and disease progression. Some studies have focused on the appearance of imaging features at different cancer stages (10, 11). In addition many other studies have reported the effect of imaging features on survival outcomes, but no studies have been reported on GBC. GBC survival prediction model is of great significance for patients’ prognosis assessment, treatment mode selection, surgical patient selection, postoperative adjuvant treatment plan determination, high-risk recurrence patient identification, follow-up frequency formulation, and rational use of medical resources. In this study, we assessed a number of CT-based radiomics parameters to predict patient’s overall survival (OS). Patient cohort with a total of 141 patients was used to analyze image data, extract features, and perform model tests. All the selected parameters were evaluated for their predictive power and stability. Finally, we combined clinical information for a cost-effective prediction.

Methods

Patient Selection

The flow of data analysis and processing is shown in . The records of patients from 2010 to 2017 were selected from the Department of Hepatobiliary surgery through an electronic medical review. Inclusion criteria for patients: 1) Pathological examination Confirmed GBC; 2) Perform CT scan before tumor biopsy or surgery. Some patients were excluded because of the history of liver surgery or other liver lesions leading to gallbladder lesions that could not be identified (). Considering that conventional CT, including CT and Contrast-enhanced CT, is commonly used tests in clinical practice and have good cost-effectiveness, it was selected as the study object. Finally, a total of 141 patients were included in this study, and CT images were collected from Radiology Department.
Figure 1

Workflow for image processing and machine learning.

Table 1

The general condition of the patients in this study.

Patient (Total=141)Patient characteristics
MaleFemale
Gender56(39.7%)85(60.3%)
<3030-50>50
Age23(16.3%)84(59.5%)34(24.1%)
I-IIIII-IV
T stage49(32.8%)92(65.2)
N0N1N2
N stage67(47.5%)56(39.7%)18(12.7%)
M0M1
M stage82(58.1%)56(39.7)
YesNo
Liver metastasis78(55.3%)63(44.6%)
YesNo
Jaundice33(23.4%)108(76.5%)
<40 u/ml>=40 u/mlNA
CA19959(41.8%)79(56.0%)3(2.12%)
<35 u/ml>=35 u/mlNA
CA12585(60.2%)53(37.5%)3(2.12%)
<5 μg/L>=5 μg/LNA
CEA90(63.8%)48(34.0%)3(2.12%)
<20 μg/L>=20 μg/LNA
AFP130(92.1%)8(5.67%)3(2.12%)
YesNo
Surgical treatment114(80.8%)27(19.1%)
Workflow for image processing and machine learning. The general condition of the patients in this study. All procedures involving human participants comply with ethical standards bodies and/or national research councils. Ethics Committee of Sichuan University approved this retrospective study. Written informed consent (written informed consent for patients under 16 years of age must be signed by a parent or guardian) is required before radiological examination for all patients.

Image Recognition and Feature Extraction

CT scanning was performed using 64-MDCT Scanner (Brilliance64, Philips Medical Systems, Eindhoven, The Netherlands) or 128-MDCT scanner (Somatom Definition AS+, Siemens Healthcare Sector, Forchheim, Germany) before going through any treatment. All CT examinations were performed under the following conditions: 120 kVp; 199 mAs; 12.9 ctdIVOL (mGy); 460.7 DLP (mGy*cm); pitch, 0.75–1.0; rotation time, 0.5–0.75 s; collimation, 0.625 mm; section thickness, 2.0 or 5.0 mm. The ROI area was sketched by two experienced radiologists.Due to the limited recognition ability of ordinary CT for GBC and the boundary of cancer is usually fuzzy, we followed the following principle when making segmentation: 1) delineate solid lesions with high density of GBC and avoid low-density areas such as bile, 2) delineate the definite tumor part when it is difficult to recognize the blurring around the lesion, and 3) excluded samples with disagreement among radiologists. We used the image feature extraction software LIFEx to obtain the texture signatures of CT images (12). Based on each layer of CT image, we depicted the boundary of the lesion in the two-dimensional region of interest (ROI) and finally obtained a three-dimensional ROI. ROI is described by independent radiologists who do not know the patient’s diagnosis (). The maximum, minimum, mean, and standard deviation of the density values in the ROI region were calculated. From the obtained data, Gray-level co-occurrence matrix (GLCM), Neighborhood gray-level different matrix (NGLDM), Gray level run length matrix (GLRLM), and the Gray level zone length matrix (GLZLM) were calculated.We obtained a total of 54 radiomics parameters ().

Statistic Analysis Workflow

First, all the collected samples were randomly divided into test set and validation set according to a ratio of 7:3. We used the sample function of R software to make randomization, and conducted a hypothesis test on the age of the randomized patients between the two groups (). The results showed that the average age difference between the two groups was not statistically significant (P > 0.05). Therefore, we selected the group accounting for 70% as the training set for the follow-up analysis of the model. Then, the signatures from image texture were filtered by least absolute and Selection Operator (LASSO) (13). After 100 repeated simulations, signatures with the best robustness were selected. In order to optimize the model, we use the random forest to further screen the selected signatures and obtain the final machine learning model. We performed a multivariate Cox regression analysis of radiological parameters and clinical characteristics and drew a nomogram. The survival curve was plotted by Kaplan-Meier analysis and tested by log-rank test.

Results

Establishment of a Model Using Radiomics Signatures

First, we randomly divided the patients into a training group and a test group, with a split ratio of 7:3. Then, LASSO method was used to make simulation in the training group for up to 100 times and 15 signatures were selected. The results are shown in . Next, the random forest was used to further optimize. According to the relative importance, three most important parameters were screened out (). Then, we built a model based on random forest algorithm. The risk score for each patient is calculated and the risk score distribution for each patient is shown in . By comparing the high with low risk groups, we found that the high risk group had a worse overall survival. And GLZLM-HGZE, and GLCM-homogeneity increased risk, but the increase of NGLDM-coarseness reduces the risk, so we think GLZLM-HGZE and GLCM-homogeneity may be a risk factor of GBC, and NGLDM-coarseness is more likely to be a protective factor. Correlation analysis shows that the correlation degree of these three factors is low (). The Univariate COX shows the risk score had a hazard ratio of 1.534 (95% CI: 1.078–2.183). Finally, we validated this model in the verification group (). The model had a good performance, with high-risk individuals had poorer survival outcomes than low-risk individuals. In addition, the survival rate of high-risk patients was significantly lower than that of low-risk patients (P = 0.012).
Figure 2

Panel (A) shows the Lasso result. Panel (B) shows the random forest result. The left (B) shows the order of the out-of-bag importance of the selected parameters. The right picture shows relationship between the error rate and the number of classification trees.

Figure 3

Panel (A) shows the distribution of risk scores and the values of the three CT parameters in the training and test groups. Panel (B) shows the survival of patients at high or low risk after being grouped by median.

Panel (A) shows the Lasso result. Panel (B) shows the random forest result. The left (B) shows the order of the out-of-bag importance of the selected parameters. The right picture shows relationship between the error rate and the number of classification trees. Panel (A) shows the distribution of risk scores and the values of the three CT parameters in the training and test groups. Panel (B) shows the survival of patients at high or low risk after being grouped by median.

Prognostic Model Performed Based on Clinical Data and Radiomics

In order to achieve better performance, we analyzed a variety of clinical indicators of patients, including age, gender, and tumor stage. Through multivariate COX analysis, we screened out the prognostic indicators affecting the survival of patients with GBC, including surgery or not, liver metastasis and lymph node metastasis grade (). Next, we randomly divided the patients into two groups to conduct model training and testing. We used multivariate COX to predict the overall survival of the patients by combining three selected clinical indicators (Liver metastasis, surgery, and lymph node metastasis grade) and radiomics risk score (). Finally, we use a nomogram to visualize the performance of the model () and evaluated the prediction accuracy through ROC curve. The results of nomograms showed that the 1- and 3-year prediction reached 0.7465 and 0.7974 in the training group and 0.7271and 0.7314 in the validation group, respectively. also shows the comparison between the ideal model and the actual nomogram prediction. The calibration chart shows that the actual model is basically consistent with the ideal model, indicating that our model has a high accuracy.
Table 2

The results of a multivariate COX analysis.

P valueHRLow 95% CIHigh 95% CI
Radiomics Risk Score0.0401.4951.0192.194
Surgery0.0870.6720.4261.059
Liver metastasis0.0261.6151.0602.459
N Stage0.0371.7971.0353.122
Jaundice0.8340.9530.6061.498
T stage0.6961.2230.4473.343
Sex0.4560.8620.5821.275
Age0.9421.020.6021.727
Table 3

The results of a multivariate analysis combined with clinical examination and radiologic parameters.

P valueHRLow 95.0%CIHigh 95.0%CI
Liver metastasis0.0091.6201.1262.322
Surgery0.0770.6680.4271.044
Radiomics Risk Score0.0421.4621.0142.107
N Stage0.0421.7301.0202.935
Figure 4

Nomogram that predicts the overall prognosis survival of gallbladder cancer patients after multiple factors are included.

Figure 5

Panel (A) shows the ROC of the prognostic survival model incorporating with clinical parameters. Panel (B) is the calibration curve of the model.

The results of a multivariate COX analysis. The results of a multivariate analysis combined with clinical examination and radiologic parameters. Nomogram that predicts the overall prognosis survival of gallbladder cancer patients after multiple factors are included. Panel (A) shows the ROC of the prognostic survival model incorporating with clinical parameters. Panel (B) is the calibration curve of the model.

Discussion

In this study, CT scan data combined with machine learning methods was used to predict overall survival outcomes in GBC patients. Firstly, we use LASSO to filtered raw data and acquire a robust parameter set. Then we use the random forest to making further optimization. Finally, three parameters including GLZLM-HGZE, GLCM-homogeneity, and NGLDM-coarseness were obtained. The three parameters were correlated with the prognostic risk of the patients, and the prognostic model got a hazard ratio of 1.549. The clinical parameters analysis results showed that lymph node metastasis grade, surgery or not and liver metastasis situation are prognostic factors. In combination with these indicators and CT risk score, we get a model for prognosis of GBC. Moreover, the 3-year AUC value of the predictive model in the random validation group reached 0.797. It is of great guiding significance for the selection of treatment options and clinical decision support of patients with GBC to find the key prognostic factors related to the survival time and establish an individual and accurate survival prediction model. At present, the prognosis studies of GBC mainly focus on the clinical and pathological examination (6, 14, 15). There are many factors affecting postoperative survival of GBC patients, including TNM stage of tumor, degree of tumor differentiation, liver infiltration, jaundice, and lymph node dissection. The TNM staging is widely used in clinical, but only includes tumor infiltration depth (T stage), lymph node metastasis (N stage), and distant metastases (M stage), with limited survival prediction value. It only for a class of patients, cannot achieve individualized accurate prediction to adjust subsequent treatment. Many studies have focused on the effect of clinical examination on the prognosis of GBC, including inflammatory factors and nutritional indicators (16, 17). However, the original data of those researches almost come from one laboratory examination, and there may be fluctuations in the same individual during different times. Meanwhile, these markers lack specificity, and the mechanism of the correlation between these indicators and tumor outcome is still to be studied. Moreover, clinical data can only reflect the partial biological manifestations of tumor and lack a comprehensive description of entire tumor lesions. Radiomics is a promising approach to acquire a large amount of intuitive data through the analysis of entire tumor lesion and metastasis. Compared with the molecular features detected by popular omics techniques such as genomics and proteomics, radiomics can better overcome the temporal and spatial specificity of the whole course of cancer. Meanwhile, texture analysis can provide quantitative and semi-quantitative parameters to reflect tumor heterogeneity, which is of great significance in tumor research (18, 19). However, there are few studies of this type conducted in GBC, ascribed to the uniformed data is acquired in clinical experiment. A variety of medical images are applied in clinical decision-making, mainly including CT scan, contrast-enhanced CT, multi-parameter MRI, and PET-CT. CT is the most common and cost-effective data acquisition of original lesions before treatment among them. Although reconstructed MRI sequences and contrast-enhanced CT have advantages in the identification and differentiation of tumors, these detection methods are still not widely used in most areas of China. In terms of applying machine learning, the heterogeneity of tumor tissues is included in multiple texture parameters, thus, the analysis of texture parameters alone cannot fully reflect the overall characteristics of the tumor. Considering this problem, we believe that a complex model integrating different texture signatures is needed to fully identify the total tumor lesion. Also, the random forest is a powerful machine learning method, which has been proved to be able to implement the correct classification work successfully (20, 21). Adopting the concept of integrated learning, random forest has a good accuracy in current machine learning algorithms by combining multiple decision tree models. Moreover, it can process high-dimensional data and be applied in big data effectively. In particular, it can evaluate the importance of each feature in classification. Therefore, in this study, the random forest can better identify parameters and establish classifiers. However, our study still has three main limitations. Firstly, analyzing CT scan data alone cannot replace other image acquisition methods (such as enhanced CT, mpMRI, reconstructed multiple sequences ADC, and DTI) in real clinical work. Secondly, limited by the size of the sample, there are not enough enhanced CT data and MRI data, thus, this study was not compared with a variety of advanced scanning techniques. Third, our research was limited by its retrospective data. These findings might have better clinical implications, if confirmed in prospective studies. Forth, the patients included in the study were all from a single center, which may result in the lack of sufficient extensibility of the classifier. Considering the differences of medical institutions in obtaining original images and the differences in manual segmentation of lesions, we cannot guarantee that this machine learning classifier performs well on external data sources. But all of the research methods and analysis used in the study come from open-source data packets, which mean that the analysis process needs to be repeated on other data.

Conclusion

We found associations between established CT imaging parameters and overall survival. Radiomics-based non-invasive technology represented promising ability in predicting the overall survival of gallbladder carcinoma, although more extensive testing are necessary to perfect this technology in real clinical use.

Data Availability Statement

The original contributions presented in the study are included in the article/. Further inquiries can be directed to the corresponding authors.

Ethics Statement

The studies involving human participants were reviewed and approved by Ethics Committee of Sichuan University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author Contributions

ZL, XM, and JJ are responsible for conceiving and designing the subject. ZL, GZ, XJ and HZ conduct data analysis and article writing. YZ and HZ are responsible for data processing. ZL and GZ make the same contribution to this paper. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  21 in total

1.  Immunohistochemically demonstrated lymph node micrometastasis and prognosis in patients with gallbladder carcinoma.

Authors:  Eiji Sasaki; Masato Nagino; Tomoki Ebata; Koji Oda; Toshiyuki Arai; Hideki Nishio; Yuji Nimura
Journal:  Ann Surg       Date:  2006-07       Impact factor: 12.969

2.  The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute.

Authors:  Kathleen A Cronin; Lynn A G Ries; Brenda K Edwards
Journal:  Cancer       Date:  2014-12-01       Impact factor: 6.860

3.  Endometrial Carcinoma: MR Imaging-based Texture Model for Preoperative Risk Stratification-A Preliminary Analysis.

Authors:  Yoshiko Ueno; Behzad Forghani; Reza Forghani; Anthony Dohan; Xing Ziggy Zeng; Foucauld Chamming's; Jocelyne Arseneau; Lili Fu; Lucy Gilbert; Benoit Gallix; Caroline Reinhold
Journal:  Radiology       Date:  2017-05-10       Impact factor: 11.105

4.  Gallbladder Cancer Incidence and Mortality, United States 1999-2011.

Authors:  S Jane Henley; Hannah K Weir; Melissa A Jim; Meg Watson; Lisa C Richardson
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2015-06-12       Impact factor: 4.254

Review 5.  Genomics of gallbladder cancer: the case for biomarker-driven clinical trial design.

Authors:  Jason K Sicklick; Paul T Fanta; Kelly Shimabukuro; Razelle Kurzrock
Journal:  Cancer Metastasis Rev       Date:  2016-06       Impact factor: 9.264

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

Review 7.  Radiomics: extracting more information from medical images using advanced feature analysis.

Authors:  Philippe Lambin; Emmanuel Rios-Velazquez; Ralph Leijenaar; Sara Carvalho; Ruud G P M van Stiphout; Patrick Granton; Catharina M L Zegers; Robert Gillies; Ronald Boellard; André Dekker; Hugo J W L Aerts
Journal:  Eur J Cancer       Date:  2012-01-16       Impact factor: 9.162

Review 8.  Combined detection tumor markers for diagnosis and prognosis of gallbladder cancer.

Authors:  Yun-Feng Wang; Fei-Ling Feng; Xu-Hong Zhao; Zhen-Xiong Ye; He-Ping Zeng; Zhen Li; Xiao-Qing Jiang; Zhi-Hai Peng
Journal:  World J Gastroenterol       Date:  2014-04-14       Impact factor: 5.742

9.  Prognostic Value Of Preoperative Systemic Inflammatory Biomarkers In Patients With Gallbladder Cancer And The Establishment Of A Nomogram.

Authors:  Yan Deng; Feng Zhang; Xiao Yu; Cheng-Long Huo; Zhen-Gang Sun; Shuai Wang
Journal:  Cancer Manag Res       Date:  2019-10-21       Impact factor: 3.989

10.  Machine Learning and Feature Selection Methods for Disease Classification With Application to Lung Cancer Screening Image Data.

Authors:  Darcie A P Delzell; Sara Magnuson; Tabitha Peter; Michelle Smith; Brian J Smith
Journal:  Front Oncol       Date:  2019-12-11       Impact factor: 6.244

View more
  6 in total

1.  Contrast-enhanced CT radiomics for prediction of recurrence-free survival in gallbladder carcinoma after surgical resection.

Authors:  Fei Xiang; Xiaoyuan Liang; Lili Yang; Xingyu Liu; Sheng Yan
Journal:  Eur Radiol       Date:  2022-05-25       Impact factor: 7.034

2.  Prediction of lymph node metastasis in early colorectal cancer based on histologic images by artificial intelligence.

Authors:  Manabu Takamatsu; Noriko Yamamoto; Hiroshi Kawachi; Kaoru Nakano; Shoichi Saito; Yosuke Fukunaga; Kengo Takeuchi
Journal:  Sci Rep       Date:  2022-02-22       Impact factor: 4.379

3.  Radiomics of Contrast-Enhanced Computed Tomography: A Potential Biomarker for Pretreatment Prediction of the Response to Bacillus Calmette-Guerin Immunotherapy in Non-Muscle-Invasive Bladder Cancer.

Authors:  Lei Ye; Yuntian Chen; Hui Xu; Zhaoxiang Wang; Haixia Li; Jin Qi; Jing Wang; Jin Yao; Jiaming Liu; Bin Song
Journal:  Front Cell Dev Biol       Date:  2022-02-25

4.  Radiomics-based nomogram as predictive model for prognosis of hepatocellular carcinoma with portal vein tumor thrombosis receiving radiotherapy.

Authors:  Yu-Ming Huang; Tsang-En Wang; Ming-Jen Chen; Ching-Chung Lin; Ching-Wei Chang; Hung-Chi Tai; Shih-Ming Hsu; Yu-Jen Chen
Journal:  Front Oncol       Date:  2022-09-20       Impact factor: 5.738

5.  Machine Learning and Radiomic Features to Predict Overall Survival Time for Glioblastoma Patients.

Authors:  Lina Chato; Shahram Latifi
Journal:  J Pers Med       Date:  2021-12-09

6.  FDG PET/CT to Predict Recurrence of Early Breast Invasive Ductal Carcinoma.

Authors:  Joon-Hyung Jo; Hyun Woo Chung; Young So; Young Bum Yoo; Kyoung Sik Park; Sang Eun Nam; Eun Jeong Lee; Woo Chul Noh
Journal:  Diagnostics (Basel)       Date:  2022-03-12
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.