Literature DB >> 28240020

Prognostic Factors for Survival in Patients with Gastric Cancer using a Random Survival Forest

Davoud Adham1, Nategh Abbasgholizadeh, Malek Abazari.   

Abstract

Background: Gastric cancer is the fifth most common cancer and the third top cause of cancer related death with about 1 million new cases and 700,000 deaths in 2012. The aim of this investigation was to identify important factors for outcome using a random survival forest (RSF) approach. Materials and
Methods: Data were collected from 128 gastric cancer patients through a historical cohort study in Hamedan-Iran from 2007 to 2013. The event under consideration was death due to gastric cancer. The random survival forest model in R software was applied to determine the key factors affecting survival. Four split criteria were used to determine importance of the variables in the model including log-rank, conversation?? of events, log-rank score, and randomization. Efficiency of the model was confirmed in terms of Harrell’s concordance index.
Results: The mean age of diagnosis was 63 ±12.57 and mean and median survival times were 15.2 (95%CI: 13.3, 17.0) and 12.3 (95%CI: 11.0, 13.4) months, respectively. The one-year, two-year, and three-year rates for survival were 51%, 13%, and 5%, respectively. Each RSF approach showed a slightly different ranking order. Very important covariates in nearly all the 4 RSF approaches were metastatic status, age at diagnosis and tumor size. The performance of each RSF approach was in the range of 0.29-0.32 and the best error rate was obtained by the log-rank splitting rule; second, third, and fourth ranks were log-rank score, conservation of events, and the random splitting rule, respectively.
Conclusion: Low survival rate of gastric cancer patients is an indication of absence of a screening program for early diagnosis of the disease. Timely diagnosis in early phases increases survival and decreases mortality. Creative Commons Attribution License

Entities:  

Keywords:  Gastric cancer; random survival forest; Hamadan; Iran

Year:  2017        PMID: 28240020      PMCID: PMC5563089          DOI: 10.22034/APJCP.2017.18.1.129

Source DB:  PubMed          Journal:  Asian Pac J Cancer Prev        ISSN: 1513-7368


Introduction

Cancer is a non-communicable disease with about 1.14 million new cases and 2.8 million deaths in 2012; it is the second cause of death after cardiovascular disease (Ferlay et al., 2013, Organization, 2015). Gastric cancer is the fifth most common cancer and the third leading cause of cancer related death with about 1 million new cases and 700,000 deaths in 2012 (Ferlay et al., 2013, Pelucchi et al., 2015). According to a global estimates, gastric cancer will be one of the main causes of death in the world by 2030; with about 2.5 million new cases and a minimum of 1.9 death by 2,050 (Torre et al., 2015). Gastric cancer is the third most common cancer after breast and skin cancers in Iran, According to national report of cancer registry in IRAN (2008) 6,886 cases of gastric cancer were recorded, which represents about 3.9% of all cancers disease (Mousavi et al., 2009). In the Middle East, Iran has highest incidence rate of Gastric cancer (Mohagheghi et al., 2009). According to studies in Iran, northern and northwestern areas of the country have the highest risk of gastric cancer, while central and southern areas of the country have moderate and low risk of stomach cancer respectively (Saidi et al., 2002, Sadjadi et al., 2003, Alireza et al., 2005). From biological point of view, symptoms of gastric cancer are unknown. The disease it is very active and progressive and incurable in most cases (Beaglehole et al., 2011). Decrease of prevalence of Helicobacter pylori infection and smoking and improved diet have caused a moderate decline in incidence rate of gastric cancer in the last three decades; however, the disease still remains a major health problem (La Vecchia and Franceschi, 2000, Boccia and La Vecchia, 2013). Survival analysis is one of the statistical methods widely used in medical studies in recent decades; it is a set of statistical procedures for data analysis in which the desired output variable is time until an event occurs (Kleinbaum and Klein, 2012). Recently, random survival forests (RSF) has been used for analyzing survival data. It is an ensemble tree method for the analysis of right censored survival data. Constructing ensembles from tree structures can significantly improve learning performance (Ishwaran and Kogalur, 2010). The results showed that the RSF model can identify complex interactions among multiple variables and outperform traditional CPH models (Omurlu et al., 2009, Kälin et al., 2011, Miao et al., 2015). Factors such as age at diagnosis, metastasis, stage of the disease, histological grade, pathological stage, metastasis, and tumor size are known as significant prognostic factors related to survival time of the patient with gastric cancer(Akhavan et al., 2013, Kakuta et al., 2014, Minami et al., 2015). Given the high prevalence of gastric cancer in the region and the lack of a reliable study to determine risk factors of the disease based on advanced statistical methods; therefore, the aim of our study is to identify important risk factors and their complex effects on Gastric cancer patients using RSF.

Materials and methods

In this historical cohort study, data from 182 patients with gastric cancer admitted in the Referral Therapy Center in Hamadan, Iran from 2007 to 2013 was analyzed. The data was extracted from the medical records. Survival status of patients was checked through telephone. Survival time was calculated from diagnosis to death or the end of the study (in months). Patients who withdrawn or lost-to-follow up for any reason during the study or patients who were still alive by the end of the study were considered as right censored. The effect of some demographic variables such as gender and age at diagnosis, as well as clinical data such as histological type (rivers - diffuse - complex), histopathology type (Adenocarcinoma - Lymphoma - Sarcoma), stage (I - II - III - IV), tumor location (Pyloric - Body - Fundus), metastatic status, number of involved lymph nodes, tumor size, type of treatment (Radiotherapy - chemotherapy), and family history of cancer on patients’ survival was evaluated. Staging was based on the tumor node metastasis system (Aronow et al., 2013).

Random survival forests

RSF is a non-parametric machine learning method for analyzing right censored survival data. The RSF-model incorporates all univariate and multivariate effects automatically. Another properties of RSF is that it can find influential covariates in highly correlated subsets of covariates, which is particularly useful in high-dimensional covariate selection problems (Ishwaran et al., 2008, Ishwaran and Kogalur, 2010).

The RSF algorithm

B bootstrap samples are randomly selected from the original dataset, while each bootstrap sample excludes 37% of the data on average and calls out-of-bag data (OOB data)(Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). In this study B=1,000. A survival tree is grown for each bootstrap sample data; q=√p candidate variables are randomly selected from all p variables for each node in survival tree to maximize the survival difference between child nodes using one of the split criteria (log-rank, conservation of events, log-rank score, and random) described in (Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). In this study, 3 candidate variables were randomly selected out of all 10 variables. The tree is grown until final node’s size reaches a minimum number of events with unique survival times (Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). In this study minimum final node size was equal to 3. For every tree the cumulative hazard function (CHF) is calculated and then the ensemble CHF is obtained by averaging CHF. The cumulative hazard function for each final node in a grown tree is estimated by Nelson-Aalen’s estimator (Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). Out-of-bag (OOB) error rate is calculated based on Harrell c-statistics for the ensemble CHF (Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). The variable importance (VIMP) for x is the prediction error for the original ensemble subtracted from the prediction error for the new ensemble obtained using randomizing x assignments (Ishwaran et al., 2008). Positive values indicate variables with predictive ability(important value), whereas zero or negative values identify non-predictive variables (not important value)(Ishwaran et al., 2008, Ishwaran and Kogalur, 2010). In this study the four node splitting rules was used for RSF approach (log-rank splitting, conservation of events splitting, log-rank score splitting, and random).

Harrell’s concordance index

Harrell’s concordance index (C-index) is a measure of survival performance. It does not depend on choosing a fixed time for evaluation of the model and specifically takes into account censoring the individuals. The error rate is computed as 1-C, where C is the Harrell’s concordance index. Error rates are between 0 and 1, while 0.5 corresponds to a procedure doing no better than random guessing and 0 is the perfect accuracy (Ishwaran et al., 2008). The data were analyzed using the random Survival Forest package (Ishwaran et al., 2013) by R 3.1.2. In addition, RSF drew 1000 bootstrap samples from the generated data, grew a tree for each bootstrapped data set and split a predictor using a survival splitting rule. Concordance error rates were obtained from each method for 1,000 replications and the mean of the concordance error rates were recorded.

Results

Explorative Data Analyses

The mean of age of diagnosis was 63 ±12.6 and mean and median survival time of the patients were estimated 15.1 (95%CI: 13.31, 16.99), and 12.3 (95%CI: 11, 13.4) months respectively. The one-year, two-year, and three-year survival rates of the patients were 51%, 13%, and 5% respectively (Figure 1). During the study, 146 patients died and 36 (19.8%) survived who were considered as of right censored observations. One hundred and twelve patients (61.5%) were male and 70 (38.5%) were female. The characteristics of the patients are listed in Table 1.
Figure 1

Kaplan-Meier Cumulative Survival

Table 1

Characteristics of the Patients with Gastric Cancer and Univariate Analysis of Risk Factors

VariablesNumberPercentMedian Survival Time(Months)Log-Rank TestP-value
Gender3.70.055
 Male11261.511.3
 Female7038.514.1
Family history0.40.544
 Yes179.414.1
 No16590.612.2
Age at diagnosis(yr)8.00.018
 <607340.114.1
 61-757541.210.7
 >753418.712.3
Tumor location5.70.057
 Pyloric10056.812.4
 Body3922.110.7
 Fundus3721.114.1
Metastatic status82.4<0.001
 No7742.314.1
 Yes4826.47.3
 Unknown5731.316.2
Number of involved lymph nodes15.6<0.001
 (1-6 number)10275.012.3
 (7-15 number)3425.08.3
Histopathology type2.50.279
 Adenocarcinoma12569.812.3
 Lymphoma2916.313.6
 Sarcoma2513.910.2
Tumor size26.3<0.001
 T1(1 cm)2115.222.0
 T2 (2 cm)4834.812.2
 T3 (3 cm)4531.911.3
 T4 (> 4cm)2518.110.3
Stage22.4<0.001
 I95.022.1
 II3117.117.6
 III3619.910.7
 IV10558.011.0
Histological type0.10.956
 Rivers9153.812.1
 Diffuse5633.111.3
 Complex2213.111.3
Type of treatment5.40.021
 Radiotherapy7440.614.7
 Chemotherapy10859.411.2
Kaplan-Meier Cumulative Survival Characteristics of the Patients with Gastric Cancer and Univariate Analysis of Risk Factors

Random Survival Forrest Analyses

Informativeness of each predictor was taken into account under the log-rank splitting rule. Figure 2 shows the error rate for the RSF log-rank model as a function of the number of trees and the out-of-bag importance values for predictors. The Right part of Figure 2 depicts the importance values for all 11 predictors. From the plot, we found that the eight prognostic factors (Metastatic status, Age at diagnosis, Tumor size, Number of involved lymph nodes, Histological type, Gender, Type of treatment, Tumor location) had an effect on survival time (Positive Value). Other predictors had negative values or no effect on survival time. Concordance error rate of this RSF model was 0.2966 (Table 2).
Figure 2

Out-of-Bag Importance Values of RSF for Log-Rank Splitting Rule

Table 2

Harrell’s Concordance Error Rates for Methods

MethodError rate
RSFLog-rank0.297
Log-rank scor0.301
Conservation of events0.304
Random0.325
Out-of-Bag Importance Values of RSF for Log-Rank Splitting Rule Harrell’s Concordance Error Rates for Methods Figure 3 illustrates the error rate for the RSF model as a function of the number of trees and the out-of-bag importance value for predictors. As shown the six prognostic factors (Metastatic status, Age at diagnosis, Tumor size, Gender, Type of treatment, and Family history) had an effect on survival time. Metastatic status, age at diagnosis, and tumor size were assigned with important values by RSF log-rank splitting rule. Concordance error rate of th RSF model was 0.304 (Table 2). Figure 4 illustrates the error rate for the RSF model as a function of the number of trees and the out-of-bag importance values for the predictors. This figure shows that the four prognostic factors (Metastatic status, Age at diagnosis, Tumor size, and Histological type) were positive important values and larger than all other prognostic factors. Metastatic status, age at diagnosis, and Tumor size were given important values by RSF log-rank splitting rule and RSF conservation of events splitting rule. Concordance error rate of this RSF model was 0.301 (Table 2).
Figure 3

Out-of-Bag Importance Values of RSF for Conservation of Events Splitting Rule

Figure 4

Out-of-Bag Importance Values of RSF for Log-Rank Score Splitting Rule

Out-of-Bag Importance Values of RSF for Conservation of Events Splitting Rule Out-of-Bag Importance Values of RSF for Log-Rank Score Splitting Rule Figure 5 pictures the error rate for the RSF model as a function of the number of trees and the out-of-bag importance values for the predictors. As indicated in the plot, the six prognostic factors (Metastatic status, Age at diagnosis, Tumor size, and Histological type, Type of treatment, and Family history) had an effect on survival time. Metastatic status, age at diagnosis, and tumor size had positive value by RSF log-rank splitting rule, RSF conservation of events splitting rule, and log-rank score splitting rule. Concordance error rate of this RSF model was 0.31 (Table 2).
Figure 5

Out-of-Bag Importance Values of RSF for Random Splitting Rule

Out-of-Bag Importance Values of RSF for Random Splitting Rule

Random Survival Forrest Model Performance

The performance of each RSF approach was very similar to the best error rate (0.297) obtained by the log-rank splitting rule with 1,000 trees (Table 2). The second, third, and fourth ranks were occupied by log-rank score, conservation of events, and random splitting rule respectively.

Discussion

Each RSF approach showed a slightly different ranking order. The very important covariates in nearly all 4 RSF approaches were metastatic status, age at diagnosis, and tumor size. Unimportant covariates in nearly all 4 RSF approaches was histopathology type. The remaining covariates had positive importance values with somewhat different ranking within each RSF approach. Age at diagnosis time had a significant effect on patients’ survival time, which is consistent with the studies carried out in Italy, China, and north of Iran (Wang et al., 2002, Bucchi et al., 2004, Yazdani-Charati et al., 2014). Metastasis was another factor that had an important value and significant effect on the survival time. This finding has been confirmed in other studies (Wang et al., 2002, Moghimi-Dehkordi et al., 2009, Maroufizadeh et al., 2012, Dixon et al., 2014). Some studies have reported that the disease stage highly influenced the patients’ survival time so that the median of survival time in stage I was more than median of survival time in stage IV (Zeraati et al., 2005, Moghimi-Dehkordi et al., 2009, Dixon et al., 2014, Yazdani-Charati et al., 2014). This is consistent with our results. Consistent with (Lin et al., 2013), we found that tumor size was an important or significant value in survival time; this means that the survival time decreases as tumor size increases. The number of involved lymph nodes was another important value in this study; by increasing the number of involved lymph nodes, the risk of death also increased; this is inconsistent with other studies including (Maroufizadeh et al., 2012). Type of the treatment was another important or significant factor in survival time. (Moghimi-Dehkordi et al., 2009) showed that the survival time of patients under chemotherapy was more than the survival time of patients who received radiotherapy (Moghimi-Dehkordi et al., 2009). Consistent with, histological type and family history were other factors with important effect. Moreover, histopathology types was not an important factor in all 4 RSF approaches without a significant effect on survival time. However, other studies have shown significant effect of this variable (Samadi et al., 2007, Moghimi-Dehkordi et al., 2009). Low survival rate of gastric cancer patients is an indication of absence of a screening program for early diagnosis of the disease. Timely diagnosis in early phases of the disease increases survival rate and decreases mortality rate caused by the disease.
  23 in total

1.  Dissecting causal components in gastric carcinogenesis.

Authors:  Stefania Boccia; Carlo La Vecchia
Journal:  Eur J Cancer Prev       Date:  2013-11       Impact factor: 2.497

2.  Survival rate of gastric and esophageal cancers in Ardabil province, North-West of Iran.

Authors:  Fatemeh Samadi; Masoud Babaei; Abbas Yazdanbod; Mahdi Fallah; Mehdi Nouraie; Dariush Nasrollahzadeh; Alireza Sadjadi; Mohammad-Hossein Derakhshan; Behrooz Shokuhi; Robab Fuladi; Reza Malekzadeh
Journal:  Arch Iran Med       Date:  2007-01       Impact factor: 1.354

3.  Novel prognostic markers in the serum of patients with castration-resistant prostate cancer derived from quantitative analysis of the pten conditional knockout mouse proteome.

Authors:  Martin Kälin; Igor Cima; Ralph Schiess; Niklaus Fankhauser; Tom Powles; Peter Wild; Arnoud Templeton; Thomas Cerny; Ruedi Aebersold; Wilhelm Krek; Silke Gillessen
Journal:  Eur Urol       Date:  2011-06-29       Impact factor: 20.096

4.  Global cancer statistics, 2012.

Authors:  Lindsey A Torre; Freddie Bray; Rebecca L Siegel; Jacques Ferlay; Joannie Lortet-Tieulent; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2015-02-04       Impact factor: 508.702

5.  Survival in gastric cancer patients: univariate and multivariate analysis.

Authors:  B Moghimi-Dehkordi; A Safaee; S Ghiasi; M R Zali
Journal:  East Afr J Public Health       Date:  2009-04

6.  Endoscopic esophageal cancer survey in the western part of the Caspian Littoral.

Authors:  F Saidi; R Malekzadeh; M Sotoudeh; M H Derakhshan; M J Farahvash; A Yazdanbod; Sh Merat; J Mikaeli; R Sotoudehmanesh; S Nasseri-Moghadam; A Majidpour; S Arshi; B Abedi-Ardakani; A Yoonessi; F Sadr; A Sepehr; D Fleischer; S Fahimi
Journal:  Dis Esophagus       Date:  2002       Impact factor: 3.429

7.  Resectable gastric cancer: operative mortality and survival analysis.

Authors:  Chia-Siu Wang; Chin-Chuan Hsieh; Tzu-Chieh Chao; Yi-Yin Jan; Long-Bin Jeng; Tsann-Long Hwang; Min-Fu Chen; Pang-Chi Chen; Jen-Shi Chen; Swei Hsueh
Journal:  Chang Gung Med J       Date:  2002-04

8.  Cancer mortality in a cohort of male agricultural workers from northern Italy.

Authors:  Lauro Bucchi; Oriana Nanni; Alessandra Ravaioli; Fabio Falcini; Rosalba Ricci; Eva Buiatti; Dino Amadori
Journal:  J Occup Environ Med       Date:  2004-03       Impact factor: 2.162

9.  Prognostic factors and causes of death in patients cured of esophageal cancer.

Authors:  Tomoyuki Kakuta; Shin-Ichi Kosugi; Tatsuo Kanda; Takashi Ishikawa; Takaaki Hanyu; Tsutomu Suzuki; Toshifumi Wakai
Journal:  Ann Surg Oncol       Date:  2014-02-08       Impact factor: 5.344

10.  Cancer incidence in Tehran metropolis: the first report from the Tehran Population-based Cancer Registry, 1998-2001.

Authors:  Mohammad-Ali Mohagheghi; Alireza Mosavi-Jarrahi; Reza Malekzadeh; Max Parkin
Journal:  Arch Iran Med       Date:  2009-01       Impact factor: 1.354

View more
  8 in total

1.  Accuracy of Endoscopic Ultrasonography for Gastric Cancer Staging.

Authors:  Victor Mihai Sacerdotianu; Bogdan Silviu Ungureanu; Sevastita Iordache; Maria Monalisa Filip; Daniel Pirici; Ilona Mihaela Liliac; Adrian Saftoiu
Journal:  Curr Health Sci J       Date:  2022-03-31

2.  Artificial intelligence predictive system of individual survival rate for lung adenocarcinoma.

Authors:  Tingshan He; Jing Li; Peng Wang; Zhiqiao Zhang
Journal:  Comput Struct Biotechnol J       Date:  2022-05-14       Impact factor: 6.155

3.  Potential Prognostic Immune Biomarkers of Overall Survival in Ovarian Cancer Through Comprehensive Bioinformatics Analysis: A Novel Artificial Intelligence Survival Prediction System.

Authors:  Tingshan He; Liwen Huang; Jing Li; Peng Wang; Zhiqiao Zhang
Journal:  Front Med (Lausanne)       Date:  2021-05-24

4.  Prediction of prognosis in elderly patients with sepsis based on machine learning (random survival forest).

Authors:  Luming Zhang; Tao Huang; Fengshuo Xu; Shaojin Li; Shuai Zheng; Jun Lyu; Haiyan Yin
Journal:  BMC Emerg Med       Date:  2022-02-11

5.  Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system.

Authors:  Zhiqiao Zhang; Liwen Huang; Jing Li; Peng Wang
Journal:  BMC Bioinformatics       Date:  2022-04-08       Impact factor: 3.169

6.  Identification of Lifestyle Behaviors Associated with Recurrence and Survival in Colorectal Cancer Patients Using Random Survival Forests.

Authors:  Moniek van Zutphen; Fränzel J B van Duijnhoven; Evertine Wesselink; Ruud W M Schrauwen; Ewout A Kouwenhoven; Henk K van Halteren; Johannes H W de Wilt; Renate M Winkels; Dieuwertje E Kok; Hendriek C Boshuizen
Journal:  Cancers (Basel)       Date:  2021-05-18       Impact factor: 6.639

7.  Inhibition of Topoisomerase IIα and Induction of Apoptosis in Gastric Cancer Cells by 19-Triisopropyl Andrographolide

Authors:  Adeep Monger; Nittaya Boonmuen; Kanoknetr Suksen; Rungnapha Saeeng; Teerapich Kasemsuk; Pawinee Piyachaturawat; Witchuda Saengsawang; Arthit Chairoungdua
Journal:  Asian Pac J Cancer Prev       Date:  2017-10-26

8.  Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia.

Authors:  Zachary S Bohannan; Frederick Coffman; Antonina Mitrofanova
Journal:  Comput Struct Biotechnol J       Date:  2022-01-06       Impact factor: 6.155

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.