Literature DB >> 29560157

Predictive model for survival in patients with gastric cancer.

Ladan Goshayeshi1,2, Benyamin Hoseini3, Zahra Yousefli4, Alireza Khooie5, Kobra Etminani6, Abbas Esmaeilzadeh1,7, Amin Golabpour6,8.   

Abstract

BACKGROUND AND AIM: Gastric cancer is one of the most prevalent cancers in the world. Characterized by poor prognosis, it is a frequent cause of cancer in Iran. The aim of the study was to design a predictive model of survival time for patients suffering from gastric cancer.
METHODS: This was a historical cohort conducted between 2011 and 2016. Study population were 277 patients suffering from gastric cancer. Data were gathered from the Iranian Cancer Registry and the laboratory of Emam Reza Hospital in Mashhad, Iran. Patients or their relatives underwent interviews where it was needed. Missing values were imputed by data mining techniques. Fifteen factors were analyzed. Survival was addressed as a dependent variable. Then, the predictive model was designed by combining both genetic algorithm and logistic regression. Matlab 2014 software was used to combine them.
RESULTS: Of the 277 patients, only survival of 80 patients was available whose data were used for designing the predictive model. Mean ?SD of missing values for each patient was 4.43?.41 combined predictive model achieved 72.57% accuracy. Sex, birth year, age at diagnosis time, age at diagnosis time of patients' family, family history of gastric cancer, and family history of other gastrointestinal cancers were six parameters associated with patient survival.
CONCLUSION: The study revealed that imputing missing values by data mining techniques have a good accuracy. And it also revealed six parameters extracted by genetic algorithm effect on the survival of patients with gastric cancer. Our combined predictive model, with a good accuracy, is appropriate to forecast the survival of patients suffering from Gastric cancer. So, we suggest policy makers and specialists to apply it for prediction of patients' survival.

Entities:  

Keywords:  Gastric Cancer; Mashhad; Missing Value; Predictive Model; Survival

Year:  2017        PMID: 29560157      PMCID: PMC5843431          DOI: 10.19082/6035

Source DB:  PubMed          Journal:  Electron Physician        ISSN: 2008-5842


1. Introduction

Gastric cancer is known as one of the common causes of cancer mortality worldwide (1), as of the 74,067 new cases of cancer in Iran in 2009, 6,886 were detected as having gastric cancer (2). The five most common cancers (excluding skin cancer) are stomach, esophagus, colon-rectum, bladder and leukemia in males, and in females are breast, esophagus, stomach, colon-rectum and cervix uteri (3). Despite a decline that has occurred in incidences of gastric cancer in Iran during recent years, according to results of the present study, Iran is placed in the group of countries with a moderate risk of gastric cancer (4). High level clustering of the cases was seen in northern, northwestern, western, and northeastern areas for esophagus, gastric, and colorectal cancers (5). According to the latest reports of the Iranian Ministry of Health and Medical Education, the five leading gastrointestinal causes of death in order of frequency were: gastric cancer, hepatobiliary cancer, liver cirrhosis, esophageal cancer, and colorectal cancer. Gastric adenocarcinoma has been introduced as the most fatal cancer in Iran (6, 7). Patients are often diagnosed with advanced disease and five-year survival rates are poor, usually less than 30% (8). A few studies have already investigated the survival pattern of stomach cancer in different regions of Iran and prognostic factors are being ascertained (9–12). The likelihood of death was higher in men and likelihood of tumors was higher in corpus and cardia patients more than 70 years old (13). In Yazd, 295 gastric cancer patients had a 5-year survival rate of 8% (14). In another study from the national cancer registry file, where 3,439 cases of stomach cancer between 2001 and 2005 were studied, age, tumor size and advance pathologic stage was related to survival. But sex, distant metastasis, histology type, tumor grade and lymph node metastasis were not prognostic factors (15). The statistics of gastric cancer should be updated periodically to identify trends in incidence, prevalence, and mortality, all of which have important implications for health policy planning (16). To the best of our knowledge, despite high prevalence of gastric cancer in the northeastern area, there is no study on survival of gastric cancer and prognostic factors in this area. So, to investigate the survival and the factors influencing the survival time of gastric cancer patients, which are the main objectives of this study (14), we aimed to design a model for predicting the survival rate of patients with stomach cancer regarding to histopathology type, anatomical site, and family history between 2010 and 2015 in Mashhad, Iran.

2. Material and Methods

The data of this research was from the type of survival studies. This data were collected through a historical cohort conducted between 2011 and 2016. Study population were 277 patients suffering from gastric cancer. The available pathological sheets of these patients have been collected from the Iranian Cancer Registry from 2011 to 2014 as well as the laboratory of Imam Reza Hospital from 2015 to 2016 in Mashhad, Iran. Patients’ medical records have been assessed and in cases of their satisfaction, the patients have been interviewed. In cases of deceased patients, their relatives were interviewed if they were satisfied to do so. Survival prediction modeling for the patients requires the information of the deceased patients, especially their chances of survival after their cancer diagnosis. Among 277 patients, 197 were either alive at the time of this research or their survival was not clear. Therefore, the information of 80 patients was applied for the purpose of modeling. This information included 15 independent variables and one dependent variable among which 3, 10 and 3 variables were nominal, ordinal and interval respectively. The details of these variables have been described in Table 1.
Table 1

The features of the independent variables of gastric cancer

NoVariable nameScaleDescription
1SexNominalMale: 61, Female:19
2Birth yearIntervalThe year patients were burned. Range between 1926 to 1967
3EducationOrdinalIncluded the following 2 values: 1) illiterate, 2) under diploma
4RaceOrdinalIncluded the following 3 values: 1) Persian, 2) Turkish, 3) Kurdish
5PMHOrdinalIncluded the following 6 values: 1) hypertension (HTN), 2) coronary artery disease (CAD), 3) diabetes mellitus (DM), 4) DM +HTN, 5) DM+HTN+CAD, 6) HTN+CAD
6Age at DiagnosisIntervalRanged between 46 to 87
7Family History of Gastric CancerOrdinalFamily history of gastric cancer: 1) first degree relative (FDR), 2) second degree relatives (SDR)
8Age at dx of Family GCIntervalFamily ‘s age at the diagnosis time included value between 45 to 82
9History of other GI cancerOrdinalHistory of other GI cancer, 1) First Degree Relative, 2) Second Degree Relatives
10Type of other GI CancerOrdinalIncluded following 4 values: 1) small intestine, 2) liver, 3) esophagus, 4) large intestine
11Hx of extra GI CancerOrdinalFamilial history of extragastrointestinal cancer. Included following 2 values: 1) First Degree Relative, 2) Second Degree Relatives
12TreatmentOrdinalIncluded following 3 values: 1) Surgery, 2) Surgery + Chemo + radio, 3) Chemo
13Cause of DeathOrdinalIncluded following 3 values: 1) Cancer, 2) MI, 3) PTE
14PathologyOrdinalIncluded following 7 values: 1) adenocarcinoma, 2) inflammatory tumor, 3) mucinous adenocarcinoma, 4) neuroendocrine carcinoma, 5) signet ring cell carcinoma, 6) GIST tumor, 7) undifferentiated carcinoma
15AddictionNominal17 patients were addicted and 63 were not.
16SurvivalNominalThe survival for 33 patients was 1 year and for 47 patients was 2 years

2.1. Data preprocessing

Of the 277 patients, only survival of 80 patients was available whose data were used for conducting a predictive model. We removed 118 patients because they were alive when research was in progress, removed 10 patients due to having outlier data and removed 10 patients due to unavailability of survival data. The steps of preprocessing have been depicted in Figure 1.
Figure 1

The steps of preprocessing for the recommended model

2.1.1. The missing values

There were 354 fields (29.5%) from 80 records of the patients applied for modeling, which reveals the high amount of missing values in the study. The number and percentage of the missing values for each independent variable have been given in Table 2; from 15 independent variables, 8 variables were missing. The variable “Hx. of extra. GI. Cancer” had the highest percentage of missing data with 88.75%. The mean and standard deviation of the missing variables for each patient were 4.43±1.41. MICE algorithm (6), with some modifications which will be mentioned later, was used to impute missing values. The form of the algorithm to find the missing values is in a way in which at first, the variables must be prioritized based on their missing values. The variables with the lowest missing values come first and the ones with the highest come thereafter. Using data mining technique, a data classification method was generated in which the independent variables were variables without missing data, and dependent variables were the ones with the least number of missing variables. Then, with the use of data mining algorithm, the missing values of the dependent variable were imputed and the dependent variables were added to the previous collection of the independent variables. The next variable with the least number of missing values was chosen as a dependent variable and again, the previous process was repeated for it. This process continued until no variable with missing value existed (17). The above process was repeated several times until no change occurred in the data. By the end of this process, all the empty fields were imputed and no missing value existed.
Table 2

The percentage of missing values in independent variables

VariableMissingValid (n)
nPercent
1Hx of extra GI Cancer7188.75%9
2Type of other GI Cancer6480%16
3Hx of other GI Cancer6480%16
4Age at Dx of Family GC5872.5%22
5FH of gastric cancer5771.25%23
6PMH3543.75%45
7Age at Diagnosis45%76
8Birth Year11.25%79

2.1.2. Recognition of the out-of-range data

In order to cross out the out-of-range data for each feature, algorithm IQR was used (7). The number of the records with the out-of-range data is 10, which all have been crossed out.

3. The recommended model

In this research, Logistic Regression Algorithm, Genetics Algorithm and MICE Algorithm were applied. Logistic regression is a predictive analysis which is used to describe the data and to explain the relationship between one dependent binary variable and one or more independent, variable, ordinal, interval, or ratio-level independent variables. A genetic algorithm (GA) is a method for solving both constrained and unconstrained optimization problems based on a natural selection process that mimics biological evolution. The algorithm repeatedly modifies a population of individual solutions. Multivariate imputation by chained equations (MICE) has emerged as a principled method of dealing with missing data. Despite properties that make MICE particularly useful for large imputation procedures, advances in data mining now make it accessible to many researchers. The issue was dealt with in three forms. In the first form, logistic regression was used for all 15 independent variables. In the second form, the missing values were imputed by MICE algorithm. Then, the dimensions were reduced by genetic algorithm. In the third form, the missing values were imputed by the optimized algorithm and then the dimensions were reduced by genetic algorithm.

3.1. Solve problem by regression algorithm

In this stage, all data with their features has been considered. The data was divided in two sections: test and train. The division technique will be described in the assessment section. The regression algorithm was applied for train data and a model was designed. To assess the recommended model, test data was not used (model assessment technique is described in the assessment section).

3.2. Solve the problem by MICE algorithm and genetic algorithm

In this technique, at first, the missing values were imputed and then the dimensions of the data were reduced by MICE algorithm and genetic algorithm respectively. After the dimensions’ reduction, the data was divided into two sections of train and test. Like the previous stage, logistic regression algorithm was applied for train data and then the model was assessed by test data. In Figure 2, the way to reduce dimensions by genetic algorithm and application of the regression algorithm has been demonstrated. First, the missing values were imputed by MICE algorithm and then, all 80 records were applied for genetic algorithm. Genetic algorithm chose a subset of records which has been assessed by regression algorithm. This process was repeated until the best subset was chosen as output, and effective parameters for survival rate were selected. Then data has been divided into two sections of test and train, regression algorithm was applied for the train data and a model was designed. The designed model was assessed by test data.
Figure 2

Model design by logistic algorithm and genetic algorithm

3.3. Solve the problem by MICE optimized algorithm and genetic algorithm

This model looks like section 4-2 and the only difference is that MICE optimized algorithm has been applied instead of MICE algorithm. The process of this optimization on MICE algorithm has been described in section 2-1-1. The solution has been given in Figure 3.
Figure 3

Model design by MICE optimized algorithm, genetic algorithm and logistic algorithm

4. Results

To assess the algorithms, three features of Sensitivity, Specificity and Accuracy have been used. In order to conduct the research, data has been divided into two test and train sections. 10-fold technique was used to divide data. Then, regression algorithm was run 10,000 times and algorithm accuracy was measured 1,000 times and its mean was chosen as the algorithm’s accuracy. In the model assessment, the problem was solved in three situations. In the first one, regression algorithm was used for 15 independent variables and in the second, the independent variables were reduced by genetic algorithm and then the missing values were imputed through MICE algorithm. In the third case, the independent variables were reduced using genetic algorithm and then the missing data was imputed by MICE optimized algorithm.

4.1. Application o logistic algorithm for all variables

In this stage, logistic regression was applied for all variables. The algorithm was run for 10,000 times and then Sensitivity, Specificity and Accuracy algorithms were run on both test and test data.

4.2. Application of logistic regression along with dimensions’ reduction with the help of genetic algorithm and MICE algorithm

First, the missing values were imputed by MICE algorithm and then the number of independent variables were reduced to nine by the use of genetic algorithm. Education, race, PMH, age at diagnosis, family history of gastric cancer, age at Dx of family GC, Hx of other GI cancer, type of other GI Cancer, Hx. of extra GI cancer were the nine parameters associated with patient survival. After this reduction, logistic regression algorithm was run on these nine variables and repeated 10,000 times. Then the algorithm of Sensitivity, Specificity and Accuracy were run for training, as well as test data.

4.3. Application of logistic regression with dimensions’ reduction with the help of genetic algorithm and optimized MICE algorithm

As explained in 2-1-1, the missing values were imputed by data mining algorithms, and independent variables were reduced to six: Sex, birth year, age at diagnosis time, age at diagnosis time of patients’ family, family history of gastric cancer, and family history of other gastrointestinal cancers. After the reduction of the variable, logistic regression algorithm was run on these six variables for 10,000 times. Then, Sensitivity, Specificity and Accuracy algorithm was run on test data. The logistic regression coefficients are described in Table 3. If the regression coefficient is positive, the rise of that variable causes the rise of patient’s survival rate, and if it is negative, its rise causes the fall of the patient’s survival rate.
Table 3

The coefficients of the recommended model of logistic regression

Beta0SexBirth YearAge at DiagnosisAge of diagnosis time at patients’ familyFamily history of gastric cancerType of other GI Cancer
β6eβ5β5eβ4β4eβ3β3eβ2β2eβ1β1β0eβ6
−0.250.95−0.050.46−0.760.67−0.401.210.191.290.25−0.250.78

4.4. A comparison among the three techniques for problem solution

In this stage, the three different techniques for problem solution were compared by considering the three assessment parameters. The details of this comparison are demonstrated in Figure 4. As seen in Figure 4, the optimized MICE technique has worked better on all three parameters of Sensitivity, Specificity and Accuracy.
Figure 4

A comparison among the three assessment parameters with the help of three techniques for problem solution

5. Discussion

In this research, a model was designed to predict the chances of survival in gastric cancer patients. From 15 independent variables, 6 variables were chosen for the purpose of modeling. Two dependent variables studied the survival rate of the patients in two forms of one and two years. According to the logistic regression coefficients, each variable was described. The survival rate falls and if it is one (male), the chances of survival falls. Based on the amount of e^(β_1 )according to Table 3, the possibility of survival fall is 29% more in women than in men. In two articles of Yang et al. in 2011 (19) and Zeraati and Amiri in 2016 (20), it was demonstrated that gender directly influences the survival rate in gastric cancer. The 2nd, 3rd and 4th variables were patient’s age, the age of diagnosis and age of diagnosis time at patients’ family, respectively. As the age rises, the period of patient’s survival falls and vice versa. The two studies of De Angelis et al. in 2014 (21) and Ferro et al. in 2014 (22) reveal that as the age of patient as well as the age of diagnosis rise, the survival period falls. Due to a negative coefficient for the variable “Family history of gastric cancer”, if there has been any case of gastric cancer in the family, the possibility of survival decreases and in the case of gastric cancer in relatives, the possibility of survival rises. Based on the amount of e^(β_1 ) according to Table 3, the chance of survival in patients with gastric cancer history in the family is 5% less than patients with gastric cancer history in relatives. In the paper of Rugge et al in 2015 (23), the influence of cancer history in family on survival possibility has been revealed. Due to a negative coefficient for the variable “Other type of cancer”, if this variable is small intestine or liver, survival rate falls and if it is esophagus or large intestine, survival rate rises. The most noticeable limitation in this study is the low number of records used for modeling. Therefore, it is highly recommended that this study be repeated in future with more records to find a model with higher level of accuracy, although the designed model with these available records was good enough in accuracy, which can be among the advantages of this study. As seen in Figure 4, with dimensions’ reduction algorithm as well as MICE optimized algorithm, the accuracy of the model increases from 63.03% to 72.57% which shows the appropriateness of this recommended technique in modeling the prediction of gastric cancer patients’ survival rate.

6. Conclusions

In this study, a prediction model is presented for the survival time of patients with gastritis. In this model, six variables of gender, year of birth, age at diagnosis, age of diagnosis in the family of patients, family history of gastric cancer, and family history of other gastrointestinal cancers affect survival time. Having a patient’s survival time can be a great help in determining the treatment method for doctors. It is suggested that the proposed model be evaluated for other examples. A further study on the methods for extracting variables that affect gastric cancer can be appropriate for future research.
  15 in total

1.  Survival of metastatic gastric cancer: Significance of age, sex and race/ethnicity.

Authors:  Dongyun Yang; Andrew Hendifar; Cosima Lenz; Kayo Togawa; Felicitas Lenz; Georg Lurje; Alexandra Pohl; Thomas Winder; Yan Ning; Susan Groshen; Heinz-Josef Lenz
Journal:  J Gastrointest Oncol       Date:  2011-06

2.  Clinical profile of gastric cancer in Khuzestan, southwest of Iran.

Authors:  Hajiani Eskandar; Sarmast Shoshtari Mohammad Hossein; Masjedizadeh Rahim; Hashemi Jalal; Azmi Mehrdad; Tahereh Rajabi
Journal:  World J Gastroenterol       Date:  2006-08-14       Impact factor: 5.742

3.  Survival rate of gastric and esophageal cancers in Ardabil province, North-West of Iran.

Authors:  Fatemeh Samadi; Masoud Babaei; Abbas Yazdanbod; Mahdi Fallah; Mehdi Nouraie; Dariush Nasrollahzadeh; Alireza Sadjadi; Mohammad-Hossein Derakhshan; Behrooz Shokuhi; Robab Fuladi; Reza Malekzadeh
Journal:  Arch Iran Med       Date:  2007-01       Impact factor: 1.354

Review 4.  Gastric cancer: descriptive epidemiology, risk factors, screening, and prevention.

Authors:  Parisa Karimi; Farhad Islami; Sharmila Anandasabapathy; Neal D Freedman; Farin Kamangar
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2014-03-11       Impact factor: 4.254

5.  Cancer survival in Europe 1999-2007 by country and age: results of EUROCARE--5-a population-based study.

Authors:  Roberta De Angelis; Milena Sant; Michel P Coleman; Silvia Francisci; Paolo Baili; Daniela Pierannunzio; Annalisa Trama; Otto Visser; Hermann Brenner; Eva Ardanaz; Magdalena Bielska-Lasota; Gerda Engholm; Alice Nennecke; Sabine Siesling; Franco Berrino; Riccardo Capocaccia
Journal:  Lancet Oncol       Date:  2013-12-05       Impact factor: 41.316

6.  Spatial analysis of common gastrointestinal tract cancers in counties of Iran.

Authors:  Ali Soleimani; Jafar Hassanzadeh; Ali Ghanbari Motlagh; Hamidreza Tabatabaee; Elham Partovipour; Sareh Keshavarzi; Mohammad Hossein
Journal:  Asian Pac J Cancer Prev       Date:  2015

7.  Gastric carcinoma: 5 year experience of a single institute.

Authors:  S Sadighi; J Raafat; Ma Mohagheghi; F Meemary
Journal:  Asian Pac J Cancer Prev       Date:  2005 Apr-Jun

8.  Worldwide trends in gastric cancer mortality (1980-2011), with predictions to 2015, and incidence by subtype.

Authors:  Ana Ferro; Bárbara Peleteiro; Matteo Malvezzi; Cristina Bosetti; Paola Bertuccio; Fabio Levi; Eva Negri; Carlo La Vecchia; Nuno Lunet
Journal:  Eur J Cancer       Date:  2014-03-17       Impact factor: 9.162

Review 9.  Gastric cancer in Iran: epidemiology and risk factors.

Authors:  Reza Malekzadeh; Mohammad H Derakhshan; Zinab Malekzadeh
Journal:  Arch Iran Med       Date:  2009-11       Impact factor: 1.354

10.  Bayesian analysis for survival of patients with gastric cancer in Iran.

Authors:  Ahmad Reza Baghestani; Ebrahim Hajizadeh; Seyed Reza Fatemi
Journal:  Asian Pac J Cancer Prev       Date:  2009
View more
  5 in total

1.  Expression of miR-141 and YAP1 in gastric carcinoma and modulation of cancer cell proliferation and apoptosis.

Authors:  Fangchao Du; Chao Yu; Rui Li; Ding Ding; Lei He; Gang Wen
Journal:  Int J Clin Exp Pathol       Date:  2019-02-01

2.  Colorectal Cancer in North-Eastern Iran: a retrospective, comparative study of early-onset and late-onset cases based on data from the Iranian hereditary colorectal cancer registry.

Authors:  Benyamin Hoseini; Zahra Rahmatinejad; Ladan Goshayeshi; Robert Bergquist; Amin Golabpour; Kamran Ghaffarzadegan; Fatemeh Rahmatinejad; Reza Darrudi; Saeid Eslami
Journal:  BMC Cancer       Date:  2022-01-08       Impact factor: 4.430

3.  Characteristics of gastric precancerous conditions and Helicobacter pylori infection among dyspeptic patients in north-eastern Iran: is endoscopic biopsy and histopathological assessment necessary?

Authors:  Abbas Esmaeilzadeh; Ladan Goshayeshi; Robert Bergquist; Lida Jarahi; Alireza Khooei; Alireza Fazeli; Hooman Mosannen Mozaffari; Ali Bahari; Mohammad Bagher Oghazian; Benyamin Hoseini
Journal:  BMC Cancer       Date:  2021-10-26       Impact factor: 4.430

4.  Internal validation and evaluation of the predictive performance of models based on the PRISM-3 (Pediatric Risk of Mortality) and PIM-3 (Pediatric Index of Mortality) scoring systems for predicting mortality in Pediatric Intensive Care Units (PICUs).

Authors:  Zahra Rahmatinejad; Fatemeh Rahmatinejad; Majid Sezavar; Fariba Tohidinezhad; Ameen Abu-Hanna; Saeid Eslami
Journal:  BMC Pediatr       Date:  2022-04-12       Impact factor: 2.125

5.  Mortality risk factors in patients with gastric cancer using Bayesian and ordinary Lasso logistic models: a study in the Southeast of Iran.

Authors:  Abolfazl Hosseinnataj; Mohammad RezaBaneshi; Abbas Bahrampour
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2020
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.