Literature DB >> 29563837

Combining statistical techniques to predict postsurgical risk of 1-year mortality for patients with colon cancer.

Inmaculada Arostegui1,2,3, Nerea Gonzalez2,4, Nerea Fernández-de-Larrea5,6, Santiago Lázaro-Aramburu7, Marisa Baré2,8, Maximino Redondo2,9, Cristina Sarasqueta2,10, Susana Garcia-Gutierrez2,4, José M Quintana2,4.   

Abstract

INTRODUCTION: Colorectal cancer is one of the most frequently diagnosed malignancies and a common cause of cancer-related mortality. The aim of this study was to develop and validate a clinical predictive model for 1-year mortality among patients with colon cancer who survive for at least 30 days after surgery.
METHODS: Patients diagnosed with colon cancer who had surgery for the first time and who survived 30 days after the surgery were selected prospectively. The outcome was mortality within 1 year. Random forest, genetic algorithms and classification and regression trees were combined in order to identify the variables and partition points that optimally classify patients by risk of mortality. The resulting decision tree was categorized into four risk categories. Split-sample and bootstrap validation were performed. ClinicalTrials.gov Identifier: NCT02488161.
RESULTS: A total of 1945 patients were enrolled in the study. The variables identified as the main predictors of 1-year mortality were presence of residual tumor, American Society of Anesthesiologists Physical Status Classification System risk score, pathologic tumor staging, Charlson Comorbidity Index, intraoperative complications, adjuvant chemotherapy and recurrence of tumor. The model was internally validated; area under the receiver operating characteristic curve (AUC) was 0.896 in the derivation sample and 0.835 in the validation sample. Risk categorization leads to AUC values of 0.875 and 0.832 in the derivation and validation samples, respectively. Optimal cut-off point of estimated risk had a sensitivity of 0.889 and a specificity of 0.758.
CONCLUSION: The decision tree was a simple, interpretable, valid and accurate prediction rule of 1-year mortality among colon cancer patients who survived for at least 30 days after surgery.

Entities:  

Keywords:  1-year-mortality; clinical prediction rules; colonic neoplasms; colorectal surgery; prediction model; tree-based methods

Year:  2018        PMID: 29563837      PMCID: PMC5846756          DOI: 10.2147/CLEP.S146729

Source DB:  PubMed          Journal:  Clin Epidemiol        ISSN: 1179-1349            Impact factor:   4.790


Introduction

Currently, colorectal cancer is among the most common cancers1–3 with high incidence and mortality rates, despite improved rates of survival during the last few years.4 Previous scientific work broadly investigated diagnosis and treatment of colorectal cancer. However, work on the development of clinical prediction rules for patients with colorectal cancer in order to predict adverse events and mortality after surgical treatment needs to be properly validated and translated into easy-to-use tools for clinical practice.5–6 Ideally, studies should investigate robust clinical outcomes such as mortality and/or complications, in order to identify related factors, as well as patient-reported outcomes, such as health-related quality of life, and their determinants of change.7–9 Some predictive scores of short-term evolution outcomes, 30 days, have been developed, such as the various versions of the Physiological and Operative Severity Score for the enUmeration of Mortality and morbidity (POSSUM) scoring,10–12 although most of them are not properly validated in most settings. Furthermore, to our knowledge, there are no validated prediction models for medium-term follow-up, 1–2 years, on colorectal cancer outcomes, a period during which the majority of adverse outcomes after treatment and/or surgery are observed.13 Classification and regression trees (CART) have been used extensively as an alternative to the classic linear and additive prediction models. Results are presented in tree form of a decision rule with a hierarchical sequential structure that can be easily understood and applied in clinical practice. CART models have been used previously for prognosis classification in cancer and other diseases.14–16 Various studies have performed CART analysis in colorectal cancer patients to search for biomarkers highly predictive of response to therapy in order to select patients for treatment17–19 or to select genes for phenotypic classification.20,21 However, to our knowledge, there are no validated prediction models, including CART models, for medium-term mortality among patients with colon cancer. Other relatively modern modeling techniques, known as machine learning methods, which include random forests (RF), neural networks (NN) or support vector machine (SVM), have received increasing attention in medical research as they may potentially provide more accurate results.22–24 In 2011, Manilich et al developed a prognostic model for colorectal cancer including several outcomes to investigate competing-risk survival for 5 years using RF methods; this model was based on multiple clinical factors with the objective of evaluating the accuracy of patient staging solely based on the tumor–node–metastasis (TNM).25 Moreover, several machine learning techniques have been ensembled into a single algorithm that provides a prediction rule with the best mean-squared error (MSE).26 However, the use of these complex classifying techniques is not common in clinical research for predictive purposes. The Results and Health Services Research in Colorectal Cancer (CCR-CARESS) project is a prospective cohort study that recruited incident colorectal cancer patients receiving surgical treatment and that followed them during 5 years. It is, therefore, an appropriate study design for developing clinical predictive rules. In line with one of the main purposes of the CCR-CARESS project, the aim of this study was to combine RF and CART modeling approaches in order to develop and validate a clinical predictive model for 1-year mortality among patients with colon cancer who survive for at least 30 days after receipt of a surgical intervention. The objectives were first to identify clinical factors that most accurately predict 1-year mortality among this group of patients post surgery, and second, to develop and validate a clinically applicable predictive rule.

Methods

Study design

The CCR-CARESS prospective observational cohort is a multicenter study of patients diagnosed with colorectal cancer who had undergone surgical interventions that were carried out between June 2010 and December 2012 in 22 public hospitals in Spain. Those hospitals represented nine provinces from six regions in Spain and all of them operate under the Spanish National Health Service. Patients have a follow-up period of up to 5 years after surgery. The design and purposes of the study have been thoroughly described previously.27

Patient selection

Patients were eligible for the CCR-CARESS study if they were diagnosed with colon cancer (up to 15 cm above the anal margin) or rectum cancer (between the anal margin and 15 cm above it), and received curative or palliative surgery for the first time. Patients were identified from the surgical waiting lists of each hospital and were invited to participate during a clinical visit or by letter. Colorectal cancer diagnosis was mainly based on anatomopathologic diagnosis after a biopsy by colonoscopy.28–30 Exclusion criteria were in situ cancer, inoperable tumor, severe mental or physical pathologies that could prevent patients from responding to the questionnaires, and terminal illness. Patients were informed of the study and they were asked to sign an informed consent before participating. In the current study, patients were considered if they had a diagnosis of colon cancer and survived 30 days after the intervention.

Variable collection

Qualified and trained reviewers collected clinical data from the medical records, employing data collection forms and an instructions manual to ensure consistency among hospitals and reviewers. Baseline data collected upon hospital admission included sociodemographic, clinical (including onset of symptoms, habits, personal and family background, comorbidities, diagnostic tests and preintervention treatments), preoperative (including laboratory parameters, tumor markers, diagnostic tests and preintervention clinical staging) and pathology information; and outpatient anesthesia information on the surgical intervention (American Society of Anesthesiologists Physical Status Classification System [ASA] risk score31). Data related to the hospital admission included information on the surgical intervention, anatomic pathology data, length of stay, presence or degree of complications and data related to the remaining days of admission (including the presence of complications, the need for reintervention or death). The Charlson Comorbidity Index (CCI) was calculated based on general comorbidities.32 TNM classification was assessed according to the 7th Edition of the American Joint Committee on Cancer,29 and focused on the preintervention/clinical TNM (cTNM) and the histopathologic report for TNM (histopathologic tumor–node–metastasis [pTNM]). For the final-stage grouping, pTNM was grouped into three categories: 0–II, III and IV. The surgical margins were examined for the presence of residual tumor, which was described using the residual tumor (R) classification: R0 was microscopically free proximal and distal margins; R1 was microscopically involved margins and R2 was macroscopic residual cancer.29 Lymph node ratio (LNR), defined as the ratio of tumor-infiltrated lymph nodes to total number of resected lymph nodes, was calculated and categorized as suggested by Rosenberg et al.33 Data on laboratory results, diagnostic tests, presence of complications, readmissions, reintervention or death were collected up to 30 days after surgery. Finally, information was collected throughout the year, regarding the need for radiation therapy and/or chemotherapy, including treatment schedule, cycles, complications and supportive care required; laboratory results and diagnostic tests performed; presence of complications; tumor recurrence; readmission or reintervention and death. Further information, which includes the full study protocol, has been published elsewhere.27

Outcome measures

The primary outcome was mortality within 1 year of surgery among those who survived for at least 30 days after surgery. Vital status was established by reviewing medical records and examining the hospital databases and National Death Index. Deaths were considered confirmed if the name, gender, and date of birth and identity card on the record matched those of the participant.

Statistical analysis

The study sample was randomly divided into a derivation sample (50%) and a validation sample (50%). Both samples were described using means and SDs for continuous variables and as frequencies and percentages for categorical variables. Differences between the derivation and the validation samples were tested for the distribution of each variable using the two-sample Student’s t-test for continuous variables and the chi-square test for categorical variables; nonparametric methods were used when necessary. The same methods were used to test univariate associations between predictors and 1-year mortality. When a missing observation was observed for any recorded symptom or complication, it was assumed to be asymptomatic or that no complication occurred. When pTNM was fully unobserved, it was replaced by the analogous cTNM. Any other unrecorded or unobserved value was considered as a missing value. Frequency and percentage of missing values were reported for each variable. Various tree-based methods were used in order to identify the variables and partition points that optimally classified patients by risk of mortality. First, the best predictors were selected using RF methods for the whole sample; 1000 trees were used in the RF model. Importance for each variable in the model was measured as the mean decreases in accuracy (error rate) and in node impurity (Gini Index).34 In addition, categorization of continuous variables or new encoding of categorical variables was controlled during the modeling phase in order to avoid the overimportance of categorical variables that could occur in tree-based models.34 Therefore, as to optimally categorize continuous predictors, such as hemoglobin or hematocrit levels, cut-off points were selected using genetic algorithms.35 The final decision tree based on a simple recursive partitioning algorithm was created in the derivation sample to identify 1-year mortality risk factors with the highest discriminative power, including the predictors identified by the RF as most important. To internally validate the risk of 1-year mortality derived from the decision tree, we used bootstrap resampling with N=2000 repetitions and estimated 95% confidence interval (CI).36 We report the median of these 2000 repetitions as the parameter estimate and the 2.5 and 97.5 percentiles as the 95% CI. Validation sample was solely used for evaluating the performance of the final tree derived. The MSE was calculated in the validation sample in order to evaluate the magnitude of the differences between the observed and predicted probabilities of mortality. To make the tree more user friendly, we simplified the resulting algorithm into a manageable number of risk classes based mainly on the estimated risk of 1-year mortality. We applied the risk classification derived from the derivation sample to the validation sample. Model discrimination of the tree and the risk categories was assessed by the area under the receiver operating characteristic (ROC) curve (AUC) and estimated risk dichotomization for optimal sensitivity–specificity combination.37 The Cochran–Armitage trending statistic was performed to assess whether classification provided by the tree could differentiate low-risk patients from high-risk patients in a fashion of graded response based on the level of risk present. Multiple logistic regression (LR) was also fitted to data in the derivation sample. The same covariates that were previously selected by the recursive partitioning algorithm were included in the LR model. Firth’s penalized maximum likelihood estimation was used when necessary to reduce bias in the parameter estimates when data separation occurred because of the small number of events.38,39 The linear predictor function obtained from the derivation sample was applied to the validation sample. Categorization of the predicted risk of mortality was also performed for the LR model using the same criteria as before. Model discrimination of the LR model was evaluated in the same way as we did for the decision tree. Comparison of the discrimination ability between the decision tree and the LR model was performed using a bootstrap test with N=2000 repetitions to compare AUCs obtained from two ROC curves.40,41 Effects were considered statistically significant at α=0.05. Statistical analyses were performed using SAS for Windows© version 9.1 and R version 3.4.

Ethics approval and consent to participate

Patients were informed of the CCR-CARESS study objectives, invited to voluntarily participate and were included in the study sequentially. All of them signed a written informed consent to participate in the study. The Institutional Review Boards of the participating hospitals approved this project. In particular, the Clinical Research Ethics Committee of the Basque Country (CEIC-E), the Clinical Research Ethics Committee of the Hospital Galdakao-Usansolo, the Clinical Research Ethics Committee of the Hospital Txagorritxu, the Clinical Research Ethics Committee of the Área Sanitaria de Gipuzkoa, the Clinical Research Ethics Committee of the Hospital Basurto, the Clinical Research Ethics Committee of the Hospital Universitario La Paz, the Clinical Research Ethics Committee of the Hospital Universitario Fundación Alcorcón, the Clinical Research Ethics Committee of the Hospital Clínico San Carlos, the Regional Committee of Clinical Trials of Andalucía (Sevilla), the Clinical Research Ethics Committee of the Agencia Sanitaria Costa del Sol, the Clinical Research Ethics Committee of the Parc Taulí Sabadell-University Hospital, the Clinical Research Ethics Committee of the Hospital del Mar and the Clinical Research Ethics Committee Fundació Unio Catalana d’Hospitals approved the study.

Results

A total of 1945 patients were enrolled in the study: 981 (50%) and 964 (50%) were randomized to the derivation and the validation samples, respectively. Differences between the two samples were not statistically significant (P>0.05), except for surgical complications up to 30 days after surgery (P=0.049). Table S1 shows these results in more detail. Figure 1 shows the importance scores for the top 30 predictors used in the RF model for 1-year mortality. Association between predictors and 1-year mortality is shown using univariate analysis in the derivation sample (Table 1). Significant variables from univariate analysis and the top 30 predictors provided by the RF model were included in the splitting process for building the classification tree using CART modeling in order to investigate 1-year mortality (Figure 2). Variables selected using the CART model were the presence of residual tumor (R0, R1 vs R2), ASA risk score categorized into two groups (I–III vs IV), pTNM, CCI, intraoperative complications, chemotherapy treatment after surgery and tumor recurrence during the 1-year period. The mortality rate was <5% in all those patients with residual tumors classified as R0 or R1, ASA below IV, no intraoperative complications and pTNM less than or equal to III, with the exception of those in a pTNM III stage without adjuvant chemotherapy. Generally, mortality rates were >10% among patients with residual tumors classified as R0 or R1 and patients with an ASA risk score of IV, with intraoperative complications or with pTNM between III or IV. Among patients with residual tumors classified as R2, mortality rates were >35%. MSE of the classification tree in the validation sample was 0.0026.
Figure 1

Variable importance for the top 30 predictors of 1-year mortality selected by the random forest.

Abbreviations: ASA, American Society of Anesthesiologists; CA, carbohydrate antigen; CEA, carcinoembryonic antigen; CRC, colon or rectum cancer; ICU, intensive care unit; pTNM, histopathologic tumor–node–metastasis.

Table 1

Univariate relation of explanatory variables and 1-year mortality in the derivation sample is shown

VariablesMissingOne-year mortality for up to 30 days survivors
P-valuea
Yes (n=50)No (n=931)
Before surgery
Gender
 Male30 (4.9)582 (95.1)0.721
 Female20 (5.4)349 (94.6)
Ageb273.7 (10.5)68.7 (10.7)0.001
Smoking habit10
 Smoker26 (5.6)442 (94.4)0.776
 Former smoker5 (4.0)120 (96.0)
 Nonsmoker19 (7.0)359 (95.0)
Charlson Comorbidity Indexb3.9 (2.2)2.9 (1.3)0.002
 ≤218 (3.5)493 (96.5)0.02
 >232 (6.8)438 (93.2)
Past history of CRC
 No46 (5.1)855 (94.9)0.967
 Yes4 (5.0)76 (95.0)
CEA5
 No14 (6.5)204 (93.6)0.324
 Yes36 (4.8)722 (95.2)
CA 19-916
 No31 (5.5)537 (94.5)0.643
 Yes19 (4.8)378 (95.2)
Hemoglobinb1614.4 (19.6)14.7 (15.7)0.914
Hematocritb2834.4 (8.5)37.4 (14.4)0.029
ASA27
 I, II and III41 (4.5)875 (95.5)<0.001
 IV8 (21.0)30 (79.0)

Hospitalization
Aggravating pathologyc
 No41 (4.5)875 (95.5)0.004
 Yes9 (13.9)56 (86.1)
Surgical approach1
 Open surgery24 (5.8)393 (94.2)0.631
 Laparoscopy13 (5.6)218 (94.4)
 Both11 (3.7)286 (96.3)
 Others2 (5.7)33 (94.3)
Surgical severity1
 Minor00<0.001
 Moderate3 (42.9)4 (57.1)
 Major35 (4.7)708 (95.3)
 Complex major12 (5.2)218 (94.8)
Tumor site
 Right-transverse side27 (9.5)388 (93.5)0.086
 Left23 (4.1)543 (95.9)
Adjacent organ invasion
 036 (4.1)844 (95.9)<0.001
 18 (9.6)75 (90.4)
 >16 (33.3)12 (66.7)
Lymph node ratiob790.24 (0.28)0.08 (0.16)<0.001
 <0.1724 (3.1)748 (96.6)<0.001
  [0.17–1.41)9 (8.7)95 (91.4)
  [1.41–0.69]7 (15.9)37 (84.1)
 >0.693 (17.7)14 (82.4)
Intraoperative complications
 No35 (3.9)865 (96.1)<0.001
 Yes15 (18.5)66 (81.5)
pTNM stage3
 0, I, II13 (2.3)561 (97.7)<0.001
 III20 (6.4)295 (93.6)
 IV16 (18.2)73 (82.0)
Residual tumor40
 R033 (3.8)849 (96.3)<0.001
 R13 (9.1)30 (90.9)
 R211 (42.3)15 (57.7)
K-ras
 Not done36 (4.7)737 (95.3)0.010
 No mutation7 (4.1)165 (95.4)
 Mutation5 (16.7)25 (83.3)
Complications after surgery
 No22 (3.9)548 (96.1)0.038
 Yes28 (6.8)383 (93.2)
Reintervention
 No47 (5.2)858 (94.8)0.635
 Yes3 (4.0)73 (96.1)
Admission at reanimation/ICU
 No30 (3.9)734 (96.1)0.002
 Yes20 (9.2)197 (90.8)
Complications at reanimation/ICU
 No45 (5.1)839 (94.9)0.978
 Yes5 (5.2)92 (94.8)

Up to 30 days after surgery
Cancer complications
 No49 (5.0)930 (95.0)0.100
 Yes1 (50.0)1 (50.0)
Medical complications
 No44 (4.6)906 (95.4)0.004
 Yes6 (19.4)25 (80.6)
Surgical complications
 No49 (5.2)898 (94.8)0.561
 Yes1 (2.9)33 (97.1)
Infectious complications
 No48 (5.3)862 (94.7)0.573
 Yes2 (2.8)69 (97.2)
Readmission
 No44 (4.9)862 (95.1)0.266
 Yes6 (8.0)69 (92.0)

One-year follow-up
Adjuvant chemotherapy7
 No26 (3.3)519 (95.2)0.801
 Yes19 (4.4)410 (95.6)
Readmission
 No22 (2.9)749 (97.1)<0.001
 Yes28 (13.3)182 (86.7)
Recurrence of the tumor
 No31 (3.6)838 (96.4)<0.001
 Yes19 (17.0)93 (83.0)

Notes: Frequency and percentage are shown for all categorical variables.

Result provided by the Student’s t-test for continuous variables and the chi-square test for categorical variables, nonparametric methods were used when necessary.

Mean and SD are shown for continuous variables.

Aggravating pathology is defined as having one of the following diagnoses: occlusion, perforation, fistula, abscess, bleeding and diffuse location peritonitis.

Abbreviations: ASA, American Society of Anesthesiologists; CA, carbohydrate antigen; CEA, carcinoembryonic antigen; CRC, colon or rectum cancer; ICU, intensive care unit; pTNM, histopathologic tumor–node–metastasis.

Figure 2

Results of the CART analysis for 1-year mortality in the derivation sample.

Notes: Each branch shows the classification variable and each node shows the number of subjects and the estimated probability of 1-year mortality on that node. Final nodes are in bold using different line types for stratified risk groups: low (dotted), medium (dashed), high (dotted dash) and very high (solid). Application to the validation sample is shown below each node in light gray-colored boxes.

Abbreviations: ASA, American Society of Anesthesiologists; CART, classification and regression trees; CCI, Charlson Comorbidity Index; Chem, adjuvant chemotherapy; IntraCom, intraoperative complications; pTNM, histopathologic tumor–node–metastasis; R1y, recurrence of the tumor; ResTum, residual tumor.

The ROC curve of predicted 1-year mortality for the CART in the derivation and validation samples is shown in Figure 3 along with the cut-off point of estimated risk dichotomization for the optimal sensitivity–specificity combination for the derivation sample. The AUC of the CART model was 0.896 (95% CI: 0.856, 0.936) and 0.835 (95% CI: 0.776, 0.895) in the derivation and validation samples, respectively. More detailed results on the internal bootstrap validation of the CART analysis are shown in the additional material (Figure S1; Table S2).
Figure 3

ROC curve for predicted 1-year mortality by the CART analyses.

Notes: Solid line applies for derivation sample and dashed line for validation sample. AUC=0.896 and 95% CI is (0.856, 0.936) for derivation sample and AUC=0.835 and 95% CI is (0.776, 0.895) for validation sample. The cut-off point of estimated 1-year mortality risk dichotomization for optimal sensitivity–specificity combination for derivation sample is shown with the corresponding specificity and sensitivity values.

Abbreviations: AUC, area under the receiver operating characteristic curve; CART, classification and regression trees; CI, confidence interval; ROC, receiver operating characteristic.

The LR model provided AUC estimates of 0.883 (95% CI: 0.834, 0.933) and 0.817 (95% CI: 0.752, 0.882) in the derivation and validation samples, respectively. Difference between AUCs obtained from the CART and the LR was not statistically significant in any of the two subsamples. Using data from the derivation sample, the CART created four 1-year mortality risk classes: low (<0.03), medium (≥0.03 and <0.1), high (≥0.1 and <0.2) and very high (≥0.2). The AUC provided by the stratified risk categories in the derivation sample was 0.875 (95% CI: 0.823, 0.926). This risk classification was validated in the validation sample with AUC=0.832 (95% CI: 0.777, 0.888) (Table 2). The Cochran–Armitage test showed a statistically significant trend in both samples (P<0.0001). The cut-off point for dichotomization of estimated mortality risk investigating the optimal sensitivity–specificity combination in the derivation sample was achieved at point 0.03, leading to a sensitivity of 0.889 and a specificity of 0.758 for risk of mortality at 1 year.
Table 2

Distribution of the subjects depending on the estimated risk of 1-year mortality

Derivation sample (981)Validation sample (964)
Risk groupNo (931)Yes (50)No (893)Yes (71)

Unclassified60 (92.3)5 (7.7)51 (85.0)9 (15.0)

Low634 (99.4)4 (0.6)594 (98.3)10 (1.7)

Medium115 (95.0)6 (5.0)130 (95.6)6 (4.4)
High71 (85.5)12 (14.5)66 (79.5)17 (20.5)
Very high51 (68.9)23 (31.2)52 (64.2)29 (35.8)

AUC0.875 (0.823–0.926)0.832 (0.777–0.888)

Notes: Estimated mortality rate (P) was categorized and classified as mortality risk as follows: low (P<0.03), medium (0.03≤P<0.1), high (0.1≤P<0.2) and very high (P≥0.2). Dashed horizontal line shows the cutoff point for dichotomization of estimated 1-year mortality risk looking for optimal sensitivity–specificity combination in the derivation sample, leading to a sensitivity of 0.889 and a specificity of 0.758.

Abbreviation: AUC, area under the receiver operating characteristic curve.

Risk of mortality predicted by the LR model was also categorized, using the same criteria as that for the CART. The AUCs for the stratified risk classification obtained from the LR model were 0.869 (95% CI: 0.809, 0.929) in the derivation sample and 0.817 (95% CI: 0.757, 0.878) in the validation sample. Comparison between the AUCs obtained with the stratified risk categories from the CART and the LR model provided no statistically significant differences.

Discussion

In the current study, this combination of different statistical techniques has enabled us to obtain a simple and easy-to-use decision tree with obtainable variables that are routinely used in daily clinical practice. The tree was obtained from a large prospective cohort of patients who underwent surgery for colon cancer and were followed for 1-year post surgery. The presence of residual tumors (based on R classification) was the first variable detected using the tree that was associated with mortality within 1 year for patients with colon cancer who survived for at least 30 days after surgery. The following branches included the ASA risk score, intraoperative complications, pTNM stage, adjuvant chemotherapy, recurrence of the tumor in 1 year and CCI score. As regards the comparison of our results with other results reported in the literature, there are other studies that conclude that the presence of residual tumor after surgery could be interpreted as a surrogate of the severity of the disease, as well as an indicator of surgery effectiveness.42 Other predictors identified of 1-year mortality were related to the general condition of patient before surgery, such as ASA and comorbidities (based on the CCI), or directly related to the severity of the disease as determined using the pTNM stage. Comorbidities, in conjunction with age, have been previously reported to have an impact on mortality,13,43,44 as well as the ASA score.25,45 Previous studies have determined that the severity of the colon cancer, as measured by the TNM stage, lymph node status, number of lymph nodes positive for tumor, or depth of primary tumor penetration, is a predictor of 1-year mortality.46 With regard to adjuvant chemotherapy, several tree nodes showed that it was a predictor of 1-year mortality; this same finding has been demonstrated in other studies.47 Finally, variables related to the condition of the patient during and after surgery, such as intraoperative complications and recurrence of the tumor in 1 year, are also present in the tree and have been previously identified as predictors in other studies.42,48 Tree-based methods, such as RF and CART, are advantageous compared to linear and additive models, such as regression models. Tree-based methods do not require parametric specification of the relationship between the predictors and the outcome, whereas regression methods do require. It basically means that while regression models are fitted based on an equation that defines how in theory the predictors and the outcome are related, tree-based models do not assume any predefined relationship between the variables. Thanks to this feature of tree-based models, these allow for the natural incorporation of complex interactions and relationships between covariates, aside from what is already known, and various competing and inter-related variables can be explored simultaneously. Generally, importance of predic tors in a univariate regression framework and in a tree-based framework could differ considerably, as occurs in our study, probably due to the interaction effects. The final tree showed, for instance, an interaction effect between adjuvant chemotherapy and pTNM stage, with significant splits depending on adjuvant chemotherapy for pTNM stage III when there were not intraoperative complications and for pTNM stages III and IV in the presence of intraoperative complications, for patients with ASA I, II or III and residual tumor classified as R0 or R1, while the same split was not present for other combinations of the same variables. Moreover, in practice, the main advantage of tree-based methods is that the result provided in a decision tree form can be easily interpreted by clinicians and researchers and somehow mimics the clinical practice in the decision-making process. RF is based on an algorithm that uses bootstrapping in conjunction with CART, thereby randomly selecting individuals and predictors from the original sample in an iterative way, and protecting the model from overfitting. RF is computationally more efficient than other tree-based methods, such as simple CART models, and it is robust to a noisy response.34 RF allowed us to select the most important variables to be incorporated into the tree. Encoding of categorical variables is an important issue during the modeling phase in tree-based methods because categorical variables could artificially gain importance over the continuous variables.34 Prevention against favoring categorical predictors with a large number of categories over continuous or dichotomous predictors was incorporated in the modeling phase. In addition, categorization of some predictors, for easy interpretation of the results, has been optimized by selecting the optimal number and location of cut-off points using a prediction framework based on genetic algorithms, recently proposed in the literature.35 One recent publication concludes that modern machine learning techniques, such as RF, SVM and NN, showed instability and a high optimism even with >200 events per variable.49 They may need over 10 times as many events per variable to achieve a stable validated AUC and a small optimism as the more conventional modeling techniques such as LR and CART. These findings imply that such modern techniques should only be used in medical prediction problems if very large data sets are available. When we performed RF with the whole sample (n=1945), high classification error was obtained. However, we relied on results obtained from RF with regard to variable importance, and this information was retained to significantly reduce the number of predictors to be incorporated in order to develop a classification tree using CART. In practice, when we look for higher accuracy, most of the models become more complex and their interpretation becomes more difficult. This is always the tradeoff we make when prediction accuracy is the primary goal.34 However, the simple result obtained from the CART method provided an interpretable tree. Moreover, the split-sample validation and the bootstrap internal validation of the CART showed stability of the results even with <5% of events in the sample. Our final tree was developed with 50 events and 7 variables, which is low compared to the usual recommendation for binary outcomes of 10 events per predictor.50,51 Hence, the results of combining more complex techniques, such as RF and genetic algorithms, with more simple approaches, such as CART, yield to not only accurate but also valid and stable results, as it is shown in our final decision tree. When results from the CART and the LR approaches were compared in terms of discrimination ability, we have observed that difference between both methods was not statistically significant. Whereas in terms of interpretability, results in a tree form are easier to interpret by clinical researchers than the formulae provided by regression approaches. Other studies have used decision trees based on recursive partitioning techniques to predict prognosis in patients with cancer.15,52 Radespiel-Tröger et al studied factors that predict recurrence of colon cancer after resection using tree-based methods.53 Moreover, Manilich et al developed an RF prognostic model using clinical and histopathologic factors to predict 5-year survival of patients with colorectal cancer.25 Their study was, however, limited to patients with a complete radical resection of tumor with negative radial or distal margins (R0). Investigators concluded that the main predictor was LNR, which was not a significant predictor in our study. However, other significant variables in that study, such as ASA score, tumor stage and treatment, were similar to those in our study. The present study was a large prospective cohort study, including 22 hospitals; therefore, there was variability in individuals and clinical practice, and the number of variables that were collected as potential predictors of 1-year mortality was high. The whole sample included a 4% of patients who underwent palliative resection, who were likely to be different on prognostic to those undergoing curative resection. However, results excluding these patients were very similar to the presented results in terms of prediction (correlation coefficient r=0.989). In contrast, from our point of view, one value of the present study is to reflect the type of patients that appear in hospitals in real life, preserving a certain natural heterogeneity. Furthermore, the predictive tree was developed following the current structured guidelines for the development of prediction models, as detailed in the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement.54 In combining statistical techniques, we gained validity, accuracy and interpretability. Although the predictors identified were similar to those in other studies, the simple result obtained using the CART method provided an interpretable tree with a good predictive ability. Moreover, the internal validation of the CART carried out by the bootstrap analysis showed stability of the results, even with a low rate of events in the sample. Therefore, this prediction rule may help clinicians to easily classify patients by prognosis and guide them in the follow-up process. Other authors have combined multiple machine learning techniques to develop a prediction algorithm with the best MSE.26 However, the result provided by that single algorithm is not as intuitive and easy to use as our classification tree and therefore, its use in clinical practice is more limited than the prediction rule we have proposed. Given that some of the variables included in the tree, such as pTNM, intraoperative complications or adjuvant chemotherapy, are only available in the postoperative period, main application of this tool would be in the planning of the clinical follow-up. Patients with a higher risk of 1-year mortality could benefit from a more intensive surveillance of their oncologic disease and potential comorbidities and complications, while those with a low risk could be scheduled to a less intensive follow-up. For more severe patients (ASA IV or residual tumor 2), our CART provides information about the chances of survival, though the sample size is small. Patients with residual tumor 2 seem to have low survival expectancy, though chemotherapy increases the likelihood of surviving. Among the milder cases (stage 0–II), the result of the surgery (residual tumor) marked an important difference having a high probability of surviving 1 year in those with residual tumor 0. In cases with stage III, it helps us to see how depending on the result of the surgery as opting for chemotherapy can significantly increase the survival chances. This may be able to guide the oncologist about the use of more or less aggressive treatments. This tree could also be used by health care workers to provide patients with more precise prognostic information than that based solely on the TNM staging system. Taking into account the high negative predictive value of the model, it could be especially useful to inform patients classified in the low-risk groups, for which a high 1-year survival probability could be reassuring.

Limitations

The limitations that are common in prospective multicenter studies with 1 year of follow-up were also present in our study. One of these limitations is related to losses to follow-up, which obviously could be a source of bias, and in our study 6% of the patients in the sample were not classified by the tree because of missing values in some of the predictive variables included in the model. Another limitation we must mention is that no inter- or intraobserver reliability studies were performed to assess the quality of the data collection process. Nevertheless, reviewers were trained at each site and were provided with a common manual for the data collection. Our study included a few tumor biomarkers, as carcinoembryonic antigen levels and CA 19.9. Prediction ability of the models could be increased in the future by including genetic and biologic prognostic markers. The influence of variables, such as differences in the health care provided to the patients or adherence to treatment, was not well documented and consequently they were not included in the model; all of these factors could have an influence on mortality. Interpretation of the stratified risk results in terms of screening must be cautious because low positive predictive values (15%–17%) could be due to the low mortality rate. This kind of data is likely to be clustered within hospitals and it was not taken into consideration in the analysis. A mixed-effects approach for RF and CART could probably improve the results. This kind of methodology has been proposed in the literature for continuous outcomes,55 although as far as our knowledge, it has not been developed yet for dichotomous responses. However, the clustering of patients into hospitals has been checked in a generalized mixed-effects model framework, showing a negligible effect in this particular data. Finally, an important limitation of CART is that including higher-order interactions without considering the main effects could lead to spurious relations between predictors, leading to an overestimation of the effect of some predictors. However, the use of combined split-sample and bootstrap validation techniques provided internally validated results to minimize this drawback. Further research is needed in order to validate this decision tree using other samples and populations. Such studies will provide guidance as the models’ applicability in clinical practice and/or what modifications might be needed in order to improve its validity.

Conclusion

This clinical prediction rule, which combined RF and CART, was a simple, interpretable, valid and accurate prediction model for 1-year mortality among colon cancer patients who survived for at least 30 days after surgery. This decision tool could be provided to clinicians in order to assist in clinical decision-making processes. Results of internal validation of the CART analysis by bootstrap resampling (N=2000). Abbreviation: CART, classification and regression trees. Descriptive statistics for explanatory variables stratified by sample (derivation vs validation) Notes: Frequency and percentage are shown for all categorical variables. Result provided by the two-sample Student’s t-test for continuous variables and the chi-square test for categorical variables, nonparametric methods were used when necessary. Mean and SD are shown for continuous variables. Aggravating diagnosis is defined as having one of the following diagnoses: occlusion, perforation, fistula, abscess, bleeding and diffuse location peritonitis. Abbreviations: ASA, American Society of Anesthesiologists; CA, carbohydrate antigen; CEA, carcinoembryonic antigen; CRC, colon or rectum cancer; ICU, intensive care unit; pTNM, histopathologic tumor–node–metastasis. Internal validation of the CART analysis by bootstrap resampling (N=2000) Note: Estimated median mortality risk, 95% CIs and stratification of risk are shown by node. Abbreviations: CART, classification and regression trees; CI, confidence interval.
Table S1

Descriptive statistics for explanatory variables stratified by sample (derivation vs validation)

VariablesMissingRandomly split samples
P-valuea
Derivation981 (50.4%)Validation964 (49.6%)
Before surgery
Gender
 Male612 (62.4)593 (61.5)0.693
 Female369 (37.6)371 (38.5)
Ageb268.9 (10.7)68.9 (10.9)0.883
Smoking habit20
 Smoker468 (48.2)443 (50.6)0.551
 Former smoker125 (12.9)114 (12.0)
 Nonsmoker378 (38.9)357 (37.4)
Charlson Comorbidity Indexb2.94 (1.37)2.84 (1.25)0.225
 ≤2511 (52.1)524 (54.4)0.316
 >2470 (47.9)440 (45.6)
Past history of CRC
 No901 (91.8)886 (91.9)0.614
 Yes80 (8.2)78 (8.1)
CEA10
 No218 (22.3)242 (25.2)0.134
 Yes758 (77.7)717 (74.8)
CA 19-929
 No568 (58.9)578 (60.8)0.392
 Yes397 (41.1)373 (39.2)
Hemoglobinb1614.7 (15.9)15.3 (18.2)0.413
Hematocritb5837.2 (14.1)37.3 (19.1)0.973
ASA51
 I, II and III916 (96.0)901 (95.9)0.855
 IV38 (4.0)

Hospitalization
Aggravating pathologyc
 No916 (93.4)879 (91.2)0.070
 Yes65 (6.6)85 (8.8)
Surgical approach2
 Open surgery417 (42.6)384 (39.9)0.245
 Laparoscopy231 (23.6)222 (23.1)
 Both297 (30.3)330 (34.3)
 Others35 (3.6)27 (2.8)
Surgical severity30.904
 Minor00
 Moderate7 (0.7)6 (0.6)
 Major743 (75.8)723 (75.2)
 Complex major230 (23.5)233 (24.2)
Laterality of the tumor
 Right-transverse side415 (42.3)399 (41.4)0.683
 Left566 (57.7)565 (58.6)
Adjacent organ invasion
 0880 (89.7)861 (89.3)0.955
 183 (8.5)84 (8.7)
 >118 (1.8)19 (2.0)
Lymph node ratiob790.09 (0.17)0.10 (0.19)0.071
 <0.17772 (82.4)755 (81.3)0.447
  [0.17–1.41)104 (11.1)107 (11.5)
  [1.41–0.69]44 (4.7)40 (4.3)
 >0.6917 (1.8)27 (2.9)
Intraoperative complications
 No900 (91.7)880 (91.3)0.718
 Yes81 (8.3)84 (8.7)
pTNM stage12
 0, I and II574 (58.7)508 (53.2)0.051
 III315 (32.2)346 (36.2)
 IV89 (9.1)101 (10.6)
Residual tumor75
 R0882 (93.7)859 (92.5)0.558
 R133 (3.5)39 (4.2)
 R226 (2.8)31 (3.3)
K-ras12
 Not done773 (79.3)749 (78.2)0.732
 No mutation172 (17.6)174 (18.6)
 Mutation30 (3.1)35 (3.7)
Complications after surgery
 No570 (58.1)582 (60.4)0.309
 Yes411 (41.9)382 (39.6)
Reintervention
 No905 (92.2)882 (91.5)0.540
 Yes76 (7.8)82 (8.5)
Admission at reanimation/ICU
 No764 (77.9)753 (78.1)0.902
 Yes217 (22.1)211 (21.9)
Complications at reanimation/ICU
 No884 (90.1)862 (89.4)0.614
 Yes97 (9.9)102 (10.6)

Up to 30 days after surgery
Cancer complications
 No979 (99.8)961 (99.7)0.685
 Yes2 (0.2)3 (0.3)
Medical complications
 No950 (96.8)941 (97.6)0.299
 Yes31 (3.2)23 (2.4)
Surgical complications
 No947 (96.5)913 (94.7)0.049
 Yes34 (3.5)51 (5.3)
Infectious complications
 No910 (92.8)910 (94.4)0.141
 Yes71 (7.2)54 (5.6)
Readmission
 No906 (92.4)888 (92.1)0.844
 Yes75 (7.7)76 (7.9)

One year of follow-up
Adjuvant chemotherapy14
 No545 (56.0)494 (51.6)0.056
 Yes429 (44.0)463 (48.4)
Readmission3
 No771 (78.6)740 (77.0)0.399
 Yes210 (21.4)221 (23.0)
Recurrence of the tumor3
 No869 (88.6)848 (88.2)0.814
 Yes112 (11.4)113 (11.8)

Notes: Frequency and percentage are shown for all categorical variables.

Result provided by the two-sample Student’s t-test for continuous variables and the chi-square test for categorical variables, nonparametric methods were used when necessary.

Mean and SD are shown for continuous variables.

Aggravating diagnosis is defined as having one of the following diagnoses: occlusion, perforation, fistula, abscess, bleeding and diffuse location peritonitis.

Abbreviations: ASA, American Society of Anesthesiologists; CA, carbohydrate antigen; CEA, carcinoembryonic antigen; CRC, colon or rectum cancer; ICU, intensive care unit; pTNM, histopathologic tumor–node–metastasis.

Table S2

Internal validation of the CART analysis by bootstrap resampling (N=2000)

NodeCART
Bootstrap resampling
Risk group
NObserved mortality riskEstimated median mortality risk95% CI
12430Low
22290.01310.0131(0.0040, 0.0300)Low
3200.05000.0625(0.0357, 0.1875)Medium
41660.00600.0065(0.0054, 0.0222)Low
5270.03700.0455(0.0270, 0.1364)Medium
6400.05000.0513(0.0208, 0.1395)Medium
7100.30000.3000(0.1000, 0.6364)Very high
8600.13330.1321(0.0545, 0.2261)High
9340.05880.0606(0.0244, 0.1539)Medium
10230.17390.1667(0.0455, 0.3333)High
1160.50000.5000(0.1667, 1.000)Very high
12330.21210.2143(0.0800, 0.3548)Very high
13170.35290.3529(0.1333, 0.6111)Very high
1480.50000.5000(0.1667, 0.8571)Very high

Note: Estimated median mortality risk, 95% CIs and stratification of risk are shown by node.

Abbreviations: CART, classification and regression trees; CI, confidence interval.

  48 in total

1.  A CART-based approach to discover emerging patterns in microarray data.

Authors:  Anne-Laure Boulesteix; Gerhard Tutz; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2003-12-12       Impact factor: 6.937

2.  A solution to the problem of separation in logistic regression.

Authors:  Georg Heinze; Michael Schemper
Journal:  Stat Med       Date:  2002-08-30       Impact factor: 2.373

3.  Major postoperative complications following elective resection for colorectal cancer decrease long-term survival but not the time to recurrence.

Authors:  M Odermatt; D Miskovic; K Flashman; J Khan; A Senapati; D O'Leary; M Thompson; A Parvaiz
Journal:  Colorectal Dis       Date:  2015-02       Impact factor: 3.788

Review 4.  Risk stratification tools for predicting morbidity and mortality in adult patients undergoing major surgery: qualitative systematic review.

Authors:  Suneetha Ramani Moonesinghe; Michael G Mythen; Priya Das; Kathryn M Rowan; Michael P W Grocott
Journal:  Anesthesiology       Date:  2013-10       Impact factor: 7.892

5.  Prognosis of patients with colorectal cancer is associated with lymph node ratio: a single-center analysis of 3,026 patients over a 25-year time period.

Authors:  Robert Rosenberg; Jan Friederichs; Tibor Schuster; Ralf Gertler; Matthias Maak; Karen Becker; Anne Grebner; Kurt Ulm; Heinz Höfler; Hjalmar Nekarda; Jörg-Rüdiger Siewert
Journal:  Ann Surg       Date:  2008-12       Impact factor: 12.969

6.  Predicting the probability of mortality of gastric cancer patients using decision tree.

Authors:  F Mohammadzadeh; H Noorkojuri; M A Pourhoseingholi; S Saadat; A R Baghestani
Journal:  Ir J Med Sci       Date:  2014-03-14       Impact factor: 1.568

7.  Age-adjusted Charlson comorbidity index scores as predictor of survival in colorectal cancer patients who underwent surgical resection and chemoradiation.

Authors:  Chin-Chia Wu; Ta-Wen Hsu; Chun-Ming Chang; Chia-Hui Yu; Ching-Chih Lee
Journal:  Medicine (Baltimore)       Date:  2015-01       Impact factor: 1.889

8.  Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.

Authors:  Tjeerd van der Ploeg; Peter C Austin; Ewout W Steyerberg
Journal:  BMC Med Res Methodol       Date:  2014-12-22       Impact factor: 4.615

9.  Healthcare provider perceptions of clinical prediction rules.

Authors:  Safiya Richardson; Sundas Khan; Lauren McCullagh; Myriam Kline; Devin Mann; Thomas McGinn
Journal:  BMJ Open       Date:  2015-09-02       Impact factor: 2.692

10.  A prognostic classifier for patients with colorectal cancer liver metastasis, based on AURKA, PTGS2 and MMP9.

Authors:  Jeroen A C M Goos; Veerle M H Coupé; Mark A van de Wiel; Begoña Diosdado; Pien M Delis-Van Diemen; Annemieke C Hiemstra; Erienne M V de Cuba; Jeroen A M Beliën; C Willemien Menke-van der Houven van Oordt; Albert A Geldof; Gerrit A Meijer; Otto S Hoekstra; Remond J A Fijneman
Journal:  Oncotarget       Date:  2016-01-12
View more
  2 in total

1.  Subsequent Development of Epithelial Ovarian Cancer After Ovarian Surgery for Benign Ovarian Tumor: A Population-Based Cohort Study.

Authors:  Chen-Yu Huang; Wen-Hsun Chang; Hsin-Yi Huang; Chao-Yu Guo; Yiing-Jenq Chou; Nicole Huang; Wen-Ling Lee; Peng-Hui Wang
Journal:  Clin Epidemiol       Date:  2020-06-18       Impact factor: 4.790

Review 2.  Machine Learning-Based Short-Term Mortality Prediction Models for Patients With Cancer Using Electronic Health Record Data: Systematic Review and Critical Appraisal.

Authors:  Sheng-Chieh Lu; Cai Xu; Chandler H Nguyen; Yimin Geng; André Pfob; Chris Sidey-Gibbons
Journal:  JMIR Med Inform       Date:  2022-03-14
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.