Literature DB >> 29034756

A prediction model for early death in non-small cell lung cancer patients following curative-intent chemoradiotherapy.

Arthur Jochems¹, Issam El-Naqa², Marc Kessler², Charles S Mayo², Shruti Jolly², Martha Matuszak², Corinne Faivre-Finn³, Gareth Price³, Lois Holloway⁴, Shalini Vinod⁴, Matthew Field⁴, Mohamed Samir Barakat⁴, David Thwaites⁵, Dirk de Ruysscher¹, Andre Dekker¹, Philippe Lambin¹.

Abstract

BACKGROUND: Early death after a treatment can be seen as a therapeutic failure. Accurate prediction of patients at risk for early mortality is crucial to avoid unnecessary harm and reducing costs. The goal of our work is two-fold: first, to evaluate the performance of a previously published model for early death in our cohorts. Second, to develop a prognostic model for early death prediction following radiotherapy.
MATERIAL AND METHODS: Patients with NSCLC treated with chemoradiotherapy or radiotherapy alone were included in this study. Four different cohorts from different countries were available for this work (N = 1540). The previous model used age, gender, performance status, tumor stage, income deprivation, no previous treatment given (yes/no) and body mass index to make predictions. A random forest model was developed by learning on the Maastro cohort (N = 698). The new model used performance status, age, gender, T and N stage, total tumor volume (cc), total tumor dose (Gy) and chemotherapy timing (none, sequential, concurrent) to make predictions. Death within 4 months of receiving the first radiotherapy fraction was used as the outcome.
RESULTS: Early death rates ranged from 6 to 11% within the four cohorts. The previous model performed with AUC values ranging from 0.54 to 0.64 on the validation cohorts. Our newly developed model had improved AUC values ranging from 0.62 to 0.71 on the validation cohorts.
CONCLUSIONS: Using advanced machine learning methods and informative variables, prognostic models for early mortality can be developed. Development of accurate prognostic tools for early mortality is important to inform patients about treatment options and optimize care.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 29034756 PMCID： PMC6108087 DOI： 10.1080/0284186X.2017.1385842

Source DB: PubMed Journal: Acta Oncol ISSN： 0284-186X Impact factor: 4.089

Introduction

Primary lung cancer is the leading cause of cancer deaths worldwide [1]. Non-small cell lung cancer (NSCLC) accounts for 80–85% of all lung cancers. For non-resectable NSCLC, chemoradiotherapy (CRT) is the standard of care for locally advanced disease. It has been reported that 8% of NSCLC patients die within the first 30 days of systemic anti-cancer treatment (SACT) initiation [2]. In patients who die within a few weeks of radiotherapy initiation, the inconvenience of attending hospital appointments and the side effects may outweigh the benefit of the treatment. Prognostic models that can identify patients at risk for early mortality are therefore vital to reduce patient burden and treatment costs. Several prognostic factors have been identified, such as performance status [3], weight loss [3], presence of comorbidity [4], chemotherapy use in combination with radiotheraphy (RT) [3], radiation dose [3], tumor volume [5], genetics [6,7] and image features, the socalled radiomics approach [8]. For other factors, such as age and gender, results are contradictory and therefore inconclusive [9,10]. The current gold standard for survival prediction NSCLC patients is the TNM staging system. This system was originally developed to determine patients eligibility for surgery, and not intended to be used in the context of CRT treatment [11]. Prognostic models of survival for nonresectable NSCLC patients receiving CRT are available [5,12]. However, these models focus on 2-year survival, rather than early death. Further shortcomings of these models are that they are trained on relatively small patients datasets with limited heterogeneity and that the added value of more advanced modeling strategies such as random forest is not explored. Therefore, the development of a model for the prediction of early death in NSCLC patients following curativeintent CRT is highly relevant. In this study, we develop a new model for prediction of early death in locally advanced NSCLC patients. The model is subsequently validated in four independent patient cohorts. In addition, we compared the performance of our model to a reduced version of a previously developed early-death model in the context of SACT by Wallington et al. [2]. We hypothesized that the reduced version of the model published by Wallington et al. [2] will perform above the chance level in predicting early death in our cohorts. Furthermore, we postulated that the new model developed and validated using four international cohorts will show higher discriminative performance than the model developed by Wallington et al. [2].

Material and methods

Data

Clinical data from 1540 lung cancer patients, treated with curative-intent CRT or RT alone were collected and stored in four different medical institutes [698 patients at Maastro Clinic, 147 at University of Michigan (USA), 196 at The Christie Hospital in Manchester (UK) and 499 in Liverpool hospital (Australia)]. Patients were treated for their primary lung tumor and were not diagnosed with another tumor in the 5 years prior to treatment. The patients’ characteristics are shown in Supplementary Table S1. Four-month survival, taken from the first day of radiotherapy, was used as the outcome of this study. Patients with missing data in the validation were excluded. Patients that did not fulfill the planned treatment were included in the study. A consort diagram for this study can be observed in Supplementary Figure S1.

Maastro cohort

The Maastro data were collected under an institutional review board (IRB) approved and registered clinical trial (NCT01949259). Patients were treated between 2004 and 2014. Three different CRT or RT protocol types were administered to the patients in this study. Two hundred twentyseven NSCLC patients were treated according to a new protocol for sequential chemo-radiation, which was introduced in August 2005. The individualized radiation dose ranged from 54.0 to 79.2 Gy, delivered in fractions of 1.8 Gy, twice daily, until the mean lung dose or maximum dose to the spinal cord were reached. Three hundred fifty-seven NSCLC patients received concurrent CRT. A radiation dose of 45 Gy was delivered in fractions of 1.5 Gy, twice daily. This treatment was followed by an individualized dose ranging from 8 to 24 Gy, delivered in fractions of 2 Gy, once daily, again until reaching the normal tissue dose constraints. Fiftythree NSCLC patients received accelerated high dose conformal radiotherapy: 66 Gy in 24 fractions (2.75 Gy per fraction). Some of these patients received chemotherapy. The remaining 61 patients received a treatment regime tailored specifically to the patient. For these patients, total doses ranged from 52.25 to 129.6 Gy and dose per fraction ranged from 2.75 to 5.4 Gy.

Michigan cohort

The Michigan data were collected from prospective protocols under IRB approval (UMCC 2006.040 and UMCC 2007.123). All patients were treated with curative intent in the period between May 2003 and July 2014. One hundred and two received RT to standard doses (60–66 Gy) using once daily fractions of 2 Gy. Seventy eight of these patients received concurrent chemotherapy. Forty-five patients received RT by intensifying doses to persistent PET-avid target volumes during treatment with 2.1–2.85 Gy per fraction up to a total dose of 85.5 Gy in 30 fractions. Forty three out of 45 patients received concurrent chemotherapy.

Manchester cohort

The Manchester cohort consisted of 196 anonymized lung cancer patients with NSCLC, Stage I–IIIB. Studies was conducted under IRB approval. All patients were treated with curative intent in the period between December 2008 and May 2013. Two different protocols were used for treating patients in this dataset. One hundred twenty-one NSCLC patients received 55 Gy in 20 daily fractions (2.75 Gy per fraction), either without chemotherapy or following induction chemotherapy. Seventy-three NSCLC patients received 60–66 Gy in 30–33 daily fractions (2 Gy per fraction) with concurrent radiotherapy. The remaining two patients received a treatment regime tailored specifically to the patient.

Liverpool cohort

The Liverpool cohort consisted of 499 anonymized lung cancer patients with Stage I–IIIB NSCLC. The study was conducted under IRB approval. All patients were treated with curative intent. Two different protocols were used for treating patients in this dataset. Fifty-three NSCLC patients received 50–55 Gy in 20 daily fractions (2.5–2.75 Gy per fraction) without chemotherapy. Eighty-eight NSCLC patients received 50 55 Gy in 20 or 25 daily fractions (2–2.75 Gy per fraction) without chemotherapy. Forty NSCLC patients received 50–55 Gy in 20 or 25 daily fractions (2–2.75 Gy per fraction) with sequential or concurrent chemotherapy. Three hundred twenty-seven NSCLC patients received 60–66 Gy in 30–33 daily fractions (2 Gy per fraction) either as sole treatment or with concurrent or sequential chemotherapy. Of the remaining 46 patients, 24 received high-fractional dose SABR treatment while 22 patients received a treatment regime tailored specifically to the patient, ranging from 50.4 to 70 Gy with between 15 and 35 fractions.

Variable selection

For variable selection, we have used the largest set of variables that all cohorts had in common and used these to develop a random forest model. The model used performance status, age, gender, T and N stage, total tumor volume (cc), total tumor dose (Gy) and chemo timing (none, sequential, concurrent) to make predictions.

Model comparison

Our newly developed model was compared to the model developed by Wallington et al. [2]. The model by Wallington et al. [2] used age, gender, performance status, tumor stage, income deprivation (ID1, ID2, ID3, ID4, and ID5; based on the English Indices of Deprivation 2010 [13]), no previous treatment given (yes/no) and body mass index (BMI) to make predictions. As income deprivation and BMI were missing in our patient cohorts, we have imputed these with the median to make comparison possible.

Analysis

Random forest is an ensemble machine learning technique that combines several decision trees and has been demonstrated to yield an excellent classification performance [14]. Random forest models used are based on the randomForest package in R (version 3.3.1) [14]. Receiver operating characteristic (ROC) curves were made using the pROC package in R [15]. Comparison of ROC curves was done using a method developed by DeLong et al. (1988) [16]. Kaplan–Meier curves were made using the survival package in R [17]. Risk groups were based on partitioning patients in two groups according to survival prediction by the random forest model. Kaplan–Meier curves based on the TNM staging system are included for comparison. Missing values in the training cohort were imputed with a non-parametric missing value imputation method using random forest [18].

Results

The early death rates were 11% for the Maastro clinic cohort, 6% for the Liverpool cohort, 10% for the Manchester cohort and 7% for the Michigan cohort. ROC curves for the Wallington model with imputed BMI and income deprivation variables were shown in Figure 1. AUCs range from 0.54 to 0.64. Combining the results for the Manchester, Michigan and Liverpool cohorts gave an overall AUC of 0.55 (95% CI: 0.46–0.63). As the confidence interval overlapped with 0.5, performance of the model did not exceed the chance level.

Figure 1.

ROC curves resulting from validation of the model by Wallington et al. [2] on our patient cohorts.

ROC curves for validating the random forest model were shown in Figure 2. AUCs ranged from 0.62 to 0.71. Combining the results for the Manchester, Michigan and Liverpool cohorts gave an overall AUC of 0.66 (95% CI: 0.54–0.77). As the confidence interval did not overlap with 0.5, performance of the model exceeded the chance level. The discriminative performance of the random forest model, compared to the Wallington model with imputed BMI and income deprivation variables, was significantly higher on the Michigan, Manchester and Liverpool validation cohorts combined (p<.05).

Figure 2.

ROC curves resulting from validation of the random forest model trained on the Maastro training cohort.

Splitting the validation cohort into two subgroups, resulted in the identification of a high- and low chance of survival group (Figure 3). Median survival in the high survival chance group was 620 days (95% CI: 521–788 days), versus 396 (95% CI: 353–451 days) in the low survival chance group. The 4-month survival rate was 95% (95% CI: 92–99%) for the high survival chance group and 90% (95% CI: 86–94%) for the low survival chance group. A log-rank test on these curves indicated that they were significantly different (p<.001). Figure 4 showed the survival curves for splitting patients based on TNM stage. Median survival was 964 days (95% CI: 780–1708 days) in the Stage 2A group, 638 days (95% CI: 477–856 days) in the Stage 2B group, 583 days (95% CI: 493–737 days) in the Stage 3A group and 510 days (95% CI: 428–574 days) group. The 4-month survival rates were 97% (95% CI: 92–100%) for Stage 2A, 94% (95% CI: 89–99%) for Stage 2B, 89% (95% CI: 84–94%) for Stage 3A and 91% (95% CI: 89–95%) for Stage 3B. The Stage 2A group differs significantly from the other groups (p<.01), however, the other groups were not significantly different from one another (p = .78).

Figure 3.

Survival curves for low and high-risk patient groups identified by the random forest model. The difference between these groups is significant (p<.001).

Figure 4.

Survival curves for patient groups identified TNM staging. The difference between these groups is significant (p<.01).

Discussion

In this study, we developed a random forest model for early death prediction in non-resectable NSCLC patients. We have compared the model performance to the existing model from Wallington et al. [2] for early death prediction in cancer patients following SACT. Our model showed improved AUCs over the Wallington model and can be used to identify high and low risk groups. The model is available at www.predictcancer.org. There is a paucity of literature on post-RT early mortality relative to the surgical and systemic treatment literature on this topic [2,19]. Previous work on survival prediction for inoperable NSCLC have been developed [12,20], however, these models focus on longer term survival. Our work focuses on identifying patients at high risk of early death. The reduced version model developed by Wallington et al. [2] performs with AUCs >0.5 (i.e., chance) on the validation sets, although the confidence intervals include 0.5 (Figure 1). The AUCs of our random forest model are higher than those of the model of Wallington et al. [2]. Several explanations are plausible for this observation. Firstly, the Wallington model is developed for patients receiving SACT therefore more likely to have advanced disease, whereas our model is based on patients with earlier stage disease receiving RT or CRT. Secondly, different outcome measures are used. In our work, we have used 4-month mortality following the first day of radiotherapy as outcome measure. We have used this outcome measure as it is reasonably close to treatment initiation to make relevance of treatment questionable, while at the same time allowing for sufficient events to occur in our cohorts to make a strong model. Wallington et al. [2] used 30-day mortality following the first day of SACT. Thirdly, the Wallington model uses a logistic regression modeling approach to make predictions. Fourthly, Wallington et al. [2] used a number of variables in their model that were not available in our dataset. Specifically, income deprivation level and BMI (underweight, overweight, obese). The odds ratios for income dependency were relatively small (1.198 for the most income deprived) and did not correlate significantly with the outcome. The odds ratio for BMI underweight was larger (1.752), although it also did not significantly correlate with the outcome (p ¼ .28) [2]. Nonetheless, it is expected that the Wallington model would perform better if these variables were available in our cohorts. In this work, we have used a random forest model. The choice for random forest as modeling strategy is based on experimental work in which it has been shown that random forests are superior to conventional regression models [21]. The model proposed in this study uses readily available clinical variables to make predictions while performing with high discriminative performance for this outcome. Dehing-Oberije et al. [5] have proposed a survival model that outperforms the TNM staging model by including tumor volume and number of positive lymph node stations to make predictions. In another study, it is shown that comprehensive quantification of the tumor using high numbers of imaging features may add value over clinical parameters in predicting outcome for NSCLC patients [8]. Therefore, inclusion of more tumor-specific biomarkers into the model may further enhance model performance. One of the weaknesses of our study is that we use as our primary outcome 4-month survival from the first day of RT administration. It may be more beneficial to take as an outcome measure 4-month survival from the first day of any treatment. Unfortunately, these dates were not always available in the four available cohorts. Another weakness of this work is that comorbidities were not included as a prognostic variable in the model, as this information was not available in our datasets. There is a significant impact of comorbidities on survival in lung cancer patients [4]. Another weakness of our work is that a rather large portion of data are missing (for example, 46% missing T-stage, in the Maastro cohort). This is a problem that arises when using data originating from routine clinical practice. As more complete datasets give us a better representation of the patients than imputed values, a more complete dataset is expected to produce better results. Another weakness of our study is that our training cohorts are very heterogeneous in terms of therapy and cancer stages, making it more difficult to capture the diversity of these patients in the model. The model presented in this study has several weaknesses (as mentioned above). Furthermore, the model leaves room for higher discriminative performance (AUCs range between 0.62 and 0.79). Therefore, we believe this model is not a substitute for clinical judgement by a physician. However, it can be used as a supplement thereof. In future work, we intend to include variables that could potentially have higher prognostic value, such as radiomics and genomics features to develop more sophisticated models [8,22,23]. Furthermore, as high volumes of patient data are required to capture the heterogeneity of the disease, we intend use the distributed learning approach to have access to further datasets [24-26]. The ultimate aim is to include these models in customized patient decision aids and use them for patient stratification in clinical trials [27,28]. We have developed a prognostic model predicting early mortality in NSCLC patients receiving CRT. The model performs above the chance level in three different validation cohorts (AUCs between 0.62 and 0.79). This model outperforms the previous model.

24 in total

Review 1. American Society of Clinical Oncology treatment of unresectable non-small-cell lung cancer guideline: update 2003.

Authors: David G Pfister; David H Johnson; Christopher G Azzoli; William Sause; Thomas J Smith; Sherman Baker; Jemi Olak; Diane Stover; John R Strawn; Andrew T Turrisi; Mark R Somerfield
Journal: J Clin Oncol Date: 2003-12-22 Impact factor: 44.544

2. MissForest--non-parametric missing value imputation for mixed-type data.

Authors: Daniel J Stekhoven; Peter Bühlmann
Journal: Bioinformatics Date: 2011-10-28 Impact factor: 6.937

3. Modern clinical research: How rapid learning health care and cohort multiple randomised clinical trials complement traditional evidence based medicine.

Authors: Philippe Lambin; Jaap Zindler; Ben Vanneste; Lien van de Voorde; Maria Jacobs; Daniëlle Eekers; Jurgen Peerlings; Bart Reymen; Ruben T H M Larue; Timo M Deist; Evelyn E C de Jong; Aniek J G Even; Adriana J Berlanga; Erik Roelofs; Qing Cheng; Sara Carvalho; Ralph T H Leijenaar; Catharina M L Zegers; Evert van Limbergen; Maaike Berbee; Wouter van Elmpt; Cary Oberije; Ruud Houben; Andre Dekker; Liesbeth Boersma; Frank Verhaegen; Geert Bosmans; Frank Hoebers; Kim Smits; Sean Walsh
Journal: Acta Oncol Date: 2015-09-23 Impact factor: 4.089

4. High-dose proton beam therapy for stage I non-small cell lung cancer: Clinical outcomes and prognostic factors.

Authors: Chiyoko Makita; Tatsuya Nakamura; Akinori Takada; Kanako Takayama; Motohisa Suzuki; Yusuke Azami; Takahiro Kato; Iwao Tsukiyama; Masato Hareyama; Yasuhiro Kikuchi; Takashi Daimon; Masaharu Hata; Tomio Inoue; Nobukazu Fuwa
Journal: Acta Oncol Date: 2014-10-07 Impact factor: 4.089

5. Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept.

Authors: Arthur Jochems; Timo M Deist; Johan van Soest; Michael Eble; Paul Bulens; Philippe Coucke; Wim Dries; Philippe Lambin; Andre Dekker
Journal: Radiother Oncol Date: 2016-10-28 Impact factor: 6.280

6. Comorbidity and outcomes of concurrent chemo- and radiotherapy in limited disease small cell lung cancer.

Authors: Tarje Onsøien Halvorsen; Stein Sundstrøm; Øystein Fløtten; Odd T Brustugun; Paal Brunsvig; Ulf Aasebø; Roy M Bremnes; Stein Kaasa; Bjørn H Grønberg
Journal: Acta Oncol Date: 2016-08-23 Impact factor: 4.089

7. External validation of a prognostic CT-based radiomic signature in oropharyngeal squamous cell carcinoma.

Authors: Ralph T H Leijenaar; Sara Carvalho; Frank J P Hoebers; Hugo J W L Aerts; Wouter J C van Elmpt; Shao Hui Huang; Biu Chan; John N Waldron; Brian O'sullivan; Philippe Lambin
Journal: Acta Oncol Date: 2015-08-12 Impact factor: 4.089

8. 30-day mortality after systemic anticancer treatment for breast and lung cancer in England: a population-based, observational study.

Authors: Michael Wallington; Emma B Saxon; Martine Bomb; Rebecca Smittenaar; Matthew Wickenden; Sean McPhail; Jem Rashbass; David Chao; John Dewar; Denis Talbot; Michael Peake; Timothy Perren; Charles Wilson; David Dodwell
Journal: Lancet Oncol Date: 2016-08-30 Impact factor: 41.316

9. Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.

Authors: Arthur Jochems; Timo M Deist; Issam El Naqa; Marc Kessler; Chuck Mayo; Jackson Reeves; Shruti Jolly; Martha Matuszak; Randall Ten Haken; Johan van Soest; Cary Oberije; Corinne Faivre-Finn; Gareth Price; Dirk de Ruysscher; Philippe Lambin; Andre Dekker
Journal: Int J Radiat Oncol Biol Phys Date: 2017-04-24 Impact factor: 7.038

10. Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT.

Authors: Timo M Deist; A Jochems; Johan van Soest; Georgi Nalbantov; Cary Oberije; Seán Walsh; Michael Eble; Paul Bulens; Philippe Coucke; Wim Dries; Andre Dekker; Philippe Lambin
Journal: Clin Transl Radiat Oncol Date: 2017-05-19

9 in total

1. The incidence, risk factors and predictive nomograms for early death among patients with stage IV gastric cancer: a population-based study.

Authors: Yi Yang; Zi-Jiao Chen; Su Yan
Journal: J Gastrointest Oncol Date: 2020-10

Review 2. Role of smartphone devices in precision oncology.

Authors: Ruby Srivastava
Journal: J Cancer Res Clin Oncol Date: 2022-10-17 Impact factor: 4.322

3. Predicting 2-year survival in stage I-III non-small cell lung cancer: the development and validation of a scoring system from an Australian cohort.

Authors: Natalie Si-Yi Lee; Jesmin Shafiq; Matthew Field; Caroline Fiddler; Suganthy Varadarajan; Senthilkumar Gandhidasan; Eric Hau; Shalini Kavita Vinod
Journal: Radiat Oncol Date: 2022-04-13 Impact factor: 3.481

4. A Novel Predictive Tool for Determining the Risk of Early Death From Stage IV Endometrial Carcinoma: A Large Cohort Study.

Authors: Zixuan Song; Yizi Wang; Yangzi Zhou; Dandan Zhang
Journal: Front Oncol Date: 2020-12-14 Impact factor: 6.244

5. A predictive model for early death in elderly patients with gastric cancer: A population-based study.

Authors: Wenwei Yang; Yuting Fang; Yaru Niu; Yongkun Sun
Journal: Front Oncol Date: 2022-08-22 Impact factor: 5.738

6. A Predictive Nomogram for Early Mortality in Stage IV Gastric Cancer.

Authors: Yuqian Feng; Kaibo Guo; Huimin Jin; Yuying Xiang; Yiting Zhang; Shanming Ruan
Journal: Med Sci Monit Date: 2020-08-19

7. Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data.

Authors: María Gabriela Valdés; Iván Galván-Femenía; Vicent Ribas Ripoll; Xavier Duran; Jun Yokota; Ricard Gavaldà; Xavier Rafael-Palou; Rafael de Cid
Journal: BMC Syst Biol Date: 2018-11-20

8. Prognostic Prediction Models Based on Clinicopathological Indices in Patients With Resectable Lung Cancer.

Authors: Yanyan Liu; Xinying Li; Zhucheng Yin; Ping Lu; Yifei Ma; Jindan Kai; Bo Luo; Shaozhong Wei; Xinjun Liang
Journal: Front Oncol Date: 2020-10-29 Impact factor: 6.244

9. Leveraging hybrid biomarkers in clinical endpoint prediction.

Authors: Maliazurina Saad; Ik Hyun Lee
Journal: BMC Med Inform Decis Mak Date: 2020-10-07 Impact factor: 2.796

9 in total