Literature DB >> 31797615

Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data.

Lisa Neums1,2, Richard Meier, Devin C Koestler, Jeffrey A Thompson.   

Abstract

The accurate prediction of a cancer patient's risk of progression or death can guide clinicians in the selection of treatment and help patients in planning personal affairs. Predictive models based on patient-level data represent a tool for determining risk. Ideally, predictive models will use multiple sources of data (e.g., clinical, demographic, molecular, etc.). However, there are many challenges associated with data integration, such as overfitting and redundant features. In this paper we aim to address those challenges through the development of a novel feature selection and feature reduction framework that can handle correlated data. Our method begins by computing a survival distance score for gene expression, which in combination with a score for clinical independence, results in the selection of highly predictive genes that are non-redundant with clinical features. The survival distance score is a measure of variation of gene expression over time, weighted by the variance of the gene expression over all patients. Selected genes, in combination with clinical data, are used to build a predictive model for survival. We benchmark our approach against commonly used methods, namely lasso- as well as ridge-penalized Cox proportional hazards models, using three publicly available cancer data sets: kidney cancer (521 samples), lung cancer (454 samples) and bladder cancer (335 samples). Across all data sets, our approach built on the training set outperformed the clinical data alone in the test set in terms of predictive power with a c.Index of 0.773 vs 0.755 for kidney cancer, 0.695 vs 0.664 for lung cancer and 0.648 vs 0.636 for bladder cancer. Further, we were able to show increased predictive performance of our method compared to lasso-penalized models fit to both gene expression and clinical data, which had a c.Index of 0.767, 0.677, and 0.645, as well as increased or comparable predictive power compared to ridge models, which had a c.Index of 0.773, 0.668 and 0.650 for the kidney, lung, and bladder cancer data sets, respectively. Therefore, our score for clinical independence improves prognostic performance as compared to modeling approaches that do not consider combining non-redundant data. Future work will concentrate on optimizing the survival distance score in order to achieve improved results for all types of cancer.

Entities:  

Mesh:

Year:  2020        PMID: 31797615      PMCID: PMC6941850     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  17 in total

1.  Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation.

Authors:  Michael J Pencina; Ralph B D'Agostino
Journal:  Stat Med       Date:  2004-07-15       Impact factor: 2.373

2.  Predicting the risk of high-grade bladder cancer using noninvasive data.

Authors:  Nandakishore Shapur; Dov Pode; Ran Katz; Amos Shapiro; Vladimir Yutkin; Galina Pizov; Liat Appelbaum; Kevin C Zorn; Mordechai Duvdevani; Ezekiel H Landau; Ofer N Gofrit
Journal:  Urol Int       Date:  2011-08-18       Impact factor: 2.089

3.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

4.  Relationship between cancer patients' predictions of prognosis and their treatment preferences.

Authors:  J C Weeks; E F Cook; S J O'Day; L M Peterson; N Wenger; D Reding; F E Harrell; P Kussin; N V Dawson; A F Connors; J Lynn; R S Phillips
Journal:  JAMA       Date:  1998-06-03       Impact factor: 56.272

5.  Toward a Shared Vision for Cancer Genomic Data.

Authors:  Robert L Grossman; Allison P Heath; Vincent Ferretti; Harold E Varmus; Douglas R Lowy; Warren A Kibbe; Louis M Staudt
Journal:  N Engl J Med       Date:  2016-09-22       Impact factor: 91.245

6.  Comprehensive molecular characterization of clear cell renal cell carcinoma.

Authors: 
Journal:  Nature       Date:  2013-06-23       Impact factor: 49.962

7.  A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?

Authors:  B Haibe-Kains; C Desmedt; C Sotiriou; G Bontempi
Journal:  Bioinformatics       Date:  2008-07-17       Impact factor: 6.937

8.  Comprehensive molecular characterization of urothelial bladder carcinoma.

Authors: 
Journal:  Nature       Date:  2014-01-29       Impact factor: 49.962

9.  Comprehensive molecular profiling of lung adenocarcinoma.

Authors: 
Journal:  Nature       Date:  2014-07-09       Impact factor: 49.962

10.  Methylation-to-Expression Feature Models of Breast Cancer Accurately Predict Overall Survival, Distant-Recurrence Free Survival, and Pathologic Complete Response in Multiple Cohorts.

Authors:  Jeffrey A Thompson; Brock C Christensen; Carmen J Marsit
Journal:  Sci Rep       Date:  2018-03-26       Impact factor: 4.379

View more
  3 in total

1.  Prediction and interpretation of cancer survival using graph convolution neural networks.

Authors:  Ricardo Ramirez; Yu-Chiao Chiu; SongYao Zhang; Joshua Ramirez; Yidong Chen; Yufei Huang; Yu-Fang Jin
Journal:  Methods       Date:  2021-01-21       Impact factor: 4.647

2.  Similarity-driven multi-view embeddings from high-dimensional biomedical data.

Authors:  Brian B Avants; Nicholas J Tustison; James R Stone
Journal:  Nat Comput Sci       Date:  2021-02-22

3.  Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening.

Authors:  Florent Chatelain; Laurent Guyon; Rémy Jardillier; Dzenis Koca
Journal:  BMC Cancer       Date:  2022-10-05       Impact factor: 4.638

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.