Julia Gilhodes1, Christophe Zemmour2, Soufiane Ajana1, Alejandra Martinez3, Jean-Pierre Delord4, Eve Leconte5, Jean-Marie Boher2, Thomas Filleron6. 1. Department of Biostatistics, Institut Claudius Regaud, IUCT-O, Toulouse, France. 2. Department of Clinical Research and Investigation, Biostatistics and Methodology Unit, Institut Paoli-Calmettes, Aix Marseille University, INSERM, IRD, SESSTIM, Marseille, France. 3. Department of Surgery, Institut Claudius Regaud, IUCT-O, Toulouse, France. 4. Department of Medical Oncology, Institut Claudius Regaud, IUCT-O, Toulouse, France. 5. TSE-R, Université de Toulouse, France. 6. Department of Biostatistics, Institut Claudius Regaud, IUCT-O, Toulouse, France. Electronic address: filleron.thomas@iuct-oncopole.fr.
Abstract
BACKGROUND: In the era of personalized medicine, it's primordial to identify gene signatures for each event type in the context of competing risks in order to improve risk stratification and treatment strategy. Until recently, little attention was paid to the performance of high-dimensional selection in deriving molecular signatures in this context. In this paper, we investigate the performance of two selection methods developed in the framework of high-dimensional data and competing risks: Random survival forest and a boosting approach for fitting proportional subdistribution hazards models. METHODS: Using data from bladder cancer patients (GSE5479) and simulated datasets, stability and prognosis performance of the two methods were evaluated using a resampling strategy. For each sample, the data set was split into 100 training and validation sets. Molecular signatures were developed in the training sets by the two selection methods and then applied on the corresponding validation sets. RESULTS: Random survival forest and boosting approach have comparable performance for the prediction of survival data, with few selected genes in common. Nevertheless, many different sets of genes are identified by the resampling approach, with a very small frequency of genes occurrence among the signatures. Also, the smaller the training sample size, the lower is the stability of the signatures. CONCLUSION: Random survival forest and boosting approach give good predictive performance but gene signatures are very unstable. Further works are needed to propose adequate strategies for the analysis of high-dimensional data in the context of competing risks.
BACKGROUND: In the era of personalized medicine, it's primordial to identify gene signatures for each event type in the context of competing risks in order to improve risk stratification and treatment strategy. Until recently, little attention was paid to the performance of high-dimensional selection in deriving molecular signatures in this context. In this paper, we investigate the performance of two selection methods developed in the framework of high-dimensional data and competing risks: Random survival forest and a boosting approach for fitting proportional subdistribution hazards models. METHODS: Using data from bladder cancerpatients (GSE5479) and simulated datasets, stability and prognosis performance of the two methods were evaluated using a resampling strategy. For each sample, the data set was split into 100 training and validation sets. Molecular signatures were developed in the training sets by the two selection methods and then applied on the corresponding validation sets. RESULTS: Random survival forest and boosting approach have comparable performance for the prediction of survival data, with few selected genes in common. Nevertheless, many different sets of genes are identified by the resampling approach, with a very small frequency of genes occurrence among the signatures. Also, the smaller the training sample size, the lower is the stability of the signatures. CONCLUSION: Random survival forest and boosting approach give good predictive performance but gene signatures are very unstable. Further works are needed to propose adequate strategies for the analysis of high-dimensional data in the context of competing risks.
Authors: Julia Gilhodes; Florence Dalenc; Jocelyn Gal; Christophe Zemmour; Eve Leconte; Jean-Marie Boher; Thomas Filleron Journal: Comput Math Methods Med Date: 2020-07-01 Impact factor: 2.238