MOTIVATION: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. RESULTS: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. AVAILABILITY: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.
MOTIVATION: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. RESULTS: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. AVAILABILITY: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.
Authors: Derrek P Hibar; Jason L Stein; Omid Kohannim; Neda Jahanshad; Andrew J Saykin; Li Shen; Sungeun Kim; Nathan Pankratz; Tatiana Foroud; Matthew J Huentelman; Steven G Potkin; Clifford R Jack; Michael W Weiner; Arthur W Toga; Paul M Thompson Journal: Neuroimage Date: 2011-04-08 Impact factor: 6.556
Authors: Bambi R Brewer; Sujata Pradhan; George Carvell; Anthony Delitto Journal: IEEE Trans Neural Syst Rehabil Eng Date: 2009-10-30 Impact factor: 3.802
Authors: Vessela N Kristensen; Ole Christian Lingjærde; Hege G Russnes; Hans Kristian M Vollan; Arnoldo Frigessi; Anne-Lise Børresen-Dale Journal: Nat Rev Cancer Date: 2014-05 Impact factor: 60.716
Authors: Malin Lando; Marit Holden; Linn C Bergersen; Debbie H Svendsrud; Trond Stokke; Kolbein Sundfør; Ingrid K Glad; Gunnar B Kristensen; Heidi Lyng Journal: PLoS Genet Date: 2009-11-13 Impact factor: 5.917