Mingguang Shi1, Bing Zhang. 1. Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
Abstract
MOTIVATION: Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-supervised learning technique, low density separation (LDS). RESULTS: Using a clinically important question of predicting recurrence risk in colorectal cancer patients, we demonstrated that (i) semi-supervised classification improved prediction accuracy as compared with the state of the art supervised method SVM, (ii) performance gain increased with the number of unlabeled samples, (iii) unlabeled data from different institutes could be employed after appropriate processing and (iv) the LDS method is robust with regard to the number of input features. To test the general applicability of this semi-supervised method, we further applied LDS on human breast cancer datasets and also observed superior performance. Our results demonstrated great potential of semi-supervised learning in gene expression-based outcome prediction for cancer patients. CONTACT: bing.zhang@vanderbilt.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-supervised learning technique, low density separation (LDS). RESULTS: Using a clinically important question of predicting recurrence risk in colorectal cancerpatients, we demonstrated that (i) semi-supervised classification improved prediction accuracy as compared with the state of the art supervised method SVM, (ii) performance gain increased with the number of unlabeled samples, (iii) unlabeled data from different institutes could be employed after appropriate processing and (iv) the LDS method is robust with regard to the number of input features. To test the general applicability of this semi-supervised method, we further applied LDS on humanbreast cancer datasets and also observed superior performance. Our results demonstrated great potential of semi-supervised learning in gene expression-based outcome prediction for cancerpatients. CONTACT: bing.zhang@vanderbilt.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Devin C Koestler; Carmen J Marsit; Brock C Christensen; Margaret R Karagas; Raphael Bueno; David J Sugarbaker; Karl T Kelsey; E Andres Houseman Journal: Bioinformatics Date: 2010-08-16 Impact factor: 6.937
Authors: Samuel C Mok; Tomas Bonome; Vinod Vathipadiekal; Aaron Bell; Michael E Johnson; Kwong-kwok Wong; Dong-Choon Park; Ke Hao; Daniel K P Yip; Howard Donninger; Laurent Ozbun; Goli Samimi; John Brady; Mike Randonovich; Cindy A Pise-Masison; J Carl Barrett; Wing H Wong; William R Welch; Ross S Berkowitz; Michael J Birrer Journal: Cancer Cell Date: 2009-12-08 Impact factor: 31.743
Authors: An-Ting T Lu; Shelley R Salpeter; Anthony E Reeve; Steven Eschrich; Patrick G Johnston; Alain J Barrier; Francois Bertucci; Nicholas S Buckley; Edwin E Salpeter; Albert Y Lin Journal: Clin Colorectal Cancer Date: 2009-10 Impact factor: 4.481
Authors: Jeran K Stratford; David J Bentrem; Judy M Anderson; Cheng Fan; Keith A Volmar; J S Marron; Elizabeth D Routh; Laura S Caskey; Jonathan C Samuel; Channing J Der; Leigh B Thorne; Benjamin F Calvo; Hong Jin Kim; Mark S Talamonti; Christine A Iacobuzio-Donahue; Michael A Hollingsworth; Charles M Perou; Jen Jen Yeh Journal: PLoS Med Date: 2010-07-13 Impact factor: 11.069
Authors: Anne P G Crijns; Rudolf S N Fehrmann; Steven de Jong; Frans Gerbens; Gert Jan Meersma; Harry G Klip; Harry Hollema; Robert M W Hofstra; Gerard J te Meerman; Elisabeth G E de Vries; Ate G J van der Zee Journal: PLoS Med Date: 2009-02-03 Impact factor: 11.069