MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.
MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancerpatients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancerpatients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.
Authors: Nicholas Carriero; Michael V Osier; Kei-Hoi Cheung; Perry L Miller; Mark Gerstein; Hongyu Zhao; Baolin Wu; Scott Rifkin; Joseph Chang; Heping Zhang; Kevin White; Kenneth Williams; Martin Schultz Journal: J Am Med Inform Assoc Date: 2004-10-18 Impact factor: 4.497
Authors: Jeremy L Norris; Dale S Cornett; James A Mobley; Malin Andersson; Erin H Seeley; Pierre Chaurand; Richard M Caprioli Journal: Int J Mass Spectrom Date: 2007-02-01 Impact factor: 1.986
Authors: Samir Gupta; Han Sun; Sang Yi; Joy Storm; Guanghua Xiao; Bijal A Balasubramanian; Song Zhang; Raheela Ashfaq; Don C Rockey Journal: Cancer Prev Res (Phila) Date: 2014-08-04