| Literature DB >> 19455248 |
Masaru Ushijima1, Satoshi Miyata, Shinto Eguchi, Masanori Kawakita, Masataka Yoshimoto, Takuji Iwase, Futoshi Akiyama, Goi Sakamoto, Koichi Nagasaki, Yoshio Miki, Tetsuo Noda, Yutaka Hoshikawa, Masaaki Matsuura.
Abstract
We propose a method for biomarker discovery from mass spectrometry data, improving the common peak approach developed by Fushiki et al. (BMC Bioinformatics, 7:358, 2006). The common peak method is a simple way to select the sensible peaks that are shared with many subjects among all detected peaks by combining a standard spectrum alignment and kernel density estimates. The key idea of our proposed method is to apply the common peak approach to each class label separately. Hence, the proposed method gains more informative peaks for predicting class labels, while minor peaks associated with specific subjects are deleted correctly. We used a SELDI-TOF MS data set from laser microdissected cancer tissues for predicting the treatment effects of neoadjuvant therapy using an anticancer drug on breast cancer patients. The AdaBoost algorithm is adopted for pattern recognition, based on the set of candidate peaks selected by the proposed method. The analysis gives good performance in the sense of test errors for classifying the class labels for a given feature vector of selected peak values.Entities:
Year: 2007 PMID: 19455248 PMCID: PMC2675857
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Pathologies of the 65 patients. The effects 0 and 1a are defined as nonresponders; 1b and 2 are defined as responders.
| Training | 4 | 28 | 11 | 7 | 50 |
| Test | 2 | 8 | 4 | 1 | 15 |
| Total | 6 | 36 | 15 | 8 | 65 |
Figure 1.Average of peaks (nonresponders). The dashed line denotes a threshold value h = 0.5.
Figure 2.Training error rate (solid line), CV error rate (dashed line), and test error rate (dotted line) by AdaBoost for the discrete and continuous covariates.
(a) discrete covariates
(b) continuous covariates
Figure 3.Prediction scores of the test data for discrete covariates using six-peaks model. The “○” indicate the five responders and the “×” indicate the ten nonresponders. Only one of the subjects with response “1b” is misclassified.
p-values of the single peak analysis.
| 1361 | 10 | 8 | 32 | 0 | 8.15E–5 |
| 2250 | 17 | 1 | 13 | 19 | 2.00E–4 |
| 2621 | 16 | 2 | 12 | 20 | 8.11E–4 |
| 2843 | 3 | 15 | 29 | 3 | 2.28E–7 |
| 2989 | 17 | 1 | 12 | 20 | 7.37E–5 |
| 6557 | 9 | 9 | 30 | 2 | 6.84E–4 |