| Literature DB >> 19455232 |
Deukwoo Kwon1, Mahlet G Tadesse, Naijun Sha, Ruth M Pfeiffer, Marina Vannucci.
Abstract
In recent years, there has been an increased interest in using protein mass spectroscopy to identify molecular markers that discriminate diseased from healthy individuals. Existing methods are tailored towards classifying observations into nominal categories. Sometimes, however, the outcome of interest may be measured on an ordered scale. Ignoring this natural ordering results in some loss of information. In this paper, we propose a Bayesian model for the analysis of mass spectrometry data with ordered outcome. The method provides a unified approach for identifying relevant markers and predicting class membership. This is accomplished by building a stochastic search variable selection method within an ordinal outcome model. We apply the methodology to mass spectrometry data on ovarian cancer cases and healthy individuals. We also utilize wavelet-based techniques to remove noise from the mass spectra prior to analysis. We identify protein markers associated with being healthy, having low grade ovarian cancer, or being a high grade case. For comparison, we repeated the analysis using conventional classification procedures and found improved predictive accuracy with our method.Entities:
Keywords: Markov chain Monte Carlo; mass spectrometry; ordinal outcome; variable selection
Year: 2007 PMID: 19455232 PMCID: PMC2675849
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Pre-processing and analysis of mass spectroscopy data.
Figure 2.Profiles of three mass spectra from each class.
Figure 3.Marginal posterior probabilities of inclusion for single peaks in each of the four MCMC chains.
List of selected markers with median intensities for each group.
| 3271 | 6.3926 | 3.2527 | 7.2641 | 0.4378 | * |
| 5743.5976 | 0.50085 | 0.49787 | 1.0655 | 0.2737 | |
| 6540.7 | 4.2977 | 3.1079 | 3.4107 | 0.3174 | |
| 7056.6 | 2.994 | 2.8814 | 2.6191 | 0.219 | |
| 7661.8 | 2.4026 | 1.7608 | 1.4349 | 1 | * |
| 8151.8 | 5.4292 | 5.6189 | 7.312 | 1 | * |
| 11514.5 | 0.17743 | 0.19802 | 0.85362 | 0.9956 | * |
| 11673.5 | 0.28511 | 0.31944 | 1.2318 | 0.9984 | * |
| 11724.752 | 0.601 | 0.56101 | 1.385 | 0.2497 | |
| 11903 | 0.2833 | 0.26976 | 0.73907 | 0.9998 | * |
| 13324.5 | 1.23 | 1.1709 | 1.2205 | 0.1224 | * |
Figure 4.Surface representation of spectra from patients in the three classes. Arrows at the top of the graph indicate peaks selected by our method.
Cross-validated misclassification rates with leave-one-out spectral data used for training classifiers.
| MCMC pooled output | ||||
| Bayesian prediction | 0.38 | 2/10 | 8/11 | 9/29 |
| LDA | 0.66 | 6/10 | 8/11 | 19/29 |
| QDA (with PCA) | 0.52 | 3/10 | 8/11 | 15/29 |
| KNN (with | 0.48 | 5/10 | 8/11 | 11/29 |
| linear SVM | 0.54 | 2/10 | 10/11 | 15/29 |
| nonlinear SVM | 0.66 | 1/10 | 11/11 | 21/29 |