| Literature DB >> 28096647 |
Upamanyu Banerjee1, Ulisses M Braga-Neto1.
Abstract
Proteomics promises to revolutionize cancer treatment and prevention by facilitating the discovery of molecular biomarkers. Progress has been impeded, however, by the small-sample, high-dimensional nature of proteomic data. We propose the application of a Bayesian approach to address this issue in classification of proteomic profiles generated by liquid chromatography-mass spectrometry (LC-MS). Our approach relies on a previously proposed model of the LC-MS experiment, as well as on the theory of the optimal Bayesian classifier (OBC). Computation of the OBC requires the combination of a likelihood-free methodology called approximate Bayesian computation (ABC) as well as Markov chain Monte Carlo (MCMC) sampling. Numerical experiments using synthetic LC-MS data based on an actual human proteome indicate that the proposed ABC-MCMC classification rule outperforms classical methods such as support vector machines, linear discriminant analysis, and 3-nearest neighbor classification rules in the case when sample size is small or the number of selected proteins used to classify is large.Entities:
Keywords: Markov chain Monte Carlo; approximate Bayesian computation; liquid chromatography-mass spectrometry; optimal Bayesian classifier; proteomics
Year: 2017 PMID: 28096647 PMCID: PMC5224349 DOI: 10.4137/CIN.S30798
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
LC-MS parameters used in the experiment.
| PARAMETER | SYMBOL | VALUE/RANGE |
|---|---|---|
| Instrument response | 5 | |
| Noise severity | 0.03, 3.6 | |
| Peptide efficiency factor | [0.1–1] | |
| Peptide detection algorithm | 0,0.0016,2 |
Figure 1Relationship among all parameters of the LC-MS model (see text).
Hyperparameter priors used in the experiment.
| PARAMETER | SYMBOL | RANGE/VALUE |
|---|---|---|
| Shape (gamma distribution) | Unif(1.6, 2.4) | |
| Scale (gamma distribution) | Unif(800, 1200) | |
| Coefficient of variation | φ | Unif(0.3, 0.5) |
| Fold change | Unif(1.5, 1.6) |
Figure 2Expected classification error rates for varying sample size and fixed number of selected proteins d = 8.
Figure 3Expected classification error rates for varying number of selected proteins and fixed sample size n = 10 per class.
Figure 4Expected classification error rates for fixed sample size n = 10 per class, fixed number of selected proteins d = 8, and varying coefficient of variation φ.
Figure 5Expected classification error rates for fixed sample size n = 10 per class, fixed number of selected proteins d = 8, and varying lower bound a for the peptide efficiency factor e ~ Unif(α, 1).