| Literature DB >> 19455252 |
Changyu Shen1, Timothy E Breen, Lacey E Dobrolecki, C Max Schmidt, George W Sledge, Kathy D Miller, Robert J Hickey.
Abstract
INTRODUCTION: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer.Entities:
Keywords: SELDI; classification; hepatic carcinoma; random forest; support vector machine
Year: 2007 PMID: 19455252 PMCID: PMC2675858
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Illustration of baseline subtraction.
Figure 2.Illustration of the effect of normalization and m/z adjustment. “_adj” implies spectrum after normalization and m/z adjustment.
17 M/Z values used for classification. AUC: Area Under the Curve of Receiver Operating Characteristic; LFDR: Local False Discovery Rate.
| 644.72900 | 0.735 (0.624, 0.846) | 0.305 |
| 941.7701 | 0.717 (0.605, 0.828) | 0.331 |
| 950.5807 | 0.717 (0.604, 0.829) | 0.340 |
| 1384.6386 | 0.720 (0.606, 0.834) | 0.305 |
| 1453.8830 | 0.712 (0.600, 0.824) | 0.335 |
| 1732.8245 | 0.727 (0.619, 0.835) | 0.305 |
| 2458.0293 | 0.726 (0.614, 0.838) | 0.305 |
| 2458.5453 | 0.718 (0.608, 0.829) | 0.305 |
| 2786.9481 | 0.732 (0.629, 0.834) | 0.266 |
| 4054.9402 | 0.746 (0.632, 0.861) | 0.059 |
| 4064.9297 | 0.753 (0.641, 0.866) | 0.059 |
| 4070.9294 | 0.732 (0.610, 0.853) | 0.059 |
| 7549.8190 | 0.721 (0.610, 0.831) | 0.305 |
| 7550.7298 | 0.720 (0.609, 0.831) | 0.305 |
| 8004.6494 | 0.736 (0.621, 0.852) | 0.059 |
| 8061.0327 | 0.724 (0.607, 0.840) | 0.100 |
| 8080.8138 | 0.737 (0.619, 0.855) | 0.061 |
Figure 3.Prediction errors using 30 features and 4-fold cross-validation (100 runs).
Figure 4.Prediction errors using 17 features and 4-fold cross-validation (100 runs).
Sensitivity and specificity averaged over 100 cross-validations.
| KNN1 | 0.67 | 0.79 | 0.67 | 0.84 |
| KNN3 | 0.61 | 0.86 | 0.63 | 0.86 |
| LDA | 0.69 | 0.79 | 0.63 | 0.82 |
| LogitBoost | 0.62 | 0.87 | 0.63 | 0.85 |
| NNET | 0.57 | 0.86 | 0.46 | 0.86 |
| PAM | 0.64 | 0.89 | 0.57 | 0.92 |
| QDA | NA | NA | 0.18 | 0.98 |
| RF | 0.61 | 0.93 | 0.61 | 0.90 |
| SVM.lin | 0.78 | 0.83 | 0.58 | 0.84 |
| SVM.rad | 0.71 | 0.93 | 0.61 | 0.93 |
| Tree | 0.51 | 0.76 | 0.49 | 0.80 |
Figure 5.Estimate of expected true errors and their standard errors based on 17 features and 66 training samples.
Mean and standard deviation (SD) of classification error rates under random assigned disease status (100 random assignments, 10 runs of 4-fold cross-validations for each assignment).
| KNN1 | 0.45 | 0.11 | 0.45 | 0.10 |
| KNN3 | 0.42 | 0.10 | 0.43 | 0.10 |
| LDA | 0.45 | 0.10 | 0.41 | 0.10 |
| LogitBoost | 0.50 | 0.10 | 0.43 | 0.10 |
| NNET | 0.43 | 0.11 | 0.41 | 0.11 |
| PAM | 0.36 | 0.09 | 0.35 | 0.09 |
| QDA | NA | NA | 0.34 | 0.06 |
| RF | 0.38 | 0.09 | 0.39 | 0.09 |
| SVM.lin | 0.43 | 0.10 | 0.39 | 0.09 |
| SVM.rad | 0.33 | 0.03 | 0.33 | 0.03 |
| Tree | 0.46 | 0.11 | 0.45 | 0.11 |