| Literature DB >> 20122180 |
Abstract
BACKGROUND: As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122180 PMCID: PMC3009481 DOI: 10.1186/1471-2105-11-S1-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Four mass spectral serum profiles
| Dataset | Technology | #m/z | #Samples |
|---|---|---|---|
| Ovarian | SELDI-TOF low resolution | 15142 | 91 controls + 162 cancers |
| Ovarian-qaqc | SELDI-TOF high resolution | 15000 | 95 controls + 121 cancers |
| Liver | SELDI-QqTOF high resolution | 6107 | 176 controls + 181 cancers |
| Colorectal | MADLI-TOF high resolution | 16331 | 48 controls + 64 cancers |
Comparisons of the seven algorithms
| Average | Average | Average | |
|---|---|---|---|
| 98.94 ± 00.65 | 98.35 ± 01.03 | 99.98 ± 00.24 | |
| 99.79 ± 00.35 | 100.0 ± 00.00 | 99.42 ± 00.99 | |
| 99.50 ± 00.83 | 100.0 ± 00.00 | 98.63 ± 02.21 | |
| 99.96 ± 00.26 | 99.98 ± 00.17 | 99.93 ± 00.51 | |
| 97.41 ± 00.94 | 99.91 ± 00.31 | 92.92 ± 02.50 | |
| 96.53 ± 01.57 | 99.28 ± 01.34 | 91.67 ± 03.67 | |
| 99.67 ± 00.87 | 99.93 ± 00.38 | 99.21 ± 02.00 | |
| 99.99 ± 00.08 | 99.99 ± 00.12 | 100.0 ± 00.00 | |
| 98.70 ± 00.89 | 98.01 ± 01.94 | 99.27 ± 00.90 | |
| 98.91 ± 00.98 | 98.11 ± 02.25 | 99.57 ± 00.82 | |
| 96.57 ± 01.99 | 96.16 ± 03.52 | 96.97 ± 02.19 | |
| 97.12 ± 01.17 | 97.14 ± 02.16 | 97.94 ± 01.57 | |
| 88.69 ± 03.47 | 92.02 ± 05.01 | 86.24 ± 05.67 | |
| 90.87 ± 02.92 | 89.99 ± 04.68 | 91.82 ± 04.43 | |
| 97.69 ± 00.65 | 98.81 ± 01.68 | 96.99 ± 00.03 | |
| 97.56 ± 01.45 | 97.80 ± 02.46 | 97.41 ± 01.77 | |
| 96.02 ± 01.35 | 97.68 ± 01.71 | 94.40 ± 02.22 | |
| 97.25 ± 01.30 | 98.35 ± 01.67 | 96.20 ± 02.01 | |
| 91.78 ± 02.27 | 92.57 ± 03.84 | 91.04 ± 03.76 | |
| 90.21 ± 01.99 | 90.96 ± 03.69 | 89.57 ± 03.56 | |
| 77.76 ± 02.48 | 84.58 ± 05.14 | 71.30 ± 05.12 | |
| 76.48 ± 02.20 | 72.27 ± 04.60 | 80.80 ± 04.57 | |
| 90.08 ± 02.13 | 91.39 ± 03.53 | 88.87 ± 03.95 | |
| 86.61 ± 02.87 | 87.78 ± 04.55 | 86.50 ± 04.86 | |
| 98.14 ± 01.27 | 97.93 ± 02.32 | 98.35 ± 02.00 | |
| 97.15 ± 01.07 | 95.81 ± 02.78 | 98.18 ± 02.22 | |
| 96.55 ± 01.87 | 94.35 ± 03.47 | 98.26 ± 02.16 | |
| 93.21 ± 03.38 | 92.59 ± 04.68 | 93.89 ± 05.56 | |
| 94.73 ± 03.09 | 92.71 ± 06.14 | 96.49 ± 03.45 | |
| 95.05 ± 03.17 | 96.17 ± 02.91 | 94.28 ± 05.33 | |
| 94.05 ± 02.78 | 94.16 ± 03.74 | 94.01 ± 04.12 | |
| 96.04 ± 02.02 | 94.38 ± 03.66 | 97.39 ± 02.97 |
Figure 1Comparison on the five algorithm performance. Comparison on the five algorithm performance on four datasets: 'O1' (ovarian), 'O2' (ovarian-qaqc), 'L' (liver), and 'C' (colorectal). The NPCA-SVM algorithm demonstrated leading performance over the other four algorithms.
Biomarkers captured for the colorectal data
| m/z | Bayes factor | npca-coefficient | SVM ratio (%) |
|---|---|---|---|
| 969.1849 | 7.7881e-031 | -1.1205 | 0.9643 |
| 997.5336 | 1.4236e-026 | -1.1571 | 0.9018 |
| 1016.389 | 7.6644e-013 | 1.2773 | 0.8152 |
Figure 2Visualization of the colorectal samples by using three biomarkers. The 48 control and 64 cancer samples are visualized by using the three biomarkers. Two types of samples demonstrated significantly different means and variations.
Figure 3Visualization of the ovarian samples by using three biomarkers. The 253 ovarian samples are visualized by using the three biomarkers. The 91 control and 162 cancer samples are separated into two disjoint clusters.