| Literature DB >> 19244390 |
Theodore Alexandrov1, Jens Decker, Bart Mertens, Andre M Deelder, Rob A E M Tollenaar, Peter Maass, Herbert Thiele.
Abstract
MOTIVATION: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19244390 PMCID: PMC2647828 DOI: 10.1093/bioinformatics/btn662
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Patient characteristics and distribution across plates
| Patients | Controls | |||||
|---|---|---|---|---|---|---|
| Number | 64 | 48 | ||||
| Mean age (range) | 67.2 (37–89) | 52.2 (29–78) | ||||
| Male/female ratio | 35/29 | 21/27 | ||||
| Number on plate 1/2/3 | 25/22/17 | 17/16/15 |
Fig. 1.Mean spectra for the cancer and control group (inverted, gray spectrum) after low-level processing.
Fig. 2.Scheme of calculation of APPDWT and CONVDWT coefficients, the bior3.7 wavelet and its scaling function. A(D) denote approximation (detail) coefficients of the i-th level, n is the maximum level (10 in our case). Note that A belongs to both the APPDWT and CONVDWT coefficients.
Fig. 3.The i-th step of double CV used for simultaneous parameters estimation and prediction assessment, i goes through all the given spectra.
Double CV classification results for the detection of cancer using the proposed procedure
| DWT type | Test | TRR | Sensitivity | Specificity | Number of coefficients | Mean number of SV | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BH | Bonf | BY | BH | Bonf | BY | BH | Bonf | BY | BH | Bonf | BY | BH | Bonf | BY | |||||||
| APPDWT | KS | 96.4 | 96.4 | 96.4 | 96.9 | 98.4 | 98.4 | 95.8 | 93.8 | 93.8 | 6219 | 1545 | 3392 | 55 | 44 | 49 | |||||
| APPDWT | MW | 96.4 | 97.3 | 97.3 | 96.9 | 98.4 | 96.9 | 95.8 | 95.8 | 95.8 | 7068 | 1784 | 3920 | 56 | 43 | 52 | |||||
| CONVDWT | KS | 95.5 | 95.5 | 96.4 | 96.9 | 96.9 | 96.9 | 93.8 | 93.8 | 95.8 | 603 | 299 | 419 | 66 | 54 | 49 | |||||
| CONVDWT | MW | 94.6 | 94.6 | 96.4 | 95.3 | 96.9 | 96.9 | 93.8 | 91.7 | 95.8 | 613 | 303 | 438 | 81 | 61 | 55 | |||||
APPDWT and CONVDWT specify the ways of wavelet coefficients calculation. Column ‘Number of coefficients’ contains the number of discriminative coefficients selected. Column ‘Mean number of SV’ shows mean number of support vectors describing the generalizability of the classifier: a large number indicates overfitting.
Fig. 4.(a) The class-discriminating parts of spectra (MW, Bonf, APPDWT) against the mean spectra (control data are shifted in intensity for better viewing) in the interval 960–3500 Da (no discriminative peaks above 3500 Da). (b) Difference between these parts in the interval 1100–2100 Da. Positive (negative) peaks relate to the cancer (control) spectra.
Fig. 5.The P-values (MW, Bonf) for APPDWT coefficients plotted in log10-scale against the difference of the mean spectra for the wavelet scales L1–L9. (a) All P-values. (b) Only the 100 smallest P-values showing the most discriminative parts of the biomarker patterns.
Indication of whether the peaks (denoted by their m/z-values, in Da) are reconstructed by the most significant APPDWT coefficients
| 1208 | 1265 | 1352 | 1467 | 1692 | 1780 | 1867 | 2024 | |
|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | |||||||
| 10 | ✓ | ✓ | ✓ | |||||
| 20 | ✓ | ✓ | ✓ | ✓ | ||||
| 50 | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| 150 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| 300 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Significance order | 5 | 1 | 2.5 | 2.5 | 8 | 6.5 | 6.5 | 4 |
| CPT significance order | 4 | 2 | 3 | 1 | 9 | 7 | 5 | 15 |
The MW test is used. The sign ‘✓’ indicates the presence of the peak in the corresponding features. This table shows in particular that the largest peaks are not the most statistically significant (discriminative) ones. The significance order summarizes the table. The ‘CPT significance order’ is calculated by the CPT software.
Generalization properties of the most discriminative APPDWT coefficients considering the number of support vectors
| Number of coefficients | TRR | Mean number of SV |
|---|---|---|
| 1784 (Bonf) | 97.3 | 43.4 |
| 100 | 95.5 | 39.5 |
| 50 | 94.6 | 49.5 |
| 20 | 94.6 | 62.8 |
| 10 | 95.5 | 66.8 |
| 5 | 95.5 | 70.0 |
| 1 | 95.5 | 88.2 |
The MW test is used to calculate the P-values and to rank the coefficients. For abbreviations, see Table 2.