| Literature DB >> 12795817 |
Abstract
BACKGROUND: The early detection of ovarian cancer has the potential to dramatically reduce mortality. Recently, the use of mass spectrometry to develop profiles of patient serum proteins, combined with advanced data mining algorithms has been reported as a promising method to achieve this goal. In this report, we analyze the Ovarian Dataset 8-7-02 downloaded from the Clinical Proteomics Program Databank website, using nonparametric statistics and stepwise discriminant analysis to develop rules to diagnose patients, as well as to understand general patterns in the data that may guide future research.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12795817 PMCID: PMC165662 DOI: 10.1186/1471-2105-4-24
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Training Set Wilcoxon p-values by M/Z value. Wilcoxon p-values between normal and cancer members of the training set were calculated for every M/Z value. The Y axis represents negative the Log (base 10) of the p-value. Panel A: The X axis represents M/Z values between 0 and 20,000. Panel B: The X axis represents M/Z values between 0 and 1000. The following control spectra were used for the initial training set: daf-0181 daf-0182 daf-0183 daf-0188 daf-0189 daf-0192 daf-0193 daf-0195 daf-0196 daf-0197 daf-0198 daf-0200 daf-0201 daf-0202 daf-0205 daf-0207 daf-0210 daf-0211 daf-0212 daf-0217 daf-0218 daf-0220 daf-0223 daf-0226 daf-0230 daf-0234 daf-0235 daf-0241 daf-0242 daf-0244 daf-0247 daf-0248 daf-0250 daf-0251 daf-0252 daf-0258 daf-0259 daf-0261 daf-0262 daf-0263 daf-0267 daf-0269 daf-0270 daf-0279 daf-0280 The following cancer spectra were used for the initial training set. daf-0601 daf-0602 daf-0606 daf-0608 daf-0609 daf-0612 daf-0617 daf-0618 daf-0619 daf-0620 daf-0621 daf-0625 daf-0627 daf-0632 daf-0633 daf-0634 daf-0635 daf-0636 daf-0643 daf-0644 daf-0651 daf-0654 daf-0655 daf-0656 daf-0657 daf-0661 daf-0662 daf-0663 daf-0664 daf-0666 daf-0667 daf-0669 daf-0673 daf-0675 daf-0682 daf-0683 daf-0687 daf-0688 daf-0691 daf-0692 daf-0697 daf-0698 daf-0701 daf-0702 daf-0703 daf-0705 daf-0706 daf-0707 daf-0708 daf-0709 daf-0716 daf-0718 daf-0719 daf-0726 daf-0727 daf-0729 daf-0731 daf-0733 daf-0735 daf-0737 daf-0740 daf-0744 daf-0751 daf-0752 daf-0753 daf-0754 daf-0755 daf-0756 daf-0757 daf-0758 daf-0760 daf-0761 daf-0762 daf-0764 daf-0768 daf-0770 daf-0773 daf-0776 daf-0778 daf-0780
Development of Diagnostic Rule 1.
| Consecutive M/Z | M/Z Value | Bin Range | Wilcoxon p-value Training Set | Rule 1 | Wilcoxon p-value |
| 6782 | 4003.645 | 6781–6783 | 1.8685E-12 | S | 8.98721E-27 |
| 2311 | 464.3617 | 2308–2314 | 3.6867E-17 | S | 6.76511E-34 |
| 2237 | 435.0751 | 2234–2242 | 6.822E-18 | S | 3.895E-37 |
| 2193 | 418.1136 | 2190–2196 | 5.6991E-18 | S | 3.91174E-34 |
| 2171 | 409.7594 | 2170–2172 | 3.6168E-12 | 3.28383E-25 | |
| 1736 | 261.8864 | 1734–1739 | 1.9206E-18 | S | 1.22566E-35 |
| 1681 | 245.53704 | 1673–1691 | 2.2891E-19 | S | 7.24111E-38 |
| 1600 | 222.4183 | 1598–1608 | 1.8911E-16 | 2.01896E-33 | |
| 1594 | 220.7513 | 1593–1596 | 2.3886E-14 | 5.52587E-30 | |
| 576 | 28.70048 | 562–582 | 6.82E-12 | 2.60148E-24 | |
| 544 | 25.58989 | 541–547 | 1.9179E-14 | 8.67451E-30 | |
| 181 | 2.7921478 | 181–183 | 1.2929E-13 | S | 1.21243E-27 |
Consecutive M/Z is the numerical order of the M/Z value between 1 and 15,154. The M/Z values were sorted by p-values and the lowest 100 were arbitrarily selected. The M/Z values were then binned as described in the text, and the most significant consecutive M/Z score from each of the 12 bins was selected. M/Z values that were selected by the stepwise discriminant analysis are designated a "S" in the Rule 1 column. The Wilcoxon p-values calculated from the training set (used to derive the rule) and calculated from the entire data set are shown in their respective columns.
Development of Diagnostic Rule 2.
| Consecutive M/Z | M/Z | Wilcoxon p-value | Rule 2 |
| 5534 | 2665.397 | 4.06E-09 | S |
| 6372 | 3534.072 | 1.26E-07 | |
| 6753 | 3969.469 | 4E-07 | S |
| 6772 | 3991.844 | 6.87E-09 | S |
| 6782 | 4003.645 | 1.87E-12 | S |
| 6802 | 4027.3 | 6.21E-10 | S |
| 6814 | 4041.526 | 1.86E-07 | |
| 6823 | 4052.213 | 8.33E-07 | |
| 6827 | 4056.967 | 9.38E-07 | S |
| 6836 | 4067.673 | 3.9E-07 | |
| 6852 | 4086.742 | 4.11E-07 | |
| 6934 | 4185.17 | 6.56E-09 | |
| 7383 | 4744.889 | 1.71E-07 | S |
| 7449 | 4830.124 | 2.89E-07 | |
| 7468 | 4854.802 | 8.22E-07 | |
| 7508 | 4906.962 | 5.45E-07 | |
| 7606 | 5035.93 | 1.41E-07 | |
| 8707 | 6599.823 | 4.96E-07 | |
| 8839 | 6801.495 | 6.46E-09 | S |
| 9439 | 7756.437 | 2.66E-07 | |
| 9457 | 7786.054 | 4.58E-07 | S |
| 9483 | 7828.934 | 6.23E-07 | |
| 9607 | 8035.058 | 4.94E-10 | |
| 9793 | 8349.266 | 2.04E-08 | S |
| 12910 | 14511.46 | 6.4E-07 | |
| 13036 | 14796.14 | 3.95E-07 | S |
| 13113 | 14971.48 | 1.76E-07 | |
| 13201 | 15173.13 | 7.88E-09 | |
| 13537 | 15955.47 | 3.13E-07 | S |
| 13987 | 17034.05 | 4.53E-09 | S |
The M/Z values were sorted by M/Z values greater than 2,000 and p-values less than 10-6. Consecutive M/Z is the numerical order of the M/Z value between 1 and 15,154. The M/Z values were then binned as described in the text, and the most significant consecutive M/Z score from each of the 30 bins was selected. M/Z values that were selected by the stepwise discriminant analysis are designated "S" in the rightmost column.
Classification rules
| Let |
| Let X be the vector that represents the intensities from a subject at the 7 M/Z values: |
| 2.7921478, 245.53704, 261.8864, 418.1136, 435.0751, 464.3617, 4003.645. Classify X into the cancer group if |
| ( |
| Otherwise classify X into the control group. |
| Let |
| Let X be the vector that represents the intensities from a subject at the following 13 M/Z values: 2665.397, 3969.469, 3991.844, 4003.645, 4027.3, 4056.967, 4744.889, 6801.495, 7756.437, 8349.266, 14796.14, 15955.47, 17034.05. |
| Classify X into cancer if |
| ( |
| Otherwise classify X into control. |
| Let |
| Let X be the column vector that represents the intensities from a subject at the following 7 M/Z values: |
| 418.1136, 435.0751, 464.3617,4003.645, 4906.962, 6599.823, 6801.495. |
| Classify X into cancer if |
| ( |
| Otherwise classify X into control. |
Figure 2Wilcoxon P-Values by M/Z Value for Entire Dataset .Wilcoxon p-values between normal and cancer members of the entire dataset set were calculated for every M/Z value. The Y axis in negative the Log base 10 of the p-value. Panel A: the x-axis represents M/Z from 0 to 20,000. Panel B: the x-axis represents M/Z from 0 to 1,000.
Clinical Proteomics Program Databank Example Ovarian Rule.
| Consecutive M/Z Bin | M/Z-Value | P2_Wil |
| 5632 | 2760.6685 | 0.239533474 |
| 15020 | 19643.409 | 0.521014657 |
| 2314 | 465.56916 | 2.49791E-28 |
| 8728 | 6631.7043 | 9.00537E-4 |
| 12704 | 14051.976 | 1.79156E-08 |
| 2238 | 435.46452 | 9.07922E-37 |
| 6339 | 3497.5508 | 1.40316E-06 |
Consecutive M/Z values and Wilcoxon p-values based on the entire dataset for the rule present on the Clinical Proteomics Program Databank website.
Figure 3Diagnostic value of Low M/Z values. Scatter plots of the 162 cancer subject versus 91 normal subjects. Panel A represents 2 M/Z values from the Clinical Proteomics Program Database while Panel B and Panel C are both derived from Rule 1. See text for details.
Normalization and Permutation Analysis of Low M/Z Values
| M/Z | Permutated P-Wilcoxon | Normalized P-Wilcoxon | Actual P-Wilcoxon |
| 2.792148 | 8.30646E-4 | 1.2124E-27 | 1.21243E-27 |
| 25.58989 | 0.210484E-4 | NT | 8.67451E-30 |
| 245.537 | 0.571414E-4 | NT | 7.24111E-38 |
| 418.1136 | 1.99231E-4 | NT | 3.91174E-34 |
| 435.0751 | 3.34994E-4 | 3.895E-37 | 3.895E-37 |
| 464.3617 | 0.263008E-4 | 6.7651E-34 | 6.76511E-34 |
| 4003.645 | 0.292578E-4 | NT | 8.98721E-27 |
| 15526.93 | 2.079E-4 | NT | 0.741858713 |
Normalization and Permutation Analysis of Low M/Z Values Normalization and permutation analysis (smallest p-value of the 10,000 iterations per M/Z point tested) were carried out on selected M/Z points. See text for details. NT = Not tested.
Figure 4P-values and Intensities for M/Z values between 410 and 470. The p-values and mean intensities of cancer and control groups (entire set) for M/Z values between 410 and 470 are shown in panels A and B respectively. Selected data points are labelled with their M/Z values directly to the right of the points, see text for details.