| Literature DB >> 20187963 |
Kai-Lin Tang1, Tong-Hua Li, Wen-Wei Xiong, Kai Chen.
Abstract
BACKGROUND: Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20187963 PMCID: PMC2846906 DOI: 10.1186/1471-2105-11-109
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Mass spectra for set A (top) and set B (bottom). A mass spectrum for serum from a cancer patient, with m/z values on the horizontal axis and intensity values indicating the relative ion abundance on the vertical axis.
Figure 2Statistical moment transformation for set A. After division of the data into several intervals, the statistical moments were calculated for one of the intervals (for example m/z 7600-8000) and used as new variables to represent the characteristics of the spectrum.
Figure 3Results for different window widths. Ac/CC results for set A (top) and set B (bottom) for different window widths.
Five-fold cross validation of statistical transformation
| Mean | Variance | Skewness | Kurtosis | Ac | CC | Sn | Sp | |
|---|---|---|---|---|---|---|---|---|
| Average | √ | 0.9917 | 0.9832 | 0.9860 | 0.9989 | |||
| SD | 0.0020 | 0.0039 | 0.0040 | 0.0033 | ||||
| Average | √ | 0.9639 | 0.9272 | 0.9587 | 0.9705 | |||
| SD | 0.0078 | 0.0156 | 0.0117 | 0.0067 | ||||
| Average | √ | 0.7796 | 0.5718 | 0.7231 | 0.8516 | |||
| SD | 0.0195 | 0.0371 | 0.0273 | 0.0201 | ||||
| Average | √ | 0.6986 | 0.3920 | 0.7124 | 0.6811 | |||
| SD | 0.0187 | 0.0370 | 0.0258 | 0.0238 | ||||
| Average | √ | √ | 0.9907 | 0.9814 | 0.9851 | 0.9979 | ||
| SD | 0.0031 | 0.0062 | 0.0052 | 0.0044 | ||||
| Average | √ | √ | 0.9921 | 0.9842 | 0.9868 | 0.9989 | ||
| SD | 0.0022 | 0.0045 | 0.0043 | 0.0033 | ||||
| Average | √ | √ | 0.9898 | 0.9796 | 0.9826 | 0.9989 | ||
| SD | 0.0029 | 0.0059 | 0.0047 | 0.0033 | ||||
| Average | √ | √ | 0.9625 | 0.9242 | 0.9595 | 0.9663 | ||
| SD | 0.0046 | 0.0092 | 0.0082 | 0.0067 | ||||
| Average | √ | √ | 0.9569 | 0.9129 | 0.9562 | 0.9579 | ||
| SD | 0.0069 | 0.0140 | 0.0088 | 0.0122 | ||||
| Average | √ | √ | 0.7722 | 0.5447 | 0.7587 | 0.7895 | ||
| SD | 0.0132 | 0.0277 | 0.0145 | 0.0253 | ||||
| Average | √ | √ | √ | 0.9921 | 0.9842 | 0.9876 | 0.9979 | |
| SD | 0.0022 | 0.0045 | 0.0044 | 0.0044 | ||||
| Average | √ | √ | √ | 0.9921 | 0.9842 | 0.9876 | 0.9979 | |
| SD | 0.0022 | 0.0045 | 0.0044 | 0.0044 | ||||
| Average | √ | √ | √ | 0.9917 | 0.9832 | 0.9868 | 0.9979 | |
| SD | 0.0020 | 0.0039 | 0.0043 | 0.0044 | ||||
| Average | √ | √ | √ | 0.9903 | 0.9804 | 0.9843 | 0.9979 | |
| SD | 0.0034 | 0.0069 | 0.0047 | 0.0044 | ||||
| Average | √ | √ | √ | √ | 0.9935 | 0.9869 | 0.9950 | 0.9916 |
| SD | 0.0037 | 0.0075 | 0.0055 | 0.0042 |
Ac, accuracy; CC, correlation coefficient; Sn, sensitivity; Sp, specificity.
Validation for set A
| Ac | CC | Sn | Sp | |
|---|---|---|---|---|
| Leave-one-out validation | 0.9815 | 0.9627 | 0.9752 | 0.9895 |
| Five-fold cross validation | 0.9810 | 0.9618 | 0.9752 | 0.9884 |
| SD | 0.0046 | 0.0092 | 0.0067 | 0.0060 |
| Five-fold proportional validation | 0.9833 | 0.9663 | 0.9715 | 0.9923 |
| SD | 0.0101 | 0.0203 | 0.0172 | 0.0172 |
Validation for set B
| Ac | CC | Sn | Sp | |
|---|---|---|---|---|
| Leave-one-out validation | 0.9907 | 0.9815 | 0.9833 | 1.0000 |
| Five-fold cross validation | 0.9704 | 0.9403 | 0.9636 | 0.9789 |
| SD | 0.0050 | 0.0100 | 0.0070 | 0.0050 |
| Five-fold proportional validation | 0.9829 | 0.9657 | 0.9787 | 0.9846 |
| SD | 0.0098 | 0.0201 | 0.0307 | 0.0211 |
Validation for set C
| Ac | CC | Sn | Sp | |
|---|---|---|---|---|
| Leave-one-out validation | 0.9907 | 0.9814 | 1.0000 | 0.9835 |
| Five-fold cross validation | 0.9904 | 0.9811 | 0.9847 | 0.9977 |
| SD | 0.0145 | 0.0284 | 0.0241 | 0.0113 |
| Five-fold proportional validation | 0.9900 | 0.9800 | 0.9835 | 0.9979 |
| SD | 0.0144 | 0.0286 | 0.0254 | 0.0102 |
Validation for set D
| Ac | CC | Sn | Sp | |
|---|---|---|---|---|
| Leave-one-out validation | 0.9954 | 0.9906 | 1.0000 | 0.9895 |
| Five-fold cross validation | 0.9935 | 0.9869 | 0.9950 | 0.9916 |
| SD | 0.0037 | 0.0075 | 0.0055 | 0.0042 |
| Five-fold proportional validation | 0.9909 | 0.9817 | 0.9937 | 0.9937 |
| SD | 0.0188 | 0.0376 | 0.0193 | 0.0193 |
Results of pancreatic cancer premalignant data
| Ac | CC | Sn | Sp | ||
|---|---|---|---|---|---|
| Without preprocessing | Average | 0.6917 | 0.3884 | 0.7075 | 0.6791 |
| stdev | 0.0609 | 0.1257 | 0.1137 | 0.0861 | |
| After preprocessing | average | 0.7697 | 0.5485 | 0.7008 | 0.8240 |
| stdev | 0.0307 | 0.0611 | 0.1517 | 0.1152 |