| Literature DB >> 23152765 |
Vincent A Emanuele1, Gitika Panicker, Brian M Gurbaxani, Jin-Mann S Lin, Elizabeth R Unger.
Abstract
SELDI-TOF mass spectrometer's compact size and automated, high throughput design have been attractive to clinical researchers, and the platform has seen steady-use in biomarker studies. Despite new algorithms and preprocessing pipelines that have been developed to address reproducibility issues, visual inspection of the results of SELDI spectra preprocessing by the best algorithms still shows miscalled peaks and systematic sources of error. This suggests that there continues to be problems with SELDI preprocessing. In this work, we study the preprocessing of SELDI in detail and introduce improvements. While many algorithms, including the vendor supplied software, can identify peak clusters of specific mass (or m/z) in groups of spectra with high specificity and low false discover rate (FDR), the algorithms tend to underperform estimating the exact prevalence and intensity of peaks in those clusters. Thus group differences that at first appear very strong are shown, after careful and laborious hand inspection of the spectra, to be less than significant. Here we introduce a wavelet/neural network based algorithm which mimics what a team of expert, human users would call for peaks in each of several hundred spectra in a typical SELDI clinical study. The wavelet denoising part of the algorithm optimally smoothes the signal in each spectrum according to an improved suite of signal processing algorithms previously reported (the LibSELDI toolbox under development). The neural network part of the algorithm combines those results with the raw signal and a training dataset of expertly called peaks, to call peaks in a test set of spectra with approximately 95% accuracy. The new method was applied to data collected from a study of cervical mucus for the early detection of cervical cancer in HPV infected women. The method shows promise in addressing the ongoing SELDI reproducibility issues.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23152765 PMCID: PMC3495950 DOI: 10.1371/journal.pone.0048103
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Quadratic detector response curve fit to data using space between the peaks of QC spectra.
Processing time (in seconds) for denoising a single spectrum using different implementations of the modified Antoniadis-Sapatinas algorithm.
| n | ||||||
| Implementation | 210 | 211 | 212 | 213 | 214 | 215 |
| Original | 0.1235s | 0.5066s | 2.0476s | 7.8817s | 32.5274s | 1155.85s |
| Gen-Sparse | 0.0688s | 0.2392s | 0.9250s | 9.1161s | 23.3283s | 97.5260s |
| Offline-Sparse | 0.0089s | 00174s | 0.0407s | 0.0999s | 0.2177s | 1.7856s |
Figure 267th order FIR filter frequency response designed for flat-pass band analogous to a Savitsky-Golay filter, but with better high-frequency noise suppression properties.
Figure 3An example denoised peak using the FIR filter approach used for quantification.
This is a typical example where the Antoniadis-Sapatinas denoising would find the peak but distort its peak height.
Figure 4False-discovery rate and true-positive rate operating points showing various stages of improvement for LibSELDI.
Figure 5LibSELDI/neural network strategy for analyzing clinical spectra.
CIN0 vs. CIN3 group tests (t-tests and Mann-Whitney U-tests) based on peak height and peak area measurements.
| t-test, peak area | t-test, peak height | U-test, peak area | U-test, peak height | ||||||||
|
| CIN0 | CIN3 |
| CIN0 | CIN3 |
| CIN0 | CIN3 |
| CIN0 | CIN3 |
| 16054.5 Da | 180 (8.6) | 150 (1.1) | 11821.9 Da | 0.69 (0.12) | 0.27 (0.07) | 6912.8 Da | 52.1 (0.6) | 58.2 (3) | 11821.9 Da | 0.69 (0.1) | 0.27 (0.1) |
| 3017.1 Da | 26.5 (1.6) | 21.9 (0.6) | 16054.5 Da | 0.58 (0.09) | 0.26 (0.003) | 12680.8 Da | 116.8 (3.8) | 136.5 (8.0) | 8287.7 Da | 0.15 (0.03) | 0.07 (0.008) |
| 8287.7 Da | 66.6 (0.9) | 64.0 (0.4) | 3017.1 Da | 0.48 (0.12) | 0.13 (0.03) | 10427.1 Da | 83.3 (1.9) | 88.5 (2.8) | 3682.3 Da | 0.46 (0.06) | 0.89 (0.16) |
| 11821.9 Da | 115.6 (5.1) | 100.4 (3.8) | 3682.3 Da | 0.46 (0.06) | 0.89 (0.16) | 3682.3 Da | 34.2 (1) | 40.3 (2.3) | 12680.8 Da | 0.38 (0.07) | 0.78 (0.15) |
| 2887.8 Da | 24.6 (0.5) | 22.6 (0.6) | 2887.8 Da | 0.32 (0.04) | 0.18 (0.03) | 5647.0 Da | 45.2 (0.6) | 42.7 (0.8) | 6912.8 Da | 0.13 (0.01) | 0.38 (0.13) |
| 5849.7 Da | 49.7 (2.2) | 43.1 (1.0) | 8287.7 Da | 0.15 (0.03) | 0.07 (0.01) | ||||||
| 3682.3 Da | 34.2 (1.0) | 40.3 (2.3) | 5647.0 Da | 0.22 (0.02) | 0.14 (0.02) | ||||||
| 12680.8 Da | 0.38 (0.07) | 0.78 (0.15) | |||||||||
Showing quantification (SEM) for clusters with p-values less than 0.05.
CIN0 vs. CIN3 prevalence differences scored using the Fisher-exact test with mid-P correction.
| Fisher-exact tests with mid-P correction | |||
| Cluster | Prevalence, CIN3 | Prevalence, CIN0 | p-value |
| 3017.1 Da | 0.375 | 0.938 | 0.007 |
Showing only clusters with p-values less than 0.05.
CIN0 vs. CIN3 group tests (Mann-Whitney) based on peak height measurements under stringent condition (S/N 5/3) using Ciphergen Express.
| Cluster | Average Peak Height (SD) | |
| CIN3 | CIN0 | |
| 21663.4 | 0.843 (0.44) | 0.429 (0.41) |
| 3904.4 | 5.442 (7.75) | 12.201(10.72) |
| 7910.3 | 1.676 (1.36) | 6.00 (5.63) |
| 17205.5 | 0.121 (0.13) | 0.570 (0.55) |
| 8378.3 | 0.747 (0.35) | 1.571 (1.26) |
| 17341.8 | 0.171 (0.21) | 0.57 (0.60) |
Showing only clusters with p-values less than 0.05.
Figure 6An example peak cluster output from Ciphergen Express v3.5.