| Literature DB >> 19442303 |
Niclas C Tan1, Wayne G Fisher, Kevin P Rosenblatt, Harold R Garner.
Abstract
BACKGROUND: Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19442303 PMCID: PMC2688007 DOI: 10.1186/1471-2105-10-144
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Diagnostic accuracy measures from the default and AIC-optimal models in logistic regression
| Logistic Regression Model | Default | AIC-Optimal |
| Number of variables in final model | 2 | 5 |
| Goodness of Fit | 0.669 | 0.882 |
| AIC Statistic | 69.65 | 57.80 |
| Area under ROC curve | 0.793 | 0.910 |
| Sensitivity (%) | 63.16 | 57.89 |
| Specificity (%) | 82.22 | 95.56 |
| PPV (%) | 85.96 | 84.62 |
| NPV (%) | 84.09 | 84.31 |
| Percent accuracy (%) | 76.56 | 84.38 |
AIC = Akaike Information Criterion, ROC = Receiver Operating Characteristic, PPV = Positive Predictive Value, NPV = Negative Predictive Value
Discriminatory mass peaks from AIC-optimal models in logistic regression analysis on narcolepsy data set
| Logistic Regression Model | 1 | 2 | 3 | 4 | Pooled |
| Number of variables | 5 | 1 | 2 | 2 | 9 |
| Mass peaks ( | 1431.80 | 1809.98 | 1809.98 | 1722.93 | 1431.80 |
| 1839.98 | 3826.00 | 1740.94 | 1722.93 | ||
| 2225.14 | 1740.94 | ||||
| 3986.99 | 1809.98 | ||||
| 5857.74 | 1839.98 | ||||
| 2225.14 | |||||
| 3826.00 | |||||
| 3986.99 | |||||
| 5857.74 | |||||
Figure 1Tree diagram of best model from CART analysis.
Diagnostic accuracy measures of optimal CART model
| CART | Optimal Model |
| Number of variables in final model | 6 |
| Mass peaks ( | 1014.32, 1690.96, 1809.98, 3043.43, 3826.00, 3986.99 |
| Area under ROC curve | 0.984 |
| Sensitivity (%) | 78.95 |
| Specificity (%) | 88.89 |
| PPV (%) | 75.00 |
| NPV (%) | 90.91 |
| Percent accuracy (%) | 85.94 |
ROC = Receiver Operating Characteristic, PPV = Positive Predictive Value, NPV = Negative Predictive Value
Diagnostic accuracy measures of optimal t-test model
| T-test | Optimal Model |
| Number of variables in final model | 3 |
| Mass peaks ( | 1740.94, 3598.07, 5078.90 |
| Sensitivity (%) | 33.30 |
| Specificity (%) | 84.20 |
| PPV (%) | 50.00 |
| NPV (%) | 72.70 |
| Percent accuracy (%) | 67.90 |
All differential peaks have a p-value less than 0.05. PPV = Positive Predictive Value, NPV = Negative Predictive Value
Statistically differential peaks from UPGMA model
| Mass peak ( | Fold change | p-value |
| 1781.99 | 1.13 | 0.046 |
| 1809.98 | 1.15 | 0.007 |
| 3826.00 | 1.13 | 0.017 |
Peaks are presented with their respective fold change and p-value.
Diagnostic accuracy measures of optimal UPGMA model
| UPGMA | Optimal Model |
| Number of variables in final model | 3 |
| Mass peaks ( | 1781.99, 1809.98, 3826.00 |
| Area under ROC curve | 0.788 |
| Sensitivity (%) | 36.84 |
| Specificity (%) | 95.56 |
| PPV (%) | 77.78 |
| NPV (%) | 78.18 |
| Percent accuracy (%) | 78.13 |
ROC = Receiver Operating Characteristic, PPV = Positive Predictive Value, NPV = Negative Predictive Value
Diagnostic accuracy measures of consensus model
| Consensus Model | |
| Number of variables in final model | 2 |
| Mass peaks ( | 1809.98, 3826.00 |
| Area under ROC curve | 0.793 |
| Sensitivity (%) | 63.16 |
| Specificity (%) | 82.22 |
| PPV (%) | 85.96 |
| NPV (%) | 84.09 |
| Percent accuracy (%) | 76.56 |
Consensus peaks included in this model are peaks selected as statistically differential across three of the four algorithms. ROC = Receiver Operating Characteristic, PPV = Positive Predictive Value, NPV = Negative Predictive Value
Figure 2Diagnostic measures comparison of consensus model to the best model from each of the four statistical approaches. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and percent accuracy are plotted for the best model from each statistical approach. When the five parameters are evaluated collectively, the model with the best diagnostic performance is CART, followed by the Consensus model and Logistic Regression. The T-test and UPGMA models have a lower average diagnostic performance as evident in the greater spread of the values of the diagnostic accuracy measures.
Figure 3Schematic diagram of the multi-statistical workflow to discover consensus biomarker peaks.
Figure 4Peak alignment across spectra. Representative spectra from the high performance prOTOF mass spectrometer are shown before (top panel, 84 spectra) and after (bottom panel, 68 spectra) data preprocessing. Data preprocessing entailed removal of outlying spectra and normalization of signal intensity to the average TIC. Shown in the insets are the enlarged view of peak at m/z = 2021 which has a mass accuracy of < 10 ppm, rendering spectrum-to-spectrum alignment unnecessary. Each spectrum is plotted in a different color through arbitrary assignment.