| Literature DB >> 21347125 |
Milos Hauskrecht1, Richard Pelikan.
Abstract
High-throughput biological assays such as micro-arrays and mass spectrometry (MS) have risen as potential clinical tools for disease detection. Multiple potential biomarkers can be rapidly and cheaply evaluated for a large number of patients. Typical research and evaluation studies in these fields have focused primarily on data that were generated from samples in a single data-generation session. However, in the clinical setting, new patients screened by the technology will arrive at different times and data will unavoidably come from multiple data-generation sessions. The understanding and assessment of multi-session effects on data generated by the technology is critical for its application to clinical practice. This paper proposes a methodology for measuring and testing the reproducibility of various aspects of high-throughput data across multiple data-generation sessions. We test and demonstrate the framework on mass-spectrometry data obtained from four different data-generation sessions for the same set of samples.Entities:
Year: 2008 PMID: 21347125 PMCID: PMC3041518
Source DB: PubMed Journal: Summit Transl Bioinform ISSN: 2153-6430
Figure 1.Mean fixed-session and mean inter-session differences for two samples on peaks in the range of 7000 to 10000 Da. The means for fixed session data are labeled by circles; the means for the inter-session data are labeled by crosses.
Figure 2.Mean same-sample and mean two-sample inter-session differences on peaks in the range of 7000 to 10000 Da. The means for the same-sample data are labeled by circles; the means for the two-sample data are labeled by crosses.
Figure 3.Differential expression for four single-session datasets versus the mean differential expression for mixed-session data. The means for the mixed-session data are labeled by circles; the scores for the single-session data are labeled by crosses.
Figure 4.Average differential expression score across all profile peaks. The distribution of scores (mean shown by solid line) for mixed session data is shown and compared to scores obtained for fixed single session datasets (dashed lines).
Gains and losses in classification accuracy obtained using multivariate patterns.
| Chosen sessions | Test 3 Classification |
|---|---|
| Session #1 | 84.33% |
| Session #2 | 85.50% |
| Session #3 | 80.83% |
| Session #4 | 87.67% |
| Mixed sessions average | 81.42% |
Test 4 classification accuracies when training a future-looking model on mixed data sessions versus models trained under the ideal setting.
| Test Session | Mixed Sessions | Ideal Setting |
|---|---|---|
| Session #1 | 79.61% | 82.04% |
| Session #2 | 87.12% | 85.73% |
| Session #3 | 75.93% | 80.92% |
| Session #4 | 85.74% | 88.39% |