| Literature DB >> 18616825 |
Anupama Reddy1, Honghui Wang, Hua Yu, Tiberius O Bonates, Vimla Gulabani, Joseph Azok, Gerard Hoehn, Peter L Hammer, Alison E Baird, King C Li.
Abstract
BACKGROUND: Strokes are a leading cause of morbidity and the first cause of adult disability in the United States. Currently, no biomarkers are being used clinically to diagnose acute ischemic stroke. A diagnostic test using a blood sample from a patient would potentially be beneficial in treating the disease.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18616825 PMCID: PMC2492849 DOI: 10.1186/1472-6947-8-30
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The distribution of clinical data
| Training | Validation | |||
| Stroke patients | Controls | Stroke patients | Controls | |
| Number of subject | 48 | 32 | 35 | 25 |
| Age | 78 ± 13.59 | 76 ± 7.71 | 74.5 ± 14.00 | 75 ± 7.29 |
| Gender (male) | 52% | 34% | 45% | 44% |
| Sampling in 48 hr* | 65% | N/A | 100% | N/A |
* The blood sample was collected after the stroke
Figure 1Flowchart for the pre-processing procedure.
Peptides in support-set and their corresponding source
| Training data | Validation data | ||
| Peak ID (M/Z) | Source | Peak ID (M/Z) | Source |
| C08689_4 (8689 Da) | CM10 chip, Fraction 4, Low noalign Ce3 | C08706_6 (8706 Da) | CM10 chip, Fraction 4, Low noalign Ce3 |
| C043564_ (43564 Da) | IMAC chip, Fraction 6, High noalign Ce3 | C043560_ (43560 Da) | IMAC chip, Fraction 6, High noalign Ce3 |
| C044761_ (44761 Da) | IMAC chip, Fraction 6, High noalign Ce3 | C044684_ (44684 Da) | IMAC chip, Fraction 6, High noalign Ce3 |
Figure 2SELDI peaks for the three potential biomarkers are shown in training and validation sets. 8 spectra from each dataset were selected randomly to show that peaks in the training set are indeed the same peaks observed in validation set, even though the M/Z value is not identical in both datasets (possibly due to calibration error).
LAD classification model
| Pattern | Degree | Positive Prevalence | Negative Prevalence | Positive Homogeneity | Negative Homogeneity | Hazard Ratio | C08689_4 | C043564_ | C044761_ |
| P1 | 1 | 37 (78.72%) | 3 (9.68%) | 92.50% | 7.50% | 3.52 | ≤ -0.154 | ||
| P2 | 2 | 35 (74.47%) | 4 (12.90%) | 89.74% | 10.26% | 2.92 | ≤ 0.162 | ≤ 0.553 | |
| P3 | 1 | 31 (65.96%) | 2 (6.45%) | 93.94% | 6.06% | 2.64 | ≤ -0.237 | ||
| N1 | 2 | 3 (6.38%) | 20 (64.52%) | 13.04% | 86.96% | 0.16 | > 0.162 | > -0.154 | |
| N2 | 1 | 2 (4.26%) | 11 (35.48%) | 15.38% | 84.62% | 0.22 | > 0.728 |
Figure 3A 3-D plot of the discriminant function on the training data. The region colored pink (blue) represents the positively (negatively) classified region. The surface of the discriminant function is colored purple.
Performance of LAD model
| Performance | Training set | Validation set | Cross-validation | |
| Logical Analysis of Data Model | Accuracy | 82.6% | 74.8% | 79.8 ± 2.9% |
| Sensitivity | 89.4% | 77.5% | 85.4 ± 5.4% | |
| Specificity | 74.2% | 72.0% | 70.6 ± 3.2% | |
| Hazard Ratio | 8.1 | 3.2 | 3.0 ± 0.3 |
Performance of other classification methods:
| Method | Performance | Training | Validation | Cross validation |
| C4.5 Decision Trees | Accuracy | 84.5% | 69.3% | 75.9 ± 2.9% |
| Sensitivity | 90.3% | 76.0% | 80.4 ± 4.8% | |
| Specificity | 78.7% | 62.5% | 71.4 ± 3.5% | |
| Logistic Regression | Accuracy | 73.8% | 64.8% | 71.1 ± 1.8% |
| Sensitivity | 64.5% | 52.0% | 59.4 ± 2.7% | |
| Specificity | 83.0% | 77.5% | 82.8 ± 3.3% | |
| Support Vector Machines | Accuracy | 81.8% | 68.5% | 77.6 ± 3.1% |
| Sensitivity | 74.2% | 52.0% | 87.50 ± 3.3% | |
| Specificity | 89.4% | 85.0% | 67.80 ± 5.2% | |
| Multilayer Perceptron | Accuracy | 88.8% | 68.5% | 82.20 ± 2.4% |
| Sensitivity | 96.8% | 72.0% | 78.00 ± 3.5% | |
| Specificity | 80.9% | 65.0% | 86.40 ± 3.7% | |
Figure 4Visualization of the pattern coverage on the training data (left) and test data (right). Each row indicates an observation, and each column indicates a pattern. All observations above the dashed line are stroke patients, while those below the dashed line are controls. A cell corresponding to an observation j and positive (negative) pattern p is colored red (blue) if j is covered by pattern p.
Comparison of LAD Regression and other methods
| Training | Validation | |||
| Method | Root Mean Square Error | Pearson Correlation | Root Mean Square Error | Pearson Correlation |
| Linear Regression | 5.345 | 0.456 | 6.934 | 0.200 |
| Multilayer Perceptron | 5.941 | 0.394 | 6.696 | 0.209 |
| Support Vector Regression | 6.119 | 0.454 | 6.871 | 0.149 |
| LAD Regression | 3.158 | 0.851 | 5.900 | 0.450 |
Formulas for Q1,..., Q11 in R(c)