| Literature DB >> 17032455 |
Manju R Mamtani1, Tushar P Thakre, Mrunal Y Kalkonde, Manik A Amin, Yogeshwar V Kalkonde, Amit P Amin, Hemant Kulkarni.
Abstract
BACKGROUND: In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that--assuming each molecular biomarker to be a diagnostic test--enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17032455 PMCID: PMC1618410 DOI: 10.1186/1471-2105-7-442
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The statistical algorithm used in the present study. Numbers correspond to the steps described in the text. Steps 1–3 were used on a training subset. The training subset was randomly chosen for the OvCa dataset while for the other two datasets, the training sets used by primary authors were used. Validation was done separately in the training and test subset within each dataset.
The publicly available datasets used in the current study
| Authors | Petricoin et al, 2002 [19] | Gordon et al, 2002 [18] | Golub et al, 1999 [17] | van't Veer et al 2002 [20] | ||||
| Dataset alias | OvCa | LuMe | LLML | BrCa | ||||
| Biomarker | Proteomic mass spectra | Gene expression | Gene expression | Gene expression | ||||
| # Biomarkers | 15,154 | 12,533 | 7,129 | 24,481 | ||||
| Diagnostic classes | Ovarian cancer | Normal | Lung adeno-carcinoma | Mesothelioma | Acute lymphocytic leukemia | Acute myeloid leukemia | Metastasis within 5 years | Metastasis after 5 years |
| N (Total) | 162 | 91 | 150 | 31 | 46 | 26 | 34 | 44 |
| N (Training) | 83 | 49 | 16 | 16 | 27 | 11 | 17 | 26 |
| N (Test set) | 79 | 42 | 134 | 15 | 19 | 15 | 17 | 18 |
Figure 2The performance index (P. The curves demonstrate that the diagnostic performance of the biomakers follows the Zipf's law. The colors for the four datasets are used consistently in Figure 3 and Supplementary Figures 1 and 2 (see additional file 1).
Summary of the discriminant model performance in the training subsets of the datasets used in the present study
| Model R2 | 0.9680 | 0.9618 | 0.9170 | |
| Wilk's λ | 0.0320 | 0.0382 | 0.0830 | |
| Mahanalobis D2 | 127.69 | 94.38 | 50.91 | |
| χ2 | 416.54 | 89.77 | 85.88 | |
| Canonical correlation | 0.9839 | 0.9807 | 0.9576 | |
| Eigenvalue | 30.26 | 25.17 | 11.05 | |
Figure 3The diagnostic performance of the proposed statistical algorithm. (A-D) The probability of the predicted diagnostic class in the training set of each dataset studied. Gradient background indicates a continually increasing or decreasing likelihood of the diagnostic classes. The abscissa indicates the discriminant score generated using the proposed algorithm. (E-H) Evaluation of the diagnostic performance of the proposed algorithm. The plots are ROC curves for the entire dataset (that is training and test sets combined) since the diagnostic performance of the discriminant score was consistently high in the training and test subsets when assessed separately. Area under the ROC curve (AUC) was non-parametrically estimated using the Wilcoxon method. Insets show the strikingly bimodal distribution of the discriminant scores in the entire (that is training and test subsets combined) datasets. SE, standard error.
Comparison of the results of the proposed algorithm with other approaches reported previously using the same datasets
| OvCa | Principal components | Lilien et al, 2003 [25] | |
| Wilcoxon test | Sorace et al, 2003 [53] | ||
| Logical analysis of data | Alexe et al, 2004 [35] | ||
| Statgram | Zhu et al, 2003 [41] | ||
| Genetic algorithm | Petricoin et al, 2002 [19] | ||
| Proposed algorithm | Present study | ||
| LuMe | Gene expression ratios | Gordon et al, 2002 [18] | |
| Proposed algorithm | Present study | ||
| LLML | Self-organizing maps | Toronen et al, 1999 [40] | |
| Neural networks | Bicciato et al, 2003 [38] | ||
| ICED | Bijlani et al, 2003 [65] | ||
| Support vector machines | Furey et al, 2000 [66] | ||
| Proposed algorithm | Present study | ||
| BrCa | Correlation | van't Veer et al [20] | |
| Proposed algorithm | Present study |
Comparison of the results of the proposed algorithm with other approaches using a simulated dataset (Syn1) of 100 samples and 1000 genes
| KMC | Cleaver 1.0 | [67] | 1000 | |
| kNN | GeneCluster 2.0 | [68] | 127 | |
| WV | GeneCluster 2.0 | [68] | 127 | |
| SAM | SAM for Excel | [69] | 203 | |
| PAM | PAM for Excel | [69] | 224 | |
| SVM | GEPAS | [70] | 250 | |
| Proposed | Stata 7.0 | [71] | 22 |