| Literature DB >> 20148167 |
Yarong Yang1, Eric J Kort, Nader Ebrahimi, Zhongfa Zhang, Bin T Teh.
Abstract
BACKGROUND: Gene set enrichment analysis (GSEA) is an analytic approach which simultaneously reduces the dimensionality of microarray data and enables ready inference of the biological meaning of observed gene expression patterns. Here we invert the GSEA process to identify class-specific gene signatures. Because our approach uses the Kolmogorov-Smirnov approach both to define class specific signatures and to classify samples using those signatures, we have termed this methodology "Dual-KS" (DKS).Entities:
Keywords: DKS algorithm; gene expression; gene set enrichment analysis (GSEA); gene signatures
Year: 2010 PMID: 20148167 PMCID: PMC2816930 DOI: 10.4137/cin.s2892
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Comparison of DKS variants. For each of the 10 test datasets, the error rate is plotted as a function of upregulated gene signature size (number of genes per class) ranging from 5 to 50 with an increment of 5. The three variations (default, weighted KS score, and rescaled KS score) are plotted to allow comparison of these methodologies.
Estimated error rates of various classification methods.Comparison of error rates estimated by the 0.632+ bootstrap method. Error rates were estimated for the dualKS method (rescaled variant) and compared to previously published estimates for the other methods, reproduced in the table.
| Leukemia | 2 | 1.4 | 2.9 | 2.0 | 2.5 | 6.2 | 5.6 | 5.1 | 2.2 |
| Breast (2 cl.) | 2 | 32.5 | 33.7 | 33.1 | 32.4 | 32.6 | 33.7 | 34.2 | 29.1 |
| Breast (3 cl.) | 3 | 38.0 | 44.9 | 37.0 | 39.6 | 40.1 | 42.4 | 35.1 | 35.7 |
| NCI 60 | 8 | 25.6 | 31.7 | 28.6 | 25.6 | 24.6 | 23.7 | 25.2 | 21.9 |
| Adenocar. | 2 | 20.3 | 17.4 | 19.4 | 17.7 | 17.9 | 18.1 | 12.5 | 23.9 |
| Brain | 5 | 13.8 | 17.4 | 18.3 | 16.3 | 15.9 | 19.4 | 15.4 | 14.1 |
| Colon | 2 | 14.7 | 15.2 | 13.7 | 12.3 | 12.2 | 15.8 | 12.7 | 14.9 |
| Lymphoma | 3 | 1.0 | 0.8 | 2.1 | 2.8 | 3.3 | 4.0 | 0.9 | 3.4 |
| Prostate | 2 | 6.4 | 10.0 | 14.9 | 8.8 | 8.9 | 8.1 | 7.7 | 15.8 |
| SRBCT | 4 | 1.7 | 2.3 | 1.1 | 1.2 | 2.5 | 3.1 | 2.1 | 2.2 |
Gene set sizes.Comparison of size of gene signature identified by random forest method and DKS (rescaled variant). Since DKS identifies an independent signature for each class, both the genes per class and the total number of genes across all classes are listed.
| Leukemia | 2 | 5 | 10 |
| Breast (2 cl.) | 14 | 5 | 10 |
| Breast (3 cl.) | 110 | 10 | 30 |
| NCI 60 | 230 | 10 | 80 |
| Adenocar. | 6 | 50 | 100 |
| Brain | 22 | 15 | 75 |
| Colon | 14 | 20 | 40 |
| Lymphoma | 73 | 15 | 45 |
| Prostate | 18 | 10 | 20 |
| SRBCT | 101 | 5 | 20 |
Figure 2Sample output of dualKS package. Shown is a plot of analysis of the SRBCT dataset generated by the dualKS package implementing the DKS algorithm. The data is plotted sorted by each class signature in turn so that the relationship between high and low scoring samples on each class may be inspected. The actual and predicted classes of each sample are indicated below the X axis.