| Literature DB >> 23216803 |
Chenwei Wang1, Alperen Taciroglu, Stefan R Maetschke, Colleen C Nelson, Mark A Ragan, Melissa J Davis.
Abstract
BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset.Entities:
Year: 2012 PMID: 23216803 PMCID: PMC3553066 DOI: 10.1186/2043-9113-2-22
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
ARIs of four feature selection methods combined with four clustering methods across 12 datasets
| COPA+CH | 0.12 | 0.04 | 0.69 | 0.20 | 0.64 | 0.15 | 0.23 | 0.38 | 0.06 | 0.05 | 0.16 | 0.45 |
| COPA+KM | 0.30 | 0.16 | 0.53 | 0.62 | 0.33 | 0.31 | 0.25 | 0.54 | 0.09 | 0.23 | 0.12 | 0.41 |
| COPA+PAM | 0.13 | 0.60 | 0.36 | 0.26 | 0.57 | −0.02 | 0.31 | 0.12 | 0.35 | |||
| COPA+SIL | 0.04 | 0.08 | 0.69 | 0.20 | 0.30 | 0.15 | 0.33 | 0.43 | 0.06 | 0.05 | 0.07 | 0.55 |
| DE+CH | 0.17 | 0.15 | 0.21 | 0.24 | 0.53 | 0.36 | 0.38 | 0.28 | 0.21 | 0.52 | 0.12 | 0.44 |
| DE+KM | 0.29 | 0.15 | 0.51 | 0.75 | 0.59 | 0.34 | 0.38 | 0.65 | 0.12 | 0.54 | ||
| DE+PAM | 0.35 | 0.13 | 0.24 | 0.79 | 0.76 | 0.34 | 0.26 | 0.56 | 0.11 | 0.46 | 0.16 | 0.43 |
| DE+SIL | 0.17 | 0.15 | 0.33 | 0.24 | 0.53 | 0.14 | 0.38 | 0.28 | 0.21 | 0.52 | 0.12 | 0.44 |
| mCOPA+CH | 0.29 | 0.15 | 0.60 | 0.55 | 0.40 | 0.35 | 0.38 | 0.39 | 0.06 | 0.52 | 0.11 | 0.30 |
| mCOPA+KM | 0.46 | 0.01 | 0.79 | 0.68 | 0.48 | 0.36 | 0.00 | 0.08 | ||||
| mCOPA+PAM | 0.10 | 0.55 | 0.82 | 0.54 | 0.44 | 0.49 | 0.03 | 0.50 | 0.44 | |||
| mCOPA+SIL | 0.29 | 0.14 | 0.60 | 0.61 | 0.40 | 0.35 | 0.38 | 0.39 | 0.06 | 0.52 | 0.11 | 0.30 |
| VAR+CH | 0.14 | 0.08 | 0.69 | 0.32 | 0.34 | 0.28 | 0.38 | 0.57 | 0.02 | 0.26 | 0.13 | 0.45 |
| VAR+KM | 0.16 | 0.16 | 0.47 | 0.81 | 0.47 | 0.23 | 0.34 | 0.64 | 0.09 | 0.15 | 0.17 | 0.41 |
| VAR+PAM | 0.17 | 0.16 | 0.81 | 0.43 | 0.02 | 0.26 | 0.61 | 0.10 | 0.06 | 0.15 | 0.33 | |
| VAR+SIL | 0.14 | 0.08 | 0.69 | 0.89 | 0.34 | 0.28 | 0.38 | 0.57 | 0.02 | 0.12 | 0.13 | 0.59 |
Note: the ARI scores in italicized bold indicate the best performing method for each dataset. Datasets in bold indicate those in which mCOPA provided the most informative feature selection for the clustering of clinical subtypes.
*Datasets: Pr (Prostate: GSE6099); C (Cervical: GSE7410); Mn (Melanoma: GSE7553); R1 (Renal: GSE11024); R2 (Renal: GSE11151); NPh (Nasopharangeal: GSE12452); Lm (Lymphoma: GSE12453); R3 (Renal: GSE15641); B (Brain: GSE15824); T (Thyroid: GSE29265); Br (Breast: GSE29431); L (Lung: GSE32036).
Figure 1Statistical evaluation of ARI scores. (A) Boxplots of ARI for four clustering methods across all 12 datasets. K-means achieves the highest ARI; (B) Boxplots of ARI for four feature selection methods based on k-means clustering results across 12 datasets. mCOPA achieves the highest ARI.
Figure 2The clustering result comparison for mCOPA and differential expression (DE) analysis. (A) mCOPA produces three clusters: normal samples (blue), tumour samples (green), and metastasis samples (red). (B) DE also produces three clusters: normal samples (blue), a small cluster of tumour samples (green) and a large cluster of mixed tumour and metastatic samples. Misclassified samples are highlighted.
Figure 3Analysis of the number of samples sharing a given outlier. Most outlier features are outliers in only a small number of metastatic samples, with very few outliers shared across more than three samples. Very similar proportions are observed when sample counts for either under- or over-expressed outliers are considered separately. Counts are only shown for those outliers that occur in the metastasis cluster. A further 201 outliers map exclusively to non-metastatic samples.
Figure 4Details of outlier pathway analysis highlighting the differences between pathways significantly disrupted in individual samples. In the first two pathways, only a single sample shows significant pathway enrichment, whereas the last two pathways are more-generally affected in 15% and 30% of metastatic samples respectively.