| Literature DB >> 24491042 |
Yiyi Liu, Quanquan Gu, Jack P Hou, Jiawei Han, Jian Ma1.
Abstract
BACKGROUND: Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification.Entities:
Mesh:
Year: 2014 PMID: 24491042 PMCID: PMC3916445 DOI: 10.1186/1471-2105-15-37
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic diagram of our algorithm.
Main notations used in this paper
| Number of samples | |
| Number of genes | |
| Number of sample clusters | |
| Number of gene clusters | |
| Gene expression matrix of size | |
| The | |
| The | |
| Sample partition matrix of size | |
| Feature partition matrix of size | |
| A | |
| A |
Figure 2NCIS result of the TCGA breast cancer expression data. Genes listed are the first 50 genes shared between the ordered p-value list (based on ANOVA test of each gene’s expression across the five subtypes) and the ordered gene weight list.
P-value of the dependence test for different clinical features and breast cancer subtypes
| 0.0444 | 2.03 × 10-3 | 1.68 × 10-3 | 2.33 × 10-3 | 6.17 × 10-3 | |
| 0.0442 | 6.22 × 10-3 | 3.84 × 10-3 | 2.67 × 10-3 | 6.24 × 10-3 | |
| 0.497 | 0.123 | 0.266 | 0.175 | 5.90 × 10-3 | |
| 0.359 | 3.29 × 10-3 | 2.08 × 10-4 | 0.187 | 8.35 × 10-3 | |
| 0.831 | 0.396 | 0.337 | 0.999 | 0.0780 |
For survival time, we used logrank test; for AJCC neoplasm disease lymph node stage, AJCC neoplasm disease stage, and AJCC tumor stage, we used Chi-squared test; for tumor nuclei percentage, we used ANOVA. Note that we did not use the normal-like subtype in this comparison.
Figure 3Expression patterns of ABCC8 subnetwork in breast cancer subtypes. Genes directly connected to ABCC8 and genes targeting ABCC8's downstream genes are included. Color of circle corresponds to gene expression level; size of circle corresponds to gene weight. (a) Subtype Luminal A; (b) Subtype Basal; (c) Subtype Luminal B; (d) Subtype HER2-enriched; (e) Subtype Normal-like.
Figure 4Accuracies on simulated datasets. NCIS (α = 0.85) vs. NCIS (α = 0) vs. consensus clustering on simulated datasets. Height of the solid boxes reflects the average accuracy in each setting (over 5 independent datasets simulated under the setting) and the bar indicates the standard deviation. P-value of paired one-sided t-test (25 data points for each group) for H0: Accuracy (NCIS (α = 0.85)) ≤ Accuracy (NCIS (α = 0)) is 0.0057. P-value of paired one-sided t-test (25 data points for each group) for H0: Accuracy (NCIS (α = 0.85)) ≤ Accuracy (Consensus clustering) is 0.0019.