| Literature DB >> 24564171 |
Peikai Chen, Yubo Fan, Tsz-kwong Man, Y S Hung, Ching C Lau, Stephen T C Wong.
Abstract
BACKGROUND: Subtypes are widely found in cancer. They are characterized with different behaviors in clinical and molecular profiles, such as survival rates, gene signature and copy number aberrations (CNAs). While cancer is generally believed to have been caused by genetic aberrations, the number of such events is tremendous in the cancer tissue and only a small subset of them may be tumorigenic. On the other hand, gene expression signature of a subtype represents residuals of the subtype-specific cancer mechanisms. Using high-throughput data to link these factors to define subtype boundaries and identify subtype-specific drivers, is a promising yet largely unexplored topic.Entities:
Mesh:
Year: 2013 PMID: 24564171 PMCID: PMC3820164 DOI: 10.1186/1471-2105-14-S18-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A hypothetical diagram depicting the role of CNAs in cancer, subtypes and their relations with signatures. While most CNAs are assumed to affect the expressions of the affected genes only and have limited harms, a few CNAs together with other drivers, e.g. mutations, may perturb subtype-specific pathways and trigger subtypes of cancer. This consequently leads to the observation of a set of signature genes.
Figure 2Approach overview. Gene expression microarray measurements are used for training the class labels and detecting signature genes. The copy number profiles (by SNP arrays) of the corresponding samples undergo candidate pre-selection by GISTIC. A subtype-specific method identifies top signature-related candidates.
Figure 3Convergence of the iterative subtyping algorithm. A-C, The convergence properties of the subtyping algorithm on the three datasets, Cho73 (A), Northcott90 (B) and Kool62 (C), each with various subtype-specific FCTs. D-F, The convergence properties to the three medulloblastoma datasets as above, each with various initial numbers of clusters ranging from 2 to 20. The FCT of 1.0 is used in all experiments in D-F.
Figure 4The subtypes and signatures as the algorithm converges. The numbers of cases and signatures (in brackets) in each good (and core) clusters as the algorithm converges (in 10 iterations) on three datasets, Cho73 (A), Northcott90 (C) and Kool62 (E). The cases are ordered according to subtyping of the last (converged) iteration. All experiments were started with 20 initial clusters, FCT set to 1.0 and ΔK = 1. The converged dendrograms of the corresponding datasets are shown in B, D and F, respectively.
Cross-dataset comparisons of converged subtype signatures.
| Datasets and subtypes | Northcott90 | Kool62 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| A | B | C | (# o.v.g.) | A | B | C | (# o.v.g.) | (# n.s.g., %) | |
| Cho73 | |||||||||
| A | 136 | 0 | 1 | 207 | 2 | 6 | 345 (128, 37.1) | ||
| B | 4 | 64 | 3 | 6 | 105 | 6 | 222 (56, 25.2) | ||
| C | 0 | 1 | 37 | 3 | 5 | 66 | 110 (60, 54.5) | ||
| (# o.v.g.) | (237) | (378) | |||||||
| Northcott90 | |||||||||
| A | 260 | 1 | 2 | 377 (97, 25.8) | |||||
| B | 1 | 110 | 2 | 175 (45, 25.7) | |||||
| C | 1 | 0 | 81 | 150 (77, 51.3) | |||||
| (# o.v.g.) | (451) | ||||||||
| # s.g. | 377 | 175 | 150 | 757 | 335 | 307 | |||
| (# n.s.g.) | (97) | (45) | (77) | (219) | (126) | (172) | |||
| (%) | (28.9) | (37.6) | (56.0) | ||||||
The datasets have different numbers of probe-sets and were normalized separately, leading to different numbers of signature genes for different datasets of a subtype. o.v.g.=overlapping genes; s.g.=signature genes; n.s.g.=negative signature genes.
Cross-dataset validations of the subtypes.
| Testing sets | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training sets and self-trained subtypes | Cho73 | Northcott90 | Kool62 | |||||||||
| A | B | C | (accu.) | A | B | C | (accu.) | A | B | C | (accu.) | |
| Cho73 | (98.9%) | (96.7%) | ||||||||||
| A | 7 | 0 | 0 | 9 | 0 | 0 | ||||||
| B | 0 | 27 | 0 | 0 | 13 | 0 | ||||||
| C | 0 | 1 | 55 | 0 | 2 | 38 | ||||||
| Northcott90 | (97.3%) | (100%) | ||||||||||
| A | 8 | 0 | 0 | 9 | 0 | 0 | ||||||
| B | 0 | 17 | 2 | 0 | 15 | 0 | ||||||
| C | 0 | 0 | 46 | 0 | 0 | 38 | ||||||
| Kool62 | (100%) | (100%) | ||||||||||
| A | 8 | 0 | 0 | 7 | 0 | 0 | ||||||
| B | 0 | 17 | 0 | 0 | 28 | 0 | ||||||
| C | 0 | 0 | 48 | 0 | 0 | 55 | ||||||
This table shows the numbers of cases in each testing set predicted to be of a subtype trained by the training set (first column). For example, the "2" in the testing set Kool62 indicates that 2 cases were self-trained to be of subtype B, but predicted to be of subtype C by the Cho73 dataset. accu.=accuracy.
Figure 5Examples of key signature genes. Examples of gene signatures for individual subtypes in Kool62. A, B, C: example key signature genes in Subtypes A, B and C, respectively. Within each plot, the three boxplots refer to expressions of a gene in the three subtypes, respectively. Normal refers to normal cases. Also shown are the original data with small amount of jittering on the horizontal axis. FC: fold change. Ranked by LIMMA adjusted p-values.
The subtype signatures of the non-Wnt/non-Shh subtypes by previous studies.
| Datasets and original labels | Subtype WNT (# s.g.) | Subtype SHH (# s.g.) | NWS subtypes (# s.g.) |
|---|---|---|---|
| Cho73 | |||
| 1 | 313 | 243 | 34 * |
| 2 5 7 | 342 | 237 | 45 *, ‡, ¶ |
| 4 | 366 | 222 | 75 †, ‡, ¶ |
| 5 | 339 | 244 | 93 †, ¶ |
| 7 | 330 | 245 | 6 |
| (Total overlaps) | (203) | (127) | (1) |
| (Selective overlaps) Northcott90 | (* 7, † 34, ‡ 25, ¶12) | ||
| Group C | 330 | 179 | 170 |
| Group D | 396 | 181 | 186 |
| (Total overlaps) | (253) | (112) | (76) |
| Kool62 | |||
| Subtype C | 773 | 287 | 284 *,‡ |
| Subtype D | 738 | 299 | 237 *, †, ‡ |
| Subtype E | 657 | 268 | 105 † |
| (Total overlaps) | (498) | (167) | (46) |
| (Selective overlaps) | (* 158, † 66, ‡ 53) |
Each row represents the numbers of signature genes for the WNT, SHH and one of the NWS subtypes (the label of which is shown on the first column of each row), respectively. The total overlaps refer to the intersection of all signatures of a group in each dataset, e.g., |SSubtype C ∩ SSubtype D ∩ SSubtype E| = 46. Selective overlaps refer to the intersection of the signatures with the same symbol. The difference between this table and Table S1 is that in the latter, all subtypes are trained together, while in the former, only one of the reported NWS subtypes is trained at a time with the WNT, SHH and normal cases for detection of signatures. s.g.=signature genes.
Figure 6Subtype-specific GISTIC landscapes of the Cho73 dataset. The three sub-plots correspond to GISTIC copy number landscapes of the three subtypes of the Cho73 dataset, respectively. A, Subtype A (WNT); B, Subtype B (SHH); C, Subtype C (NWS). In each subplot, the upper panel (red) corresponds to the recurrent copy number gains, while the lower panel (blue) corresponds to the recurrent copy losses. The numbers to left of each panel refer to the G-scores. The numbers to the right of each panel refer to the − log10 q-values. The green lines refer to the q-value threshold of 0.25 (or − log10 q = 0.602). The numbers (1 to 22) in-between the panels refer to the autosomes.
Candidate CNA drivers within each subtype.
| Subtypes | Significant candidate drivers |
|---|---|
| Subtype A | |
| (WNT) | |
| Subtype B | |
| (SHH) | |
| Subtype C | |
| (NWS) |