| Literature DB >> 25294321 |
Tingting Jiang, Weiwei Shi, René Natowicz, Sophia N Ononye, Vikram B Wali, Yuval Kluger, Lajos Pusztai, Christos Hatzis1.
Abstract
BACKGROUND: Molecular heterogeneity of tumors suggests the presence of multiple different subclones that may limit response to targeted therapies and contribute to acquisition of drug resistance, but its quantification has remained challenging.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25294321 PMCID: PMC4197225 DOI: 10.1186/1471-2164-15-876
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Assessment of different transcriptional diversity metrics in simulated datasets. A) Simulated gene expression profiles generated using a hierarchical model to independently control within sample (σ ) and between samples (σ ) transcriptional variation and the number of latent subgroups. Each profile consists of 50 genes (rows) and 40 samples (columns). Profiles for 1, 2, 4 and 40 latent subgroups are shown for low (σ /σ = 0.5/1.5) and high (σ /σ = 0.5/0.5) relative between-to-within sample variation. B) Transcriptional diversity within the simulated profiles assessed using the mean pairwise Pearson distance, the mean pairwise cosine distance, or the mean dispersion distance. The boxplots represent the distributions of these metrics obtained from 500 independent simulations of each dataset (blue: low σ /σ , green high σ /σ ). C) Same metrics as in B assessed in a two latent subgroup dataset of 40 samples with increasing proportion of the smaller subgroup. Boxplots represent distribution over 500 independent simulations.
Breast cancer TCGA datasets used in this study
| Data | TCGA file link | N |
|---|---|---|
| Tumor Information |
| 466 |
| PAM50 Subtype |
| 466 |
| Gene Expression-Level 3 |
| 547 |
| Somatic Mutations |
| 463 |
| Copy Number Alterations |
| 466 |
| Gene Models (RefSeq) | UCSC table browser with track “RefSeq Genes” of genome version Feb. 2009 GRCh37/hg19 ( |
Figure 2Characterization of transcriptional and genomic heterogeneity of breast cancer subtypes. A) Transcriptional diversity of cancers within each subtype from the TCGA gene expression data [4] captured by the mean dispersion distance metric. B) Distributions of within and between patient standard deviations of gene expression levels for each subtype estimated from the TCGA gene expression data. C) Genomic heterogeneity of DNA copy number within each subtype estimated by the mean pairwise Hamming distance in the DNA copy number profiles of cancers from the TCGA dataset. D) Mutational heterogeneity of subtypes estimated by the mean pairwise Hamming distance between the somatic mutation profiles of cancers from the TCGA dataset. E) Transcriptional diversity of cancers from the Affymetrix U133A datasets assessed using the mean dispersion distance metric. F) Transcriptional diversity based on mean dispersion distance of basal-like tumors that achieved pathological complete response (pCR; n = 96) or had partial or no response (RD; n = 159) to preoperative chemotherapy. G) Estimated distributions of within and between patient standard deviations of gene expression within the pCR and RD basal-like phenotypes. H) Patient-patient pairwise correlation plots clustered to show substructure within the pCR and RD basal-like tumors. The scale for the Pearson correlation coefficient ranges from 0 (white) to 1 (blue). While chemo-sensitive tumors (pCR) show less structure, resistant tumors (RD) show a greater number of subgroups with relatively uniform gene expression. Boxplots represent the distribution of the corresponding metric obtained from 500 bootstrap resampling iterations of 100 cases from each subtype or subgroup.
Breast cancer Affymetrix U133A microarray datasets
| GEO dataset | N | PAM50 subtype | Response group basal-like | |||||
|---|---|---|---|---|---|---|---|---|
| Basal | Her2 | LumA | LumB | Normal | pCR | RD | ||
| GSE11121 | 200 | 23 | 18 | 110 | 24 | 25 | 0 | 0 |
| GSE20194 | 91 | 18 | 30 | 16 | 13 | 14 | 7 | 6 |
| GSE20271 | 81 | 19 | 15 | 22 | 9 | 16 | 4 | 15 |
| GSE2034 | 286 | 63 | 35 | 99 | 51 | 38 | 0 | 0 |
| GSE22093 | 96 | 44 | 14 | 16 | 12 | 10 | 18 | 24 |
| GSE25055 | 310 | 120 | 22 | 97 | 46 | 25 | 45 | 73 |
| GSE25065 | 198 | 68 | 15 | 62 | 34 | 19 | 22 | 41 |
| GSE7390 | 198 | 48 | 22 | 93 | 21 | 14 | 0 | 0 |
|
| 1460 | 403 | 171 | 515 | 210 | 161 | 96 | 159 |
Figure 3Transcriptional diversity of 50 KEGG biological pathways within breast cancer subtypes from the Affymetrix dataset. A) Heatmap of mean dispersion distance within breast cancer subtypes considering genes in each of the 50 KEGG pathways. Pathways (rows) are ranked from the least diverse at the top to the most diverse at the bottom. Blue represents low and red high mean dispersion. B) Detailed expression heat maps for basal-like cancers showing the heterogeneity of gene expression for genes in the least heterogeneous (ribosome metabolism; top) and the most heterogeneous pathways (linoleic acid metabolism; bottom). Blue represents low and red high expression level. C) Distribution of pathway-based transcriptional diversity within each subtype. Pathway-level mean dispersion distances were calculated by bootstrap as described in Supplementary Methods. D) Comparison of pathway-level transcriptional diversity between two clinically distinct phenotypes of basal-like cancers, an extremely chemosensitive (pCR) and a chemoresistant (RD). Points on the plot represent the average pathway-level mean pairwise dispersion obtained from boostrap within each of the 50 pathways. The dashed red line is the diagonal, indicating equal transcriptional diversity between the two response phenotypes. The regression line (blue solid line) its 95% pointwise confidence interval (grey area) is consistently below the diagonal suggesting greater transcriptional diversity for RD cancers throughout the 50 pathways. Pathways that were extreme outliers from the trend described by the regression line were identified by quantile-quantile plots of the standardized residuals. These pathways are indicated with letters as following: A – sphingolipid meta, B – SNARE interactions in vesicular transport, C – basal cell carcinoma, D – dorso-ventral axis formation, E – non-homologous end joining (DNA repair), F – folate biosynthesis.