| Literature DB >> 28376728 |
Bojana Jovanović1, Quanhu Sheng2, Robert S Seitz3, Kasey D Lawrence3, Stephan W Morris3, Lance R Thomas3, David R Hout3, Brock L Schweitzer3, Yan Guo2, Jennifer A Pietenpol4, Brian D Lehmann5,6.
Abstract
BACKGROUND: Triple negative breast cancer (TNBC) is a heterogeneous disease that lacks unifying molecular alterations that can guide therapy decisions. We previously identified distinct molecular subtypes of TNBC (TNBCtype) using gene expression data generated on a microarray platform using frozen tumor specimens. Tumors and cell lines representing the identified subtypes have distinct enrichment in biologically relevant transcripts with differing sensitivity to standard chemotherapies and targeted agents. Since our initial discoveries, RNA-sequencing (RNA-seq) has evolved as a sensitive and quantitative tool to measure transcript abundance.Entities:
Keywords: Formalin-fixed paraffin embedded; Fresh-frozen; RNA-seq; TNBCtype
Mesh:
Substances:
Year: 2017 PMID: 28376728 PMCID: PMC5379658 DOI: 10.1186/s12885-017-3237-1
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Fig. 1TNBC molecular subtype concordance between matched FF and FFPE samples processed on microarray and RNA-seq improves with increased prediction confidence. a Scatterplot shows TNBC subtype accuracy between microarray and RNA-seq as a function of prediction confidence in the TCGA breast (BRCA) cohort. b Plot shows RNA-seq prediction accuracy by confidence score. Vertical line cutoff demarks the prediction confidence score generating 95% concordance between platforms. c Scatterplot shows the concordance between microarray and RNA-seq platforms by strength of correlation to a subtype (prediction score)
Fig. 2MiSeq and HiSeq platform mapped read comparison from FF- and FFPE-derived RNA sequences. a Barplot depicts the percentage of mapped reads that are on-target, or off-target (intronic and intergenic) for FF and FFPE samples processed on MiSeq and HiSeq platforms. b Beeswarm box plot shows mapped reads (%) form individual FF (blue) and FFPE (red) samples processed on the HiSeq
Fig. 3FF and FFPE transcript correlation improves with increased sequencing depth. Density plots show the pairwise Spearman correlation of matched FF and FFPE samples for a all transcripts, b protein-coding transcripts or c TNBC centroid transcripts processed on the HiSeq or MiSeq platform
Fig. 4Removal of differential transcripts improves FF and FFPE gene expression correlation. Heatmaps display unsupervised hierarchical clustering of a sample-wise correlation coefficients, b all transcripts (n = 27,577) or c principal component analysis (PCA) of all transcripts. Following removal of differentially expressed transcripts between FF and FFPE samples, remaining transcripts (n = 15,624) were used to perform d sample-wise correlation coefficients e hierarchical clustering or f PCA. Underlined samples indicate clustering of paired FF and FFPE samples
Differential transcript analysis Hi-Seq
| Transcript Type | All Transcripts | Differential Transcripts (%) | FFPE (%) | FF (%) | |||
|---|---|---|---|---|---|---|---|
| rRNA | 44 | 42 | (95.5%) | 40 | (95.2%) | 2 | (4.8%) |
| Misc RNA | 437 | 412 | (94.3%) | 412 | (100.0%) | 0 | (0.0%) |
| snoRNA | 326 | 307 | (94.2%) | 306 | (99.7%) | 1 | (0.3%) |
| snRNA | 410 | 383 | (93.4%) | 383 | (100.0%) | 0 | (0.0%) |
| Sense intronic | 513 | 449 | (87.5%) | 448 | (99.8%) | 1 | (0.2%) |
| 3′ overlapping ncrna | 8 | 7 | (87.5%) | 7 | (100.0%) | 0 | (0.0%) |
| miRNA | 232 | 194 | (83.6%) | 192 | (99.0%) | 2 | (1.0%) |
| mt RNA | 10 | 7 | (70.0%) | 4 | (57.1%) | 3 | (42.9%) |
| Pseudogene | 3661 | 2504 | (68.4%) | 2348 | (93.8%) | 156 | (6.2%) |
| Antisense | 2881 | 1949 | (67.7%) | 1906 | (97.8%) | 43 | (2.2%) |
| Sense overlapping | 116 | 63 | (54.3%) | 63 | (100.0%) | 0 | (0.0%) |
| lincRNA | 1992 | 1041 | (52.3%) | 1009 | (96.9%) | 32 | (3.1%) |
| Processed transcript | 300 | 138 | (46.0%) | 129 | (93.5%) | 9 | (6.5%) |
| Polymorphic pseudogene | 17 | 7 | (41.2%) | 3 | (42.9%) | 4 | (57.1%) |
| Protein coding | 16,630 | 4450 | (26.8%) | 2112 | (47.5%) | 2338 | (52.5%) |
| Total | 27,577 | 11,953 | (43.3%) | 9362 | (78.3%) | 2591 | (21.7%) |
Differential Pathway Enrichment in FF samples
| Gene Set Name | # Genes Overlap |
| FDR |
|---|---|---|---|
| Cellular Compartment C5 | |||
| Cytoplasm | 391 | 1.77E-128 | 4.13E-126 |
| Cytoplasmic part | 265 | 8.87E-90 | 1.03E-87 |
| Organelle part | 228 | 2.04E-76 | 1.59E-74 |
| Intracellular organelle part | 227 | 4.69E-76 | 2.73E-74 |
| Nucleus | 243 | 3.58E-71 | 1.67E-69 |
| Macromolecular complex | 193 | 9.82E-70 | 3.81E-68 |
| Mitochondrion | 101 | 3.55E-52 | 1.18E-50 |
| Protein complex | 150 | 3.96E-48 | 1.15E-46 |
| Membrane | 240 | 1.92E-42 | 4.97E-41 |
| Mitochondrial part | 55 | 4.45E-36 | 1.04E-34 |
| Canonical Pathways C2 | |||
| Huntingtons disease | 71 | 7.32E-46 | 6.52E-43 |
| TCA cycle and electron transport | 63 | 9.81E-46 | 6.52E-43 |
| Alzheimers disease | 67 | 1.96E-44 | 6.53E-42 |
| Adaptive immune system | 116 | 1.96E-44 | 6.53E-42 |
| Immune system | 155 | 5.67E-44 | 1.51E-41 |
| Oxidative phosphorylation | 60 | 1.88E-43 | 4.17E-41 |
| Parkinsons disease | 59 | 1.08E-42 | 2.06E-40 |
| Metabolism of proteins | 111 | 2.29E-42 | 3.81E-40 |
| Respiratory electron transport | 50 | 3.40E-40 | 5.02E-38 |
| Metabolism of RAN | 83 | 2.13E-37 | 2.83E-35 |
Differential Pathway Enrichment in FFPE samples
| Gene Set Name | # Genes Overlap |
| FDR |
|---|---|---|---|
| Cellular Compartment C5 | |||
| Plasma membrane | 197 | 6.62E-61 | 1.03E-57 |
| Membrane | 234 | 3.30E-59 | 2.58E-56 |
| Plasma membrane part | 173 | 2.36E-58 | 1.23E-55 |
| Membrane part | 202 | 5.71E-53 | 2.23E-50 |
| Intrinsic to membrane | 177 | 2.20E-51 | 6.87E-49 |
| Intrinsic to plasma membrane | 150 | 2.82E-51 | 7.35E-49 |
| Integral to membrane | 173 | 1.24E-49 | 2.78E-47 |
| Integral to plasma membrane | 146 | 3.29E-49 | 6.43E-47 |
| Neuroactive ligand receptor | 55 | 1.12E-25 | 1.95E-23 |
| Canonical Pathways C2 | |||
| Naba matrisome | 97 | 6.19E-18 | 9.67E-16 |
| Neuroactive ligand receptor | 55 | 1.12E-25 | 1.49E-22 |
| Matrisome | 97 | 6.19E-18 | 4.11E-15 |
| Signaling by GPCR | 85 | 2.44E-15 | 9.19E-13 |
| GPCR downstream signaling | 78 | 2.76E-15 | 9.19E-13 |
| Neuronal system | 42 | 4.12E-15 | 1.10E-12 |
| Transmembrane transport | 51 | 1.89E-14 | 4.20E-12 |
| Matrisome associated | 71 | 1.87E-13 | 3.55E-11 |
| Calcium signaling pathway | 31 | 3.00E-13 | 4.98E-11 |
| Transmission chemical synapses | 28 | 1.48E-10 | 2.18E-08 |
| GPCR ligand binding | 43 | 3.61E-10 | 4.80E-08 |
Fig. 5Differential transcripts are enriched for longer transcripts in FFPE compared to FF samples. a Boxplot shows transcript length (log10 bp) distribution for all protein-coding transcripts (n = 16,630), non-differential transcripts (n = 4450), transcripts enriched in FF (n = 2338) or FFPE (n = 2112). b Beeswarm boxplot shows the distribution of length for individual protein coding transcripts enriched in FF or FFPE. Line graphs show c TTN and d SYNE1 exon level expression (count) along the transcript for in paired FF and FFPE samples
Fig. 6Accuracy of TNBC subtype calls between FF and FFPE depends on prediction confidence and sequencing depth. a Table summarizes TNBC subtype correlations, prediction calls, prediction confidence and concordance between matched FF and FFPE samples processed on the Illumina HiSeq and MiSeq. b Scatter plots show concordance (blue) between FF and FFPE samples run on HiSeq and MiSeq as a function of prediction confidence. c Scatterplots show the prediction confidence and prediction accuracy for FFPE (left) or FF (right) samples processed on the HiSeq (top) or MiSeq (bottom). d Scatter plots show prediction confidence and prediction strength for FFPE and FF samples processed on the HiSeq and MiSeq. Those samples with concordant subtype calls are indicated in blue and discordant calls in red