| Literature DB >> 27634900 |
Trung Nghia Vu1, Setia Pramana1, Stefano Calza1,2, Chen Suo1, Donghwan Lee1,3, Yudi Pawitan1.
Abstract
Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.Entities:
Keywords: RNA sequencing; breast cancer; non-coding RNAs; subtype co-expression; subtype-specific isoforms
Mesh:
Substances:
Year: 2016 PMID: 27634900 PMCID: PMC5356595 DOI: 10.18632/oncotarget.11998
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Pipeline of systematic identification of subtype-specific isoforms and subtype co-expression isoforms from breast cancer TCGA RNA-seq data
T1 is a statistic to compare a single subtype against all other subtypes. Statistic T2 is used to compare the corresponding other subtypes to each other.
Figure 2Isoform-level expression distribution of ESR1 gene across 5 molecular subtypes
X-axis labels are transcripts ids. The figure shows that ESR1 expression is mostly contributed by three isoforms: NM_000125, NM_001122740 and NM_001291241, and these isoforms have similar expression patterns.
Figure 3Boxplots of isoforms of gene AGTR1
Three isoforms in this gene are over-expressed in primarily the Luminal A and Normal-like, but not in the Luminal B subtype.
Figure 4Isoform NM_024792 of the FAM57A gene is specific for the Basal subtype
Top 5 subtype-specific isoforms for each subtype
| Subtype | mRNA isoforms | lncRNA isoforms |
|---|---|---|
| Basal | ||
| Her2 | ||
| Luminal A | ||
| Luminal B | ||
| Normal-like |
Median AUC of the top 5 subtype-specific isoforms in the discovery and validation sets
| mRNA | lncRNA | |||
|---|---|---|---|---|
| Discovery | Validation | Discovery | Validation | |
| Basal | 0.93 | 0.88 | 0.88 | 0.87 |
| Her2 | 0.87 | 0.81 | 0.78 | 0.65 |
| Luminal A | 0.76 | 0.72 | 0.75 | 0.75 |
| Luminal B | 0.78 | 0.72 | 0.75 | 0.68 |
| Normal-like | 0.96 | 0.92 | 0.96 | 0.90 |
Figure 5Color-map of the top 125 subtype-specific isoforms (25 from each subtype) from (a) the discovery and (b) validation sets
Red and green indicate expression levels above and below median, respectively. The isoforms in each subtype are ordered by AUC from bottom to top and right to left.
Figure 6Separating Luminal A and Luminal B subtypes: ROC curves for the top 5 and 74 isoforms in the discovery and validation sets
Figure 7Expression at protein level
a. and isoform level b. of G6PD gene. Isoform NM_001042351 and the protein are specific to Her2 subtype.