| Literature DB >> 35027621 |
Claudia Cava1, Alexandros Armaos2,3, Benjamin Lang2,4, Gian G Tartaglia2,3,5, Isabella Castiglioni6.
Abstract
Breast cancer is a heterogeneous disease classified into four main subtypes with different clinical outcomes, such as patient survival, prognosis, and relapse. Current genetic tests for the differential diagnosis of BC subtypes showed a poor reproducibility. Therefore, an early and correct diagnosis of molecular subtypes is one of the challenges in the clinic. In the present study, we identified differentially expressed genes, long non-coding RNAs and RNA binding proteins for each BC subtype from a public dataset applying bioinformatics algorithms. In addition, we investigated their interactions and we proposed interacting biomarkers as potential signature specific for each BC subtype. We found a network of only 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and RASSF7) for luminal A, a network of 21 RBPs and 53 genes for luminal B, a HER2-specific network of 14 RBPs and 30 genes, and a network of 54 RBPs and 302 genes for basal BC. We validated the signature considering their expression levels on an independent dataset evaluating their ability to classify the different molecular subtypes with a machine learning approach. Overall, we achieved good performances of classification with an accuracy >0.80. In addition, we found some interesting novel prognostic biomarkers such as RASSF7 for luminal A, DCTPP1 for luminal B, DHRS11, KLC3, NAGS, and TMEM98 for HER2, and ABHD14A and ADSSL1 for basal. The findings could provide preliminary evidence to identify putative new prognostic biomarkers and therapeutic targets for individual breast cancer subtypes.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35027621 PMCID: PMC8758778 DOI: 10.1038/s41598-021-04664-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow of the computational approach. The computational method was applied considering the comparison of each breast cancer (BC) subtype vs normal breast tissue. Differential expressional analysis with quantile-adjusted conditional maximum likelihood was performed on The Cancer Genome Atlas (TCGA) data between each breast cancer subtype and normal breast tissue. The analysis identified differentially expressed genes (DEGs), differentially long non-coding RNAs (DE lncRNAs) and differentially RNA-binding proteins (DE RBPs). lncRNAs and RBPs were defined by LNCipedia and by study of Hentze et al. Furthermore, we evaluated the interactions among DEGs, DE lncRNAs and DE RBPs using RNAct tool. The expression levels of interacting DEGs, DE lncRNAs and DE RBPs specific for each BC subtype were considered as biomarkers to classify BC subtypes with Support Vector Machine (SVM) classification, using an independent dataset of Gene Expression Omnibus (GEO), GSE58212.
Differentially expressed long non-coding (LNC) RNAs in luminal A, luminal B, HER2 and basal breast cancer.
| LumA vs normal | LumB vs normal | HER2 vs normal | Basal vs normal | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| LNC | lgFC | FDR | LNC | lgFC | FDR | LNC | lgFC | FDR | LNC | lgFC | FDR |
| HOTAIR | 2.4 | 2.6E-65 | SNHG5 | -1.2 | 2.4E-14 | SNHG8 | -1.2 | 1.5E-09 | XIST | -1.2 | 6.7E-12 |
| HCG11 | -1.1 | 6.4E-19 | PVT1 | 1.6 | 1.3E-25 | XIST | -1.0 | 9.3E-07 | UCA1 | 4.7 | 2E-156 |
| HPYR1 | 2.2 | 1.0E-53 | HCG11 | -1.5 | 2.1E-21 | SNHG5 | -1.1 | 5.0E-08 | HCP5 | 1.1 | 1.2E-10 |
| TSIX | -1.0 | 9.2E-17 | HOTAIR | 1.9 | 3.7E-36 | GAS5 | -1.0 | 4.5E-07 | MYCNOS | 3.0 | 1.5E-72 |
| EMX2OS | -1.7 | 3.9E-45 | HPYR1 | 2.5 | 5.3E-56 | MYCNOS | 4 | 3.4E-112 | HOTAIR | 2.1 | 1.6E-37 |
| MYCNOS | 1.5 | 7.5E-29 | DLEU2 | 1.8 | 3.6E-33 | HOTAIR | 2.7 | 4.3E-55 | SNHG3 | 1.2 | 2.8E-12 |
| DLEU2 | 1.3 | 2.0E-22 | MYCNOS | 1.7 | 2.2E-29 | HCG11 | -1.4 | 2.1E-11 | MIR155HG | 1.2 | 1.0E-13 |
| LOH12CR2 | -1.0 | 3.0E-11 | UCA1 | 2.4 | 2.7E-42 | PART1 | 1.1 | 1.7E-11 | |||
| EMX2OS | -2.4 | 1.6E-50 | EMX2OS | -2.2 | 6.6E-25 | DLEU2 | 1.7 | 1.2E-23 | |||
| UCA1 | 1.1 | 1.7E-13 | LOH12CR2 | -1.0 | 4.8E-07 | EMX2OS | -2.0 | 2.3E-29 | |||
| PART1 | -1.9 | 6.9E-35 | EGOT | -1.6 | 2.1E-14 | EGOT | -2.2 | 1.4E-33 | |||
LgGC log-fold change, FDR false discovery rate.
Figure 2Venn diagram. It identifies: (A) common differentially expressed long non-coding RNAs (lncRNAs) and (B) RNA-binding proteins (RBPs) among breast cancer molecular subtypes.
Functional information of lncRNAs as reported the Cancer lncRNA Census 2.0.
| GENCODE ID | Cancer function | Cancer type | |
|---|---|---|---|
| DLEU2 | ENSG00000231607 | T | Blood; head_and_neck; liver |
| EGOT | ENSG00000235947 | T | |
| GAS5 | ENSG00000234741 | Both | |
| HCG11 | ENSG00000228223 | O | Liver |
| HCP5 | ENSG00000206337 | O | Thyroid;bone;cervical |
| HOTAIR | ENSG00000228630 | O | Cervical; ovarian; blood; colon; kidney; nasopharyngeal; lung; head_and_neck; |
| MIR155HG | ENSG00000234883 | N/A | Blood |
| MYCNOS-01 | ENSG00000233718 | O | Neuroepithelial |
| PART1 | ENSG00000152931 | O | Prostate; esophageal |
| PVT1 | ENSG00000249859 | Both | Ovarian; |
| SNHG3 | ENSG00000242125 | O | Colon |
| SNHG5 | ENSG00000203875 | O | Urothelial; colon; stomach |
| SNHG8 | ENSG00000269893 | O | Stomach; lung; liver |
| UCA1 | ENSG00000214049 | O | Urothelial; liver; |
| XIST | ENSG00000229807 | Both | Neuroepithelial; blood; lung; |
T tumor suppressors, O oncogenes.
The table shows differentially expressed genes (DEGs), differentially expressed long Non-Coding (DE-lncRNAs) RNAs and differentially RNA binding-proteins (DE-RBPs).
| DEGs | DEGs Ω DE- lncRNAs | DEGs Ω DE-RBPs | |
|---|---|---|---|
| 233 lumA vs 113 normal | 3199 | 7 | 99 |
| 103 lumB vs 113 normal | 4074 | 11 | 176 |
| 43 HER2 vs 113 normal | 4134 | 11 | 155 |
| 74 Basal vs 113 normal | 4181 | 11 | 179 |
| Tot (unique) | 5980 | 19 | 281 |
Figure 3The boxplots show the number of RNAs or RNA-binding proteins (RBPs) that there are per protein or RNA, considering direct interactions involving only differentially expressed RBPs and genes present in a single breast cancer subtype (t-test, * p-value < 0.05, ** p-value < 0.01, *** p-value < 0.001).
List of potential prognostic subtype specific RBPs/genes as obtained from The Human Protein Atlas.
| Prognostic RBP | Prognostic gene | |
|---|---|---|
| Luminal A | RASSF7 ( | |
| Luminal B | CSE1L ( HOOK1 ( KIF1C ( LSM1 ( MRPS23 ( MTDH ( UTP18 ( YBX2 ( ZFP36L1 ( | ABHD12 ( ANAPC11 ( ASIP ( CCDC107 ( CLYBL ( COMMD3 ( COMMD5 ( CYC1 ( DCTPP1 ( DUSP22 ( EPB41L4B ( ETS1 ( FDXR ( GRINA ( HYI ( IL1R1 ( ITM2C ( JDP2 ( LBH ( LYPLA1 ( OLFM2 ( PHF20L1 ( PLEKHF1 ( SDC3 ( SYNGR2 ( TSEN54 ( UBE2L6 ( |
| HER2 | ALDH18A1 ( CXorf57 ( HSPD1 ( PRDX1 ( RPL22 ( RPL26 ( SBDS ( SPATS2 ( | ARRDC1 ( CHPT1 ( CITED4 ( DHRS11 ( FBXO2 ( KLC3 ( NAGS ( PKIG ( PPP4R4 ( RAB40B ( SMARCD3 ( TMEM98 ( |
| Basal | BTF3 ( BYSL ( CCT6A ( CORO1A ( CPEB3 ( DKC1 ( EIF4B ( ENO1 ( EPRS ( FASN ( GTPBP4 ( HDAC2 ( ILF2 ( IQCG ( LARP4B ( MAGOH ( MAGOHB ( MRPL15 ( NDRG1 ( NPM3 ( NSA2 ( RPS14 ( SNRPD1 ( UCHL5 ( YARS ( | ABCD4 ( ABHD14A ( ACOT1 ( ACOT2 ( ACSF2 ( ADRA2C ( ADSSL1 ( APBB2 ( ASNS ( C2CD4D ( CAMK2B ( CAPG ( CCDC103 ( CCDC28B ( CCNG1 ( CD6 ( CD7 ( CD72 ( CD8B ( CDC123 ( CGREF1 ( CKS1B ( COQ10A ( COTL1 ( CRADD ( CROT ( CTDSP1 ( CTSF ( DOK3 ( EMID1 ( EPHX1 ( FAM117A ( FAM172A ( FAM189B ( FAM24B ( FAM26F ( FOXQ1 ( FRAT1 ( FSTL3 ( FXYD7 ( GHDC ( GJB3 ( GJC1 ( GNB4 ( GNMT ( GPX4 ( GRP ( GSC ( GSTM3 ( HHEX ( HLA-B ( HLA-DQB1 ( HLA-DRB1 ( HLA-F ( HLA-G ( HSD17B8 ( HTATIP2 ( HYOU1 ( ICAM1 ( IFITM2 ( IL12RB1 ( IL2RG ( IL32 ( IRAK1 ( ISG20 ( KRTCAP3 ( LCK ( LRRC14B ( LYPD5 ( MAP4K1 ( MDFI ( MORN1 ( MTHFD1L ( NDUFB9 ( NFKBIE ( NFKBIZ ( OPN3 ( PCDHB2 ( PDK3 ( PEX11G ( PGPEP1 ( PHF7 ( PLA2G7 ( PSMB8 ( PSMG1 ( PTS ( PUSL1 ( PVR ( PXMP4 ( QPCT ( RBM43 ( RBP1 ( REEP5 ( RELB ( RHOB ( RNF212 ( RNF8 ( RPS6KA5 ( S100A3 ( SAP30L ( SETD7 ( SH2B2 ( SHC2 ( SKP2 ( SLC26A1 ( SMOX ( SPIB ( SPOCK2 ( SSBP2 ( ST8SIA6 ( SULT1A1 ( SUV39H2 ( TAPBP ( TARBP1 ( THRA ( TMEM135 ( TMEM25 ( TMEM52 ( TOMM5 ( TOP1MT ( TPSAB1 ( TRIM47 ( TSC22D3 ( TSKU ( TSPAN33 ( TTC36 ( UPK3A ( ZDHHC23 ( ZFYVE21 ( ZNF582 ( |
P-value is indicated as obtained from the log-rank test.
Performance of classification Support Vector Machine (SVM) and Random Forest (RF) using subtype-specific interactions (sensitivity, specificity and accuracy).
| LumA vs lumB,HER2,basal | LumB vs lumA,HER2,basal | HER2 vs lumA,lumB,basal | Basal vs lumA,lumB,HER2 | |||||
|---|---|---|---|---|---|---|---|---|
| SVM | RF | SVM | RF | SVM | RF | SVM | RF | |
| Sensitivity | 0.86 | 0.93 | 0.64 | 0.35 | 0.87 | 0.12 | 0.77 | 0.78 |
| Specificity | 0.91 | 0.94 | 0.91 | 0.96 | 0.98 | 1 | 1 | 1 |
| Accuracy | 0.89 | 0.94 | 0.84 | 0.80 | 0.96 | 0.89 | 0.96 | 0.97 |