| Literature DB >> 34579745 |
Alaa Zare1, Lynne-Marie Postovit2, John Maringa Githaka3.
Abstract
Inflammatory breast cancer (IBC) is a rare, aggressive cancer found in all the molecular breast cancer subtypes. Despite extensive previous efforts to screen for transcriptional differences between IBC and non-IBC patients, a robust IBC-specific molecular signature has been elusive. We report a novel IBC-specific gene signature (59 genes; G59) that achieves 100% accuracy in discovery and validation samples (45/45 correct classification) and remarkably only misclassified one sample (60/61 correct classification) in an independent dataset. G59 is independent of ER/HER2 status, molecular subtypes and is specific to untreated IBC samples, with most of the genes being enriched for plasma membrane cellular component proteins, interleukin (IL), and chemokine signaling pathways. Our finding suggests the existence of an IBC-specific molecular signature, paving the way for the identification and validation of targetable genomic drivers of IBC.Entities:
Keywords: Breast cancer; IBC; IBC signature; Machine learning; Random forest
Mesh:
Substances:
Year: 2021 PMID: 34579745 PMCID: PMC8477487 DOI: 10.1186/s13058-021-01467-y
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Fig. 1Identification of an IBC-specific gene signature. a Left: List of IBC and non-IBC samples used for gene signature discovery (GSE45581 dataset). Row wise matched HER2/ER scores are highlighted and sample accessions numbers (GSM) from gene expression omnibus (GEO) database are indicated. Middle: Strategy for signature discovery. Right: Strategy for signature validation. b Unsupervised hierarchical clustering heatmap of all samples (GSE45581 dataset) using the IBC signature genes. c The Optimal number of clusters determined by the Caliński–Harabasz criterion. d Principal Component Analysis scatter plot using the first and second principal components. e Waterfall plot for all samples’ IBC probability score (see Additional file 1: Supp. Methods) validating the signature. The dotted line demarcates the minimum probability score to classify the sample as IBC in the model. PAM50 molecular subtyping and ROR scores are indicated. f Distribution of expected accuracy from models trained using random sets of 59 genes (10,000 iterations) compared with the 100% accuracy observed in IBC signature (dotted distribution line versus solid vertical line, respectively)
Fig. 2Independent validation of IBC gene signature and its gene ontology/pathway analysis. a, b Validation of post-treatment samples from GSE5847 dataset and pre-treatment core biopsies samples from GSE111477 dataset, respectively. IBC probability plot, PCA scatter plot and unsupervised hierarchical clustering heatmaps are represented similar to Fig. 1. c (i) Venn plots for G59 overlap with 5 previous IBC gene signatures (see Additional file 1: Supp. Methods). (ii) Table indicating the accuracy of the signatures in GSE45581 and GSE111477 datasets (See details in Additional file 1: Supp. Methods). d Kaplan–Meier plot log-rank test for G59-predicted IBC like versus non-IBC like samples in TCGA (see details in Additional file 1: Supp. Methods). The p-value, hazard ratio (HR) and the 95% confidence interval of ratio are indicated. e Pie chart indicating the proportion of gene types in the signature. ncRNA: non-coding RNA. f Clustergrams of top 10 cellular component and pathway analysis of the signature genes, with overlapping genes highlighted (see Additional file 1: Table S3 and S4 for complete list)