Literature DB >> 34159849

A Machine Learning Approach to Differentiate Two Specific Breast Cancer Subtypes Using Androgen Receptor Pathway Genes.

Taobo Hu1, Guiyang Zhao2, Yiqiang Liu3, Mengping Long3.   

Abstract

Triple-negative breast cancer is a heterogeneous disease with different molecular and histological subtypes. The Androgen receptor is expressed in a portion of triple-negative breast cancer cases and the activation of the androgen receptor pathway is thought to be a molecular subtyping signature as well as a therapeutic target for triple-negative breast cancer. Thus, identification of the androgen receptor pathway status is important for both molecular characterization andclinical management. In this study, we investigate the expression of the androgen receptor pathway in metaplastic breast cancer and luminal androgen receptor subtypes of triple-negative breast cancer and found that the androgen receptor pathway was downregulated in metaplastic breast cancer compared to luminal androgen receptor subtype. Using random forest, we found that the two subtypes of breast cancer can be molecularly classified with the gene expression of the androgen receptor pathway.

Entities:  

Keywords:  AR; LAR; TNBC; metaplastic breast cancer; random forest

Mesh:

Substances:

Year:  2021        PMID: 34159849      PMCID: PMC8226237          DOI: 10.1177/15330338211027900

Source DB:  PubMed          Journal:  Technol Cancer Res Treat        ISSN: 1533-0338


Introduction

Breast cancer is a heterogeneous disease with different molecular features and prognoses. Among them, triple-negative breast cancer (TNBC) which is defined by the lack of expression in estrogen receptor (ER), progesterone receptor (PR) and human epidermal receptor 2 (HER2) by immunohistochemical staining has the most limited therapy choice and worst clinical outcome. TNBC can be further classified into subtypes according to histological morphology as well as molecular features. The histological subtypes of TNBC are composed of the commonest invasive ductal carcinoma of no special type (IDC-NST) and other special subtypes including metaplastic breast cancer, adenoid cystic carcinoma, medullary carcinoma and secretory carcinoma. Studies have shown that TNBC of special types as a single group has a worse prognosis than TNBC-NST, indicating the prognostic value of histological subtyping. Metaplastic breast cancer (MBC) was a special subtype of breast cancer accounting for less than 1% of all invasive breast cancer, characterized by the presence of metaplastic components in cancer tissue which is most commonly squamous carcinoma, followed by chondroid and sarcoma components. Most MBC were triple-negative, and study has shown that MBC has a worse prognosis in all clinical stages after treatment compared to other TNBC. Due to the limited cases of MBC, our understanding of their molecular characteristics remains largely unrevealed. Molecularly, TNBC can also be classified into various subtypes by different algorithms using gene expression data. Though all of the currently applied subtyping algorithms could distinguish a consistent molecular subtype in TNBC which was the luminal androgen receptor (LAR) subtype. LAR accounted for 15%-20% of all TNBC and was characterized by the high expression of the AR gene and enrichment in hormonally regulated pathways. LAR subtype had a relatively low proliferation rate, decreased relapse-free survival and similar distant metastasis-free survival compared with other subtypes and can potentially benefit from anti-AR molecule enzalutamide. Since immunohistochemical stain for AR in TNBC showed that 38%-55% of TNBC has positive AR expression, using AR as a surrogate marker of LAR subtype would reveal low specificity. Recent studies reported the percentage of AR-positive expression cases in MBC to be 0%, 8.7% and 11% respectively which was significantly lower than that in TNBC-NST, indicating the lack of luminal differentiation in MBC. Genomic mutation characterization of MBC revealed that it harbored a mutation rate of 57% in PI3K/AKT/mTOR pathway which was much higher than the 4% in AR-negative TNBC but closer to 40% in AR-positive TNBC. Thus, whether the low expression of AR in MBC also indicated the downregulation of the AR pathway and the exact molecular difference between the MBC and LAR group remains unknown. In this study, we analyzed and compared the expression of AR pathway genes in MBC and LAR using data from TCGA. A machine learning approach was used to differentiate MBC and LAR with AR pathway genes.

Results

Clinicopathological Characteristics of the Studied Cohort

A total of 38 cases of LAR and 14 cases of MBC were selected in the TCGA database. The clinicopathological characteristics including age at diagnosis, ethnicity, tumor stage, tumor size and lymph node status were analyzed with no significant difference detected between the two groups (Table 1).
Table 1.

Clinical Features of Selected Patients.

DependentLARMBC P value
Age<508 (21.1)2 (14.3)0.879
≥ 5030 (78.9)12 (85.7)
EthnicityHispanic or Latino2 (5.3)1 (7.1)0.546
Not Hispanic or Latino33 (86.8)13 (92.9)
Not reported3 (7.9)0 (0.0)
Tumor stageStage I7 (18.4)2 (21.4)0.942
Stage IIa15 (39.5)5 (35.7)
Stage IIb6 (15.8)3 (21.4)
Stage III1 (2.6)0 (0.0)
Stage IIIa4 (10.5)2 (14.3)
Stage IIIb1 (2.6)1 (7.1)
Stage IIIc3 (7.9)0 (0.0)
Not reported1 (2.6)0 (0.0)
TumorT15 (13.2)2 (14.3)0.200
T217 (44.7)2 (14.3)
T31 (2.6)0 (0.0)
Tx1 (2.6)0 (0.0)
Not reported14 (36.8)10 (71.4)
Lymph nodeN09 (23.7)4 (28.6)0.082
N19 (23.7)0 (0.0)
N23 (7.9)0 (0.0)
N33 (7.9)0 (0.0)
Not reported14 (36.8)10 (71.4)
Clinical Features of Selected Patients.

Androgen Receptor Pathway Genes Were Differentially Expressed in MBC and LAR

A total of 166 genes were identified as the representative genes in the androgen receptor pathway using the Pathway Commons database (Version 12). In addition, recent research has identified another hormonal receptor gene G-protein coupled estrogen receptor (GPER), which was encoded by GPER1. GPER can be activated by hormonal estradiol. Unlike ERalpha and ERbeta which are mostly known to be nuclear receptors, GPER has a seven-transmembrane domain and many studies have confirmed its membrane localization. It was found to be expressed strongly in triple-negative breast cancer and patients younger than 49-years-old. The expression of GPER has reversely correlated with the expression of androgen receptor in TNBC and at the molecular level AR has a repressed regulation on GPER by binding to the promoter of AR genomic region. Thus, GPER1 was also included in our analysis as one of the AR pathway genes. The mRNA expression of genes in the AR pathway was analyzed and compared in the 2 groups. Differentially expressed genes were identified and summarized in Table 2. In total, 32 out of the 167 genes have been found to be differentially expressed between MBC and LAR, including RUNX2, AR and GPER1 (Figure 1). The top 5 genes with the highest significance were RUNX2, SPDEF, FOXA1, DDC and AR. Except for DDC which was a metabolic enzyme, the other 4 genes were all transcription factors that have previously been shown to act intimately with one another. Among them, RUNX2 was the only upregulated gene in MBC and it was reported to inhibit the effect of AR as a transcription factor by promoting the dissociation of AR from the targeted genes. The SPDEF was downstream of AR, whose expression was induced by AR. FOXA1 was the pioneer gene in the AR pathway and acted by loosening the AR-binding DNA region to facilitate the binding of AR.
Table 2.

Differentially Expressed AR Pathway Genes Between MBC and LAR Cancers.a

NameEnsemble Idlog FCAve exprt P valueB
RUNX2ENSG000001248131.6915.635.843.51E-076.48
SPDEFENSG00000124664−4.4118.47−5.065.67E-063.86
FOXA1ENSG00000129514−3.8117.56−4.356.44E-051.59
DDCENSG00000132437−5.7311.54−4.307.47E-051.45
ARENSG00000169083−2.7916.30−3.870.0003082890.14
FKBP4ENSG00000004478−0.8120.09−3.610.00069105−0.60
SLC25A4ENSG00000151729−0.7016.84−3.500.000974849−0.92
ETV5ENSG000002444051.2516.413.260.001946062−1.55
FLNAENSG000001969240.8421.033.200.002348338−1.72
SMAD3ENSG000001669490.8616.823.150.002696079−1.85
SIRT1ENSG00000096717−0.7017.01−3.000.00413892−2.23
RCHY1ENSG00000163743−0.4916.04−2.990.004228911−2.25
TGIF1ENSG00000177426−0.5117.64−2.960.004681079−2.34
TGFB1I1ENSG000001406820.9616.892.840.006513312−2.64
NCOR2ENSG000001964980.4818.052.810.006993992−2.70
NCOA4ENSG00000266412−0.4719.99−2.800.007114918−2.72
HSP90AA1ENSG00000080824−0.6122.56−2.760.007991213−2.82
SVILENSG000001973210.6917.682.720.008962078−2.92
SF1ENSG000001680660.2319.672.690.009689098−2.99
PRDX1ENSG00000117450−0.5622.56−2.620.011373878−3.13
HDAC1ENSG000001164780.3919.472.540.014042583−3.32
GPER1ENSG000001648501.0513.972.500.015470474−3.40
GTF2H2ENSG000001457360.8812.402.460.017417516−3.51
CASP8ENSG00000064012−0.5616.41−2.390.020523257−3.65
CDC25BENSG000001012240.6318.832.270.027069795−3.89
KAT5ENSG00000172977−0.2717.29−2.130.038277859−4.18
AHRENSG000001065460.6718.122.110.039491891−4.21
CDK1ENSG00000170312−0.7518.05−2.100.040386094−4.23
CAV1ENSG000001059740.7518.772.070.043886593−4.30
NR0B2ENSG00000131910−3.185.85−2.030.047145316−4.36
GTF2F2ENSG000001883420.3417.312.020.048112999−4.37
FHL2ENSG000001156410.7517.562.020.048381632−4.38

a The columns of the table are the gene name, the gene id, the estimated contrast, the expression mean over both groups, contrast t-value, contrast P-value and the estimated log-odds probability ratio (B) that the gene is differentially expressed.

Figure 1.

AR pathway genes were differentially expressed in MBC and LAR. AR was highly expressed in the LAR group while its expression in MBC was low (left panel). The membrane-bound estrogen receptor, GPER1 showed a higher expression in MBC than in LAR (middle panel). As the gene with most significant expression difference, RUNX2 was upregulated in MBC while downregulated in LAR (right panel).

AR pathway genes were differentially expressed in MBC and LAR. AR was highly expressed in the LAR group while its expression in MBC was low (left panel). The membrane-bound estrogen receptor, GPER1 showed a higher expression in MBC than in LAR (middle panel). As the gene with most significant expression difference, RUNX2 was upregulated in MBC while downregulated in LAR (right panel). Differentially Expressed AR Pathway Genes Between MBC and LAR Cancers.a a The columns of the table are the gene name, the gene id, the estimated contrast, the expression mean over both groups, contrast t-value, contrast P-value and the estimated log-odds probability ratio (B) that the gene is differentially expressed.

Classification of MBC and LAR Using Random Forest

The above results suggested that MBC and LAR were differently regulated in the AR pathway. Next, we try to directly differentiate the two groups using gene expression data of the AR pathway. Whereas, using the expression data of a single gene was unable to classify the two groups at 100% efficacy as shown in Figure 1. The machine learning approach was reported to be able to achieve good predictive performance for sample classification using gene expression data. Thus, we further tried to look at the effect of androgen receptor pathway genes on classifying the MBC and LAR groups via the random forest algorithm. Random forest is an algorithm for classification developed in 2001 that uses an ensemble of classification trees and it was widely used in the classification using microarray data. In this task, the expression of the 167 AR pathway genes was used as continuous variables to classify the sample as either MBC or LAR (Figure 2A). The prediction accuracy using the random forest algorithm was 100% (Table 3). Genes that contributed to the classification most were listed in Figure 2B and C. The contribution was measured by Mean Decrease Accuracy or Mean Decrease Gini. RUNX2, FKBP4 and UXT were ranked as the top 3 genes by both Mean Decrease Accuracy or Mean Decrease Gini. Interestingly, the UXT gene was not listed in the DEGs between MBC and LAR, Model visualization was performed by displaying decision tree with the most and least nodes (Figure 3). In the simplest decision tree generated by the random forest algorithm which has three nodes, RUNX2 which has the most significant differential expression between MBC and LAR was used as the root node and no other internal node was used.
Figure 2.

Classifying MBC and LAR using random forest algorithm. Clustering of MBC samples (blue) and LAR sample (red) using 167 AR pathway genes (A). Genes that contributed most to the classification were listed using 2 different parameters (B and C).

Table 3.

Classification Accuracy of the Random Forest Model.

Actual classificationPredicted classification
MBCLAR
MBC380
LAR014
Prediction accuracy100%
Figure 3.

Visualization of 2 representative trees with the maximum and minimum nodes generated by random forest. The tree with maximum nodes used SPDEF gene expression value as the root node and the expression of other 9 genes as internal nodes, making the total nodes number to be 21. It was a 2-class split for each root and internal node which was determined by the gene expression value of the specific gene in the node. The cutoff value for the binary split in each node was calculated automatically (A). The tree with minimum nodes used the expression of the RUNX2 gene as the single root and internal node, generating 2 leaf nodes.

Classifying MBC and LAR using random forest algorithm. Clustering of MBC samples (blue) and LAR sample (red) using 167 AR pathway genes (A). Genes that contributed most to the classification were listed using 2 different parameters (B and C). Visualization of 2 representative trees with the maximum and minimum nodes generated by random forest. The tree with maximum nodes used SPDEF gene expression value as the root node and the expression of other 9 genes as internal nodes, making the total nodes number to be 21. It was a 2-class split for each root and internal node which was determined by the gene expression value of the specific gene in the node. The cutoff value for the binary split in each node was calculated automatically (A). The tree with minimum nodes used the expression of the RUNX2 gene as the single root and internal node, generating 2 leaf nodes. Classification Accuracy of the Random Forest Model. In the model construction, a 5-fold cross-validation was also performed for 100 times to avoid overfitting. Average cross-validation error and standard deviation were plotted in Figure 4. It was found that when the number of variables was in the range of 5 to 21, the error of cross-validation reached the minimum value.
Figure 4.

Cross-validation of the random forest algorithm for classification of MBC and LAR. A 5-fold cross-validation was performed for 100 times with the number of variables ranging from 1 to 166. The average value and standard deviation for cross-validation were plotted.

Cross-validation of the random forest algorithm for classification of MBC and LAR. A 5-fold cross-validation was performed for 100 times with the number of variables ranging from 1 to 166. The average value and standard deviation for cross-validation were plotted.

Discussion

AR was expressed in a proportion of TNBC and the activation of AR was thought to be a signature for the LAR subtype of TNBC which can be used as a therapeutic target. Thus, identification of the AR pathway status in TNBC cases was important for both molecular characterization and clinical management. In this study, we showed that the AR pathway was differently regulated in MBC and LAR of TNBC. Moreover, through the random forest, the 2 groups of TNBC can be classified using the expression of AR pathway genes with an accuracy rate of 100%. Although currently, MBC shared the same therapeutic choice with TNBC-NST, The obvious downregulation of the AR pathway in MBC compared to LAR may contribute to its histologic differentiation and aggressive behavior. Also, our research suggests that another hormonal receptor GPER was upregulated in MBC compared with LAR, possibly due to the suppression of the AR pathway. Meanwhile, it also indicated that MBC can possibly be activated by estrogen even though it lacks the expression of ER, PR and AR. Recent studies revealed that MBC has more tumor-infiltrating lymphocytes and showed higher PD-L1 expression in both tumor cells and stromal lymphocytes. Thus, whether MBC has similarity with the immunomodulatory subtype still need to be elucidated. The more sophisticated classification of TNBC would enable us to have a better understanding of its molecular mechanism and promote the development of precision medicine. This study was limited by the small sample size used due to the rarity of MBC. Moreover, MBC was considered as a single group in our study although the included MBC cases had different metaplastic components.

Materials and Methods

TCGA Data Acquisition and Cohort Selection

TCGA RNA sequencing level 3 normalized data were downloaded from TCGA Data Portal and imported into R (Version 4.0.3) using TCGAbiolinks (Version 2.16.4) functions GDCquery, GDCdownload and GDCprepare for further analysis. Among cases having immunostaining data of ER, PR and HER2, 122 TNBC cases have been selected, among them, there were 14 cases of MBC. Samples that are molecularly classified as LAR was identified in a previous article using Lehmann classifier and were used in this study. In total, there are 38 cases of the LAR subtype of TNBC.

Analysis of Differentially Expressed Genes

The gene list selected in the analysis of the AR pathway was searched in Pathway Commons database. The Fragments Per Kilobase of transcript per Million mapped reads Upper Quartile (FPKM-UQ) RNA-seq data were log2-transformed before further process. The FPKM-UQ was implemented at the GDC on gene-level read counts that were produced by HTSeq and based on a modified version of the FPKM normalization method. The log2-transformed FPKM-UQ data were analyzed using limma (Version 3.44.3) functions lmFit, eBayes and top Table to identify DEGs between MBC samples and LAR samples. Student t-test was utilized to calculate the P values of genes. Genes with P < 0.05 were considered as DEGs.

Random Forest Analysis

The log2-transformed FPKM-UQ data of DEGs in the MBC and LAR samples were imported into the randomForest function of the randomForest package (Version 4.6-14). The randomForest function implements Breiman’s random forest algorithm for classification, the algorithm yields an ensemble that can achieve both low bias and low variance and effectively avoid overfitting. The MDSplot function was implemented for the multi-dimensional scaling plot of the proximity matrix from randomForest. The number of trees (ntree) was set to be 500 by default. Each tree was grown independently, and the final prediction was yielded by the mean value. 70% of the dataset was taken for training and the rest for testing by default.
  29 in total

1.  Molecular characterization of metaplastic breast carcinoma via next-generation sequencing.

Authors:  Jing Zhai; Gabriel Giannini; Mark D Ewalt; Elizabeth Y Zhang; Marta Invernizzi; Joyce Niland; Lily L Lai
Journal:  Hum Pathol       Date:  2018-12-08       Impact factor: 3.466

2.  Genomic and Transcriptomic Landscape of Triple-Negative Breast Cancers: Subtypes and Treatment Strategies.

Authors:  Yi-Zhou Jiang; Ding Ma; Chen Suo; Jinxiu Shi; Mengzhu Xue; Xin Hu; Yi Xiao; Ke-Da Yu; Yi-Rong Liu; Ying Yu; Yuanting Zheng; Xiangnan Li; Chenhui Zhang; Pengchen Hu; Jing Zhang; Qi Hua; Jiyang Zhang; Wanwan Hou; Luyao Ren; Ding Bao; Bingying Li; Jingcheng Yang; Ling Yao; Wen-Jia Zuo; Shen Zhao; Yue Gong; Yi-Xing Ren; Ya-Xin Zhao; Yun-Song Yang; Zhenmin Niu; Zhi-Gang Cao; Daniel G Stover; Claire Verschraegen; Virginia Kaklamani; Anneleen Daemen; John R Benson; Kazuaki Takabe; Fan Bai; Da-Qiang Li; Peng Wang; Leming Shi; Wei Huang; Zhi-Ming Shao
Journal:  Cancer Cell       Date:  2019-03-07       Impact factor: 31.743

Review 3.  The NCI Genomic Data Commons as an engine for precision medicine.

Authors:  Mark A Jensen; Vincent Ferretti; Robert L Grossman; Louis M Staudt
Journal:  Blood       Date:  2017-06-09       Impact factor: 22.113

4.  The Androgen Receptor Promotes Cellular Proliferation by Suppression of G-Protein Coupled Estrogen Receptor Signaling in Triple-Negative Breast Cancer.

Authors:  Yan Shen; Fang Yang; Wenwen Zhang; Wei Song; Yuxiu Liu; Xiaoxiang Guan
Journal:  Cell Physiol Biochem       Date:  2017-10-23

5.  Androgen deprivation therapy-induced epithelial-mesenchymal transition of prostate cancer through downregulating SPDEF and activating CCL2.

Authors:  Yuan-Chin Tsai; Wei-Yu Chen; Wassim Abou-Kheir; Tao Zeng; Juan Juan Yin; Hisham Bahmad; Yi-Chao Lee; Yen-Nien Liu
Journal:  Biochim Biophys Acta Mol Basis Dis       Date:  2018-03-21       Impact factor: 5.187

6.  Differential effects of RUNX2 on the androgen receptor in prostate cancer: synergistic stimulation of a gene set exemplified by SNAI2 and subsequent invasiveness.

Authors:  Gillian H Little; Sanjeev K Baniwal; Helty Adisetiyo; Susan Groshen; Nyam-Osor Chimge; Sun Young Kim; Omar Khalid; Debra Hawes; Jeremy O Jones; Jacek Pinski; Dustin E Schones; Baruch Frenkel
Journal:  Cancer Res       Date:  2014-03-19       Impact factor: 12.701

7.  Gene selection and classification of microarray data using random forest.

Authors:  Ramón Díaz-Uriarte; Sara Alvarez de Andrés
Journal:  BMC Bioinformatics       Date:  2006-01-06       Impact factor: 3.169

8.  Cooperativity and equilibrium with FOXA1 define the androgen receptor transcriptional program.

Authors:  Hong-Jian Jin; Jonathan C Zhao; Longtao Wu; Jung Kim; Jindan Yu
Journal:  Nat Commun       Date:  2014-05-30       Impact factor: 14.919

9.  PIK3CA mutations in androgen receptor-positive triple negative breast cancer confer sensitivity to the combination of PI3K and androgen receptor inhibitors.

Authors:  Brian D Lehmann; Joshua A Bauer; Johanna M Schafer; Christopher S Pendleton; Luojia Tang; Kimberly C Johnson; Xi Chen; Justin M Balko; Henry Gómez; Carlos L Arteaga; Gordon B Mills; Melinda E Sanders; Jennifer A Pietenpol
Journal:  Breast Cancer Res       Date:  2014-08-08       Impact factor: 6.466

10.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Authors:  Antonio Colaprico; Tiago C Silva; Catharina Olsen; Luciano Garofano; Claudia Cava; Davide Garolini; Thais S Sabedot; Tathiane M Malta; Stefano M Pagnotta; Isabella Castiglioni; Michele Ceccarelli; Gianluca Bontempi; Houtan Noushmehr
Journal:  Nucleic Acids Res       Date:  2015-12-23       Impact factor: 16.971

View more
  4 in total

1.  Ductal keratin 15+ luminal progenitors in normal breast exhibit a basal-like breast cancer transcriptomic signature.

Authors:  Katharina Theresa Kohler; Nadine Goldhammer; Samuel Demharter; Ulrich Pfisterer; Konstantin Khodosevich; Lone Rønnov-Jessen; Ole William Petersen; René Villadsen; Jiyoung Kim
Journal:  NPJ Breast Cancer       Date:  2022-07-12

2.  Expression of DNA Helicase Genes Was Correlated with Homologous Recombination Deficiency in Breast Cancer.

Authors:  Mengping Long; Hongjun Liu; Jinbo Wu; Shu Wang; Xin Liao; Yiqiang Liu; Taobo Hu
Journal:  Comput Math Methods Med       Date:  2022-07-09       Impact factor: 2.809

Review 3.  Differences of Clinicopathological Features between Metaplastic Breast Carcinoma and Nonspecific Invasive Breast Carcinoma and Prognostic Profile of Metaplastic Breast Carcinoma.

Authors:  Yue Qiu; Yuhui Chen; Li Zhu; Hongye Chen; Yongjing Dai; Baoshi Bao; Lin Tian; Xiaopeng Hao; Jiandong Wang
Journal:  Breast J       Date:  2022-08-22       Impact factor: 2.269

4.  AR Expression Correlates with Distinctive Clinicopathological and Genomic Features in Breast Cancer Regardless of ESR1 Expression Status.

Authors:  Mengping Long; Chong You; Qianqian Song; Lina X J Hu; Zhaorong Guo; Qian Yao; Wei Hou; Wei Sun; Baosheng Liang; Xiaohua Zhou; Yiqiang Liu; Taobo Hu
Journal:  Int J Mol Sci       Date:  2022-09-29       Impact factor: 6.208

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.