Literature DB >> 30394935

Systematic analysis of breast atypical hyperplasia-associated hub genes and pathways based on text mining.

Wei Ma1, Bei Shi2, Fangkun Zhao3, Yunfei Wu1, Feng Jin1.   

Abstract

The purpose of this study was to describe breast atypical hyperplasia (BAH)-related gene expression and to systematically analyze the functions, pathways, and networks of BAH-related hub genes. On the basis of natural language processing, gene data for BAH were extracted from the PubMed database using text mining. The enriched Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways were obtained using DAVID (http://david.abcc.ncifcrf.gov/). A protein-protein interaction network was constructed using the STRING database. Hub genes were identified as genes that interact with at least 10 other genes within the BAH-related gene network. In total, 138 BAH-associated genes were identified as significant (P < 0.05), and 133 pathways were identified as significant (P < 0.05, false discovery rate < 0.05). A BAH-related protein network that included 81 interactions was constructed. Twenty genes were determined to interact with at least 10 others (P < 0.05, false discovery rate < 0.05) and were identified as the BAH-related hub genes of this protein-protein interaction network. These 20 genes are TP53, PIK3CA, JUN, MYC, EGFR, CCND1, AKT1, ERBB2, CTNN1B, ESR1, IGF-1, VEGFA, HRAS, CDKN1B, CDKN1A, PCNA, HGF, HIF1A, RB1, and STAT5A. This study may help to disclose the molecular mechanisms of BAH development and provide implications for BAH-targeted therapy or even breast cancer prevention. Nevertheless, connections between certain genes and BAH require further exploration.

Entities:  

Year:  2019        PMID: 30394935      PMCID: PMC6784767          DOI: 10.1097/CEJ.0000000000000494

Source DB:  PubMed          Journal:  Eur J Cancer Prev        ISSN: 0959-8278            Impact factor:   2.497


Introduction

Breast cancer has become one of the most common malignant tumors that threaten women’s health and lives. However, the etiology of breast cancer is still unclear. A well-known hypothesis is the ‘multistage development model theory’ (Lakhani, 1999), in which breast cancer develops from normal tissue to general hyperplasia, atypical hyperplasia, carcinoma in situ, and then invasive carcinoma. Previous studies have shown that the cumulative risk of breast cancer among women with atypical hyperplasia approaches 30% at 25 years of follow-up (Hartmann et al., 2014, 2015). Therefore, the process may be driven by quantitative changes and qualitative transformation of some factors over an extended time period. Atypical hyperplasia, as a premalignant disease, holds a transitional region between benign and malignant disease because it possesses some of the requisite features of a malignant tumor and may share a common ancestor with carcinoma on the basis of somatic mutations (Allred ; Santen and Mansel, 2005; Bombonati ; Newburger ; Degnim, 2015). In atypical hyperplasia, there is a proliferation of dysplastic, monotonous epithelial cell populations that include clonal subpopulations (Ellis, 2010). According to the microscopic appearance, two types of breast atypical hyperplasia (BAH) are found: atypical ductal hyperplasia and atypical lobular hyperplasia. These two types of BAH occur with similar frequency and confer equal risks of future breast cancer (Dupont and Page, 1985; Hartmann ; Degnim ; Page ). Although the risk of atypical hyperplasia becoming malignant is increasing, BAH will retrogress under certain conditions (Visscher ). Because of the high-risk features and high incidence of BAH, studies on the knowledge of atypical hyperplasia structure and BAH-related gene function may be valuable for diagnosing and determining targeted breast cancer prevention therapies. Currently, there is a large body of biomedical literature in databases, and rapid growth of the research makes it impossible for researchers to address all of the information manually. Text mining tools are widely used in biomedical research to extract information about disease-related genes, proteins, molecular interactions, and pathways, and these tools allow for the generation of an enormous amount of information and the identification of relationships and structures that would otherwise not be possible. Previous studies have documented the use of these tools in the study of regulation mechanisms for different types of cancers, including breast cancer (Krallinger ). In the present study, we obtained BAH-related texts from PubMed by searching for ‘breast atypical hyperplasia’ or ‘atypical hyperplasia of mammary gland’ and retrieved 1777 publications. We identified sets of genes that were intensively investigated in relation to BAH; furthermore, we established a protein–protein interaction (PPI) network. We also identified enriched pathways and hub genes. These data may help to promote the understanding of BAH and substantially affect the treatment of this disease and may even have an effect on breast cancer prevention.

Materials and methods

The genes and proteins were automatically extracted from abstracts by natural language processing. We used ‘breast atypical hyperplasia’ or ‘atypical hyperplasia of mammary gland’, etc. as search terms (Fig. 1) and extracted literature published before July 2017 from the PubMed database. The genes and proteins mentioned in the abstracts were recognized and tagged by A Biomedical Named Entity Recognizer, which is used to tag genes, proteins, and biological entities (Settles, 2005). The Entrez Global Query Cross-Database Search System is a federated search engine that allows users to search health science databases on the NCBI website. This system was used to obtain genes and proteins with unified results to form a database (Maglott ). The number of hits for the search term in the database was counted. Hypergeometric distribution was used to calculate the co-occurrence probability of each gene name and BAH. If the co-occurrence probability of a gene exceeded the theoretical expectation (P < 0.05), this gene was considered relevant to BAH.
Fig. 1

Search strategy.

Search strategy. DAVID () is a free, online bioinformatics resource that provides functional interpretation of large lists of genes derived from genomic studies. Gene Ontology enrichment analysis was performed using DAVID. Selected BAH-related genes from the aforementioned screening process were annotated and classified by biological processes, molecular functions, and cellular components. Kyoto Encyclopedia of Genes and Genomes Orthology-Based Annotation System is an annotation system based on Kyoto Encyclopedia of Genes and Genomes that was applied for BAH-related signaling pathway enrichment annotation analysis. The STRING database () was used to construct the PPI network of BAH-related genes and select the hub genes. We selected the interactions with integrated scores of 0.9 to construct the PPI network. To select the hub genes from the PPI network, we calculated the number of genes directly interacting with each gene. We defined hub genes in the network as those genes with a degree of at least 10. A threshold of 0.05 was established for P values and the false discovery rate (FDR).

Results

Breast atypical hyperplasia-associated genes and Gene Ontology analysis

We examined 1777 abstracts and obtained 325 genes after the retrieval of contents from PubMed. Through hypergeometric distribution, a total of 138 genes were identified as BAH-related genes (P < 0.05). Among these BAH-related genes, the top 20 most frequently investigated genes are listed in Table 1.
Table 1

Top 20 most significant atypical hyperplasia-related genes based on text mining

Top 20 most significant atypical hyperplasia-related genes based on text mining ESR1 (ER-α), TP53, ERBB2, CCND1, and TP63 were the most frequently mentioned genes (Table 1). The Gene Ontology analysis results of classification not only by biological processes and cellular components but also by molecular functions are presented in Table 2. Regulation of cell proliferation, apoptosis, programmed cell death, and cell death were the main biological processes associated with BAH-related genes. With respect to molecular function, the major activities of these genes included enzyme binding, structure-specific DNA binding, double-stranded DNA binding, and transmembrane receptor protein tyrosine kinase activity. These genes were related to various cellular components, including the plasma membrane, organelle lumens, and membrane rafts.
Table 2

Classification results for biological process, cellular components, and molecular functions by Gene Ontology analysis

Classification results for biological process, cellular components, and molecular functions by Gene Ontology analysis

Pathway and protein–protein interaction analyses

Following the pathway analysis, 133 pathways were identified as significant (P < 0.05, FDR < 0.05). Among these pathways, pathways related to cancer, proteoglycans in cancer, and microRNAs in cancer involved the largest number of genes. The 20 most significant BAH-related pathways are presented in Table 3.
Table 3

The 20 most significant pathways associated with atypical hyperplasia-related genes

The 20 most significant pathways associated with atypical hyperplasia-related genes Meanwhile, we constructed a BAH-related PPI network (Fig. 2). The 20 genes that interact with at least 10 other genes (P < 0.05, FDR < 0.05) were identified as the hub genes of the BAH-related PPI network. These genes are TP53, PIK3CA, JUN, MYC, EGFR, CCND1, AKT1, ERBB2, CTNN1B, ESR1, IGF-1, VEGFA, HRAS, CDKN1B, CDKN1A, PCNA, HGF, HIF1A, RB1, and STAT5A. TP53, which interacts with 28 other genes, exhibited the greatest number of interactions (Fig. 3). The similarities and differences between BAH-related hub genes and the top 20 highest frequency genes were classified using a Venn diagram (Fig. 4).
Fig. 2

Network analysis of breast atypical hyperplasia-related genes.

Fig. 3

Hub genes of breast atypical hyperplasia.

Fig. 4

Similarities and differences between breast atypical hyperplasia-related hub genes and the top 20 highest frequency genes.

Network analysis of breast atypical hyperplasia-related genes. Hub genes of breast atypical hyperplasia. Similarities and differences between breast atypical hyperplasia-related hub genes and the top 20 highest frequency genes.

Discussion

The remarkable increase in the morbidity and mortality of breast cancer is a major concern worldwide. BAH, as a precancerous disease, has attracted increasing attention. However, its biology is poorly understood. The multistage development model theory does not account for all breast cancer subtypes that stem from BAH on the basis of both genomic and histological observations (Gao ). Thus, a better understanding of BAH will advance not only our understanding of breast carcinogenesis but also our clinical management of these high-risk patients. Taking effective measures for treatment and intervention to reduce the incidence of breast cancer can greatly improve women’s physical and mental health. Text mining can help us derive implicit knowledge that may be hidden in unstructured literature and present the data in an organized form. Our knowledge of the pathophysiology of BAH allows us to propose possible candidate genes that could play a role in the development and progression of breast cancer. We generated an integrated approach to enrich the molecular context of BAH by applying text mining of events involving genes (presented as nodes) and pathways (presented as edges that correspond to interactions between nodes). By extracting information from PubMed, we present a comprehensive molecular interaction network for BAH (85 nodes and 291 edges) and discuss its properties using standard network metrics. All of the aforementioned 20 hub genes are known to be closely related to the typical pathological progression of BAH. Atypical hyperplasia is a noncancerous cellular hyperplasia in which cells show some atypia. Therefore, some genes that affect cell proliferation, apoptosis, and signal transduction, such as RB1, VEGF, STAT5A, CCND1, TP53, ESR1 (ER-α), and ERBB, could play an important role in the relationships between BAH and certain hub genes. These genes have been extensively studied, and all of the aforementioned genes are known to be closely related to the occurrence and development of BAH. However, relative to these genes, PCNA, CDKN1B, CTNNB1, EGFR, AKT1, MYC, JUN, CDKN1A, IGF-1, HIF1A, PIK3CA, HRAS, and HGF have been reported less frequently in the context of BAH, which requires further research.

PIK3CA and AKT1

PIK3CA, PIK3CB, and PIK3CD encode a catalytic subunit (p110) of PI3K (Vogt ; Georgescu, 2011; Ersahin ). Activated PI3K can catalyze the formation of the second messenger phosphatidylinositol triphosphate, and then, phosphatidylinositol triphosphate plays a key role by recruiting Pleckstrin homology domain-containing proteins to the membrane, including AKT1 and PDPK1, and activating signaling cascades involved in cell growth, survival, proliferation, motility, and morphology (Karakas ; Engelman, 2009; Castaneda ). PIK3CA hotspot point mutations were identified in associated hyperplasia, even in usual ductal hyperplasia and columnar cell change, suggesting that PIK3CA mutations may play a role in breast epithelial proliferation and atypical changes (Kehr ). It is interesting that the rate of PIK3CA mutations in BAH is higher than it is in invasive carcinomas (Ang ). This study provides some insight into the role of activating PIK3CA mutations in breast carcinogenesis and the precursor status of these early breast lesions. Subsequently, activated Akt such as AKT1 stimulates the regulation of cellular metabolism, growth, and survival by CCND1, MYC, NF-kB, and a variety of downstream factors (Koboldt et al., 2012; Khan ; Deng ). However, the rate of AKT1 mutations is much higher in BAH than in breast cancer (Troxell ). This phenomenon may suggest that AKT1 mutations may play a role in precancerous disease. We could draw inspiration from these findings that the mutations of PIK3CA and AKT1 play an important role in the early stage of malignant tumor formation, which may provide potential therapeutic targets for preventing the formation of malignant tumors and even precancerous lesions.

EGFR and ERBB2 (HER2)

The oncogene ERBB2 (c-erbB2/HER2) is a well-established prognostic and predictive factor for invasive breast cancer and is a major driver of tumor development and progression in a subset of breast cancer following amplification (Popescu ; Krishnamurti and Silverman, 2014). Following transphosphorylation, the dimerized receptor activates several intracellular signaling pathways, such as the Ras/MAPK pathway and the PI3K/Akt pathway, both of which subsequently affect cell proliferation, survival, motility, and adhesion (Moasser, 2007). It was reported that ERBB2 amplification may predict substantially increased risk for subsequent breast cancer in women with benign breast diseases, including BAH (Stark ). The overexpressed ERBB2 receptor may be a valuable therapeutic target not only for breast cancer but also for atypical hyperplasia. EGFR, also known as ERBB1/HER1, is another member of the epidermal growth factor receptor family. After ligand activation, phosphorylated EGFR provides a binding domain for PKC, PI3K/Akt/mTOR, SRC, STAT, and RAS/RAF/MEK1/ERK1/2 activation (Yarden and Sliwkowski, 2001). EGFR overexpression causes hyperplastic, dysplastic, and neoplastic changes in the mammary epithelium of transgenic mice (Brandt ). Approximately 48% of primary human breast cancers exhibit EGFR overexpression (Klijn ), and in women with atypical hyperplasia, fine needle aspiration results showed that EGFR overexpression was 59% (Fabian ). However, the detection of EGFR in human biopsy tissue samples has not yet been reported, which may become a future direction of BAH research.

CTNNB1

CTNNB1 encodes β-catenin as a pivotal biomolecule that can not only combine with E-cadherin, T-cell factor, and lymphatic enhancement factor but also contact the complex composed of glycogen synthase kinase 3β, adenomatous polyposis coli, and axin. β-catenin is among a complex of proteins that constitute adherens junctions; it also plays a central role in transcriptional regulation in the Wnt signaling pathway (Hatsell ). Current evidence supports the disputation that the β-catenin/Wnt pathway is activated in a subgroup of breast cancers; however, the mechanisms leading to β-catenin nuclear accumulation in breast cancer remain elusive. There is a hypothesis that CTNNB1-activating gene mutations drive β-catenin nuclear expression (Hayes ; Geyer ). However, β-catenin/Wnt pathway activation in breast cancer is not commonly thought to be driven by CTNNB1 mutations in the triple-negative phenotype. CTNNB1 has been intensively studied in breast cancer, but its role in precancerous lesions requires further investigation in the future.

IGF-1

After insulin-like growth factor 1 (IGF-1) binding to insulin-like growth factor receptor 1 (IGF-1R), the complex activates numerous downstream pathways, such as the PI3K–AKT1mTOR (Stewart ; Lee ; Rowinsky ; Naing ; Macaulay ; Iams and Lovly, 2015) and MAPK (Yamauchi and Pessin, 1994) pathways. IGF-1 plays a key role in the multistep process that leads from normal breast tissue to hyperplasia and then to malignancy (Kleinberg et al., 2009, 2011). However, in different mouse models, published data have shown that blockade of IGF-I action in the mammary gland prevents premalignant mammary lesion development (Hadsell and Bonnette, 2000; Carboni ; Singh ). The increased risk for breast cancer among women with benign breast diseases (including atypia) may be related to an apparent tendency to have lower levels of IGF-1 than those in healthy controls, notably among perimenopausal/postmenopausal women. The expression of IGF-1R is slightly increased in lesions (such as atypical ductal hyperplasia and columnar cell changes) that are hormonally driven, whereas it was significantly reduced in estrogen receptor-negative lesions (such as apocrine metaplasia). This observation may suggest that IGF-1 plays an important role in hyperplasia, even in atypical hyperplasia and breast cancer. Typical hyperplasia of the breast is the key in the evolution from benign disease to malignancy. The levels of BAH-related gene expression and mutation are not the same as those in malignant tumors. Our study may provide some insight for future work in choosing the research topics; however, the results of our analyses are affected by some methodological limitations that should be considered. Much work remains for understanding the mechanism of progression from BAH to malignancy. Although many epidemiology studies have clarified the risk associated with atypical hyperplasia and carcinoma in situ, there are no specific morphological or clinical features that help identify the high risk of developing invasive breast cancer. Further research into the molecular events occurring at the hyperplastic and in-situ stages is essential to understanding and identifying BAH as a high-risk disease for progression to invasive carcinoma.

Acknowledgements

This study was supported by the National Science Foundation of China (Grant no. 81773163).

Conflicts of interest

There are no conflicts of interest.
  52 in total

1.  HER-2/neu amplification in benign breast disease and the risk of subsequent breast cancer.

Authors:  A Stark; B S Hulka; S Joens; D Novotny; A D Thor; L E Wold; M J Schell; L J Melton; E T Liu; K Conway
Journal:  J Clin Oncol       Date:  2000-01       Impact factor: 44.544

2.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.

Authors:  Burr Settles
Journal:  Bioinformatics       Date:  2005-04-28       Impact factor: 6.937

Review 3.  Benign breast disorders.

Authors:  Richard J Santen; Robert Mansel
Journal:  N Engl J Med       Date:  2005-07-21       Impact factor: 91.245

Review 4.  Analysis of biological processes and diseases using text mining approaches.

Authors:  Martin Krallinger; Florian Leitner; Alfonso Valencia
Journal:  Methods Mol Biol       Date:  2010

5.  Role of insulin-like growth factors and the type I insulin-like growth factor receptor in the estrogen-stimulated proliferation of human breast cancer cells.

Authors:  A J Stewart; M D Johnson; F E May; B R Westley
Journal:  J Biol Chem       Date:  1990-12-05       Impact factor: 5.157

6.  Atypical hyperplastic lesions of the female breast. A long-term follow-up study.

Authors:  D L Page; W D Dupont; L W Rogers; M S Rados
Journal:  Cancer       Date:  1985-06-01       Impact factor: 6.860

Review 7.  The transition from hyperplasia to invasive carcinoma of the breast.

Authors:  S R Lakhani
Journal:  J Pathol       Date:  1999-02       Impact factor: 7.996

8.  Mammary gland specific hEGF receptor transgene expression induces neoplasia and inhibits differentiation.

Authors:  R Brandt; R Eisenbrandt; F Leenders; W Zschiesche; B Binas; C Juergensen; F Theuring
Journal:  Oncogene       Date:  2000-04-20       Impact factor: 9.867

Review 9.  IMC-A12, a human IgG1 monoclonal antibody to the insulin-like growth factor I receptor.

Authors:  Eric K Rowinsky; Hagop Youssoufian; James R Tonra; Phillip Solomon; Douglas Burtrum; Dale L Ludwig
Journal:  Clin Cancer Res       Date:  2007-09-15       Impact factor: 12.531

10.  Enhancement of insulin-like growth factor signaling in human breast cancer: estrogen regulation of insulin receptor substrate-1 expression in vitro and in vivo.

Authors:  A V Lee; J G Jackson; J L Gooch; S G Hilsenbeck; E Coronado-Heinsohn; C K Osborne; D Yee
Journal:  Mol Endocrinol       Date:  1999-05
View more
  1 in total

1.  Text-Mining Approach to Identify Hub Genes of Cancer Metastasis and Potential Drug Repurposing to Target Them.

Authors:  Trishna Saha Detroja; Hava Gil-Henn; Abraham O Samson
Journal:  J Clin Med       Date:  2022-04-11       Impact factor: 4.964

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.