Literature DB >> 31876736

Bioinformatic analysis of the molecular mechanism underlying bronchial pulmonary dysplasia using a text mining approach.

Weitao Zhou1, Fei Shao2, Jing Li3,4.   

Abstract

Bronchopulmonary dysplasia (BPD) is a common disease of premature infants with very low birth weight. The mechanism is inconclusive. The aim of this study is to systematically explore BPD-related genes and characterize their functions.Natural language processing analysis was used to identify BPD-related genes. Gene data were extracted from PubMed database. Gene ontology, pathway, and network analysis were carried out, and the result was integrated with corresponding database.In this study, 216 genes were identified as BPD-related genes with P < .05, and 30 pathways were identified as significant. A network of BPD-related genes was also constructed with 17 hub genes identified. In particular, phosphatidyl inositol-3-enzyme-serine/threonine kinase signaling pathway involved the largest number of genes. Insulin was found to be a promising candidate gene related with BPD, suggesting that it may serve as an effective therapeutic target.Our data may help to better understand the molecular mechanisms underlying BPD. However, the mechanisms of BPD are elusive, and further studies are needed.

Entities:  

Mesh:

Year:  2019        PMID: 31876736      PMCID: PMC6946243          DOI: 10.1097/MD.0000000000018493

Source DB:  PubMed          Journal:  Medicine (Baltimore)        ISSN: 0025-7974            Impact factor:   1.817


Introduction

Bronchial pulmonary dysplasia (bronchopulmonary dysplasia [BPD]) is a common disease of premature infants with low birth weight, characterized by continuing oxygen therapy at least 28 days, accompanied by pulmonary imaging changes. Pulmonary surfactant and small tidal volume aeration can alleviate lung injury and cut down premature infant mortality, but not decrease the incident rate of BPD. Recent studies show that 1 in every 10,000 children in the United States is diagnosed with BPD. The etiology of BPD is mainly due to the genetic susceptibility, oxygen toxicity, air pressure injury, capacity injury, infection, immunity and so on, resulting in oxygen toxicity and pulmonary fibrosis. However, the pathogenic factor is not clear. With the development of technique in molecular biology and genetic engineering, studies on molecular mechanism underlying the pathogenesis of BPD have been increasing in recent decades. The pathogenesis of BPD has been shown to be resulted from genetic susceptibility, infection, immature lung development, high concentration of oxygen injury, nutritional deficiency, and other factors. A large number of literatures show that the pathogenesis of BPD is attributed to some important protein and signaling pathways, such as angiotensin-converting enzyme, mannose-binding lectin, nuclear factor kappa B, phosphatidyl inositol-3-enzyme-serine/threonine kinase (PI3K-AKT) signaling pathway, interleukin (IL), vascular endothelial growth factor (VEGF).[,,,] Although a large number of transcription factors or signaling proteins have been discovered, studies on these transcription factors tend to be more conservative. Furthermore, defining the most meaningful regulatory factors from the information network is an important project. Due to the development of high-throughput proteomic and transcriptomic approach, it is feasible to study tens of thousands of genes and proteins at the same time nowadays. However, the results of high-throughput data are often lack of consistency due to the different choices of data platform and statistical criteria. Moreover, most valuable information remains hidden within literatures through conventional gene-by-gene way. In recent years, text mining has been used as an effective method to explore underlying mechanisms for many diseases.[,] It provides a way to retrieve data from published researches automatically. Text mining, as an effective method for the study of molecular mechanism, has been utilized in many diseases, such as: glioblastoma, endometrial cancer, breast cancer,[,] ectopic pregnancy, decidualization. In an attempt to provide a better understanding of BPD and pave a foundation for the development of novel therapeutic interventions in BPD, we performed a text mining analysis to identify genes related to BPD. The filtered gene set was subsequently carried out to systematically analyze their functions, and define signaling pathways and network involved.

Methods

Natural Language processing (NLP) analysis of BPD

PubMed database was used as the source of literature for text mining. The search was carefully conducted with the query term “bronchopulmonary dysplasia.” All relevant articles identified till March 2018 were retrieved and converted into XML format. All the genes and proteins associated with the query term were dug out and added to the list, followed by Gene mention tagging using a biomedical named entity recognizer software. Conjunction resolution was also carried out to identify individual descriptions on the extracted genes. In this study, gene symbol in entrez gene database of the National Center for Biotechnology Information was commonly used.[,] The processing flow chart of the NLP analysis of bronchopulmonary-dysplasia was shown in Figure 1.
Figure 1

The histogram of Go terms enriched among bronchopulmonary dysplasia candidate genes according to BP, CC, and MF. BP = biological process, CC = cellular component, MF = molecular function.

The histogram of Go terms enriched among bronchopulmonary dysplasia candidate genes according to BP, CC, and MF. BP = biological process, CC = cellular component, MF = molecular function. For each gene, the frequency of its occurrence was denoted. The higher frequency a gene has, the higher correlation between the certain gene and BPD. The total number of publications retrieved from the PubMed database was recorded as N. The frequency of the certain gene and BPD were denoted by m and n, respectively. K was used to represent the occurrence of both the gene and BPD in actual situations. Then, we calculated the probability of the frequency greater than k simultaneously cited under the completely random conditions by using hypergeometric distribution: The BPD-gene relations with P-value < .05 were retrieved for further use.

Gene ontology (GO) analysis

GO analysis was performed by using the GSEABase package from R (http://www.r-project.org/) statistical platform. Word cloud was generated by using the word cloud package from R. The biological process (BP), cellular component (CC), and molecular function (MF) were characterized and evaluated in this study.

Pathway analysis

The Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to analyze the interactions between genes. Genes were mapped to the database by using GenMAPP v2.1 (http://www.genmapp.org/). Then statistical tests were performed to identify enriched pathways. P < .05 was set as threshold.

Network analysis of BPD genes

We integrated the relationships of involved genes into 3 different interaction categories: protein interaction, gene regulation, protein modification listed in the KEGG database; interaction data from existing high-throughput protein interaction experiments, such as protein-protein interactions (PPrel) confirmed by yeast 2-hybrid; gene interaction demonstrated in previous publications. Briefly, the pathway data were downloaded from the KEGG database, and then used to analyze the interaction among genes. We also downloaded KEGGSOAP package from R statistical analysis platform (http://www.bioconductor.org/packages/2.4/bioc/html/KEGGSOAP.html). Three different types of relationships were analyzed: enzyme-enzyme relation (ECrel, indicating 2 enzymes catalyzing successive reaction steps), PPrel (such as binding and modification), gene expression interaction (GErel, indicating relation of transcription factors and target gene products). The PPrel data was obtained by using the mammalian protein-protein interaction database (http://mips.helmholtz-muenchen.de/proj/ppi). For interactions that had already been published, we used co-citation matrices in PubMed. Using this algorithm, we identified the certain gene terms and co-occurred term variants within the sentences of an abstract. The frequency of the co-cited genes was analyzed as well. Finally, statistical analysis was performed using the hypergeometric distribution as described above. The resulting network of relationship was built and displayed by Medusa software. Then, the genes with a large number of connections that play important roles in the network stability were identified.

Results

NLP analysis of BPD

After the retrieval of documents from Pubmed, a total number of 5812 primary studies were identified, and 397 were obtained. Eventually, 216 genes were identified as BPD-related genes with P < .05. Among these genes, vascular endothelial growth factor A (VEGFA), IL-6, tumor necrosis factor (TNF), and interleukin 1, beta were mentioned most frequently. The top 20 frequently cited genes were listed in Table 1
Table 1

The top 20 frequently cited genes related to bronchopulmonary dysplasia.

The top 20 frequently cited genes related to bronchopulmonary dysplasia.

GO analysis

These 216 genes mentioned above were categorized in GO analysis. Enriched GO terms are classified according to BP, CC, and MF. In the BP category, 12 GO terms, namely protein metabolism, cell organization and biogenesis, death, signal transduction, other metabolic processes, cell cycle and proliferation, DNA metabolism, developmental processes, cell adhesion, stress response, cell-cell signaling, and other BP were found to be significantly enriched. GO terms related to nonstructural extracellular, extracellular matrix, and plasma membrane region were significantly enriched under the CC category. Enriched GO terms in the MF category included kinase activity, transcription regulatory activity, signal transduction activity (Fig. 1).

Pathway analysis and gene network

To better understand the gene function related to BPD, we also performed pathway analysis by using DAVID online tools. All BPD-related genes were mapped to the pathways found in KEGG. As shown in Figure 2, 30 signaling pathways were identified as significant. Pathway assignment for the top 10 genes was recommended by the TM-rank algorithm and generated by Cytoscape software (Fig. 3).
Figure 2

The significantly enriched pathways identified by using DAVID online tools.

Figure 3

Pathway assignment for the top 10 genes recommended by the TM-rank algorithm. This graph was made by Cytoscape software. Ellipse nodes in red represent genes and rectangle nodes in blue represent pathways.

The significantly enriched pathways identified by using DAVID online tools. Pathway assignment for the top 10 genes recommended by the TM-rank algorithm. This graph was made by Cytoscape software. Ellipse nodes in red represent genes and rectangle nodes in blue represent pathways. We constructed a network of BPD-related genes that consists of 365 nodes connected via 6403 edges (Fig. 4A). Topological analysis proved that the network followed a power-law distribution (Fig. 4B). In this network, we identified 17 hub genes (Fig. 4C): tumor protein (TP53), v-akt murine thymoma viral oncogene homolog 1, jun oncogene, IL6, insulin (INS), B-cell CLL/lymphoma 2, VEGFA, epidermal growth factor receptor, TNF, transforming growth factor, beta 1, matrix metallopeptidase 9, matrix metallopeptidase 2, fibroblast growth factor 2, intercellular adhesion molecule 1, nitric oxide synthase 3, mitogen-activated protein kinase 8, and cyclin D1. Among these genes, TP53 was the BPD-related gene that exhibited the greatest number of interactions.
Figure 4

The gene network underlying all bronchopulmonary dysplasia candidate genes. (A) The structure of the gene network generated by using the MIPS database. (B) Degree distribution of the gene network. (C) The seventeen hub genes identified in the network. MIPS = mammalian protein-protein interaction.

The gene network underlying all bronchopulmonary dysplasia candidate genes. (A) The structure of the gene network generated by using the MIPS database. (B) Degree distribution of the gene network. (C) The seventeen hub genes identified in the network. MIPS = mammalian protein-protein interaction.

Discussion

A total number of 216 BPD-related genes were identified from the literature search of 5812 publications. Among the BPD-related genes we identified in the search, VEGFA, IL6 were the most frequently mentioned. Studies have already shown that there is a relevant association between BPD development and VEGFA activity in preterm infants.[,] VFGFA was shown to play vital roles in the repair of vascular lung, and the lack of VEGFA activity could lead to an impairing lung microvascular development in fetus. IL6 expression was found to be increased with the histologic severity of chorioamnionitis as well. The role of BPD-related genes could be validated through wet experiments theoretically; however, due to the large number of genes involved, experimental work is not a feasible for us to get a comprehensive understanding of the gene set. Thus, we performed pathway enrichment analysis using the complete list of 216 candidate genes by using the DAVID tools. A total of 30 enriched pathways were identified. Of particular interest was the PI3K-Akt signaling pathway. Notably, 67 BPD-related genes were enriched in this pathway. PI3K becomes activated by INS and is responsible for most of the metabolic actions mediated by INS. PI3K plays essential roles in many metabolic process such as cell growth, differentiation, survival, and protein synthesis. Inhibition of the PI3K-Akt pathway disrupts normal lung development, whereas the activation of PI3K-Akt pathway preserves alveolar development. PI3K-Akt signaling pathway could exert a cytoprotective role in cell survival during hyperoxia. We interpret that text mining result is usually based on the frequency calculation for each gene in publications. However, the most popular genes are not always the most important ones. It is known that, rather than working alone, gene products such as proteins usually form complexes to exert their functions. Hence, the functional importance of a gene depends on its interaction with other partners. The hub genes are likely more important due to their key positions in the ECrel, PPrel and GErel network. In this study, we constructed a large network of BPD-related genes that consists of 365 nodes connected via 6403 edges. Seventeen hub genes were identified in this network. Even the noisiness and incompleteness of interaction data may cause the inaccuracy of our results, the network could still provide us a comprehensive and reasonable way to understand the gene set. Text Mining is characterized as the way toward separating meaningful information from numerous unstructured text utilizing computational methods. However, these articles of high- throughput experiments generate a large amount of gene information sometimes cannot be fully recognized and refined. For these articles, we downloaded full texts (as well as supplementary files if needed) and extracted gene mentions by hands. To reduce false-positive results, full texts of extracted papers contained hub-gene were downloaded, followed by manual confirmation of information. For most of the hub-genes, their relationships with BPD were easily understood and well-studied. However, when it comes to INS, most extracted papers are actually about insulin-like growth factor-I instead of INS. Only 2 true positive studies were found. One study mentioned that the serum INS is significantly increased in BPD infants after the administration of systemic corticosteroid treatment. Postnatal application of glucocorticoids can prevent BPD in preterm infants. However, the treatment also has adverse effects such as hyperglycemia, hypertension, and intestinal perforation. Serum INS level is elevated as a result of hyperglycemia. However, the role of INS in BPD is not clear. The other found that INS could increase cell function and NO production in normal fetal pulmonary artery endothelial cells (PAECs). PAECs from intrauterine growth restriction fetuses were less sensitive to INS which caused significantly decreased cell motility, growth, tube formation, and NO production. Impaired PAEC function may contribute to the increased rate of BPD. Furthermore, a related paper on Metformin reversing established lung fibrosis in a bleomycin model was published in 2008; it implicated INS in alleviating the later pathological change of BPD. Usually, the limitation for text mining–based strategies is that there is no chance to discover new genes. Interestingly, in this study we found INS as a novel gene related to BPD by accident. This could pave a foundation for further investigation about the molecular mechanisms between INS and BPD. As a matter of fact, our purpose of this work is to provide a foundation for identifying new molecules that are worthy perusing in the future, rather than summarizing previous works. Genes not in the central position of the network or do not have direct interactions with principal molecules could still be land for novel findings. In conclusion, we systematically analyzed BPD-related genes using a text mining approach. These genes were further characterized by GO, pathway and network analysis. Our research provides a basis for a better understanding of the molecular mechanisms underlying BPD and paves a foundation for further studies as well.

Acknowledgment

These authors wish to express their gratitude to the Shanghai Boyun Biotechnology Co. Ltd, for bioinformatics analysis.

Author contributions

Conceptualization: Fei Shao, Jing Li. Data curation: Weitao Zhou, Fei Shao. Formal analysis: Weitao Zhou, Fei Shao. Funding acquisition: Jing Li. Methodology: Weitao Zhou, Fei Shao. Supervision: Jing Li. Writing – original draft: Weitao Zhou, Fei Shao. Writing – review and editing: Weitao Zhou, Fei Shao. Weitao Zhou orcid: 0000-0003-3618-5193.
  40 in total

Review 1.  Critical nodes in signalling pathways: insights into insulin action.

Authors:  Cullen M Taniguchi; Brice Emanuelli; C Ronald Kahn
Journal:  Nat Rev Mol Cell Biol       Date:  2006-02       Impact factor: 94.444

2.  Activation of Akt protects alveoli from neonatal oxygen-induced lung injury.

Authors:  Rajesh S Alphonse; Arul Vadivel; Lavinia Coltan; Farah Eaton; Amy J Barr; Jason R B Dyck; Bernard Thébaud
Journal:  Am J Respir Cell Mol Biol       Date:  2010-03-26       Impact factor: 6.914

3.  Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

Authors:  Simon Kocbek; Lawrence Cavedon; David Martinez; Christopher Bain; Chris Mac Manus; Gholamreza Haffari; Ingrid Zukerman; Karin Verspoor
Journal:  J Biomed Inform       Date:  2016-10-11       Impact factor: 6.317

4.  Comparisons and Limitations of Current Definitions of Bronchopulmonary Dysplasia for the Prematurity and Respiratory Outcomes Program.

Authors:  Brenda B Poindexter; Rui Feng; Barbara Schmidt; Judy L Aschner; Roberta A Ballard; Aaron Hamvas; Anne Marie Reynolds; Pamela A Shaw; Alan H Jobe
Journal:  Ann Am Thorac Soc       Date:  2015-12

5.  Pulmonary vascular endothelial growth factor and Flt-1 in fetuses, in acute and chronic lung disease, and in persistent pulmonary hypertension of the newborn.

Authors:  P Lassus; M Turanlahti; P Heikkilä; L C Andersson; I Nupponen; A Sarnesto; S Andersson
Journal:  Am J Respir Crit Care Med       Date:  2001-11-15       Impact factor: 21.405

6.  Disrupted pulmonary vasculature and decreased vascular endothelial growth factor, Flt-1, and TIE-2 in human infants dying with bronchopulmonary dysplasia.

Authors:  A J Bhatt; G S Pryhuber; H Huyck; R H Watkins; L A Metlay; W M Maniscalco
Journal:  Am J Respir Crit Care Med       Date:  2001-11-15       Impact factor: 21.405

Review 7.  Late (> 7 days) systemic postnatal corticosteroids for prevention of bronchopulmonary dysplasia in preterm infants.

Authors:  Lex W Doyle; Jeanie L Cheong; Richard A Ehrenkranz; Henry L Halliday
Journal:  Cochrane Database Syst Rev       Date:  2017-10-24

8.  Overview of BioCreative II gene mention recognition.

Authors:  Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

9.  Systematic Analysis of the Molecular Mechanism Underlying Decidualization Using a Text Mining Approach.

Authors:  Ji-Long Liu; Tong-Song Wang
Journal:  PLoS One       Date:  2015-07-29       Impact factor: 3.240

10.  Metformin reverses established lung fibrosis in a bleomycin model.

Authors:  Sunad Rangarajan; Nathaniel B Bone; Anna A Zmijewska; Shaoning Jiang; Dae Won Park; Karen Bernard; Morgan L Locy; Saranya Ravi; Jessy Deshane; Roslyn B Mannon; Edward Abraham; Victor Darley-Usmar; Victor J Thannickal; Jaroslaw W Zmijewski
Journal:  Nat Med       Date:  2018-07-02       Impact factor: 53.440

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.