BACKGROUND: We identified the hub genes and pathways dysregulated in acute myeloid leukemia and the potential molecular mechanisms involved. METHODS: We downloaded the GSE15061 gene expression dataset from the Gene Expression Omnibus database and used weighted gene co-expression network analysis to identify hub genes. Differential expression of the genes was evaluated using the limma package in R software. Subsequently, we built a protein-protein interaction network followed by functional enrichment analysis. Then, the prognostic significance of gene expression was explored in terms of overall survival. Finally, transcription factor-mRNA (ribonucleic acid) and microRNA-mRNA interaction analysis was also explored. RESULTS: We identified 100 differentially expressed hub genes. Functional enrichment analysis indicated that the genes were principally involved in immune system regulation, host defense, and negative regulation of apoptosis and myeloid cell differentiation. We identified 4 hub genes, the expression of which was significantly correlated with overall survival. Finally, 26 key regulators for hub genes and 38 microRNA-mRNA interactions were identified. CONCLUSION: We performed a comprehensive bioinformatics analysis of hub genes potentially involved in acute myeloid leukemia development. Further molecular biological experiments are required to confirm the roles played by these genes.
BACKGROUND: We identified the hub genes and pathways dysregulated in acute myeloid leukemia and the potential molecular mechanisms involved. METHODS: We downloaded the GSE15061 gene expression dataset from the Gene Expression Omnibus database and used weighted gene co-expression network analysis to identify hub genes. Differential expression of the genes was evaluated using the limma package in R software. Subsequently, we built a protein-protein interaction network followed by functional enrichment analysis. Then, the prognostic significance of gene expression was explored in terms of overall survival. Finally, transcription factor-mRNA (ribonucleic acid) and microRNA-mRNA interaction analysis was also explored. RESULTS: We identified 100 differentially expressed hub genes. Functional enrichment analysis indicated that the genes were principally involved in immune system regulation, host defense, and negative regulation of apoptosis and myeloid cell differentiation. We identified 4 hub genes, the expression of which was significantly correlated with overall survival. Finally, 26 key regulators for hub genes and 38 microRNA-mRNA interactions were identified. CONCLUSION: We performed a comprehensive bioinformatics analysis of hub genes potentially involved in acute myeloid leukemia development. Further molecular biological experiments are required to confirm the roles played by these genes.
Acute myeloid leukemia (AML) is a heterogeneous hematological malignancy characterized by the clonal expansion of myeloid blasts in peripheral blood (PB), bone marrow (BM), and/or other tissues. AML is the most common form of adult acute leukemia, with an annual incidence of 2 to 4 cases per 100,000 adults. The median age at diagnosis is 64 years. As populations age, the incidence of AML increases.[ Despite advances in our understanding of AML molecular heterogeneity and pathogenesis, the standard therapy has not greatly changed over the past few decades. The 5-year median overall survival (OS) is roughly 40% in patients younger than 60 years, but the disease is more serious in older individuals, with a 5-year survival rate of only 10% to 20% in those older than 60 years.[ Hence, it is essential to identify biomarkers predicting AML progression and prognosis to facilitate the development of new therapeutic approaches.High-throughput platforms (such as microarrays) allowing analysis of gene expression can be used to explore the complex biological network involved in AML development. Many microarray studies on AML have appeared,[ substantially unraveling AML pathogenesis and identifying new molecular markers. Microarray-mediated gene expression profiling (GEP) can be used to diagnose leukemia with high accuracy.[ Gene expression signatures associated with distinct clinical subtypes of leukemia have been identified by microarray studies.[ Moreover, GEP analysis is also useful for predicting AML prognosis.[In recent years, many GEP microarrays have been used to identify differentially expressed genes (DEGs) in AML; a few such genes are promising biomarkers in terms of AML diagnosis and prognosis.[ However, most previous studies were focused on DEG screening, not on the interactions among genes. Weighted gene co-expression network analysis (WGCNA) can be used to identify correlations among DEGs using microarray or RNA (ribonucleic acid) sequencing data. WGCNA identifies modules of highly correlated genes and links the modules to certain clinical phenotypes. Correlation networks facilitate network-based gene screening of candidate biomarkers or therapeutic targets.[In the present study, we compared the GEPs of primary AML and non-leukemia BM samples; we collected data from the National Center of Biotechnology Information Gene Expression Omnibus database. The WGCNA algorithm was used to construct a scale-free weighted genetic interaction network featuring specific gene modules that play common biological roles during AML development. Furthermore, we identified certain hub genes as potential candidate biomarkers or novel molecular therapeutic targets for AML.
Methods
GEP
This study as approved by the Institutional Review Board of the Guangdong Second Provincial General Hospital. The GSE15061 dataset (Platforms GPL570) was downloaded from the Gene Expression Omnibus database. GSE15061 data were derived using Affymetrix Human Genome U133 Plus 2.0 GeneChips (Thermo Fisher Scientific, Waltham, MA, USA). The dataset is part of the MILE Study (Microarray Innovations in Leukemia) program, headed by the European Leukemia Network (ELN), contains information on 164 myelodysplastic syndrome, 202 AML, and 69 non-leukemia BM samples. The AML samples include different subtypes, such as AML with t(15; 17), t(8; 21), inv(16), or t(11q23)/MLL. We analyzed only the AML and non-leukemia samples.
Data preprocessing and probe re-annotation
All probes and samples were first checked to ensure that no data were missing. The data were then quantile-normalized and log-transformed, and the probe sequences were mapped to the RefSeq transcript database (www.ncbi.nlm.nih.gov/refseq/) using seqMap software (Wong Lab, Stanford University, Stanford, CA, USA).[ Only probes exhibiting perfect matches to RefSeq transcripts were retained, and these probes were then annotated at the gene level.
Selection of genes exhibiting the most variation
We ranked the probes according to the coefficient of variation (CV) of their expression profiles and selected the top 5000 genes. The CV is commonly used to measure relative variability among subjects of interest. The CV is simply the standard deviation divided by the mean. We selected probes in descending order of CV, while ignoring repeat genes, until the number of selected genes attained reached 5000.
Co-Expression network construction and module detection
The R package WGCNA was used to build a weighted gene co-expression network for the gene expression matrix, as described elsewhere.[ First, we cleaned up the data by removing obvious outliers as well as genes and samples with excessive missing entries. We next constructed pairwise Pearson correlation matrices. The weighted gene co-expression network was constructed using the β function; this is a soft thresholding parameter that retains strong correlations between genes but removes weak correlations. The adjacency data were transformed into a topological overlap matrix (TOM) measuring not only between-gene correlations but also the extent of their shared correlations across the weighted network. We next performed average linkage hierarchical clustering using the TOM-based dissimilarity measure, with a minimum module size of 30 and a medium sensitivity of 2 (the other parameters were set to the default values). A dynamic TreeCut algorithm was used to cut the hierarchical clustering dendrogram and the branches were defined as modules. We associated modules with clinical traits and calculated the associated gene significance (GS) and module membership (MM) values. Finally, by setting minimum thresholds for these 2 parameters, we selected the hub genes with the highest GSs and MMs for each module.
DEG screening
The R package limma was used to identify DEGs between AML and non-leukemia samples. Screening was performed using an adjusted P-value < .05 and a log2-fold change (FC) ≥ 2. The DEG screening results were validated using the Cancer Genome Atlas (TCGA)-AML gene expression profiles identified by GEPIA (http://gepia.cancer-pku.cn/) at a P-value < .01 and a log2 FC value ≥ 1. GEPIA is a newly developed interactive web server used to analyze RNA expression data from TCGA and Genotype-Tissue Expression projects, using a standard processing pipeline.
Functional annotation and protein–protein interaction (PPI) network construction
We used the ClueGO plugin in Cytoscape to perform gene ontology (GO) (http://www.geneontology.org/), Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.kegg.jp/kegg/), and REACTOME (https://reactome.org/) pathway enrichment analyses of the selected genes. PPI information was also assessed employing the GeneMANIA plugin of Cytoscape. The plugin MCODE was used to screen the PPI network identified by Cytoscape using a degree cutoff of 2, a node score cutoff of 0.2, a K-core of 2, and a maximum depth of 100.
Survival analysis
The association between hub DEGs and OS was examined using GEPIA. OS curves obtained using the Kaplan–Meier method were generated by reference to the median gene expression levels. The data were validated employing the Leukemia Gene Atlas (LGA) (http://www.leukemia-gene-atlas.org/) consisting of AML gene expression profiles created by Verhaak et al.[ We used the 50% quantiles of gene expression as cutoffs for the Kaplan–Meier curves generated by the LGA public platform supporting analysis of molecular leukemia data.
Transcription factor (TF)-mRNA and microRNA-mRNA interaction analysis
We used TRRUST (https://www.grnpedia.org/trrust/), a manually curated database of human and mouse transcriptional regulatory networks for TF-mRNA interaction analysis within hub DEGs. We also used miRWalk to perform microRNA-mRNA interaction analysis. MiRWalk (http://mirwalk.umm.uni-heidelberg.de/) is an online resource for prediction of microRNA binding sites, which stores predicted data obtained with a machine learning algorithm including experimentally verified microRNA-target interactions. We chose intersection of 2 database Targetscan (http://www.targetscan.org/vert_72/) and miRDB (http://mirdb.org/index.html) to present the results.
Results
Data analysis pipeline is shown in Figure 1. All sample and probe data were complete. The dataset contained 54,675 probes, and after re-annotation 41,373 remained, which mapped to 20,604 unique genes. We selected the top 5000 genes according to CV.
Figure 1
Flow chart diagram of data analysis.
Flow chart diagram of data analysis.The 5000 genes mentioned above were used in subsequent analyses. Seven outliers were excluded. We chose a soft threshold β of 6 when constructing the co-expression network. TOM- based hierarchical clustering yielded 14 gene modules, and module–trait associations and their P-values were evaluated. We selected modules with absolute correlations > .5, thereby obtaining 4 modules (blue, 430 genes, P = 3e−12, correlation = −0.64; brown, 386 genes, P = 6e−56, correlation = 0.78; green/yellow, 113 genes, P = 3e−23, correlation = −0.56; yellow, 248 genes, P = 1e−27, correlation = 0.6). We set minimum GS and MM thresholds and selected those hub genes with the highest GS and MM values in each module. Initially, we set a minimum GS of 0.3 and a minimum MM of 0.6. Because of the relatively small number of genes obtained in the green/yellow module, we reduced the GS and MM thresholds for this module to 0.2 and 0.4, respectively. The selected hub genes were further analyzed (blue, 151; brown, 131; green/yellow, 94; yellow, 119; Supplementary Tables S1–S4).
Identification of DEGs
Using the limma package of R software, and employing P < .05 and log2 FC ≥ 2 as cut-offs, we identified 197 DEGs, of which 63 were upregulated and 134 downregulated (Supplementary Table 5 and Fig. S1). Using cutoffs of P < .01 and log2 FC ≥ 1, 133 DEGs were validated by GEPIA using the RNA sequencing data from TCGA-AML.
Functional enrichment analysis
To obtain further insights into the biological relevance of hub DEGs (Intersection of hub genes and DEGs, n = 100) associated with AML, we performed GO, KEGG, and REACTOME pathway enrichment analyses using the ClueGO plugin of Cytoscape. As shown in Supplementary Figure S2 and Table S6, the following 28 pathways were significantly enriched among the hub DEGs (P < .01): neutrophil activation, cellular defense, regulation of interleukin-8 production, negative regulation of myeloid cell differentiation, embryonic skeletal system morphogenesis, hematopoietic cell lineage regulation, transcriptional dysregulation in cancer, Staphylococcus aureus infection, T cell-mediated immunity, and postsynaptic density organization (Table 1).
Table 1
Functional enrichment analysis of 100 differentially expressed hub genes with representative pathways for each group.
Functional enrichment analysis of 100 differentially expressed hub genes with representative pathways for each group.
PPI network analysis
GeneMANIA showed that the PPI network of the hub DEGs featured 117 nodes and 1464 edges. The MCODE plugin of Cytoscape revealed 4 clusters (using the default settings). The top cluster, featuring 38 nodes and 611 edges, was selected for further analysis (Fig. 2). Gene functional enrichment analysis revealed significant enrichment of the following 34 pathways (P < .01): neutrophil activation, icosanoid biosynthesis, neutrophil chemotaxis, cellular defense, mast cell activation, positive regulation of cytokine secretion, regulation of interleukin-8 production, superoxide anion generation, response to fungi, complement receptor-mediated signaling, Dectin-2 family activity, and negative regulation of myeloid cell apoptosis (Supplementary Fig. S3 and Table S7).
Figure 2
The top cluster of the protein–protein interaction (PPI) network involving the differentially expressed hub genes. Gray circles indicate the top 18 genes.
The top cluster of the protein–protein interaction (PPI) network involving the differentially expressed hub genes. Gray circles indicate the top 18 genes.The associations among the hub DEGs in the module that were most associated with AML and OS were examined using GEPIA to data-mine TCGA-AML information. Log-rank OS curves showed that FLT3, CEACAM21, SUSD3, RETN, MME, FGF13, HOXA9, HOXB6, FAM30A, CFD, GNLY, CLEC5A, S100P, PRG3, SCHIP1, SPINK2, STAR, CCNA1, and CRNDE expression levels were significantly associated with 100-month OS (Supplementary Fig. S4). These results were validated using LGA; high SUSD3, SCHIP1, and SPINK2, and low STAR expression were all significantly associated with poor 200-month OS (P = .0329, .0046, .0206, and .0032, respectively) (Figs. 3–4).
Figure 3
High SUSD3, SCHIP1, and SPINK2 expression and low STAR expression were correlated with poor survival as revealed.
Figure 4
Survival afforded by acute myeloid leukemia hub genes as revealed by the Leukemia Gene Atlas. High SUSD3, SCHIP1, and SPINK2 expression and low STAR expression were correlated with poor.
High SUSD3, SCHIP1, and SPINK2 expression and low STAR expression were correlated with poor survival as revealed.Survival afforded by acute myeloid leukemia hub genes as revealed by the Leukemia Gene Atlas. High SUSD3, SCHIP1, and SPINK2 expression and low STAR expression were correlated with poor.
TF-mRNA and microRNA-mRNA interaction analysis
Using the TRRUST, we identified 26 key regulators for our hub DEGs (Fig. 5, Supplementary Table S8). Of note, we found 6 key TFs (CEBPA, CEBPB, NR4A1, CREM, SP1 and JUN) for gene STAR, the expression of which was significantly correlated with OS. 49 microRNA-mRNA interactions were identified by miRWalk analysis. However, we chose 38 microRNA-mRNA interactions from the intersection of 2 database Targetscan and miRDB to present the results (Fig. 6), 4 genes (COL24A1, TRIM71, HOXA9, and MYCN) were regulated by more than 4 microRNAs.
Figure 5
26 transcription factors identified by TRRUST for hub differentially expressed genes.
Figure 6
38 micro ribonucleic acid -mRNA interactions were identified by miRWalk analysis based on Targetscan and miRDB database.
26 transcription factors identified by TRRUST for hub differentially expressed genes.38 micro ribonucleic acid -mRNA interactions were identified by miRWalk analysis based on Targetscan and miRDB database.
Discussion
AML is a complex heterogeneous malignancy caused by malignant transformation of hematopoietic stem cells and accumulation of immature myeloid progenitors in the PB and BM. Over the past few decades, AML treatments have remained largely unchanged; chemotherapy, and stem cell transplantation are commonly employed.[ In recent years, considerable progress has been made in our understanding of disease pathogenesis and identification of biomarkers used to diagnose AML and aid treatment. Survival outcomes have improved, especially in younger patients. However, AML has a very poor prognosis in older patients, most of whom die from disease relapse. Therefore, the mechanisms of leukemogenesis and progression require further attention.GEP uses cDNA microarrays or oligonucleotide probes to measure the levels of many mRNA transcripts simultaneously,[ allowing abnormalities and variations to be assessed on a genome-wide basis. In recent years, many studies have explored AML gene expression using microarrays,[ identifying new clinical subgroups[ and providing valuable diagnostic and prognostic information.[ Previously, AML progression-related genes were identified using basic methods such as DEM screening.[ WGCNA is a new algorithm identifying correlation patterns among genes across multiple microarray samples, creating modules of highly co-expressed genes and relating these modules to clinical phenotypes.[Here, we identified 100 hub DEGs in AML and non-leukemia BM samples. Functional enrichment indicated that the hub genes were significantly enriched in 28 pathways divided into 10 groups, including regulation of the immune system, defense, and negative regulation of apoptosis and myeloid cell differentiation. Many of these pathways have already been reported to be active in AML. We found that as AML developed, negative regulation of myeloid cell apoptosis, and differentiation triggered accumulation of immature myeloid progenitors in PB and BM, accompanied by immune system activation and dysregulation. We constructed a hub gene PPI network and identified the genes with significant interactions; the most significant cluster was evaluated further. We found 4 hub genes, the expression of which was correlated significantly with OS.Of these hub DEGs, some have been reported previously to be expressed in AML patients. FLT3 is a receptor tyrosine kinase expressed by immature hematopoietic cells and is involved in the normal development of stem cells and the immune system. FLT3 mutations have been detected in many AML patients with poor prognoses.[ CEBPE is a member of the CCAAT/enhancer binding protein family that plays important roles in normal myelopoiesis. Loss of CEBPE activity may play a role in AML pathogenesis. Increased CEBPE activity suppresses the leukemia phenotype of acute promyelocytic leukemia, suggesting that targeted modulation of CEBPE activity may constitute a new form of acute promyelocytic leukemia therapy.[ HOX gene dysregulation is a common feature of AML. Dysregulation of HOXA9 or HOXA10 triggers transformation to leukemia by altering hematopoietic stem cell self-renewal and differentiation properties. HOX overexpression is a poor prognostic marker in AML patients.[ S100A8, a member of the damage-associated molecular pattern family, is differentially expressed in many cell types but abundant in myeloid cells and involved in the progression of various cancers including leukemia. The S100A8 expression level is correlated with poor AML outcomes. S100A8 also plays an important role in drug-resistance development in leukemia cells by promoting autophagy and inhibition of apoptosis.[ CCNA1 is a cell cycle protein overexpressed in many myeloid leukemia cell lines and in AML patients; its overexpression plays a role in the growth and apoptotic suppression of leukemia cells via repression of WT1 synthesis.[ Other hub genes (ATP1B1, CDA, MEIS1, MN1, HK3, CAMP, SIGLEC5, MNDA, FCER1G, ALOX5, and CSF3R) have been associated with AML previously.Of the various hub genes, some are also active in non-AML hematological or solid tumors. CLEC5A is a type II membrane protein critical for myeloid differentiation. CLEC5A dysregulation plays a role in the pathogenesis of myelodysplastic syndromes.[ NFE2 is overexpressed in most patients with myeloproliferative neoplasms, and its expression is regulated by epigenetic mechanisms that are perturbed in myeloproliferative neoplasms.[ TAL1 (SCL/TAL1, T-cell acute leukemia protein 1) is a TF involved in hematopoiesis and leukemogenesis. PADI4 is a promoter-dependent epigenetic cofactor of Tal1, which creates important epigenetic histone markers, and is a potential target of molecular therapy in some leukemias.[ MMP9 is a proteolytic enzyme that degrades components of the extracellular matrix and basement membrane; its overexpression is correlated with poor survival in ovarian cancer patients and promotes cancer cell invasion.[ C5AR1 (the receptor for the complement anaphylatoxin C5a) is a potent immune system mediator that plays a major role in malignant growth and dissemination of non-small cell lung cancer cells. Disruption of C5AR1 signaling in lung cancer cells abrogates tumor-associated osteoclastogenic activity, in turn impairing osseous colonization.[Of the 4 hub genes significantly correlated with OS in AML, SUSD3 is a cell-surface protein with unknown function that may be involved in the growth and development of breast cancer via the estrogen receptor pathway. Some evidence indicates that the gene acts downstream of the estrogen receptor and that estrogen enhances its expression. Lack of SUSD3 expression in breast cancer tissue was the most significant predictor of non-responsiveness to an aromatase inhibitor.[ Hippo signaling is an evolutionarily conserved pathway modulating organ growth, and SCHIP1 is a newly discovered member of the Hippo pathway. Loss of the SCHIP1 protein triggers overproliferation via upregulation of genes targeted by Yorkie (YAP in mammals), indicating that this protein serves as a tumor suppressor regulating the YAP oncogene.[ SPINK2 is a member of the family of Kazal-type serine proteinase inhibitors that inhibit trypsin/acrosin; SPINK2 is synthesized principally in the testes and seminal vesicles, promoting fertility. Dysregulation of SPINK2 is closely associated with cancers such as lymphomas; high levels of SPINK2 transcription in patients with primary cutaneous follicular center cell lymphomas are associated with improved prognosis and reduced mortality.[ Recently, SPINK2 expression was shown to predict poor outcomes in pediatric AML patients.[ Steroid hormones are synthesized by steroidogenic cells in various tissues and influence many developmental and physiological processes. STAR mediates the rate-limiting step in steroid biosynthesis, and a great deal of evidence supports the crucial role played by STAR in the regulation of steroid biosynthesis. Appropriate regulation of steroid hormone expression is essential for appropriate biological functioning, aiding geriatric populations to live longer and healthier.[ However, whether such genes play roles in AML pathogenesis requires further investigation.
Conclusion
We performed a comprehensive bioinformatics analysis of hub genes that may be involved in AML development. The genes and signaling pathways identified may be of clinical utility. However, further in vitro and in vivo molecular biological experiments are required to confirm the functions of the identified genes.
Acknowledgments
The authors would like to thank other colleagues of the department of hematology for supporting this project.
Author contributions
ZYM designed the study. TYP, ZLL, and DYY downloaded and analyzed the data. TYP and ZYM interpreted the data and drafted the manuscript. LZ, LS, and ZQ consulted with others and helped to revise the manuscript.
Authors: H A Drabkin; C Parsy; K Ferguson; F Guilhot; L Lacotte; L Roy; C Zeng; A Baron; S P Hunger; M Varella-Garcia; R Gemmill; F Brizard; A Brizard; J Roche Journal: Leukemia Date: 2002-02 Impact factor: 11.528
Authors: Daniel Ajona; Carolina Zandueta; Leticia Corrales; Haritz Moreno; María J Pajares; Sergio Ortiz-Espinosa; Elena Martínez-Terroba; Naiara Perurena; Fernando J de Miguel; Eloisa Jantus-Lewintre; Carlos Camps; Silvestre Vicent; Jackeline Agorreta; Luis M Montuenga; Ruben Pio; Fernando Lecanda Journal: Am J Respir Crit Care Med Date: 2018-05-01 Impact factor: 21.405
Authors: Stephan Kolodziej; Olga N Kuvardina; Thomas Oellerich; Julia Herglotz; Ingo Backert; Nicole Kohrs; Estel la Buscató; Sandra K Wittmann; Gabriela Salinas-Riester; Halvard Bonig; Michael Karas; Hubert Serve; Ewgenij Proschak; Jörn Lausen Journal: Nat Commun Date: 2014-05-29 Impact factor: 14.919