Literature DB >> 30214614

Integrated bioinformatic analysis of microarray data reveals shared gene signature between MDS and AML.

Zhen Zhang1, Lin Zhao1, Xijin Wei2, Qiang Guo1, Xiaoxiao Zhu1, Ran Wei1, Xunqiang Yin1,3, Yunhong Zhang1,3, Bin Wang2, Xia Li1.   

Abstract

Myeloid disorders, especially myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML), cause significant mobility and high mortality worldwide. Despite numerous attempts, the common molecular events underlying the development of MDS and AML remain to be established. In the present study, 18 microarray datasets were selected, and a meta-analysis was conducted to identify shared gene signatures and biological processes between MDS and AML. Using NetworkAnalyst, 191 upregulated and 139 downregulated genes were identified in MDS and AML, among which, PTH2R, TEC, and GPX1 were the most upregulated genes, while MME, RAG1, and CD79B were mostly downregulated. Comprehensive functional enrichment analyses revealed oncogenic signaling related pathway, fibroblast growth factor receptor (FGFR) and immune response related events, 'interleukine-6/interferon signaling pathway, and B cell receptor signaling pathway', were the most upregulated and downregulated biological processes, respectively. Network based meta-analysis ascertained that HSP90AA1 and CUL1 were the most important hub genes. Interestingly, our study has largely clarified the link between MDS and AML in terms of potential pathways, and genetic markers, which shed light on the molecular mechanisms underlying the development and transition of MDS and AML, and facilitate the understanding of novel diagnostic, therapeutic and prognostic biomarkers.

Entities:  

Keywords:  acute myeloid leukemia; gene expression profile; meta analysis; microarray; myelodysplastic syndrome

Year:  2018        PMID: 30214614      PMCID: PMC6126153          DOI: 10.3892/ol.2018.9237

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Hematopoietic stem cells (HSCs) reside in bone marrow (BM), and the BM microenvironment provides essential extrinsic signals to maintain the repopulation and differentiation of HSC (1). In addition, lineage specific transcriptional factors account fundamentally for the intrinsic control of HSC (2). When normal differentiation is hampered, the accumulation of immature HSCs and malignant neoplastic proliferation will occur. Myeloid disorders, especially myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML), are the most frequently reported malignant cases, which cause high mortality in adults (3). MDS is defined as clonal HSC malignancies characterized by ineffective hematopoiesis and dysplasia, with clinical manifestations of peripheral cytopenias, hypercellular BM, and variable degrees of increased blasts (4). AML normally originates from a small number of hematopoietic leukemic stem cells (LSCs) in BM, and their self-renewal and differentiation will generate leukemic progenitors, which will produce a considerable amounts of immature clonogenic leukemic blasts and interfere the normal hematopoiesis (5). Heterogeneous subsets of MDS and AML patients have been classified following World Health Organization (WHO) or French-American-British criteria (6), mainly based on subjective clinical findings (number of cytopenias and percentage of marrow blasts) and biological properties (specific cytogenetic and molecular lesions) (4). In addition, approximately 30% of the patients with MDS would progress into AML (7). Thus, a number of approaches have been developed to compare MDS and AML in the cytogenetic and molecular aspects, and some hallmark genes or phenotypes being established (8). However, the molecular mechanisms and biological events underlying MDS and AML development and their transition remain to be addressed. Gene expression profile analysis is a powerful research strategy, which integrates data in genetics, molecular transcription, and functional genomics to reveal dysregulated genes between patients and healthy donors. Microarray provides increasing body of gene-wide transcriptional data regarding MDS and AML. However, results vary between studies due to diversity in cohort selection, specimen source, and experimental designs. Therefore, meta-analysis is advantageous to enhance statistical power to detect the dysregulated genes and biological pathways by combining different publically available datasets. Microarray data integration-based meta-analyses rely on efficient in silico tools. With the advances of ever-growing theories and bioinformatics tools, we can now employ in silico tools to efficiently combine multiple microarray datasets regardless of different populations, experimental designs, and diseases (9). NetworkAnalyst is a powerful web-based tool, which supports robust and reliable gene expression analysis through approaches including preliminary data processing, sample annotation, batch effect adjustment, dataset integration, and results visualization (10). To maximally overcome the impact caused by the differences in study design and platform usage among different datasets, ‘Combining Effect Size (ES)’ analysis and Random Effect Modeling were applied to achieve more consistent and accurate results by taking into consideration of both direction and magnitude of gene expression changes. In the present study, we have selected 8 and 10 eligible microarray datasets for MDS and AML, respectively, from publicly available dataset repositories. To our knowledge, this is the first time that common transcriptional signature of MDS and AML are illuminated in patients vs. healthy individuals based on meta-analysis. Our results will shed light on the mechanistic foundations for MDS progression into AML, and propose novel targets for the prevention and development of both MDS and AML (11).

Materials and methods

Search strategy

Microarray-based gene expression profile studies were identified in the Pubmed database (http://www.pubmed.com), Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/gds/) and ArrayExpress dataset of the European Molecular Biology Laboratory-European Bioinformatics Institute (http://www.ebi.ac.uk/arrayexpress/) for MDS and AML. The following key words were used for MDS and AML respectively: myelodysplastic syndrome or dysmyelopoietic syndrome or hematopoetic myelodysplasia or MDS, and microarray or gene expression profile or gene expression profiling and acute myeloid leukemia or acute myeloblastic leukemia or acute myelocytic leukemia or acute nonlymphoblastic leukemia or acute nonlymphocytic leukemia) or acute myelogenous leukemia or AML or ANLL and microarray or gene expression profile or gene expression profiling.

Inclusion and exclusion criteria

Eligible studies and datasets should follow these inclusive criteria: i) patient and healthy control studies of human; ii) analysis of gene expression profiling; iii) comparable experimental conditions and untreated; and iv) available complete raw and processed microarray data. Studies were excluded if they were: i) letters, abstracts, meta-analysis, review articles and case repor; ii) cell lines were used in experimental design; iii) RT-PCR only for profiling studies; and iv) studies without healthy control. All the datasets and references, which are conformed to the criteria mentioned above, were carefully screened. The latest search was performed on May 30, 2017.

Data extraction and processing

Full text and supplementary materials of selected articles were screened and the key items were extracts as listed: GEO series accession number, type of disease, number of patients and healthy donors, specimen sources and platform of microarray (Table I). The series matrix files were downloaded from GEO datasets for all studies, with the exception of GSE983, whose CEL files were obtained and processed further by R platform to generate preliminary series matrix file. Common Entrez IDs were used to substitute all the gene probes in accordance with the corresponding microarray platforms. Before integrative meta-analysis, individual dataset was normalized by log2 transformation and R-mediated mean, and quantile normalization. Expression data of patients with MDS or AML and healthy donors were defined as class 2, and 1, respectively, according to the guidelines of NetworkAnalyst (10).
Table I.

Summary of individual studies included in the meta-analysis.

Author, yearGEO accession no.DiseaseSample sourcePlatform(Refs.)
Del Rey et al, 2013GSE41130MDSBone marrow mononuclear cellsAffymetrix Human Genome U133 Plus 2.0 Array(8)
Pellagatti et al, 2010GSE19429MDSBone marrow CD34+ cellsAffymetrix human genome U133 plus 2.0 array(13)
Sternberg et al, 2005GSE2779MDSBone marrow CD34+ cellsAffymetrix human genome U133A array(14)
Graubert et al, 2011GSE30195MDSBone marrow CD34+ cellsAffymetrix human genome U133 plus 2.0 array(15)
Pellagatti et al, 2006GSE4619MDSBone marrow CD34+ cellsAffymetrix human genome U133 plus 2.0 array(16)
Wang et al, 2013GSE51757MDSBone marrowAgilent-028004 surePrint G3 human GE 8×60K microarrayUnpublished
Gerstung et al, 2015GSE58831MDSBone marrow CD34+ cellsAffymetrix human genome U133 plus 2.0 srray(17)
Xu et al, 2016GSE81173MDSBone marrow CD34+ cellsAffymetrix human gene expression arrayUnpublished
Kikushige et al, 2010GSE24395AMLBone marrow CD34+CD38-cellsSentrix human-6 v2 expression beadchip(18)
de Jonge et al, 2011GSE30029AMLBone marrow CD34+ cellsIllumina human HT-12 V3.0 expression beadchip(19)
Bacher et al, 2012GSE33223AMLBone marrow CD34+ cellsAffymetrix human genome U133 plus 2.0 array(20)
Stirewalt et al, 2012GSE37307AMLBone marrow CD34+ and peripheral blood cellsAffymetrix human genome U133A arrayUnpublished
Schneider et al, 2015GSE68172AMLBone marrowAffymetrix human genome U133 plus 2.0 array(21)
Virtaneva et al, 2001GSE70284AMLBone marrowAffymetrix human full length HuGeneFL array(22)
Zheng et al, 2016GSE79605AMLBone marrow mononuclear cellsAgilent-014850 whole Human genome microarrayUnpublished
von der Heide et al, 2016GSE84881AMLBone marrow mesenchymal stromal cellsAffymetrix human genome U133 plus 2.0 array(23)
Stirewalt et al, 2008GSE9476AMLBone marrow CD34+ and peripheral blood cellsAffymetrix human genome U133A array(24)
Stegmaier et al, 2004GSE983AMLPrimary patient AML cellsAffymetrix human full length HuGeneFL array(25)

MDS, myelodysplastic syndrome; AML, acute myeloid leukemia; NA, not available.

Batch effect adjustment

NetworkAnalyst is capable of massive datasets integration on the premise of batch effect adjust option. The processed and normalized datasets were uploaded and subjected to the well-established ComBat procedures to remove study-specific batch effects, which uses the Emperical Bayes method to adjust the extreme expression ratios, alleviate gene variances across all other genes, and possibly removing their inference without compromising the biological covariates (12). The sample clustering patterns with and without batch effect adjustment were visualized and compared by principal component analysis (PCA) to assess the efficiency of batch effect removal.

Meta-analysis

We conducted the meta-analysis using NetworkAnalyst, a web interface for integrative statistical and visualizing tool. In the option of ‘multiple gene expression data’ for the web interface, All datasets were uploaded to the ‘multiple gene expression data’ input area and analyzed in a streamlined manner, including data processing for Entrez ID, annotation check, view by both Boxplot and PCA plot, confirmation of normalization, individual DEGs analysis and data summary. For the DEGs discrimination, the cut-off of P-value was adjusted to 0.05, using the false discovery rate (FDR) based on Benjamini-Hochberg procedure and moderated t-test based on the Limma algorithm. Furthermore, all datasets were subjected to integrity check to ensure that the merged data could be carried out by ES combination, which allowed the generation of more biologically consistent meta-based DEGs by incorporation of both the magnitude and direction of gene fold change. Between the two popular methods: Fixed (FEM) and random effects models, REM was used in the current study. REM model assumes that each study contains a random ES that incorporates unknown cross-study heterogeneities, as demonstrated by Cochran's Q tests. ‘Define custom signature’ tool from NetworkAnalyst was used to produce the heatmap visualization for both top 25 up- and downregulated genes.

PPI Network analysis

PPI Network-based analysis was performed using NetworkAnalyst (10). In PPI networks, nodes stand for proteins, while edges represent known interactions between the linked proteins. Topology and subnetwork analyses were performed in this study, in purpose to demonstrate its overall structural properties and highlight part of the network that had shown significant changes. In short, the subnetwork analysis was executed in three steps: i) obtaining the list of DEGs by meta-analysis; ii) proceeding the list to the IMEx interactom-based PPI network analysis; and iii) choosing zero-order and minimum network to avoid ‘hairball effect’ and overall presentation of the structure. In addition, there are two kinds of complementary measurements in NetworkAnalyst to reveal the most important nodes, also called hub genes: Degree and betweenness centrality. Degree centrality is the number of connections that a node has to other nodes, whereas betweenness centrality corresponds to the number of shortest paths passing through the node. From the parent network, the most significant modules of hub genes were extracted for both up- and downregulated DEGs using the ‘module explorer tool’, based on the random walks-dependent Walktrap algorithm.

Functional gene set enrichment analysis of shared DEGs

To demonstrate the implication of shared DEGs in MDS and AML, we conducted an enriched pathway group analysis by ClueGO of Cytoscape, a software with significantly expanded annotated gene sets of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway databases. The processes were submitted to the followed settings and parameters: enrichment (right-sided) hyper-geometric distribution tests, with a P-value ≤0.05; Benjamini and Hochberg adjustment for the terms and the groups with Kappa-statistics score threshold set to 0.4; leading term groups were selected based on the highest significance. The gene ontology (GO) terms enrichment analysis was conducted by topGO of Bioconductor version 2.14, R 3.1.1, and summarized by REVIGO. The enriched GO terms or pathways were considered to be significant with adjusted P-value <0.05.

Statistical analysis

The web-server NetworkAnalyst analyzed the data and performed meta-analysis by REM-based combined ES statistics. DEGs were selected by FDR guided Benjamini-Hochberg procedure with adjusted P-value <0.05. Hypergeometric test (right-sided) and Benjamini & Hochberg FDR correction were used to identify significantly enriched biological pathways in DEGs by ClueGO of Cytoscape. Significantly enriched biological processes were monitored using top GO and REVIGO analysis and P<0.05 was considered to indicate a statistically significant difference.

Results

Selection of eligible microarray datasets

2012 and 476 literatures were retrieved from Pubmed for AML and MDS, respectively, according to our search strategy. And after a series of inclusion and exclusion screening, a total of 8 and 10 microarray datasets were selected for MDS (GSE19429, GSE2779, GSE30195, GSE41130, GSE4619, GSE51757, GSE58831 and GSE81173) (8,13–17) and AML (GSE24395, GSE30029, GSE33223, GSE37307, GSE68172, GSE70284, GSE79605, GSE84881, GSE9476 and GSE983) (18–25) respectively, as indicated in Materials and methods (Fig. 1A). All the 18 datasets were carefully screened to meet the inclusion and exclusion criteria for meta-analysis. Collectively, a total of 106/281 and 75/465 (healthy control/patient) samples were included for MDS and AML, respectively. There are 8 different microarray platforms applied in the selected datasets, which include Affymetrix Human Genome U133 Plus 2.0 Array, Affymetrix Human Genome U133A Array, Affymetrix Human Full Length HuGeneFL Array, Affymetrix Human Gene Expression Array, Illumina HumanHT-12 v.3.0 expression beadchip, Sentrix Human-6 v2 Expression BeadChip, Agilent-014850 Whole Human Genome Microarray, and Agilent-028004 SurePrint G3 Human GE 8×60 K Microarray. In addition, various tissues, such as total BM cells, FACS-sorted CD34+ hematopoietic progenitors, and peripheral blood cells, were used for the microarray analysis. In the process of data integration, patient samples of MDS and AML were not distinguished, for the purpose to uncover the common gene signature shared between MDS and AML. Table I presents the detailed information for each dataset, including GEO accession number, type of disease, sample composition, sample source, and the corresponding references.
Figure 1.

Flowcharts for microarray datasets selection and meta-analysis. (A) Selection process of microarray datasets for meta-analysis of shared gene expressional signature between MDS and AML. (B) Process of meta-analysis based data exploration. MDS, myelodysplastic syndrome; AML, acute myeloid leukemia.

Identification of common differentially expressed genes (DEGs) between MDS and AML

The workflow for meta-analysis used in the present study was illustrated in Fig. 1B. To identify the shared transcriptional signatures between MDS and AML, the total of 18 datasets were simultaneously analyzed by NetworkAnalyst. When the 18 microarray datasets were analyzed individually, a total of 7579 DEGs were revealed. For meta-analysis, we uploaded the 18 datasets in succession, and each dataset was processed by Entrez ID matching, sample (control/patient) annotation and individual DEGs identification. To remove batch effects among different datasets, we performed ‘ComBat’-based batch effect adjustment (12), and the sample clustering patterns with or without batch effect adjustment were visualized by PCA plots (Fig. 2A and B). All the 18 gene expressional microarray data were then integrated and merged. Meta-analysis was conducted following the Cochran's Q test, REM and ES statistical methods, which facilitated to reveal the DEGs between healthy donors and patients across different microarray datasets by permitting variable true ES and integrating unknown cross-study heterogeneities. Finally, we found a total of 330 DEGs, including 191 up- and 139 downregulated genes across the 18 datasets with significance threshold of adjusted P-value <0.05. By comparing the list of individually identified DEGs with meta-analysis based DEGs, 211 DEGs were revealed by both analyses. Of note, 119 DEGs were uniquely discovered by meta-analysis, which were referred as gained genes, and 7378 DEGs were only found in individual analysis, which were defined as lost genes (Fig. 2C). Expression profile of the top 25 up- and downregulated genes among the 330 identified DEGs was visualized by heatmap (Fig. 2D). Due to the large number of healthy donor and patient samples, the heatmap was divided into 3 parts by GEO datasets. The gene expressional profiles of the top 25 up- and downregulated genes could be easily visualized to be consistent across different datasets, except those with small sample numbers, including GSE33223, GSE79605, GSE41130 and GSE84881. The Parathyroid hormone 2 receptor (PTH2R), Tec protein tyrosine kinase (TEC), and Glutathione peroxidase 1 (GPX1) were among the most significantly upregulated genes, and PTH2R had the highest combined ES of 1.0994, which was upregulated consistently with ES ranging from 0.47068 to 2.8508 across all the 18 datasets (data not shown), while Matrix metalloproteinase 12 (MME), Recombination activating 1 (RAG1), and CD79b molecule (CD79B) were among the most significantly downregulated genes, and MME had the relatively highest combined ES of −1.29, which was consistent in 15 out of 18 datasets with ER ranging from −3.7844 to −0.33961 (data not shown). Additionally, top 10 up- and downregulated DEGs were listed in Table II.
Figure 2.

Meta-analysis based DEGs and gene expression profiles. (A) PCA-3D plot for sample clustering of microarray datasets without batch effect adjustment. (B) PCA-3D plot for sample clustering of microarray datasets with batch effect adjustment. (C) Venn diagram of DEGs by meta-analysis (meta DEGs) and individual microarray dataset analysis (individual DEGs). (D) Heat-map visualization of expressional profiles for top 25 up- and downregulated DEGs identified by meta-analysis. Genes were ranked by combined ES value. DEGs, DEGs, differentially expressed genes; Var1: variate 1, represents different datasets by colors; Var2: variate 2, represents control and patient samples by colors.

Table II.

Top 20 DEGs shared by MDS and AML.

A, Top 10 upregulated genes

Entrez IDGene symbolCombined ESAdjusted P-value
5746PTH2R1.0994P<0.001
7006TEC0.89751.3×10−06
2876GPX10.85643.2×10–03
445ASS10.85431.2×10−06
59ACTA20.78317.8×10-04
6565SLC15A20.73722.2×10−03
9124PDLIM10.73481.4×10-04
3297HSF10.73438.0×10−04
7490WT10.73203.3×10-03
5476CTSA0.72711.3×10−04

B, Top 10 downregulated genes

Entrez IDGene symbolCombined ESAdjusted P-value

4311MME−1.28672.9×10-06
5896RAG1−1.16632.2×10−03
974CD79B−1.15972.0×10-05
9590AKAP12−1.12011.9×10−03
4318MMP9−1.11892.9×10-02
4050LTB−1.06621.9×10−06
2308FOXO1−1.06036.1×10-12
7441VPREB1−1.01971.7×10−02
753C18orf1−1.01691.5×10-03
216ALDH1A1−0.99863.7×10−04

DEGs, differentially expressed genes; ES, effect size.

Hub genes identification by network based meta-analysis

Many studies have shown that ‘hub’ proteins are more likely to be encoded by pleiotropic genes or genes that are related to certain diseases (26). Hub nodes are potentially key molecules in signaling, as they are highly interconnected with dysregulated genes and they often receive and integrate multiple signals and pass them onto downstream nodes (27). In order to identify the key hub genes among the common DEGs shared by MDS and AML, we intentionally performed a network based meta-analysis. NetworkAnalyst provided us with a protein-protein interaction (PPI) network analysis tool by integrating the IMEx (International Molecular Exchange Consortium) interactome with the original seed of 330 DEGs. The NetworkAnalyst-based PPI network analysis generated first-order, minimum-order and zero-order networks with (4507 nodes, 9421 edges), (873 nodes, 3510 edges), and (123 nodes, 157 edges), respectively. The minimum-order network shows the overall structure (Fig. 3A), while the zero-order network helps to visualize the detailed interactions between different seeds (Fig. 3B). HSP90AA1 (combined ES: 0.36197, adjusted P-value: 0.0026418) and CUL1 (combined ES: −0.54806, adjusted P-value: 0.002118) were found to be the top ranked hub genes by degree and betweenness centrality analysis among the up- and downregulated DEGs. We also listed the top 10 ranked hub genes with detailed information in Table III. Additionally, we used the ‘module explorer’ tool to highlight two most significant modules as two sub-networks composed of HSP90AA1 (19 nodes and 23 edges) and CUL1 (20 nodes and 25 edges), respectively (Fig. 3C and D).
Figure 3.

PPI network based hub gene analysis. (A) Minimum order of PPI network structure of DEGs identified by meta-analysis with Fruchterman-Rengold layout. Red nodes represent upregulated and green nodes represent downregulated DEGs. (B) Zero order PPI network of shared DEGs by meta-analysis. (C and D) PPI subnetworks representative of up- and downregulated DEGs. WalkTrap algorithm based ‘module explorer’ of NetworkAnalyst extracted the module. PPI, protein-protein interaction; DEGs, DEGs, differentially expressed genes.

Table III.

Top 10 shared hub genes identified by network based meta-analysis.

Gene symbolRegulationDegreeBetweennessCombined ES
HSP90AA1Up7222416884.660.3620
CUL1Down6031592425.89−0.5481
CUL5Up366700674.500.3516
IL7RDown199349830.53−0.7247
MAP3K3Down176304739.58−0.4870
XRCC5Down169372212.88−0.4065
CDKN2AUp148316047.110.5783
RPL11Up134157024.970.4898
SP3Down132376542.88−0.5418
TLE1Down108302176.33−0.6311

ES, effect size.

Gene set enrichment analysis for identification of overrepresented biological pathways

In order to thoroughly understand the pathways involved in MDS and AML development, the enriched biological pathways for up- and downregulated DEGs were functionally grouped with the threshold of P-value <0.05 by the ClueGO plugin of Cytoscape v.3.4.0, and only KEGG and Reactome pathway databases were selected. For upregulated DEGs, we found 88 enriched biological pathways, which can be divided into 16 groups, mainly involved in ‘Signaling by fibroblast growth factor receptor (FGFR) in disease’, ‘EPH-ephrin mediated repulsion of cells’, ‘Histidine, lysine, phenylalanine, tyrosine, proline and tryptophan catabolism’, ‘Positive epigenetic regulation of rRNA expression’, and ‘Neurophilin interactions with VEGF and VEGFR’ (Fig. 4A). For downregulated DEGs, we found 37 enriched biological pathways, which can be divided into 17 groups, mainly involved in ‘Interleukine-6 signaling’, ‘Integration of provirus’, ‘Graft-versus-host disease’, ‘B cell receptor signaling pathway’, and ‘Effects of PIP2 hydrolysis’ (Fig. 4B).
Figure 4.

Over representation of enriched pathways for DEGs. (A) Enriched pathway groups were generated with Cytoscape plug-in (ClueGO) by integrating the upregulated genes with KEGG and Reactome pathways. (B) Enriched pathway groups were generated by integrating the downregulated genes with KEGG and Reactome pathways. The node size indicates greater significance of enrichment, and the colors represent different groups. The pathways with adjusted P-value <0.05 are shown in the network. DEGs, DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Gene set enrichment analysis for identification of overrepresented GO terms

Gene set enrichment analyses were conducted for the analysis of overrepresented gene ontology (GO) terms, which were enriched by topGO package of R for the 191 up- and 139 downregulated DEGs with the threshold of P-value <0.05. REVIGO is a web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures (28). Using REVIGO, the top 200 enriched GO terms for upregulated DEGs were shown in 20 subsets, including ‘positive regulation of tyrosine phosphorylation of Stat4 protein’ (comprised by 58 terms), ‘adrenal cortex formation’ (comprised by 33 terms), ‘establishment or maintenance of transmembrane electrochemical gradient’ (comprised by 20 terms), ‘heart contraction’ (comprised by 10 terms), etc, which were visualized by treemap generated by R (Fig. 5A). Similarly, 13 subsets were shown for downregulated DEGs, including ‘B cell receptor signaling pathway’ (comprised by 75 terms), ‘embryo implantation’ (comprised by 24 terms), ‘fructose metabolism’ (comprised by 23 terms), ‘response to lipopolysaccharide’ (comprised by 18 terms), etc, which were also visualized by treemap (Fig. 5B).
Figure 5.

Over representation of enriched biological processes for DEGs. (A) REVIGO gene ontology treemap for upregulated DEGs by meta-analysis. (B) REVIGO gene ontology treemap for downregulated DEGs by meta-analysis. DEGs, DEGs, differentially expressed genes.

Discussion

Myeloid disorders, including MDS and AML, represent a group of hematopoietic malignancies with monoclonal expansion of immature myeloid lineages. Both MDS and AML are characterized by abnormal accumulation of defective or immature blasts in the BM, and MDS patients with 10–19% blasts are considered as high-risk of progressing to AML (>20% blasts) (29). The transformed immature cells in both diseases are biologically, genetically, and molecularly similar, thus identification of common molecular markers may best indicate the appropriate risk factors, as well as preventive, diagnostic and therapeutic decisions for these patients. Despite significant amounts of studies have used microarray-based technology to identify molecular markers in MDS and AML, inconsistent results have been reported due to diversity in patient selection, tissue source and study designs. Therefore, in the present study, we attempted to identify the common gene signature underlying MDS and AML by a comprehensive meta-analysis of 18 publically available microarray datasets. We found that there were 330 DEGs of P-value <0.05 in total shared by both MDS and AML, with 191 up- and 139 down-regulated. Importantly, 119 out of 330 DEGs were identified uniquely by meta-analysis, not individual studies. By uncovering shared gene expressional profiles, this study highlight potential diagnostic and prognostic biomarkers in MDS and AML, and may aid in understanding the molecular mechanisms of their development and progression. Among the top ten upregulated DEGs, PTH2R encodes a receptor for parathyroid hormone (PTH), which belongs to the G-protein coupled receptor 2 families, and has been previously suggested as novel marker for AML (30). TEC, a non-receptor type protein-tyrosine kinase, was revealed to be highly expressed in MDS patients (31). GPX1, which encodes a glutathione peroxidase and helps to reduce organic hydroperoxides and hydrogen peroxide (H2O2) by glutathione, has been reported to be dramatically upregulated in AML and relate with MDS by SNPs assay and DNA methylation (32,33). ASS1 (Argininosuccinate synthetase 1) gene product is responsible for the process of arginine biosynthesis; however, Miraki-Moud et al (34), reported that most AML lacked ASS1 expression, which may be due to small sample size and the different detection methods. Other genes, such as HSF1 (Heat-shock transcription factor 1), WT1 (Wilms tumor 1), ACTA2 (Actin, alpha 2), SLC15A2 (Solute carrier family 15 member 2), PDLIM1 (C terminal LIM domain protein 1), and CTSA (Cathepsin A) were revealed by our meta-analysis, and even some of them have been reported previously with increased expression in AML and MDS (35,36), their potential as diagnostic or prognostic markers in MDS and AML needs further exploration. Among the top ten downregulated DEGs, although MME encoded protein MMP12 was believed to be related with acute lymphocytic leukemia (ALL) diagnosis (37), its role in MDS and AML remains to be demonstrated. AKAP12 encodes a kinase, which is a part of the holoenzyme of PKA and serves as a scaffold protein for signaling transduction. The low expression of AKAP12 had been reported in both MDS and AML (38), which was consistent with our results. MMP9 encodes a matrix metalloproteinase protein, and has been shown to express at reduced levels in AML. LTB (Lymphotoxin beta) gene expression has been reported to decrease in malignant myeloid cells, and is potentially involved in AML (39). FOXO1 (Forkhead box O1) is a tumor suppressor gene and its low expression level was shown to correlate with AML with FLT3 internal tandem duplication mutation (Flt3-ITD) (40). ALDH1A1 (Aldehyde dehydrogenase 1 family member A1) was also found to be minimally expressed or undetectable in about 25% AML patients (41). In addition, our meta analysis also revealed a number of novel genes, such as RAG1 (Recombination activating 1); CD79B; VPREB1 (V-set pre-B cell surrogate light chain 1) and C18orf1 (Low density lipoprotein receptor class A domain containing 4), which have not been previously reported in MDS and AML. These candidate genes require further studies to evaluate their biological functions and biomarker potentials in both MDS and AML. The network biology analysis is an efficient way for systematically investigating the molecular complexity of a particular disease, facilitating the discovery of biomarker and drug targets (42). Integrating the list of DEGs by meta-analysis with IMEx interactome-based PPI network, we identified HSP90AA1 and CUL1 as the most important hub genes among up- and downregulated DEGs, respectively, based on network centrality scoring across the 18 datasets. HSP90AA1, also known as LAP2/HSPC1, encodes a product that serves as protein chaperone with major functions in protein folding and stabilization as a homodimer, and by regulating a number of cancer related proteins such as AKT, CDK4, HIF-1, VEGFR, ERBB2 and MMPs, HSP90AA1 largely dictates tumor proliferation, survival, invasion, metastasis and angiogenesis (43). Consistent with our meta-based DEGs analysis, increased expression of HSP90AA1 has been well stated in both MDS and AML (44,45). The protein level of HSP90AA1 was higher in patients with higher grade MDS, which is associated with short survival and increased risk of progression into AML (44). Similarly, AML patient with higher HSP90AA1 level showed lower remission rates, and was correlated with poorer AML prognosis (46). The network-based hub analyses also revealed that HSP90AA1 served as the central upregulated protein, which may coordinate a cohort of signaling pathways mediated by LTK, MAP3K3, MAP2 K, etc, and promote MDS and AML formation. Our results validated and supported the importance of targeting of HSP90AA1 for MDS and AML therapy in the clinic (47,48). The downregulatedhub gene, CUL1, encodes an essential component of the SCF (SKP1-CUL1-F-box protein) E3 ubiquitin ligase complex, in which it serves as the scaffold protein to organize SKP-1-F-box protein and RBX1 subunit (49). The complex mediates the ubiquitination involved in cell cycle progression, signal transduction and transcription (50). Thus, loss of CUL1 expression may impair signaling cascades involved cell cycle, signaling pathways and protein expression, which may lead to the development of MDS and AML. In order to decipher the enriched biological pathways participating in MDS and AML, we submitted the up- and downregulated DEGs separately to Clue GO plugin of Cytoscape to group functionally related pathways. Interestingly, we found that ‘Signaling by FGFR in disease’ related pathways group was top enriched. The FGFR family is composed of four kinase receptor members: FGFR1-4, which are universally expressed and the FGFR signaling activation had been demonstrated to promote survival, migration of AML cells, and their resistance to chemotherapy. In the DEGs of our meta-analysis, FGF7 and FGFR3 gene expression were both upregulated. FGF7 is a ligand of FGFR2b with high affinity, which have been recently reported to serve as a potentially niche factor for hematopoietic stem and progenitor cells (HSPCs) support and leukemic growth by activating FGFR2b signaling pathway (51). Besides, the truncated recombinant human FGF7, palifermin, which compete for binding of FGFR2b, has been FDA-approved for the treatment of patients with oral mucositis (52). It is meaningful to explore the potential application of palifermin in MDS and AML intervention or even therapy according to our meta analysis. FGFR3 belongs to a family of receptor tyrosine kinases (RTKs) responding to FGF, and will stimulate the downstream signaling modules, including the phosphatidylinositol 3-kinase (PI3K)/AKT and phospholipase C-γ (PLC-γ) pathways (53,54). And in murine model of leukemia, it has been revealed that FGFR-3 activity is important for hematopoietic transformation (55). For the downregulated genes, ‘Interleukine-6 signaling’ related pathways were the top enriched, which were well represented by immune related genes: IL12RB2, IRF4 and IRF8 down-regulation. IL12RB2 encodes a type I transmembrane protein identified as a subunit of the interleukin 12, which interacts with IL12RB1 and form the high-affinity binding site for IL12 to reconstitute IL12 dependent signaling. Although IL12RB2 down regulation have not yet been reported in MDS or AML, its silencing in B cell malignant tumors illustrated that neoplastic B cells would escape the IL-12-mediated apoptosis and growth inhibition in the absence of IL12RB2 (56). From this point of view, the down regulation of IL12RB2 in myeloid cells might help them to evade the anti-tumor activity of IL-12. Besides, IRF4 and IRF8 gene products belong to IRF (interferon regulatory factor) family of transcription factors. It has been reported that the Irf4−/− Irf8−/− double knockout mice can develop aggressive myeloid disorders rather like CML (57). Furthermore, IRF4 induction by the long non-coding RNA linc-223 would inhibit cell proliferation and stimulate AML cell differentiation (58). The promoter of IRF8 was found to be hypermethylated in MDS and AML patients (59), which might associate with its transcriptional repression revealed by our meta analysis. In addition, IRF8 expression level was potential prognostic biomarkers for adult patients with AML (60). Similar results were also observed by REVIGO analysis, that ‘FGFR signaling pathway’ was highly enriched, whereas a series of immune related pathways were downregulated including ‘B cell receptor signaling pathway’, ‘positive regulation of interferon-gamma production’, ‘adaptive immune response’, etc, which further highlighted the significance of immune responses down-regulation in MDS and AML development. Our meta-analysis is based on gene expressional microarray, which has become a principle technology for transcriptome analysis to support drug screening and health evaluation. With the development of genetic detecting technology, next-generation sequencing technologies present new ways of genetic mutation analysis, especially for whole genome sequencing. Currently, there was no comparable research focusing on common gene expressional profiles of MDS and AML by transcriptome or genomic sequencing; therefore, we compared the DEGs of MDS or AML uncovered by our meta-analysis with that of whole genome sequencing individually. We found that 4 (ZFHX2, PTPRD, STAG2 and ALAS2) out of 105 somatic mutated genes for MDS, and 2 (FLT3 and WT1) out of 23 significantly mutated genes for AML were covered by DEGs of our meta-analysis respectively (data not shown) (61,62). The finding indicated that the potential diagnostic or prognostic biomarkers obtained by our meta-analysis are more likely to undergo transcriptional regulations instead of genetic mutations. In summary, our meta-analysis revealed 330 DEGs, some of which have been proved critical for MDS or AML progress, and some others deserve further exploration for their potential as biomarkers for both MDS and AML. Functional enrichment analysis demonstrated that tumor related processes or pathways were upregulated, such as ‘Signaling by FGFR in disease’; however, the immune response related pathway, such as ‘Interleukine-6 signaling’ was downregulated in both MDS and AML, which may predict to the common events during MDS or AML development.
  60 in total

1.  High expression of heat shock protein 90 alpha and its significance in human acute leukemia cells.

Authors:  Wen-Liang Tian; Fei He; Xue Fu; Jun-Tang Lin; Ping Tang; Yu-Min Huang; Rong Guo; Ling Sun
Journal:  Gene       Date:  2014-03-25       Impact factor: 3.688

2.  Gene expression profiling in the leukemic stem cell-enriched CD34+ fraction identifies target genes that predict prognosis in normal karyotype AML.

Authors:  H J M de Jonge; C M Woolthuis; A Z Vos; A Mulder; E van den Berg; P M Kluin; K van der Weide; E S J M de Bont; G Huls; E Vellenga; J J Schuringa
Journal:  Leukemia       Date:  2011-07-15       Impact factor: 11.528

3.  Azacitidine has limited activity in 'real life' patients with MDS and AML: a single centre experience.

Authors:  Murat Ozbalak; Mustafa Cetiner; Huseyin Bekoz; Elif Birtas Atesoglu; Cem Ar; Ayse Salihoglu; Nukhet Tuzuner; Burhan Ferhanoglu
Journal:  Hematol Oncol       Date:  2011-03-08       Impact factor: 5.271

4.  Structure of the Cul1-Rbx1-Skp1-F boxSkp2 SCF ubiquitin ligase complex.

Authors:  Ning Zheng; Brenda A Schulman; Langzhou Song; Julie J Miller; Philip D Jeffrey; Ping Wang; Claire Chu; Deanna M Koepp; Stephen J Elledge; Michele Pagano; Ronald C Conaway; Joan W Conaway; J Wade Harper; Nikola P Pavletich
Journal:  Nature       Date:  2002-04-18       Impact factor: 49.962

Review 5.  The World Health Organization (WHO) classification of the myeloid neoplasms.

Authors:  James W Vardiman; Nancy Lee Harris; Richard D Brunning
Journal:  Blood       Date:  2002-10-01       Impact factor: 22.113

6.  Evidence for reduced B-cell progenitors in early (low-risk) myelodysplastic syndrome.

Authors:  Alexander Sternberg; Sally Killick; Tim Littlewood; Chris Hatton; Andy Peniket; Thomas Seidl; Shamit Soneji; Joanne Leach; David Bowen; Claire Chapman; Graham Standen; Edwin Massey; Lisa Robinson; Bipin Vadher; Richard Kaczmarski; Riaz Janmohammed; Kim Clipsham; Andrew Carr; Paresh Vyas
Journal:  Blood       Date:  2005-08-02       Impact factor: 22.113

7.  Low expression of the putative tumour suppressor gene gravin in chronic myeloid leukaemia, myelodysplastic syndromes and acute myeloid leukaemia.

Authors:  Jacqueline Boultwood; Andrea Pellagatti; Fiona Watkins; Lisa J Campbell; Noor Esoof; Nicholas C P Cross; Helen Eagleton; Tim J Littlewood; Carrie Fidler; James S Wainscoat
Journal:  Br J Haematol       Date:  2004-08       Impact factor: 6.998

8.  Constitutive activation of fibroblast growth factor receptor 3 by the transmembrane domain point mutation found in achondroplasia.

Authors:  M K Webster; D J Donoghue
Journal:  EMBO J       Date:  1996-02-01       Impact factor: 11.598

9.  Expression of the Wilms' tumor gene (WT1) in human leukemias.

Authors:  H Miwa; M Beran; G F Saunders
Journal:  Leukemia       Date:  1992-05       Impact factor: 11.528

Review 10.  Data integration in the era of omics: current and future challenges.

Authors:  David Gomez-Cabrero; Imad Abugessaisa; Dieter Maier; Andrew Teschendorff; Matthias Merkenschlager; Andreas Gisel; Esteban Ballestar; Erik Bongcam-Rudloff; Ana Conesa; Jesper Tegnér
Journal:  BMC Syst Biol       Date:  2014-03-13
View more
  3 in total

1.  Deciphering molecular heterogeneity in pediatric AML using a cancer vs. normal transcriptomic approach.

Authors:  Barbara Depreter; Barbara De Moerloose; Karl Vandepoele; Anne Uyttebroeck; An Van Damme; Eva Terras; Barbara Denys; Laurence Dedeken; Marie-Françoise Dresse; Jutte Van der Werff Ten Bosch; Mattias Hofmans; Jan Philippé; Tim Lammens
Journal:  Pediatr Res       Date:  2020-10-17       Impact factor: 3.756

2.  IGF‑IR promotes clonal cell proliferation in myelodysplastic syndromes via inhibition of the MAPK pathway.

Authors:  Qi He; Qingqing Zheng; Feng Xu; Wenhui Shi; Juan Guo; Zheng Zhang; Sida Zhao; Xiao Li; Chunkang Chang
Journal:  Oncol Rep       Date:  2020-06-19       Impact factor: 3.906

3.  Circulating Small Noncoding RNAs Have Specific Expression Patterns in Plasma and Extracellular Vesicles in Myelodysplastic Syndromes and Are Predictive of Patient Outcome.

Authors:  Andrea Hrustincova; Zdenek Krejcik; David Kundrat; Katarina Szikszai; Monika Belickova; Pavla Pecherkova; Jiri Klema; Jitka Vesela; Monika Hruba; Jaroslav Cermak; Tereza Hrdinova; Matyas Krijt; Jan Valka; Anna Jonasova; Michaela Dostalova Merkerova
Journal:  Cells       Date:  2020-03-26       Impact factor: 6.600

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.