Literature DB >> 35836534

Screening diagnostic markers for acute myeloid leukemia based on bioinformatics analysis.

Wenting Chen1, Dan Liu1, Guyun Wang1, Yanping Pan1, Shuwen Wang1, Ruimei Tang1.   

Abstract

Background: An in-depth understanding of the key molecules and associated mechanisms involved in acute myeloid leukemia (AML) carcinogenesis, proliferation, and relapse is critical. This provides a basis for disease screening, early diagnosis, and development of effective treatment strategies and prognosis.
Methods: We downloaded AML transcription data sets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Differentially expressed genes (DEGs) were screened by R software and limma packages. Gene Ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed on DEGs by public databases. In the DEG set, a random forest algorithm was used to identify characteristic genes of AML. The receiver operator characteristic (ROC) curve was used to evaluate the diagnostic efficacy of selected characteristic genes, which provided clues for the discovery of early diagnostic markers. The Estimate score was calculated using the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm. Spearman's correlation test was used to explore the correlation between characteristic genes and Estimate Score, which provided clues for clarifying the potential pathogenic mechanism of key genes.
Results: A total of 1,494 DEGs were identified from AML samples and normal samples, among which 1,181 genes were upregulated and 313 genes were downregulated in AML. There were 2 genes with a mean decrease Gini >2, namely, CDC20 and ESM1, respectively. The ROC curve showed that the area under the curve (AUC) of CDC20 was 0.966, and the 95% confidence interval (CI) was (0.939 to 0.987) (P<0.001). The AUC of ESM1 was 0.905, and 95% CI: 0.849 to 0.953 (P<0.001). Correlation analysis showed that CDC20 expression was negatively correlated with Estimate Score (R=-0.21, P=0.0036) in AML. The expression of ESM1 was negatively correlated with Estimate Score (R=-0.57, P<0.001). Conclusions: The genes CDC20 and ESM1 were identified as AML characteristic genes by random forest algorithm. Both CDC20 and ESM1 have good diagnostic efficacy for AML. They may play a carcinogenic role by promoting tumor cell proliferation and inhibiting immune cell chemotaxis, which are potential biological markers. 2022 Translational Cancer Research. All rights reserved.

Entities:  

Keywords:  Acute myeloid leukemia (AML); biological marker; random forest algorithm; receiver operator characteristic curve

Year:  2022        PMID: 35836534      PMCID: PMC9273715          DOI: 10.21037/tcr-22-1257

Source DB:  PubMed          Journal:  Transl Cancer Res        ISSN: 2218-676X            Impact factor:   0.496


Introduction

Acute myeloid leukemia (AML) is a common acute leukemia, occurring in all age groups (1). It is characterized by the accumulation of acquired genetic changes in hematopoietic progenitor cells, changing the self-renewal, proliferation, and differentiation mechanism (2). In the diagnosis of AML, there is a lack of markers with both sensitivity and specificity (3). To date, most patients have been diagnosed in the middle- and late-stages of AML (3). The treatment methods for AML are limited and prone to drug resistance (4). Even after treatment, the recurrence rate of AML patients remains high (5), and the survival rate of AML patients is low. The five-year survival rate of AML patients is less than 43% (6). It is crucial to understand the key molecules and related mechanisms related to the carcinogenesis, proliferation, and recurrence of AML to provide a basis for disease screening, early diagnosis, effective treatment strategies, and prognosis judgment. Some previous studies have identified genes associated with AML prognosis, such as nucleophosmin-1 (NPM1), CCAAT enhancer binding factor alpha (CEBPA), and fms-like tyrosine kinase3 (FLT3) (7-9). However, AML lacks specific diagnostic markers (3). In the past few decades, transcriptome sequencing technology and bioinformatics analysis have been widely used to screen the mechanistic pathways of tumor genome changes and gene interactions. The advantage of bioinformatics analysis of whole transcriptome sequencing lies in the detection of gene expression in a large and comprehensive manner, and the identification of genes that may be affected by diseases in a short period of time as biomarkers for early diagnosis. The results help to identify the key pathogenic genes of tumors and find new therapeutic targets. However, independent microarray analysis and simple statistical methods easily affect the accuracy of the results. Multi-database joint analysis and application of false discovery rate combined with fold change to screen differential genes can solve this problem well. Therefore, AML transcription data sets in The Cancer Genome Atlas (TCGA) database and Gene Expression Omnibus (GEO) were jointly analyzed in this study. Our study may provide clues for the discovery of potential diagnostic markers, therapeutic targets for AML, and elucidation of oncogenic mechanisms. We present the following article in accordance with the STARD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-22-1257/rc).

Methods

This study combined TCGA data and GEO database to screen for differentially expressed genes in AML from tumor samples and normal samples. In the AML differentially expressed gene set, the random forest algorithm was used to screen the AML signature genes and the receiver operator characteristic (ROC) curve was used to evaluate the diagnostic performance of the screened signature genes. In this study, we explored the relationship between eigengenes and immune cell chemotaxis by analyzing the correlation between eigengenes and Estimate score.

Data download

The AML RNA sequencing (RNA-seq) data set was downloaded from TCGA database containing 151 AML samples. The AML whole blood RNA-seq data set (GSE24395, GSE30029) was downloaded from the GEO database. The GSE24395 data set contains 12 AML samples and 5 normal samples; GSE30029 contains 46 AML samples and 31 normal samples. All data sets were combined into a matrix and batch-corrected and normalized. All data in this study are public and thus do not need the approval of the local hospital ethics committee. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Screening of differentially expressed genes

Differentially expressed genes (DEGs) were screened using R software (v3.5.1) (The R Foundation for Statistical Computing, Vienna, Austria) and the limma package. The calculation formula of fold change (FC) was as follows: FC = gene expression of AML sample/gene expression of a normal sample. The screening condition is |log2FC| >2 and the false discovery rate (FDR) <0.01

Enrichment analysis of DEGs

The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used for Gene Ontology (GO) function enrichment analysis of DEGs. An FDR <0.05 was used as the screening condition. The Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to enrich and analyze the DEGs. An FDR <0.05 was used as the screening condition. The results of enrichment analysis were visualized by the ggplot package in R.

Characteristic gene screening and diagnostic efficacy test

In the DEG set, the random forest algorithm was used to screen the characteristic genes of AML. An ROC curve was used to evaluate the diagnostic efficacy of the selected characteristic genes.

Tumor purity calculation

The stromal score and immune score (IS) of AML samples were calculated based on gene expression by using the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm. The results represented stromal and immune cells' content in tumor samples, respectively. The sum of the two was indicated by the Estimate score, which could reflect the purity of the tumor. Spearman’s correlation test was used to investigate the relationship between characteristic genes and Estimate score.

Statistical analysis

This study used R software (V3.5.1) and related R packages for statistical analysis. Two-sided P-value <0.05 indicated statistical significance.

Results

DEG screening

In all, 1,494 DEGs were screened between AML and normal samples in this study. A total of 1,181 genes were upregulated (log2FC >2, FDR <0.01) and 313 genes were downregulated (log2FC <−2, FDR <0.01). The heat map of DEGs is shown in .
Figure 1

AML samples and normal samples DEGs. T represents the tumor sample, and N represents the normal sample. Red indicates upregulation in AML samples, and green indicates downregulation in AML samples. The horizontal axis indicates different samples, and the vertical axis indicates DEGs. AML, acute myeloid leukemia; DEGs, differentially expressed genes.

AML samples and normal samples DEGs. T represents the tumor sample, and N represents the normal sample. Red indicates upregulation in AML samples, and green indicates downregulation in AML samples. The horizontal axis indicates different samples, and the vertical axis indicates DEGs. AML, acute myeloid leukemia; DEGs, differentially expressed genes.

GO enrichment analysis

The GO enrichment analysis showed that AML DEGs were significantly enriched in functional items, such as sequence-specific double-stranded DNA binding, receiver binding, serine type dependent activity, and so on, of molecular function (MF). In terms of cellular component (CC), AML DEGs were significantly enriched in functional items such as chromatin, extracellular matrix, endoplasmic reticulum lumen, and so forth. In terms of biological process (BP), AML DEGs were significantly enriched in functional items such as positive regulation of cell promotion, protein, and immune response, as shown in .
Figure 2

AML DEGs were enriched in GO. The horizontal axis represents the number of enriched genes, and the vertical axis represents the GO project. BP, biological process; CC, cellular component; MF, molecular function; AML, acute myeloid leukemia; DEGs, differentially expressed genes; GO, Gene Ontology.

AML DEGs were enriched in GO. The horizontal axis represents the number of enriched genes, and the vertical axis represents the GO project. BP, biological process; CC, cellular component; MF, molecular function; AML, acute myeloid leukemia; DEGs, differentially expressed genes; GO, Gene Ontology.

KEGG enrichment analysis

The analysis indicated significant differences in the p53 signaling pathway, tumor necrosis factor (TNF) signaling pathway, and HIF-1 signaling pathway expression, as shown in .
Figure 3

Pathway enrichment analysis of KEGG of AML differentially expressed genes. The horizontal axis represents the number of enriched genes, and the vertical axis represents the KEGG pathway. KEGG, Kyoto Encyclopedia of Genes and Genomes; AML, acute myeloid leukemia; ECM, extracellular matrix; TNF, tumor necrosis factor; FDR, false discovery rate; PPAR, peroxisome proliferator-activated receptor.

Pathway enrichment analysis of KEGG of AML differentially expressed genes. The horizontal axis represents the number of enriched genes, and the vertical axis represents the KEGG pathway. KEGG, Kyoto Encyclopedia of Genes and Genomes; AML, acute myeloid leukemia; ECM, extracellular matrix; TNF, tumor necrosis factor; FDR, false discovery rate; PPAR, peroxisome proliferator-activated receptor.

Characteristic gene screening

When the minimum average error rate of normal samples and AML samples was 0.01, the total number of trees was 48 (as shown in ). There were two genes with a mean decrease Gini greater than 2, namely, CDC20 and ESM1, respectively, as shown in .
Figure 4

Random forest tree. The abscissa represents trees and the ordinate represents the error rate. Red represents AML samples, green represents normal samples, and black represents the overall sample. AML, acute myeloid leukemia.

Figure 5

Characteristic gene Gini index. The horizontal axis represents mean decrease Gini, and the vertical axis represents characteristic genes.

Random forest tree. The abscissa represents trees and the ordinate represents the error rate. Red represents AML samples, green represents normal samples, and black represents the overall sample. AML, acute myeloid leukemia. Characteristic gene Gini index. The horizontal axis represents mean decrease Gini, and the vertical axis represents characteristic genes.

Efficiency evaluation of characteristic gene diagnosis

The ROC curve showed that the area under the curve (AUC) of the CDC20 ROC curve was 0.966, and the 95% confidence interval (CI) was 0.939 to 0.987 (P<0.001). The AUC of ESM1 was 0.905 and the 95% CI was 0.849 to 0.953 (P<0.001) ().
Figure 6

ROC curves of CDC20 and ESM1 subjects. The horizontal axis represents 1-specificity, and the vertical axis represents sensitivity. ROC, receiver operator characteristic; AUC, area under the curve; CI, confidence interval.

ROC curves of CDC20 and ESM1 subjects. The horizontal axis represents 1-specificity, and the vertical axis represents sensitivity. ROC, receiver operator characteristic; AUC, area under the curve; CI, confidence interval.

Correlation between characteristic genes and tumor purity

Correlation analysis showed that the expression of CDC20 was negatively correlated with Estimate score in AML (R =−0.21, P=0.0036); ESM1 expression was negatively correlated with Estimate score (R =−0.57, P<0.001) ().
Figure 7

Correlation between CDC20 and EMS1 and Estimate score. The horizontal axis is gene expression, and the vertical axis is the Estimate score.

Correlation between CDC20 and EMS1 and Estimate score. The horizontal axis is gene expression, and the vertical axis is the Estimate score.

Discussion

This study screened DEGs through the transcriptome sequencing results of AML samples. The GO enrichment analysis illustrated that the DEGs of AML were significantly enriched in functional items such as positive regulation of cell promotion, protein, and immune response. These items are closely related to the occurrence and function of tumors. The key process of malignant tumor proliferation is positive regulation of cell proliferation, and proteolysis and immune response inhibition are involved in tumor proliferation and invasion. The KEGG pathway enrichment analysis indicated that AML DEGs were significantly enriched in p53 signal pathway, TNF signal pathway, HIF-1 signal pathway, and other signal pathways. These pathways are common pathways for the progression of malignant tumors and are involved in regulating gastric cancer, bladder cancer, lung cancer, liver cancer, leukemia, and other malignant tumors. The results of GO enrichment analysis and KEGG pathway enrichment analysis showed that the DEGs of AML screened of this study were representative, which may be the key pathogenic genes of AML, participating in the regulation of tumor cell proliferation and invasion as well as in the occurrence and progression of AML. The random forest algorithm was used to screen the characteristic genes in the DEG set. We identified CDC20 and ESM1 as characteristic genes of AML based on a Gini index greater than 2. Both CDC20 and ESM1 were shown to have good diagnostic efficacy for AML, and AUC was greater than 0.9. We also found that CDC20 and ESM1 were positively correlated with the Estimate score. The results indicated that the higher the expression of CDC20 and ESM1, the more tumor cells, and the lower the infiltration content of stromal cells and immune cells. In AML, CDC20 and ESM1 play a cancer-promoting role, suggesting that CDC20 and ESM1 may promote the proliferation of tumor cells and inhibit the infiltration of immune cells and gene cells. The CDC20 gene is an activator of the mitotic spindle assembly checkpoint. Its main biological role is to regulate the cell cycle and promote apoptosis (10,11). Anaphase-promoting complex (APC) is activated by CDC20 to form a complex, which destroys the ubiquitination of its downstream cell cycle regulators securin and cyclin B. The complex plays an important role in the transition period from metaphase to anaphase of mitosis (12). The process of apoptosis is closely related to anti-apoptotic factors and pro-apoptotic factors. We know that CDC20 regulates apoptosis by targeting Mcl-1 and Bim (13), and it is generally considered a cancer-promoting factor. In AML cell lines, a previous study (14) found that overexpression of CDC20 in myeloid cells could accelerate apoptosis and inhibit granulocyte differentiation. The CDC20 protein was expressed in the late G1 phase of the cell cycle, and the expression was the largest in the G2 stage. The induced expression of CDC20 can lead to the early transition of cells from the G1 phase to the S phase. In addition to AML, CDC20 is highly expressed in other malignant tumors, including gastric cancer, Bladder cancer, liver cancer, lung cancer and breast cancer. It promotes cancer cell proliferation, invasion, and migration (15-18). Other studies (19) have pointed out that CDC20 is related to the stem of tumor cells and promotes the invasion and renewal of tumor stem cells by regulating the activity of its downstream pluripotency related transcription factor Sox2. At present, there have been drug applications targeting CDC20. For example, Apcin is a specific inhibitor of CDC20 (20), which may have broad application prospects in AML. Fewer studies have investigated the correlation between ESM1 and AML. However, ESM1 is highly expressed in tumors such as lung cancer, uterine cancer, renal cell carcinoma, liver cancer, glioblastoma, and breast cancer. Evidence has suggested that ESM1 is directly involved in tumor progression, which significantly affects the proliferation and migration of head and neck cancer, gastric cancer, nasopharyngeal carcinoma, colorectal cancer, and liver cancer cells (21-25). Studies have shown that ESM1 can be used as a prognostic marker in triple-negative breast cancer (21,26). It may be that ESM1 promotes tumor invasion and migration by regulating tumor angiogenesis (27). There are some flaws in our study. First, this study lacks external data to verify the diagnostic efficacy of trait genes. Second, this study pointed out that CDC20 and ESM1 may promote the proliferation of tumor cells and inhibit the infiltration of immune cells and gene cells. This needs to be confirmed by in vitro and in vivo experiments. In conclusion, 1,494 AML DEGs were identified through the public database. The genes CDC20 and ESM1 were identified as AML characteristic genes by a random forest algorithm. Both CDC20 and ESM1 have good diagnostic efficacy for AML and are potential biological markers.
  27 in total

1.  Bioinformatics analysis identified CDC20 as a potential drug target for cholangiocarcinoma.

Authors:  Prin Sungwan; Worachart Lert-Itthiporn; Atit Silsirivanit; Nathakan Klinhom-On; Seiji Okada; Sopit Wongkham; Wunchana Seubwai
Journal:  PeerJ       Date:  2021-03-17       Impact factor: 2.984

2.  A comprehensive analysis of CDC20 overexpression in common malignant tumors from multiple organs: its correlation with tumor grade and stage.

Authors:  Mariana F Gayyed; Nehad M R Abd El-Maqsoud; Ehab Rifat Tawfiek; Saad Abdelnaby A El Gelany; Mohamed Fathy Abdel Rahman
Journal:  Tumour Biol       Date:  2015-08-06

3.  Acute Myeloid Leukemia: A Review.

Authors:  Ari Pelcovits; Rabin Niroula
Journal:  R I Med J (2013)       Date:  2020-04-01

4.  ESM1 promotes triple-negative breast cancer cell proliferation through activating AKT/NF-κB/Cyclin D1 pathway.

Authors:  Wentong Liu; Yang Yang; Bincan He; Fengjun Ma; Fengzeng Sun; Min Guo; Min Zhang; Zhiqiang Dong
Journal:  Ann Transl Med       Date:  2021-04

5.  CDC20 promotes the progression of hepatocellular carcinoma by regulating epithelial‑mesenchymal transition.

Authors:  Gang Yang; Guan Wang; Yongfu Xiong; Ji Sun; Weinan Li; Tao Tang; Jingdong Li
Journal:  Mol Med Rep       Date:  2021-04-28       Impact factor: 2.952

6.  Silenced lncRNA SNHG14 restrains the biological behaviors of bladder cancer cells via regulating microRNA-211-3p/ESM1 axis.

Authors:  Rui Feng; Zhongxing Li; Xing Wang; Guangcheng Ge; Yuejun Jia; Dan Wu; Yali Ji; Chenghao Wang
Journal:  Cancer Cell Int       Date:  2021-01-22       Impact factor: 5.722

Review 7.  Emerging Immunotherapy for Acute Myeloid Leukemia.

Authors:  Rikako Tabata; SungGi Chi; Junichiro Yuda; Yosuke Minami
Journal:  Int J Mol Sci       Date:  2021-02-16       Impact factor: 5.923

8.  Pan-cancer noncoding genomic analysis identifies functional CDC20 promoter mutation hotspots.

Authors:  Zaoke He; Tao Wu; Shixiang Wang; Jing Zhang; Xiaoqin Sun; Ziyu Tao; Xiangyu Zhao; Huimin Li; Kai Wu; Xue-Song Liu
Journal:  iScience       Date:  2021-03-09

9.  Pan-cancer analysis identifies ESM1 as a novel oncogene for esophageal cancer.

Authors:  Yuanbo Cui; Wenna Guo; Ya Li; Jijing Shi; Shanshan Ma; Fangxia Guan
Journal:  Esophagus       Date:  2020-11-11       Impact factor: 4.230

10.  Molecular landscape and prognostic impact of FLT3-ITD insertion site in acute myeloid leukemia: RATIFY study results.

Authors:  Frank G Rücker; Ling Du; Tamara J Luck; Axel Benner; Julia Krzykalla; Insa Gathmann; Maria Teresa Voso; Sergio Amadori; Thomas W Prior; Joseph M Brandwein; Frederick R Appelbaum; Bruno C Medeiros; Martin S Tallman; Lynn Savoie; Jorge Sierra; Celine Pallaud; Miguel A Sanz; Joop H Jansen; Dietger Niederwieser; Thomas Fischer; Gerhard Ehninger; Michael Heuser; Arnold Ganser; Lars Bullinger; Richard A Larson; Clara D Bloomfield; Richard M Stone; Hartmut Döhner; Christian Thiede; Konstanze Döhner
Journal:  Leukemia       Date:  2021-07-28       Impact factor: 11.528

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.