| Literature DB >> 31455838 |
Raeuf Roushangar1,2, George I Mias3,4.
Abstract
In 2019 it is estimated that more than 21,000 new acute myeloid leukemia (AML) patients will be diagnosed in the United States, and nearly 11,000 are expected to die from the disease. AML is primarily diagnosed among the elderly (median 68 years old at diagnosis). Prognoses have significantly improved for younger patients, but as much as 70% of patients over 60 years old will die within a year of diagnosis. In this study, we conducted a reanalysis of 2,213 acute myeloid leukemia patients compared to 548 healthy individuals, using curated publicly available microarray gene expression data. We carried out an analysis of normalized batch corrected data, using a linear model that included considerations for disease, age, sex, and tissue. We identified 974 differentially expressed probe sets and 4 significant pathways associated with AML. Additionally, we identified 375 age- and 70 sex-related probe set expression signatures relevant to AML. Finally, we trained a k nearest neighbors model to classify AML and healthy subjects with 90.9% accuracy. Our findings provide a new reanalysis of public datasets, that enabled the identification of new gene sets relevant to AML that can potentially be used in future experiments and possible stratified disease diagnostics.Entities:
Mesh:
Year: 2019 PMID: 31455838 PMCID: PMC6712049 DOI: 10.1038/s41598-019-48872-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1General approach, data curation, and analysis workflow summary. The flowchart shows in (a) the five main steps that summarize our method of approach for our study, and in (b) the curation and screening criteria for raw gene expression and annotation data files curation, data pre-processing, supervised machine learning for missing metadata prediction, and batch effects correction. (c) The analysis included a linear model analysis of variance (ANOVA) coupled with Tukey’s Honestly Significant Difference (HSD) post-hoc tests, and KEGG pathway and GO enrichment. Finally, we performed a machine learning classification of AML based on our findings.
Summary table gene expression datasets used in this study.
| Author, Year | GEO accession | Disease Status* | Affymetrix platform id: Number of samples used & Sample source* | Refs* | |||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Zatkova | GSE10258 | AML | GPL570: 8 BM |
[ | |||
| Tomasson | GSE10358 | AML | GPL570: 300 BM |
[ | |||
| Metzeler | GSE12417 | AML | GPL570: 73 BM & 5 PB GPL96/97: 160 BM & 2PB |
[ | |||
| Wouters | GSE14468 | AML | GPL570: 482 BM & 43 PB |
[ | |||
| Figueroa | GSE14479 | AML | GPL570: 16 BM |
[ | |||
| Klein | GSE15434 | AML | GPL570: 231 BM & 20 PB |
[ | |||
| Lück | GSE29883 | AML | GPL570: 10 BM & 2 PB |
[ | |||
Li Herold Janke Jiang | GSE37642 | AML | GPL570: 140 BM GPL96/97: 422 BM |
[ | |||
| Bullinger | GSE39363 | AML | GPL570: 11 BM & 2 PB | NYP | |||
| Opel | GSE46819 | AML | GPL570: 8 BM & 4 PB |
[ | |||
| TCGA | GSE68833 | AML | GPL570: 183 BM | NYP | |||
| Cao | GSE69565 | AML | GPL570: 12 PB |
[ | |||
| Bohl | GSE84334 | AML | GPL570: 25 BM & 20 PB | NYP | |||
| Li | GSE23025 | AML | GPL570: 21 BM & 13 PB |
[ | |||
| Warren | GSE11375 | Healthy | GPL570: 26 PB |
[ | |||
| Green | GSE14845 | Healthy | GPL570: 1 PB | NYP | |||
| Wu | GSE15932 | Healthy | GPL570: 8 PB | NYP | |||
| Karlovich | GSE16028 | Healthy | GPL570: 22 PB |
[ | |||
| Krug | GSE17114 | Healthy | GPL570: 14 PB | NYP | |||
| Kong | GSE18123 | Healthy | GPL570: 17 PB |
[ | |||
| Sharma | GSE18781 | Healthy | GPL570: 25 PB |
[ | |||
| Rosell | GSE25414 | Healthy | GPL570: 12 PB |
[ | |||
| Schmidt | GSE2842 | Healthy | GPL570: 2 PB |
[ | |||
| Meng | GSE71226 | Healthy | GPL570: 3 PB | NYP | |||
| Tasaki | GSE84844 | Healthy | GPL570: 30 PB |
[ | |||
| Leday | GSE98793 | Healthy | GPL570: 64 PB |
[ | |||
| Shamir | GSE99039 | Healthy | GPL570: 121 PB |
[ | |||
| Tasaki | GSE93272 | Healthy | GPL570: 35 PB |
[ | |||
| Clelland | GSE46449 | Healthy | GPL570: 24 PB |
[ | |||
Lauwerys Ducreux | GSE39088 | Healthy | GPL570: 46 PB |
[ | |||
| Xiao | GSE36809 | Healthy | GPL570: 35 PB |
[ | |||
| Zhou | GSE19743 | Healthy | GPL570: 63 PB |
[ | |||
|
| |||||||
| GSE107968* | 2 AML; 1 Healthy | GPL570: 3 BM | NYP | ||||
| GSE68172* | 20 AML; 5 Healthy | GPL570: 25 PB |
[ | ||||
| GSE17054* | 9 AML; 4 Healthy | GPL570: 13 BM |
[ | ||||
| GSE33223* | 20 AML; 10 Healthy | GPL570: 30 PB |
[ | ||||
| GSE15061* | 404 AML; 138 Healthy | GPL570: 542 BM |
[ | ||||
|
| |||||||
|
|
|
|
| ||||
| AML | Healthy | BM | PB | GPL570 | GPL96/97 | GPL570 | GPL96/97 |
| 2,213 | 548 | 2,090 | 671 | 2,177 | 584 | 54,675 | 44,760 |
Summary of datasets used in our analysis and disease classification. *GEO, Gene Expression Omnibus; AML, acute myeloid leukemia; Refs., references; NYP, not yet published; GPL570, Affymetrix Human Genome U133 Plus 2.0 Array; GPL96, Affymetrix Human Genome U133A Array; GPL97, Affymetrix Human Genome U133B Array; BM, Bone Marrow; PB, Peripheral Blood.
Figure 2Functional classification of DEPS from AML analysis and associated KEGG and GO enrichment analysis. For all panels, normalized values are represented in blue for down-regulation and red for up-regulation, while light red/gray represents no reported specific direction. (a) Heatmap of 974 DEPS (rows) on 2,761 arrays (columns) including 2213 AML patients and 548 healthy individuals from AML analysis, using unsupervised hierarchical clustering and Euclidean distance for clustering. The age of each individual is illustrated in the color bar on the top (dark green for old and light blue for young). The disease state (AML vs healthy), sex of each subject and age-groups are also represented in color bars on the top. (b) Horizontal bar plot of the top 10 DEPS (gene symbols on vertical axis) from AML analysis with mean difference values between AML and healthy (horizontal axis). Enrichment analysis identified 4 KEGG signaling pathways (c) for our AML DEPS, also visualized as a heatmap (d) of DEPS mean difference values between AML and healthy DEPS (rows) identified in these 4 KEGG signaling pathways (columns). The GO enrichment analysis results are summarized in (e).
Top 10 up- and down-regulated of DEPS in AML from disease state.
| Up-regulated* | |||
|---|---|---|---|
| DEG name | DEPS Gene Symbol | Tukey’s HSD Mean difference | p-adjusted value (HSD test in R) |
| Wilms tumor 1 | WT1 | 0.255353 | <4.11E-11 |
| MAM domain containing 2 | MAMDC2 | 0.248983 | 5.47E-09 |
| X inactive specific transcript (non-protein coding) | XIST | 0.230331 | <4.11E-11 |
| homeobox A3 | HOXA3 | 0.195790 | 1.1E-06 |
| fms-related tyrosine kinase 3 | FLT3 | 0.193420 | <4.11E-11 |
| cyclin A1 | CCNA1 | 0.185050 | 1.35E-07 |
| mex-3 RNA binding family member B | MEX3B | 0.181068 | <4.11E-11 |
| collagen, type IV, alpha 5 | COL4A5 | 0.177721 | 1.7E-05 |
| neurexin 2 | NRXN2 | 0.166598 | <4.11E-11 |
| ATPase, Na+/K+ transporting, beta 1 polypeptide | ATP1B1 | 0.165197 | 5.47E-09 |
|
| |||
| cysteine-rich secretory protein 3 | CRISP3 | −0.51965625 | <4.11E-11 |
| olfactomedin 4 | OLFM4 | −0.489845396 | <4.11E-11 |
| orosomucoid 1 | ORM1 | −0.465232864 | <4.11E-11 |
| cytochrome P450, family 4, subfamily F, polypeptide 3 | CYP4F3 | −0.453467442 | <4.11E-11 |
| chitinase 3-like 1 (cartilage glycoprotein-39) | CHI3L1 | −0.421520435 | <4.11E-11 |
| annexin A3 | ANXA3 | −0.390688999 | <4.11E-11 |
| oxidized low density lipoprotein (lectin-like) receptor 1 | OLR1 | −0.35525472 | <4.11E-11 |
| carcinoembryonic antigen-related cell adhesion molecule 8 | CEACAM8 | −0.351181264 | <4.11E-11 |
| orosomucoid 1 | ORM1 | −0.336303304 | <4.11E-11 |
| tumor-associated calcium signal transducer 2 | TACSTD2 | −0.323939961 | <4.11E-11 |
From the Post-hoc Tukey’s test, gene expression means difference value < 5% or >95% between AML and healthy (AML - healthy) were selected for biological effect from the statistically significant differentially expressed genes for disease state - based on the analysis of variance of all 2,761 cases (2,213 AML patients and 548 healthy controls). *Significant DEPS (gene symbols) are listed in descending order of the mean difference value comparisons for disease state.
KEGG pathway analysis of DEPS from meta-analysis of 34 gene expression datasets.
| AML Vs Healthy DEPS and associated signaling pathways | |||||
|---|---|---|---|---|---|
| Pathway | No. of genes* | Down-regulated | Up-regulated | p-value | p-value Benjamini adjusted |
| Hematopoietic cell lineage | 11, 6 | IL1R2, CD59, GYPA, MS4A1, EPOR, CD24, CD14, EPOR, IL1R1, MME, CR1 | ITGA4, FLT3, CD34, IL3RA, ITGA5, CD44 | 2.3E-5 | 5.8E-3 |
| Cell cycle | 12, 6 | CDC7, CDC6, CCNB1, CDC20, CCNA2, CCNE2, TTK, CDC14B’, CDK1, BUB1, CCNB2, BUB1B | RB1, CCNA1, CDK6, ATM, TFDP2, CDKN2A | 1.4E-4 | 1.2E-2 |
| p53 signaling pathway | 6, 7 | THBS1, CCNB1, CCNE2, CDK1, RRM2, CCNB2 | SIAH1, CDK6, ATM, SERPINE1, CDKN2A, PMAIP1, ZMAT3 | 1.0E-4 | 1.3E-2 |
| Transcriptional misregulation in cancer | 7, 13 | IL1R2, GZMB, CD14, ELANE, MMP9, CEBPE, PBX1 | WT1, RUNX2, ETV5, MEIS1, JUP, EWSR1, ATM, HOXA10, MLF1, FLT3, CCNT2, MEF2C, SLC45A3 | 6.5E-4 | 4.1E-2 |
|
| |||||
| Pathway |
|
|
| ||
| Hematopoietic cell lineage | 1, 2 | — | FLT3, CD34 | ||
| p53 signaling pathway | −, 1 | — | PMAIP1 | ||
| Transcriptional misregulation in cancer | −, 1 | MS4A1 | FLT3 | ||
Enrichment analysis was done using 974 DEPS, including KEGG enrichment analysis identified 4 statistically significant pathways from AML Vs Healthy analysis, shown with overlaps with sex-specific analysis.
*Up and down regulated genes displayed.
Figure 3Sex-related gene expression analysis in AML. (a) The heatmap of mean difference values comparison between the 70 DE overlapping genes between Analysis 1 and Analysis 2a. (b) Heatmap the 70 DEPS expression (rows) on 2,761 arrays (columns) including 2213 AML patients and 548 healthy individuals from Analysis 2a of sex-relevance in AML (using unsupervised hierarchical clustering and Euclidean distance for clustering). The disease state (AML vs healthy) and sex of each subject are indicated in color bars at the top. (c) Horizontal bar plot of the top 10 DEPS (gene symbols on vertical axis), with the mean difference values between male-female (horizontal axis). (d) Enrichment analysis for statistically significant overrepresented biological GO terms on the 70 DE genes.
Figure 4Age-related gene expression analysis in AML. (a) The top 10 up- and down- regulated DEPS overlapping AML and age-related analyses. (b) 75 DEPS specific to a single age-group comparison. (c) Overlaps over KEGG pathways of 17 DE genes identified in 4 KEGG pathways according to age groups. (d) The mean difference of 25 DEPS with respect to the 0–19 baseline across all other groups are plotted to illustrate changes with aging. The mean difference values between AML and healthy cohorts are shown in the right-most column of panes (a,b and d) for reference comparisons.
KEGG pathway analysis of DEPS from analysis of 34 gene expression datasets overlap with age-specific findings.
| AML age-dependent (AML - healthy) DEPS & associated signaling pathways | |||
|---|---|---|---|
| Pathway | No. of genes* | Down-regulated | Up-regulated |
|
| 4, 1 | CD14 (30 to 39)–(0 to 19) | FLT3 (20 to 29)–(0 to 19), (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19), (80 to 100)–(0 to 19) |
MME (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19) | |||
CD24 (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19) | |||
MS4A1 (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19), (80 to 100)–(0 to 19) | |||
|
| 3, 2 | CCNA2 (50 to 59)–(0 to 19) | CCNA1 (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19) |
CDK6 (60 to 69)–(30 to 39) | |||
CDC14B (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19) | CDKN2A (40 to 49)–(0 to 19) | ||
|
| 1, 1 | CDK6 (60 to 69)–(30 to 39) | CDKN2A (40 to 49)–(0 to 19) |
|
| 5, 4 | CD14 (30 to 39)–(0 to 19) | MEIS1 (50 to 59)–(0 to 19), (50 to 59)–(20 to 29), (60 to 69)–(0 to 19), (60 to 69)–(20 to 29), (70 to 79)–(0 to 19) |
MMP9 (20 to 29)–(0 to 19), (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19) | |||
EWSR1 (60 to 69)–(50 to 59), (70 to 79)–(50 to 59) | WT1 (20 to 29)–(0 to 19), (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19) | ||
CEBPE (20 to 29)–(0 to 19), (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (50 to 59)–(20 to 29), (60 to 69)–(0 to19), (70 to 79)–(0 to 19), (70 to 79)–(20 to29), (80 to 100)–(0 to 19) | FLT3 (20 to 29)–(0 to 19), (30 to 39)–(0 to 19), (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (60 to 69)–(0 to 19), (70 to 79)–(0 to 19), (80 to 100)–(0 to 19) | ||
CCNT2 (60 to 69)–(30 to 39), (70 to 79)–(30 to 39), (60 to 69)–(50 to 59) | HOXA10 (40 to 49)–(0 to 19), (50 to 59)–(0 to 19), (50 to 59)–(20 to 29), (60 to 69)–(0 to 19), (60 to 69)–(20 to 29), (70 to 79)–(0 to 19) | ||
Enrichment analysis was done using 974 DEPS overlapped with age-specific analysis.
*Up and down regulated genes displayed.