| Literature DB >> 25887548 |
Xinan Holly Yang1, Meiyi Li2, Bin Wang3, Wanqi Zhu4, Aurelie Desgardin5, Kenan Onel6, Jill de Jong7, Jianjun Chen8, Luonan Chen9, John M Cunningham10.
Abstract
BACKGROUND: Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform.Entities:
Mesh:
Year: 2015 PMID: 25887548 PMCID: PMC4376348 DOI: 10.1186/s12859-015-0510-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of normalization and analysis of pooled molecular function profiles. Panel A) Collection of gene expression profiles. Datasets (D) of Acute Myeloid Leukemia (AML) positive or negative (1 and 2 respectively) for Leukemia Stem Cell (LSC), and of normal Hematopoietic Stem Cells (HSC) (3). Panel B) Calculation of gene-set profiles. Panel C) Identification of functional gene-sets associated with LSCs using the dynamic network mechanism analysis. A cartoon shows that different features characterize the dramatic systems stage changes in different ways. Panel D) Evaluation of clinical relevance in primary AML samples. Panel E) Evaluation of biological relevance using independent data resources.
Figure 2FAIME algorithm with a new parameter α outline. Panel A) The input for the FAIME algorithm is either a gene expression matrix in the form of log2 microarray expression values or RNA-seq counts, and a database of gene-sets. Panel B) Mechanism (or gene-set) score is defined as the difference between the scored expression of genes inside and outside a previously defined gene-set. B-1) Applying an increasingly larger α to the FAIME method. The weight (y-axis) is an exponential function of gene expression ranks (x-axis) adjusted by the parameter α. B-2) Weight-dependent qualitative scores sharply increase with gene rank. The score (y-axis) is the product of gene expression ranks (x-axis) and the rank’s weight adjusted by the parameter α. In each panel, the more highly expressed genes are ranked higher on the x-axis. The dashed line represents the score obtained with no weighting (i.e., ranking only). Panel C) Output of the algorithm as a matrix containing mechanism scores for each gene-set and sample.
Summary of collected transcriptional and clinical data for AML LSC+, AML LSC-, and normal HSC+ samples
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
| 1st author |
|
|
|
|
|
|
| |
| Journal | Leukemia | JAMA | JAMA leukemia | Nature Med | PNAS | Cancer Cell | Genome Res | |
| Year | 2006 | 2010 | 2011 | 2011 | 2009 | 2011 | 2011 | |
| PMID | 17039238 | 17952057 | 21177505 | 21873988 | 19218430 | 21251617 | 21795385 | |
|
| N/A | (NOD/SCID/IL)2r gamma (null) (NSG) | N/A | NOD/ShiLtSz-SCID (NODSCID) | N/A | NOD/SCID or NSG | N/A | |
| AML LCS+ (CD34 + CD38- | 8 | 13 | LSC+ (n = 77) | |||||
| AML LCS+ (CD34-CD38-) | 3 | |||||||
| AML LCS+ (CD34-CD38+) | 1 | |||||||
| AML LCS+ (CD34 + CD38+) | 8 | |||||||
| AML LCS+ (Lin-CD34 + CD38-CD90-) | ||||||||
| AML LCS+ (Lin-CD34 + CD38-CD90-CD45RA+) | 22 | |||||||
| GMP-like AML LSC+ (Lin-CD34 + CD38 + CD123+/loCD110-CD45RA + CD45RA+) | 22 | |||||||
| AML leukemia progenitor cell + (LPC+) (hCD34 + hCD38+) | 5 | 8 | 5 | LSC- (n = 59) | ||||
| AML CMP+ (Lin-CD34 + CD38+) | 7 | 1 | 1 | |||||
| AML MPP+ (Lin-CD34 + CD38-CD90-CD45RA-) | 2 | |||||||
| AML Blasts (Lin-CD34-) | 7 | 23 | ||||||
| normal HSC+ (CD34 + CD133+) | 1 | normal HSC+ (n = 23) | ||||||
| normal HSC+ (Lin-CD34 + CD38-) | ||||||||
| normal HSC+ (Lin-CD34 + CD38-) | ||||||||
| normal HSC+ (Lin-CD34 + CD38loCD36-) | ||||||||
| normal HSC+ (Lin-CD34 + CD38-CD90+) | 4 | |||||||
| normal HSC+ (Lin-CD34 + CD38-CD90 + CD45RA-) | 7 | 5 |
Identified 30-gene and 25-gene signatures
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| LSC- 30 genes | LSC+ 25 genes | ||||||
| ANLN | anillin, actin binding protein | other | APPBP2 | amyloid beta precursor protein (cytoplasmic tail) binding protein 2 | other | ||
| AURKA | aurora kinase A | kinase | E | ATXN3 | ataxin 3 | peptidase | |
| CCNA1 | cyclin A1 | other | CCND2 | cyclin D2 | other | D, E | |
| CCL5 | chemokine (C-C motif) ligand 5 | cytokine | D, E, U | DYNLL2 | dynein, light chain, LC8-type 2 | other | |
| CD38 | CD38 molecule | enzyme | E, P, U | ERC1 | ELKS/RAB6-interacting/CAST family member 1 | other | |
| CDC25B | cell division cycle 25B | phosphatase | ETV6 | ets variant 6 | transcription regulator | ||
| CDK1 | cyclin-dependent kinase 1 | kinase | GGNBP2 | gametogenetin binding protein 2 | other | ||
| CENPA | centromere protein A | other | GIMAP6 | GTPase, IMAP family member 6 | other | ||
| CLC | Charcot-Leyden crystal galectin | enzyme | KIF1B | kinesin family member 1B | transporter | ||
| CPA3 | carboxypeptidase A3 (mast cell) | peptidase | KIF1C | kinesin family member 1C | other | ||
| CSTA | cystatin A (stefin A) | other | KRAS | Kirsten rat sarcoma viral oncogene homolog | enzyme | D, E, P, R, U | |
| DDX53 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 53 | other | LOC728392 | uncharacterized LOC728392 | other | ||
| DLGAP5 | discs, large (Drosophila) homologassociated protein 5 | phosphatase | MAF | v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog | transcription regulator | ||
| HGF | hepatocyte growth factor (hepapoietin A; scatter factor) | growth factor | D, DP, E, P, U | MPO | myeloperoxidase | enzyme | D, E, U |
| IL36B | interleukin 36, beta | cytokine | MTERFD2 | MTERF domain containing 2 | other | ||
| KIAA0101 | KIAA0101 | other | NAV1 | neuron navigator 1 | enzyme | ||
| LATS2 | large tumor suppressor kinase 2 | kinase | PIAS1 | protein inhibitor of activated STAT, 1 | transcription regulator | U | |
| MBD3 | methyl-CpG binding domain protein 3 | other | PSMB6 | proteasome (prosome, macropain) subunit, beta type, 6 | peptidase | ||
| MND1 | meiotic nuclear divisions 1 homolog (S. cerevisiae) | other | SESN1 | sestrin 1 | other | ||
| MPO | myeloperoxidase | enzyme | D, E, U | SLC30A7 | solute carrier family 30 (zinc transporter), member 7 | transporter | |
| MS4A3 | membrane-spanning 4-domains, subfamily A, member 3 (hematopoietic cell-specific) | other | STK38 | serine/threonine kinase 38 | kinase | ||
| NDC80 | NDC80 kinetochore complex component | other | TMIE | transmembrane inner ear | other | ||
| OLFM4 | olfactomedin 4 | other | YARS2 | tyrosyl-tRNA synthetase 2, mitochondrial | enzyme | ||
| RNASE2 | ribonuclease, RNase A family, 2 (liver, eosinophil-derived neurotoxin) | enzyme | D | ZBTB10 | zinc finger and BTB domain containing 10 | other | |
| RNASE3 | ribonuclease, RNase A family, 3 | enzyme | E | ZNF384 | zinc finger protein 384 | transcription regulator | |
| SKA3 | spindle and kinetochore associated complex subunit 3 | other | |||||
| SPC25 | SPC25, NDC80 kinetochore complex component | other | |||||
| STAR | steroidogenic acute regulatory protein | transporter | |||||
| TOP2A | topoisomerase (DNA) II alpha 170 kDa | enzyme | D, E, P, RT | ||||
| ZWINT | ZW10 interacting kinetochore protein | other | |||||
#: D = diagnosis; DP = disease progression; P = prognosis; E = efficacy; RT = response to therapy; U = unspecified application.
^: Data resource: 2000–2014 Ingenuity Systems, Inc.
Figure 3Dynamic Network Mechanism (DNM) analysis on functional gene-sets (gene-sets). Panel A) Network resulting from FAIME.5 profiles; Panel B) Network resulting from GSVA profiles. Each panel visually illustrates dynamics of the identified DNM gene-sets in each of the three sorted cell groups (1: AML LSC +, 2: AML LSC-, and 3: normal HSC+). Node color codes the standard deviation of a gene-set in the corresponding sample group, while line color codes the Pearson’s correlation coefficients between any two gene-sets. DNM gene-sets are represented as labeled squares and control gene-sets are represented as circles. The identified critical sample group for each analytical method is boxed in red (A1, B2). Line weight increases with correlation (>0.5 in Panel A and >0.4 in Panel B). Panel C) The colored groups of gene-members (a, b, c, d) or (e, f, g) corresponding to the above red-boxed DNM gene-sets respectively. The grey lines represent their pair-wise associations according to the Ingenuity knowledge database. Additional genes (black) that interacted with two or more identified genes in the Ingenuity database are also displayed.
Univariate and multivariate analyses of overall survival in patients with all types of AML, for the LSC- DNM gene-sets
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ELN_RiskFavorable vs. Adverse | 0.3 | 0.20-0.40 | 2.15E-13*** | ||
| complex vs. others | 2.2 | 1.50-3.16 | 0.000024*** | ||
| 7q vs. others | 2.1 | 1.45-2.97 | 0.000051*** | ||
| ELN_RiskIntermediate-II vs. Adverse | 0.5 | 0.36-0.70 | 0.000053*** | ||
| Age group, years (≥60 vs. <60) | 1.7 | 1.29-2.28 | 0.00016*** | ||
| 3q vs. others | 2.1 | 1.32-3.42 | 0.0015** | ||
| ELN_RiskIntermediate-I vs. Adverse | 0.6 | 0.46-0.86 | 0.0035** | ||
| inv16 vs. others | 0.5 | 0.32-0.83 | 0.0057** | ||
| cebpa mutation vs. others | 0.6 | 0.67-0.95 | 0.028* | ||
|
|
|
|
|
| |
| Age group, years (≥60 vs. <60) | 1.7 | 1.26-2.29 | 0.00046*** | ||
| ELN_RiskFavorable vs. Adverse | 0.4 | 0.26-0.76 | 0.0034** | ||
| complex vs. others | 1.6 | 0.96-2.58 | 0.07. | ||
| ELN_RiskIntermediate-II vs. Adverse | 0.7 | 0.43-1.23 | 0.23 | ||
| 7q vs. others | 1.3 | 0.82-2.09 | 0.26 | ||
| 3q vs. others | 1.3 | 0.71-2.27 | 0.42 | ||
| inv16 vs. others | 0.8 | 0.47-1.39 | 0.44 | ||
| cebpa mutation vs. others | 0.9 | 0.53-1.45 | 0.61 | ||
| ELN_RiskIntermediate-I vs. Adver | 0.9 | 0.53-1.48 | 0.65 | ||
|
|
|
|
|
|
|
| Age group, years (≥60 vs. <60) | 3.02 | 2.16-4.21 | 9.94E-12*** | ||
| gender | 0.88 | 0.64-1.22 | 0.44 | ||
| normal_karyotype vs. others | 1.12 | 0.81-1.55 | 0.50 | ||
| BM Blast(>50 vs. <=50) | 0.88 | 0.60-1.30 | 0.53 | ||
|
|
|
|
|
| |
| Age group, years (≥60 vs. <60) | 2.59 | 1.83-3.67 | 8.8E-08*** | ||
|
|
|
|
|
|
|
| Age group, years (≥60 vs. <60) | 1.63 | 1.18-2.26 | 0.0029** | ||
|
|
|
|
|
| |
| Age group, years (≥60 vs. <60) | 1.49 | 1.07-2.07 | 0.018* |
Significance code: ‘.’:p < .1; ‘*’: p < .05; ‘**’p < .01; ‘***’p < .001.
Significant univariate tested factors (p < .05) are used for multivariate test. Boldface highlights the results of DMN fGSs.
Figure 4Prognostic analysis of patients with cytogenetically normal AML, based on DNM identified gene-set pairs. Kaplan–Meier plots of survival analysis on stratified samples with better outcome (green) or worse outcome (red). The Relative Effect Analysis with Gene-Set-Group Pairs (RXA-GSP) calculated a prognostic indicator by comparing three LSC- representative gene-sets (30 genes) with three normal control gene-sets (166 genes, Additional file 4: Table S2). The normalized FAIME.5 profiles are used. In each sub-panel, top bars mark the simulated p-values from which we estimated the empirical p-value for the actually observed log-rank p-value, the vertical line marked with an arrow. A RXA-GSP indicator I of less than 1 significantly indicates worse prognosis in the training cohort (GSE12417, Panel A) of cytogenetically normal AML patients and in two validation cohorts (GSE14468 and TCGA, Panels B and C) of cytogenetically normal AML patients.
Univariate and multivariate analyses of overall survival in patients with cytogenetically normal AML, for the LSC- DNM gene-sets
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
|
|
| Age group, years (≥60 vs. <60) | 1.63 | 1.18-2.26 | 0.0029** | ||
|
|
|
|
|
| |
| Age group, years (≥60 vs. <60) | 1.6 | 1.12-2.15 | 0.0083** | ||
|
|
|
|
|
|
|
| KRAS mutaion vs. others | 70.2 | 7.30-674.5 | 8.60E-13*** | ||
| ELN_risk (IntermediateI vs. Favorable) | 1.8 | 1.27-2.62 | 0.00095*** | ||
| Age group, years (≥60 vs. <60) | 1.4 | 0.89-2.18 | 0.15 | ||
| NPM1 mutation vs. others | 0.8 | 0.57-1.12 | 0.19 | ||
| CEBPA mutation vs. others | 0.7 | 0.42-1.22 | 0.22 | ||
| Gender | 0.9 | 0.62-1.21 | 0.39 | ||
| BM Blast (>50 vs. <=50) | 1.1 | 0.76-1.53 | 0.68 | ||
| NRAS mutation vs. others | 1.0 | 0.60-1.81 | 0.90 | ||
|
|
|
|
|
| |
| KRAS mutaion vs. others | 90.7 | 9.27-888.79 | 0.00011*** | ||
| ELN_risk (IntermediateI vs. Favorable) | 1.7 | 1.19-2.48 | 0.0039** | ||
|
|
|
|
|
|
|
| Age group, years (≥60 vs. <60) | 2.5 | 1.58-4.09 | 6.53E-05*** | ||
| BM Blast (>50 vs. <=50) | 0.6 | 0.33-1.09 | 0.09. | ||
| Gender | 0.7 | 0.46-1.16 | 0.18 | ||
|
|
|
|
|
| |
| Age group, years (≥60 vs. <60) | 2.1 | 1.32-3.47 | 0.0022** |
Significance code: ‘.’:p < .1; ‘*’: p < .05; ‘**’p < .01; ‘***’p < .001.
Significant univariate tested factors (p < .05) are used for multivariate test. Boldface highlights the results of DMN fGSs.
Figure 5The computationally evaluated biological relevance of genes in the identified gene-sets. Panel A) Genes in the identified cluster of gene-sets share common functions. Sub-panel 1 represents the 30 LSC- representative genes and sub-panel 2 represents 25 LSC+ representative genes. Gene Ontology semantic similarity analysis reveals that 2/3 of the genes share the same function (the Lin distance =1). Line color codes the molecular function and biological process respectively. Panel B) Volcano plot (subpanel 1) of pairwise correlation tests between any two LSC+ 25-gene members (TCAG data). There are 31 significant co-expressions across 10 AML patients who show positive RAS activity (dark red). This co-expression disappears among the other 187 patients (grey), the 7 patients who carry somatic mutations of KRAS genes (blue), and the 16 patients carrying positive RAS activity or somatic RAS mutation (orange). Subpanel 2 illustrates the RAS activity-dependent co-expressed gene-pairs in a network. Comparing patients showing active molecular activity of RAS with patients showing normal RAS status, 31 pairs of genes gain significant co-expression, including 20 out of the 25 LSC+ representative genes (solid lines), whereas 1 gene-pair loses its co-expression in normal RAS status (the dashed line). Red lines represent positive correlations and blue lines represent negative correlations. Line widths correspond to Spearman correlation coefficients.