| Literature DB >> 20003504 |
Hans-Ulrich Klein1, Christian Ruckert, Alexander Kohlmann, Lars Bullinger, Christian Thiede, Torsten Haferlach, Martin Dugas.
Abstract
BACKGROUND: Multiple gene expression signatures derived from microarray experiments have been published in the field of leukemia research. A comparison of these signatures with results from new experiments is useful for verification as well as for interpretation of the results obtained. Currently, the percentage of overlapping genes is frequently used to compare published gene signatures against a signature derived from a new experiment. However, it has been shown that the percentage of overlapping genes is of limited use for comparing two experiments due to the variability of gene signatures caused by different array platforms or assay-specific influencing parameters. Here, we present a robust approach for a systematic and quantitative comparison of published gene expression signatures with an exemplary query dataset.Entities:
Mesh:
Year: 2009 PMID: 20003504 PMCID: PMC2803858 DOI: 10.1186/1471-2105-10-422
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the analysis process. The proposed method relies on a manually curated database of leukemia-related published gene signatures annotated with terms from a predefined taxonomy. A new microarray dataset is analyzed in two steps. First, each signature is assessed by the global test method to constitute a ranking among the signatures. Secondly, the results from the first step are used to assess terms from the leukemia taxonomy that represent leukemia-related genetic aberrations and molecular mutations.
Content of the data base for leukemia gene signatures.
| Number of signatures | Number of entries | |
|---|---|---|
| Diagnostic | 112 | 16748 |
| Prognostic | 8 | 646 |
| Other | 18 | 867 |
| Overall | 138 | 18261 |
138 gene signatures are stored in the database. The size of each signature varies between 10 and a few hundred accession numbers.
Figure 2NPM1 gene signature from Verhaak et al. Verhaak et al. [57] published a NPM1 signature of 68 accession numbers that correspond to 40 genes. These genes were measured by 89 probe sets in the query dataset. A bar is plotted for each probe representing the value contributed by that probe set to the global test statistic. The expectation of these values under the null hypothesis of no correlation between NPM1 status and gene expression in the query dataset is indicated by the vertical black line. Overall, most genes reported by Verhaak et al. were also highly correlated with the NPM1 mutation status in our dataset. The colors indicate the direction of regulation. E.g., CD200 and BAALC were downregulated in NPM1-mutated samples, while most of the HOXA@ and HOXB@ genes showed increased expression in NPM1-mutated AML samples with a normal karyotype.
Figure 3t(11q23)/MLL gene signature from Ross et al. Gene-wise test statistics are shown for a subset of 85 probe sets allocated to genes as reported by Ross et al. [59] to be associated with translocation t(11q23)/MLL. The full plot with all 185 probe sets that could be mapped to the signature from Ross et al. (100 accession numbers) is provided online [Additional file 1: Supplemental Figure S1]. The high correlation of the expression pattern of the Ross et al. signature with the NPM1 status in the query dataset was mainly caused by the TALE genes (MEIS1 and PBX3) and by some HOXA@ family genes. This was characteristic for the t(11q23)/MLL signatures in our database and is consistent with results reported in [61].
Ranking of gene signatures.
| Rank | Gene signature | Taxonomy terms |
|---|---|---|
| 1 | Verhaak et al., Haematologica, 2009, AML, | |
| 2 | Verhaak et al., Haematologica, 2009, AML, | - |
| 3 | Verhaak et al., Haematologica, 2009, AML, | |
| 4 | Verhaak et al., Haematologica, 2009, AML, | |
| 5 | Alcalay et al., Blood, 2005, AML, | |
| 6 | Verhaak et al., Haematologica, 2009, AML, | |
| 7 | Alcalay et al., Blood, 2005, AML, | |
| 8 | Valk et al., N Engl J Med, 2004, Classification of AML subtypes | - |
| 9 | Ross et al., Blood, 2004, AML and ALL, t(11q23)/ | t(11q23)/ |
| 10 | Mullighan et al., Leukemia, 2007, AML, | |
| 11 | Mullighan et al., Leukemia, 2007, AML, | |
| 12 | Verhaak et al., Haematologica, 2009, AML, del(7q) | del(7q) |
| 13 | Mullighan et al., Leukemia, 2007, AML, | |
| 14 | Verhaak et al., Haematologica, 2009, AML, t(15;17) | t(15;17), Chrom. aberration |
| 15 | Marcucci et al., J Clin Oncol, 2008, AML, | |
| 16 | Stirewalt et al., Genes Chromosomes Cancer, 2008, AML | AML, Leukemia |
| 17 | Valk et al., N Engl J Med, 2004, AML, | |
| 18 | Ross et al., Blood, 2003, B-ALL, t(11q23)/ | - |
| 19 | van Delft et al., Br J Haematol, 2005, AML, t(11q23)/ | t(11q23)/ |
| 20 | Valk et al., N Engl J Med, 2004, AML, cluster without predominant characteristics | - |
| 21 | Verhaak et al., Blood, 2005, AML, | |
| 22 | Langer et al., Blood, 2008, AML, | - |
| 23 | van Delft et al., Br J Haematol, 2005, AML, t(11q23)/ | t(11q23)/ |
| 24 | Armstrong et al., Nat Genet, 2002, ALL, t(11q23)/ | t(11q23)/ |
| 25 | Valk et al., N Engl J Med, 2004, AML, mostly | - |
| ⋮ | ⋮ | ⋮ |
Each of the 138 gene signatures was tested for differential expression between NPM1-mutated and NPM1 wild type cases in the query dataset and ranked according to its p-value. All 8 of the 138 signatures associated with the taxonomy term "NPM1 mutated" ranked among the first 21 positions. The complete ranking of all signatures is available in the supplement [Additional file 1: Supplemental Table S4].
Ranking of taxonomy terms.
| Rank | unadjusted | Term | Number of signatures | Number of articles |
|---|---|---|---|---|
| 1 | < 0.001 | 8 | 4 | |
| 2 | 0.028 | t(11q23)/ | 9 | 6 |
| 3 | 0.071 | 7 | 5 | |
| 4 | 0.087 | del(7q) | 1 | 1 |
| 5 | 0.113 | 6 | 3 | |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Taxonomy terms were assessed based on the ranking of the gene signatures associated with those terms. In case of the example dataset examining NPM1-mutations, the 8 NPM1 signatures that were extracted from 4 different articles significantly occupied low ranks. The low p-value of translocation t(11q23)/MLL indicates a putative relation between this translocation and the studied NPM1-mutation. The full ranking of all taxonomy terms is provided in the supplement [Additional file 1: Supplemental Table S5]. The ranking remained reasonably stable when (i) half of the arrays were excluded from the analysis [Additional file 1: Supplemental Figure S2] and also when (ii) half of the gene signatures were excluded from the analysis [Additional file 1: Supplemental Figure S3].