| Literature DB >> 26507857 |
Frederik Otzen Bagger1, Damir Sasivarevic2, Sina Hadi Sohi2, Linea Gøricke Laursen1, Sachin Pundhir1, Casper Kaae Sønderby3, Ole Winther4, Nicolas Rapin5, Bo T Porse6.
Abstract
Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan-Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.Entities:
Mesh:
Year: 2015 PMID: 26507857 PMCID: PMC4702803 DOI: 10.1093/nar/gkv1101
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data sets for normal hematopoiesis
| Data set | Organism | Source | Sample numbers | Cell types | Reference |
|---|---|---|---|---|---|
| Normal hematopoiesis with AMLs | Human | GSE42519 | 34 | HSC, MPP, CMP, MEP, GMP, early PM, late PM, MY, MM, BC, PMN | Rapin |
| Normal hematopoiesis (HemaExplorer) | Human | GSE17054 | 2 | HSC | Majeti |
| Normal hematopoiesis (HemaExplorer) | Human | GSE19599 | 4 | GMP, MEP | Andersson |
| Normal hematopoiesis (HemaExplorer) | Human | GSE11864 | 2 | Monocytes | Hu |
| Normal hematopoiesis (HemaExplorer) | Human | E-MEXP-1242 | 2 | Monocytes | Wildenberg |
| Normal hematopoiesis (DMAP) | Human | GSE24759 | 211 | Normal Hematopoiesis | Novershtern |
| Mouse normal hematopoietic system | Mouse | GSE14833, GSE6506 | 67 | Normal Hematopoiesis | Di Tullio |
| ImmGen data sets | Mouse | GSE15907 | >700 | Normal Hematopoiesis | Ref ( |
Data set overview
| Data set | Features | Samples | Normalisation method |
|---|---|---|---|
| Leukemia MILE study | 67191 | 2095 | 1 |
| Normal human hematopoiesis with AMLs | 67191 | 296 | 1,7 |
| Immgen Key populations | 47273 | 256 | 2 |
| AML versus normal | 67191 | 252 | 3 |
| AML TCGA data set | 67191 | 244 | 1 |
| AML TCGA data set versus normal | 67191 | 244 | 3 |
| AML Normal Karyotype | 54675 | 234 | 1 |
| AML Normal Karyotype versus normal | 67191 | 234 | 3 |
| Normal human hematopoiesis (DMAP) | 35459 | 211 | 4 |
| Immgen abT cells | 47273 | 190 | 2 |
| Immgen Dentritic cells | 47273 | 151 | 2 |
| Immgen MFs Monocytes Neutrophils | 47273 | 114 | 2 |
| Immgen B cells | 47273 | 103 | 2 |
| Normal human hematopoiesis (HemaExplorer) | 57270 | 77 | 5 |
| Immgen gdT cells | 47273 | 76 | 2 |
| Immgen Stem and progenitor cells | 47273 | 76 | 2 |
| Mouse normal hematopoietic system | 57613 | 67 | 4 |
| Immgen Activated T cells | 47273 | 55 | 2 |
| Immgen NK cells | 47273 | 47 | 2 |
| Immgen Stromal cells | 47273 | 39 | 2 |
| Mouse normal (RNA seq) | 45426 | 52 | 6 |
| BloodPool | 67191 | 2120 | 1,7 |
| BloodPool versus normal | 67191 | 2076 | 3,7 |
Normalisation method legend:
1 Each cancer sample is normalised together with a set of samples from sorted normal myeloid populations. All samples where normalised using RMA. Comparison of gene expression values is not possible with other data sets in Bloodspot.
2 All samples from the ImmGen data sets were normalised together with RMA. Samples were subsequently attributed to the different data sets in BloodSpot. This means that comparison of gene expression values is possible across all ImmGen data sets.
3 The data are normalised according to Rapin et al. Briefly, each cancer sample is normalised together with a set of samples from sorted normal myeloid populations. Next, using a PCA-based method, the 5 closest normal samples from the cancer sample are averaged and this computed normal sample are next compared to the cancer sample allowing for computation of gen expression fold changes. See Supplementary Methods and Rapin et al. (10).
4 All sampleswhere
normalised using RMA. Comparison of gene expression values is not possible with other datasets in Bloodspot.
5
See our previous work (Bagger et al. (3)).
6 The data were processed using the bcbio nextgen RNA-seq pipeline. Count data were subsequently processed with DESeq2's variance stabilising transformation method.
7 The data was batch corrected using ComBat, taking study number as batch.
Figure 1.Principal component analysis (PCA) plot of BloodPool samples. (A) before batch correction, (B) after batch correction. Batches are coloured by study of origin.
Figure 2.BloodSpot interface details. After a gene alias is submitted to display its expression pattern, any of the top three panels can be clicked to magnify content. The three panels show, from left to right, a survival plot based on a high-quality AML data set displaying a full Kaplan–Meier analysis for any query gene or gene signature, an improved jitter strip chart of gene-expression plot that draws from bar plots and violin plots and an interactive hierarchical tree that shows the relationship between the samples displayed and allows changing the focus of the display. The Select Population button allows the user to select which populations to display. The Gene Correlations button shows in a table how much other genes or gene signatures correlate with the displayed gene. It is possible to click on the genes in the table to display their expression profile. The Print as PDF button allows the user to export the current plot in PDF format. The T-Test button allows you to perform significance test between pairs of populations (legend is as follows: NS: non significant; *P < 0.05; **P < 0.01; ***P < 0.001). The Export Data as Text button allows you to export the raw data as text (CSV format). The Upload your own sample button allows for the upload of an Affymetrix HU133 plus 2.0 .CEL file and for viewing it in the context of normal haematopoiesis. The drop down menu in the upper right corner of the main plot can be used to select a probe representing the gene of interest; by default, the probe with the highest intensity is chosen. At the bottom of the main plot, a list of abbreviations is available that includes immunophenotypes when applicable.
Figure 3.Main plots from BloodSpot for MEIS1. (A) Default view in BloodSpot. The plot is a novel improved jitter strip chart of gene expression that draws from bar plots and violin plots where the jitter is controlled by the density of samples and normalised over all the columns in the chart. (B) Survival plot based on a high-quality AML data set from The Cancer Genome Atlas (TCGA). It displays a full Kaplan–Meier analysis of survival. The survival plots are only available for human data sets, sharing probes with the microarray platform used by the TCGA. (C) Interactive hierarchical tree that shows the relationship between the samples displayed. Hovering over the nodes provides the full names of cell populations. Nodes can be clicked to collapse a branch of the tree—this will also update the default plot in the middle and remove the same populations there. The colour in the nodes represents the median expression of the queried gene. To accentuate the display in the trees, node size is also proportional to gene expression. Trees are based on literature (hierarchical differentiation), or overall sample correlation (correlation of samples). (D) Example table of genes and gene signatures correlating with MEIS1 expression in the default data set. This table appears when the user clicks on the ‘correlation’ button.
Figure 4.MEIS1 expression relative to the nearest normal counterpart in different AML subtypes, including MLL-rearranged AML.
Data sets for leukemic patients
| Data set | Organism | Source | Patient numbers | Cell types | Reference |
|---|---|---|---|---|---|
| AML Normal Karyotype data sets | Human AML | GSE15434 | 251 | NK-AML, WBM | Kohlman |
| AML TCGA data sets | Human AML | TCGA | 183 | Various genetic aberrations, including t(8;21), inv(16), t(15;17), t(11q23), complex karyotype, WBM | TCGA ( |
| Leukemia MILE study | Human AML, ALL, CML, CLL and MDS | GSE13159 | 2096 | AML, ALL and preleukemic stages. | Haferlach |
| AML versus normal | Human AML | GSE6891, GSE13159 | 91 | NK-AML, WBM | de Jonge |
| 251 | |||||
| Bloodpool | Human AML | GSE13159, GSE15434, TCGA, GSE61804, GSE14468 | 2076 | Mainly AML, ALL and preleukemic stages. | all references above |