| Literature DB >> 23890051 |
Gabriel M Altschuler1, Oliver Hofmann2, Irina Kalatskaya3, Rebecca Payne1, Shannan J Ho Sui4, Uma Saxena1, Andrei V Krivtsov5, Scott A Armstrong6, Tianxi Cai1, Lincoln Stein3, Winston A Hide2.
Abstract
New strategies to combat complex human disease require systems approaches to biology that integrate experiments from cell lines, primary tissues and model organisms. We have developed Pathprint, a functional approach that compares gene expression profiles in a set of pathways, networks and transcriptionally regulated targets. It can be applied universally to gene expression profiles across species. Integration of large-scale profiling methods and curation of the public repository overcomes platform, species and batch effects to yield a standard measure of functional distance between experiments. We show that pathprints combine mouse and human blood developmental lineage, and can be used to identify new prognostic indicators in acute myeloid leukemia. The code and resources are available at http://compbio.sph.harvard.edu/hidelab/pathprint.Entities:
Year: 2013 PMID: 23890051 PMCID: PMC3971351 DOI: 10.1186/gm472
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1The Pathprint pipeline. Rank-normalized gene expression is mapped to pathway expression. A distribution of expression scores across the Gene Expression Omnibus (GEO is used to produce a probability of expression (POE) for each pathway. A pathprint vector is derived by transformation of the signed POE distribution into a ternary score, representing pathway activity as significantly underexpressed (-1), intermediately expressed (0), or overexpressed (+1).
Summary of gene sets used in Pathprint
| Pathways, n | Mean size, n | Median size, n | Minimum size, n | Maximum size, n | Total genes, n | |
|---|---|---|---|---|---|---|
| Reactome | 53 | 154 | 108 | 11 | 932 | 4,874 |
| Wikipathways | 173 | 50 | 33 | 6 | 260 | 3,918 |
| Netpath | 36 | 170 | 83 | 8 | 816 | 3,811 |
| KEGG | 227 | 76 | 55 | 6 | 1,138 | 5,990 |
| Static modules | 144 | 45 | 21 | 9 | 733 | 6,458 |
| All | 633 | 74 | 41 | 6 | 1,138 | 10,903 |
Abbreviations: KEGG, Kyoto Encyclopedia of Genes and Genomes.
Figure 2Cross-species integration. (a) Precision recall within the tissue training dataset for the pathprint (red indicates mean average precision (MAP) = 0.90), unthresholded POE (dashed; MAP =, 0.88), random gene sets (black, MAP = 0.83), Gene Expression Barcode (blue, MAP = 0.73), Spearman gene expression correlation (green, MAP = 0.71). (b) Comparison of distance metrics; precision-recall curves for aggregated mouse to human tissue data based on a thresholded pathprint build produced using Euclidean (blue), Manhattan (green), and Mahalanobis (red) distances. (a,b) Tissue-dominated versus platform/species-dominated clustering showing plots of the two most significant principal components (PCs) for (c) the pathprint and (d) the Gene Expression Barcode (red, brain; yellow, kidney; green, liver; light blue, lung; dark blue, muscle; pink, spleen; circles, Mouse 430A2; diamonds, Human 133plus2; crosses Human 133A). (e) Functional classification of tissues and blood cell types. Hierarchical clustering of consensus pathprints for human and mouse tissues on three platforms based on the Wikipathway and Reactome pathways that significantly contributed to clustering. Colors indicate scores: red, 1; white, 0; and blue, -1).
Pathprint-based retrieval of data from the Gene Expression Omnibus (GEO); arrays retrieved from GEO from consensus tissue pathprints at 95% precision
| Seed arrays, n | Correct retrievals, n | Platforms, n | Species, n | |
|---|---|---|---|---|
| Brain | 50 | 8,691 | 25 | 4 |
| Kidney | 81 | 1,156 | 14 | 3 |
| Liver | 196 | 4,797 | 22 | 4 |
| Lung | 142 | 1,735 | 13 | 3 |
| Skeletal muscle | 29 | 2,919 | 18 | 3 |
| Spleen | 33 | 179 | 5 | 2 |
Figure 3Functional classification of blood cell types. (a) Maximum-parsimony phylogenetic reconstruction of the hematopoietic lineage using pathprints calculated from (a) human [40] and (b) mouse [41] gene expression experiments. (c) Combined human-mouse tree based on shared informative pathways that resolve trees (a) and (b) and the pathway heat-map. The myeloid (yellow) and lymphoid (purple) branches are indicated, and dark branches represent agreement with the canonical lineage. See Additional file 10 for pathway annotations.
Figure 4Clinically important self renewal-associated signature (SRAS) in acute myeloid leukemia (AML). (a) Pathways differentially expressed in stem and non-stem cell profiles in leukemic and normal samples were found in human and mouse experiments. Four common SRAS pathways were identified. (b) The SRAS pathprint scores of patients with AML were significantly associated with survival. (c) A single pathway of interest is highlighted, the overall PGCL2 (α 2u globulin) module is upregulated in normal and cancer stem cells but individual genes differ between species. This pathway is strongly associated with survival (see Additional file 13).