| Literature DB >> 29070035 |
David Lopez1, Dennis Montoya1, Michael Ambrose1, Larry Lam1, Leah Briscoe1, Claire Adams1, Robert L Modlin2,3, Matteo Pellegrini4.
Abstract
BACKGROUND: Molecular signatures are collections of genes characteristic of a particular cell type, tissue, disease, or perturbation. Signatures can also be used to interpret expression profiles generated from heterogeneous samples. Large collections of gene signatures have been previously developed and catalogued in the MSigDB database. In addition, several consortia and large-scale projects have systematically profiled broad collections of purified primary cells, molecular perturbations of cell types, and tissues from specific diseases, and the specificity and breadth of these datasets can be leveraged to create additional molecular signatures. However, to date there are few tools that allow the visualization of individual signatures across large numbers of expression profiles. Signature visualization of individual samples allows, for example, the identification of patient subcategories a priori on the basis of well-defined molecular signatures. RESULT: Here, we generate and compile 10,985 signatures (636 newly-generated and 10,349 previously available from MSigDB) and provide a web-based Signature Visualization Tool (SaVanT; http://newpathways.mcdb.ucla.edu/savant ), to visualize these signatures in user-generated expression data. We show that using SaVanT, immune activation signatures can distinguish patients with different types of acute infections (influenza A and bacterial pneumonia). Furthermore, SaVanT is able to identify the prominent signatures within each patient group, and identify the primary cell types underlying different leukemias (acute myeloid and acute lymphoblastic) and skin disorders.Entities:
Keywords: Heterogeneous samples; Molecular signatures; Tissue-specific expression; Transcriptomic analysis; Visualization tools
Mesh:
Year: 2017 PMID: 29070035 PMCID: PMC5657101 DOI: 10.1186/s12864-017-4167-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Table 1
| Expression Data Source | Reference | Platform | Normalization | # Signatures Generated |
|---|---|---|---|---|
| Human U133A/GNF1H Gene Atlas (BioGPS) | Su AI et al. (2004) | Affymetrix U133A/GNF1H | fRMA | 84 |
| Mouse MOE430 Gene Atlas (BioGPS) | Lattin JE et al. (2008) | Affymetrix 430 2.0 Array | fRMA | 94 |
| Immunological Genome Project (ImmGen) | Heng TS et al. (2008) | Affymetrix Gene 1.0 ST | Pre-processed | 214 |
| Human Cell Types (Swindell) | Swindell WR et al. (2013) | Affymetrix Genome Plus 2.0 | fRMA | 24 |
| Macrophage Activation | Xue J et al. (2014) | Illumina HumanHT-12 V3.0 | Pre-processed | 80 |
| Primary Cell Atlas | Mabbott NA (2013) | Affymetrix U133 Plus 2.0 | fRMA | 26 |
| Skin Diseases (“DermDB”) | Inkeles MS et al. (2015) | Mixed | fRMA | 23 |
Data sources for SaVanT signatures
Fig. 1Constructing ‘Signature-Sample’ Matrix From Expression Data. The SaVanT pipeline converts user-submitted expression data into a signature-sample matrix whose columns are the submitted samples and rows are the user-selected molecular signatures. By default (shown above), every cell in this matrix contains the average value of signature genes for a particular signature-sample combination. The breakdown for an example cell in the signature-sample matrix is shown in red. The matrix value is computed by looking up the genes in any given user-selected signature in the SaVanT database (middle panel) and subsequently averaging the values of these genes in a particular sample in the user-submitted data (left and right panels). Above, samples are designated with numbers, genes with letters, and signatures with Roman numerals
Fig. 2SaVanT Pipeline. In the first step, an expression matrix containing values for genes in several samples is optionally converted to ranked lists of genes in samples or log-transformed. The expression matrix is then converted into a signature-sample matrix as described in Fig. 1 using the selected signatures. Optionally, the signature-sample matrix is converted to differences from mean values, converted to z-scores, and/or clustered to produce a final heatmap
Fig. 3SaVanT Distinguishes Between Patients, Cell Types, and Underlying Biology. a SaVanT output for expression data from acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) patients. ‘Signature value’ refers to the average of gene expression values in a signature. Z-scores (across the entire signature value matrix) are shown for both heatmaps. b SaVanT output for expression data from 99 patients with acute infections (either Influenza A or bacterial pneumonia). The infection type for each patient is represented by a hatched circle (Influenza A) or filled triangle (bacterial pneumonia). The numbers below each cluster quantify the proportion of infection types. The difference between the signature values and the average signature value per signature is shown. c SaVanT output for expression data from different skin diseases
Table 2
| Tool/Resource | Analysis Objective | Number of Signatures or Datasets | Number of Samples Analyzed | Input | Output | Interface and Requirements | Runtime |
|---|---|---|---|---|---|---|---|
| SaVanT | Visualization of molecular signatures across samples | 10,985 signatures | 1–150 samples | Gene expression matrix (gene symbols and values) | Interactive heatmap | Website/Browser | 75 s (50 samples, 25,219 genes, 4729 signatures, with ANOVA) |
| GSEA | Identification of significant or differential gene sets and signatures | User-defined; MSigDB supported (up to 18,026 gene sets) | Two or more biological states (with replicates) | Expression dataset and phenotype data | Enrichment plots and lists | Java Archive Download | 4 min (50 samples, 4729 signatures, 9096 genes, 1000 permutations) |
| BubbleGUM | Extraction and visualization of molecular signatures and gene sets | User-defined; MSigDB supported (up to 18,026 gene sets) | 2+ samples | GCT file (expression dataset) and phenotype data | Graphical plots | Java Archive Download | 5 min (13 samples, 75 signatures, 1000 permutations) |
| GSVA | Estimation of variation in pathway and signature genes across samples | User-defined; MSigDB supported (up to 18,026 gene sets) | 2+ samples | Gene expression matrix and gene set data | Score matrix of enrichment scores | R package (Bioconductor) | 3 min (30 samples, 100 gene sets, 20,000 genes) |
| PLAGE | Quantification of pathway activity across samples | 400 pathways from KEGG | 2+ samples | Gene expression matrix | Heatmap of pathway activity levels | Website/Browser | N/A (Could not access website) |
| ssGSEA | Determine enrichment of a gene set within dataset | User-defined; MSigDB supported (up to 18,026 gene sets) | 2+ samples | GCT file with expression estimates | Matrix of enrichment projections | R package (or via browser using GenePattern) | 2 min (50 samples, 326 gene sets, 10,100 genes) |
Previously published tools for the analysis of gene signatures