| Literature DB >> 23363777 |
Michael P Schroeder1, Abel Gonzalez-Perez1, Nuria Lopez-Bigas2.
Abstract
Cancer genomics projects employ high-throughput technologies to identify the complete catalog of somatic alterations that characterize the genome, transcriptome and epigenome of cohorts of tumor samples. Examples include projects carried out by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). A crucial step in the extraction of knowledge from the data is the exploration by experts of the different alterations, as well as the multiple relationships between them. To that end, the use of intuitive visualization tools that can integrate different types of alterations with clinical data is essential to the field of cancer genomics. Here, we review effective and common visualization techniques for exploring oncogenomics data and discuss a selection of tools that allow researchers to effectively visualize multidimensional oncogenomics datasets. The review covers visualization methods employed by tools such as Circos, Gitools, the Integrative Genomics Viewer, Cytoscape, Savant Genome Browser, StratomeX and platforms such as cBio Cancer Genomics Portal, IntOGen, the UCSC Cancer Genomics Browser, the Regulome Explorer and the Cancer Genome Workbench.Entities:
Year: 2013 PMID: 23363777 PMCID: PMC3706894 DOI: 10.1186/gm413
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Tools and resources for visualizing multidimensional cancer genomics data
| Name | Description | Visualization type | Tool type | Data that can be visualized |
|---|---|---|---|---|
| Resource for visualizing TCGA and other data sets with many features, of which the network viewer and OncoPrint are of special interest. In the network viewer, the portal overlays multidimensional genomics data onto all nodes that are representing genes. This provides the frequency of mutations and copy number alterations (and optionally, mRNA up-/downregulation). OncoPrint shows the same alteration data in a matrix heatmap | Networks | Web tool | Pre-calculated TCGA and other data sets | |
| Tool that produces heatmaps with a circular layout. Different data sets coming from the same samples can be plotted as different layered circles that form a node. The data layers are plotted maintaining the sample order, which can be adjusted by the user | Circular heatmaps | Command line application | Any user-prepared data | |
| Tool for visualizing data and information in a circular layout. It allows intuitive exploration of the relationships between genomic positions, which are depicted as ribbons. Different genomic data types can be represented in different layers of the circle. To a great extent, the color code and plot style for each layer (or data set) can be adjusted by the user | Circular genomic coordinates | Command line application | Any user-prepared data | |
| Tool prepared for the visualization of interdependencies between multiple datasets. It allows exploration of relationships between multiple groupings and different datasets. It can cluster genomics data of different alterations and represents them as matrix heatmaps. The different groupings are connected by ribbons whose width corresponds to the number of samples shared by the connected clusters. Clinical data and pathway maps can be integrated to characterize the clusters | Matrix heatmap with option to visualize pathway maps | Desktop application (Java) | Any user-prepared data (matrices, clusterings). Prepared TCGA data available at | |
| Software for visualizing complex networks and integrating these with any type of attribute data such as genomics data and clinical patient information. An extensive library of community-developed plugins is available, some of which (for example, Reactome FIs) focus on cancer data analysis [ | Networks | Desktop application (Java) | The stand-alone application supports any user-prepared network or attribute data. Additional data are available via various plugins (for example, GeneMANIA [ | |
| Tool that can be used to analyze and visualize genomic data. Data can be visualized as heatmaps or along genomic coordinates. Module maps and module networks can be created from expression data and can integrate gene expression data, DNA sequence data, and gene and experiment annotations | Matrix heatmap Genomic coordinates | Desktop application (Java) | User-prepared data | |
| Tool for analysis and visualization of genomic data using interactive heatmaps. It allows loading of multidimensional matrices (with several values per cell), and thus is very well suited for the visualization and exploration of multidimensional cancer genomics data. It contains several analyses and options that are specifically designed for the exploration of cancer genomics data | Matrix heatmap with interactive features | Desktop application (Java) | Any user-prepared data and data imported from IntOGen [ | |
| Visualization tool for interactive exploration of integrated genomics datasets, with a focus on good performance when working with large data sets. All tracks can be annotated with color-coded sample and clinical information; genomic regions can be annotated with text labels. All of the common genomic file formats are supported, including array-based data, next-generation sequence data formats and genomic annotations | Genomic coordinates | Desktop application (Java) | User-prepared data and data from the IGV server, including some TCGA data. In addition, IGV can be accessed from external tools such as GenePattern [ | |
| Resource that is used to analyze and visualize cancer genomics data, including expression, copy number variation and somatic mutation data from cancer genomic projects. Various visualization options are offered, of which web-interactive heatmaps (using jheatmap [ | Matrix heatmaps with interactive features | Web tool | Pre-calculated data from more than 300 cancer genomic experiments and user-prepared data for somatic mutations in tumors | |
| Tool for visualizing and analyzing protein-protein interaction networks (Network Analysis, Visualization and Graphing TORonto). The network visualization options can be customized to represent genomic data properties by automatically mapping attribute values to visual properties | Networks | Desktop application (Java) | User-prepared data. Data can also be loaded via plugins from multiple portals (such as Reactome [ | |
| Tool for the integrative exploration of associations between clinical and molecular features of data from the TCGA project. The visualization is interactive and the displayed data can be filtered according to different criteria. Visualization options include circular and linear genomic coordinates and networks | Circular and linear genomic coordinates | Web tool | Pre-calculated TCGA data | |
| Desktop visualization and analysis browser for genomics data. This tool was primarily developed for the effective visualization of large sets of high-throughput sequencing data, similar to IGV. Multiple visualization modes enable the exploration of genome-based sequence, points, intervals, or continuous datasets. Plugins are available, amongst which is the WikiPathways [ | Genomic coordinates | Desktop application (Java) | User-prepared data or data that can be downloaded through plugins such as the USCS Explorer plugin | |
| Host for mutation, copy number, expression, and methylation data from a number of projects. It has tools for visualizing sample-level genomic and transcription alterations in various cancers. The main viewers in CGWB are Integrated tracks view, Heatmap view and Bambino, an alignment viewer. The interface of CGWB is based on the UCSC Genome Browser [ | Genomic coordinates | Web tool | Pre-calculated data from various resources (such as Cosmic, NCI-60 and TCGA [ | |
| Tool for hosting, visualizing, and analyzing cancer genomics datasets. The browser can display genome-wide experimental measurements for multiple samples, which can originate from multiple data sets alongside their associated color-coded clinical information. The browser provides interactive views of data from genomic regions to annotated biological pathways and user-contributed collections of genes. Integrated statistical tools provide quantitative analysis within all available datasets | Genomic coordinates | Web tool | TCGA data and data from independent publications available from the UCSC server. In addition to open access to public datasets, the browser provides controlled access to private project data | |
Figure 1Cancer genomics projects generate multidimensional data for a cohort of patients. Different technological platforms will screen for different genomic and epigenomic changes in each patient, generating multidimensional data sets. The data are usually represented by clinical data along with one or more of the three main types of visualization tools: genomic coordinates, matrix heatmaps and networks.
Figure 2Screenshots of tools that are frequently used in cancer genomics research distributed according to their visualization principles. Each of the three visualization methods - matrix heatmaps, genomic coordinates and networks - are associated with a point of the triangle. Tools that are placed close to one of these points mainly use the visualization method associated with that point; those placed in between use a mixed-model visualization method.
Figure 3Four case studies are represented using one or several of the major visualization methods applied in oncogenomics. (a) Heatmap of oncogenomic alterations ordered by mutual exclusivity plotted with Gitools. In the upper half of the image, colors indicate the type of alteration: mutations (green), CNA gain (red) and CNA loss (blue). The heatmap below shows expression data (high expression in red and low expression in green) for the same samples and genes, allowing the visual observation that genomics regions whose copy number is amplified tend to have higher expression values. (b) The same data as in (a), with the same color code for alterations, represented as a network of functional interactions between the genes, extracted from the cBio Cancer Genomics Portal. The halo around the four selected nodes is divided into three sectors. Changes in the proportion of samples with altered copy number are indicated in red (gain) or blue (loss) in the top sector, whereas changes in the proportion of samples with mutations are indicated in green in the lower-right sector. Expression changes are shown in light red (increase) and light blue (decrease) in the lower-left sector of the halo. Panels (c-e) include clinical information. Each tumor sample is assigned to one of four subtypes of glioblastoma, color-coded as dark green (classical), light green (mesenchymal), orange (neural) and red (proneural). (c) Heatmap of pathway expression levels plotted with Gitools. Each column is a tumor sample. The subtype is represented in colors in the top row and each row represents a biological pathway. The color of each cell indicates the Zscore of the sample level enrichment analysis (SLEA) of the pathway in the sample. Clear differences in the expression values in different pathways can be observed for different cell subtypes. (d) Same data as in (c) represented in the form of a network, drawn using CircleMap. Each node is a pathway and its edges indicate functional interactions between pathways as extracted from KEGG. The two halos around each node indicate the Zscore of the pathway in each sample and the clinical subtype. (e) CNA and expression data for the EGFR gene region of glioblastoma samples as shown by IGV. The top part of the plot indicates the genomic position we are observing. Each sample is shown as a horizontal track, ordered by clinical subtype. Within each clinical subtype, the tracks in the upper half illustrate CNA whereas those below show expression. This visualization reveals clear differences in the CNA and expression of the EGFR locus in different clinical subtypes. (f) Adaptations of Circos plots of three breast tumors with three very different alteration landscapes. The four circles in each plot, from outermost inwards, represent the human chromosomes, mutations, copy number alterations, and structural rearrangement.