| Literature DB >> 27585944 |
William Dunn, Anita Burgun, Marie-Odile Krebs, Bastien Rance.
Abstract
The unprecedented advances in technology and scientific research over the past few years have provided the scientific community with new and more complex forms of data. Large data sets collected from single groups or cross-institution consortiums containing hundreds of omic and clinical variables corresponding to thousands of patients are becoming increasingly commonplace in the research setting. Before any core analyses are performed, visualization often plays a key role in the initial phases of research, especially for projects where no initial hypotheses are dominant. Proper visualization of data at a high level facilitates researcher's abilities to find trends, identify outliers and perform quality checks. In addition, research has uncovered the important role of visualization in data analysis and its implied benefits facilitating our understanding of disease and ultimately improving patient care. In this work, we present a review of the current landscape of existing tools designed to facilitate the visualization of multidimensional data in translational research platforms. Specifically, we reviewed the biomedical literature for translational platforms allowing the visualization and exploration of clinical and omics data, and identified 11 platforms: cBioPortal, interactive genomics patient stratification explorer, Igloo-Plot, The Georgetown Database of Cancer Plus, tranSMART, an unnamed data-cube-based model supporting heterogeneous data, Papilio, Caleydo Domino, Qlucore Omics, Oracle Health Sciences Translational Research Center and OmicsOffice® powered by TIBCO Spotfire. In a health sector continuously witnessing an increase in data from multifarious sources, visualization tools used to better grasp these data will grow in their importance, and we believe our work will be useful in guiding investigators in similar situations.Entities:
Keywords: data analytics; high-dimensional data; omics; translational research; visualization
Mesh:
Year: 2017 PMID: 27585944 PMCID: PMC5862238 DOI: 10.1093/bib/bbw080
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1.A sampling of commonly used visualization techniques for multidimensional data using a subset of data in our data set compiling data from three groups of patients Var1, Var2 and Var3 are neurocognitive dimensions, Var4 and Var5 are psychopathological dimensions and Var6 is a global genetic index. Specific visualizations used are (A) dynamic pivot table (using R ‘rpivotTable’ package), (B) correlation matrix (using R ‘PerformanceAnalytics’ package), (C) Heatmap clustered by rows and columns (using R ‘gplots’ package), (D) 3D scatterplot using color and size (using R ‘scatterplot3d’ package) and (E) parallel coordinates showing all data (using d3 Javascript library ‘d3.parcoords.js’ [21]). A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Summary of visualization programs for multidimensional data that can be applied to user-provided datasets. For each tool reviewed, we evaluated a number of features organized by the various categories: General Information, Licensing. PoC = Proof of Concept.
| Category | Item | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name of the platform | cBioPortal | iGPSe | Igloo-Plot | tranSMART | G-DOC Plu | Data-cube-based model supporting heterogeneous data | Papilio | Caleydo Domino | Qlucore Omics Explorer | Oracle Health Sciences Translational Research Center | OmicsOffice® powered by TIBCO Spotfire | |
| General information | PMID or article reference | 22588877 | 25000928 | 24444495 | 25717408 | 27130330 | 25248201 | (Steenwijk et al. 2010) [ | 26356916 | NA | NA | NA |
| Initial release year | 2012 | 2014 | 2014 | 2012 | 2016 | 2014 | 2010 | 2014 | 2007 | 2011 | 1996 | |
| URL | cbioportal.org | osumo.org/ #process | metagenomics.atc. tcs.com/IglooPlot | transmart foundation.org | gdoc.george town.edu | NA | NA | caleydo.org/ tools/domino | qlucore.com | oracle.com/us/ industries/ health-sciences/ hs-cohort- explorerds- 1672120.pdf | cambridge soft.com/ ensemble/ spotfire/ OmicsOffice/ | |
| Reference | github.com/cBioPortal/ cbioportal | osumo.org | metagenomics. atc.tcs.com/ IglooPlot/walk through.html | wiki.transmart foundation.org | NA | NA | NA | github.com/ Caleydo/org. caleydo.view. domino | qlucore.com/ documentation | oracle.com/ us/industries/ health-sciences/ hs-cohort -explorerds- 1672120.pdf | scistore. cambridgesoft.com/ ScistoreProduct Page.aspx ?ItemID=8541 | |
| Data housing | MySQL | apache server | Internal memory from loaded data | any Relational Database Management System (e.g. Oracle, PostgreSQL) | Oracle | Internal c ++ data structures from data | SQLite | Internal memory from loaded data | Internal memory from loaded data | SAS Cloud or on premis (MySQL) | Cloud or on premis (Oracle or SQL) | |
| Principle frontend and/or backend programming languages | Java and Spring in backend, Javascript with libraries such as D3 and JQuery in front end | Javascript, d3.js, R | perkTk | Grails, Java | Groovy & Grails, Adobe Flex, JavaScript | C ++, using a framework based on opengl and qt4 | C ++ | Java, OpenGL/JOGL | C ++ | Oracle ADF/Java EE on the front end, with hooks into Oracle BI. The backend is Oracle stack data and middle tiers so Oracle DB, Oracle BIFS, Oracle Weblogic in a Java 2EE environment | .NET/C# with code in Iron Python, R, and in some cases C/C ++ | |
| Current status | In use | PoC | In use | In use | In use | PoC | PoC | In use | In use | In use | In use | |
| Dedicated domain | Exploration of largescale cancer genomics sets | Integrative genomics based cancer patient stratification | General visualization of multidimensional datasets | Hypothesis generation, hypothesis validation, and cohort discovery in translational research | Integrative analysis of various data types to uncover disease mechanisms | Exploration of heterogeneous data in clinical cohorts | Exploration of heterogeneous data in clinical cohorts | General visualization of multidimensional datasets | Visualization, exploration, and analysis of bioinformatics data | Data agregation, integration, data cleaning for clinical cohort studies | Start to end genomics data analysis | |
| Licensing | Software availability | Opensource (GNU Affero General Public License, version 3) | Open source | Open source | Open source | Open source | NA | NA | Open-source (BSD License) | Fee-based | Fee-based | Fee-based |
| Client-side interface | Web browser | Web browser | Stand-alone for linux or windows | Web browser | Web browser | Stand-alone stand-alone (Trolltech Qt interface) | Web browser or standalone | Stand-alone | Web browser | Stand-alone | ||
| User mailing list or support | Yes | No | No | Yes | Yes | No | No | No | Yes | Yes | Yes | |
Figure 2Overview of tranSMART. In a typical workflow, users define subsets of patients based on a drag and drop method of variables from the right column to the appropriate boxes (A). In this example, the summary statistics view (B) shows age difference between patients with genotypes (subsets 1 and 2, respectively) in a candidate gene. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 3A demonstration of Caleydo Domino using exploration of a set of multiple tabular data sets for a music data set containing song and musician information. This figure displays the main user interface of the program where users can drag and position data subsets and chose which calculations or visualizations to use to explore data and relationships between data [63]. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 4A demonstration of StratomeX using exploration of a set of multiple tabular data sets for the TCGA clear cell renal carcinoma data set. This figure displays the main user interface of the program where users can drag and position data subsets and chose which calculations or visualizations to use to explore data and relationships between data. Above, users can visualize the relation between patients with subtypes based on two different genomic clustering experiments [65]. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Summary of visualization programs for multidimensional data that can be applied to user-provided datasets. For each tool reviewed, we evaluated a number of features organized by the various categories: Information content supported, visualization, data-exploration. PoC = Proof of Concept.
| Category | Item | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| General | General information | Name of the platform | cBioPortal | iGPSe | Igloo-Plot | tranSMART | G-DOC Plus | Data-cube-based model supporting heterogeneous data | Papilio | Caleydo Domino | Qlucore Omics Explorer | Oracle Health Sciences Translational Research Center | OmicsOffice® powered by TIBCO Spotfire |
| Information content supported | Clinical | Demographics | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Diagnosis | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | ||
| Biology | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | ||
| Survival | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | ||
| Imaging | No | No | Yes | Yes | Yes | Yes | Yes | Yes | No | No | Yes | ||
| Omics | Gene mutation | Yes | No | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | |
| mRNA | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | ||
| Other | Methylation, protein and phosphoprotein data | miRNA | NA | NA | NA | NA | NA | NA | Methylation, protein expression, flow cytometry | Methylation | RNA sequence, chromatin immunoprecipitation sequence, qPCR | ||
| Other | Any type of raw or processed data that corresponds in a one to one relation to a sample | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | |
| Visualization | High dimensional | Heatmap | yes (through IGV) | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | Yes |
| Correlation matrix | No | No | Yes | Yes | No | Yes | No | Yes | No | no | Yes | ||
| Parallel coordinates | No | No | No | No | No | No | Yes | Yes | No | no | Yes | ||
| Other | OncoPrinter | Parallel sets, silhouette plot, Sankey plot, force-directed graphs | NA | Waterfall plot, PCA plot, Haploview, Manhattan plot, Forest plot, Frequency plot for aCGH | Biological network and pathways viewers (Reactome, Cytoscape), integrated genome browser (JBrowse) | NA | Scatterplots color coded by patient type overlayed with PCA ellipses | Parallel sets, sankey-diagrams, and more novel graphics | Sample PCA, variable PCA | Requires business intelligence layer for visualization | Pathway viewer, 3D scatterplot, map chart, treemap | ||
| Low dimensional | Timeline/line chart | No | No | No | Yes | No | No | Yes | No | Yes | Yes | Yes | |
| Histograms | Yes | No | No | Yes | No | Yes | No | Yes | Yes | Yes | Yes | ||
| Scatterplots | Yes | No | No | Yes | No | Yes | Yes | Yes | Yes | No | Yes | ||
| Kaplan–Meier survival plot | Yes | Yes | No | Yes | Yes | No | No | Yes | Yes | No | No | ||
| Bar charts/box and whisker | Yes | No | No | Yes | Yes | No | No | Yes | Yes | Yes | Yes | ||
| Pie charts | No | Yes | No | Yes | no | No | No | Yes | no | Yes | Yes | ||
| Other | MutationMapper, volcano plot | NA | Novel semi-circle plotting approach based on correlation and Hooke's law | NA | Interactive 3D molecular viewer, chromosome and CNV visualizations, Venn diagram | Atlas view representing areas of brain implicated in analyses | NA | NA | NA | NA | Volcano plot | ||
| Coordination | Linked views | No | Yes | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes | |
| Data-exploration | Statistics and data mining | Statistics | Survival log-rank test, Cytoscape graph viewer for genetic networks | Log-rank test, | Class discovery within data | Logistic regression, correlation, | PCA, differential expression analysis, hierarchical clustering, group comparisons | Correlation statistics between radiology results and cognitive testing, multivariate statistics, multilinear regression, as well as any type of statistics provided calculated by R in future versions | Basic statistics such as finding differences in measures between two groups. Confidence-weighted principal component ellipses | NA | Integrated with programming languages such as R for statistics beyond simple group counts | Line similarity, regression modeling, wide range of parametric and nonparametric statistical tests, functional gene analysis, data classification | |
ANOVA = analysis of variance.