SUMMARY: In translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing user community including both academic institutions and pharmaceutical companies. tranSMART, however, currently lacks interactive and dynamic visual analytics and does not permit any post-processing interaction or exploration. For this reason, we developed SmartR , a plugin for tranSMART, that equips the platform not only with several dynamic visual analytical workflows, but also provides its own framework for the addition of new custom workflows. Modern web technologies such as D3.js or AngularJS were used to build a set of standard visualizations that were heavily improved with dynamic elements. AVAILABILITY AND IMPLEMENTATION: The source code is licensed under the Apache 2.0 License and is freely available on GitHub: https://github.com/transmart/SmartR . CONTACT: reinhard.schneider@uni.lu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: In translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing user community including both academic institutions and pharmaceutical companies. tranSMART, however, currently lacks interactive and dynamic visual analytics and does not permit any post-processing interaction or exploration. For this reason, we developed SmartR , a plugin for tranSMART, that equips the platform not only with several dynamic visual analytical workflows, but also provides its own framework for the addition of new custom workflows. Modern web technologies such as D3.js or AngularJS were used to build a set of standard visualizations that were heavily improved with dynamic elements. AVAILABILITY AND IMPLEMENTATION: The source code is licensed under the Apache 2.0 License and is freely available on GitHub: https://github.com/transmart/SmartR . CONTACT: reinhard.schneider@uni.lu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Translational research can be described as an ‘interdisciplinary branch of the biomedical field supported by three main pillars: benchside, bedside and community’ (Cohrs ). One of the most difficult, yet most important, tasks in this field is the proper communication and knowledge exchange between the different fields of expertise. An information system that integrates all level of data (pre-clinical, clinical, OMICS, etc.) generated in research and that provides an interactive interface to explore, visualize and analyze those data will substantially increase the efficiency of knowledge exchange and hypothesis generation.In the context of the eTRIKS (European Translational Information & Knowledge Management Services) consortium (https://www.etriks.org/, 2017) academia and pharma seek to combine their interdisciplinary knowledge to provide secure data environments and open source tools that help to answer important biological questions and enable the discovery of new scientific facts within existing studies. The tranSMART platform (Athey ) addresses these requirements, supports a multitude of data types, has a well-established community and provides APIs, that make connections with a variety of other services possible. Detailed reasons for this choice and comparisons to other existing analytics platforms are described elsewhere (Satagopam ). A major problem currently present in tranSMART is the lack of interactive visual-analytical functionality, which is essential for a collaborative knowledge management platform. In the present state, analytical workflows are restricted to the displaying of static images generated by the statistical programming language R. The static nature of this approach makes it very difficult to apply any post-processing analysis or to do further exploration, such as selecting a certain feature for further investigation. When confronted with this problem, one might first attempt to use existing web-visualization libraries like Highcharts (http://www.highcharts.com/, 2017), Plotly.js (Plotly Technologies Inc., 2015), or one of many BioJS (Gómez ) components. Browsing existing heat map implementations or other basic visualizations, one can see that, although visually appealing, they lack deep integration of more than the most basic statistics. Because of the analytical limitations of the web-browser and the missing analytical engine, they cannot re-compute initial input values and therefore often chose not to display many statistics in the first place to preserve their dynamism. Another approach is to combine custom visualizations with an analytical engine as shown in DIVE (Rysavy ), BRAVIZ (Angulo ), HitWalker2 (Bottomly ), or Shiny (https://shiny.rstudio.com/, 2017).Using custom visualizations with a supporting analysis component enables the researcher to iteratively explore the data with each analysis step, in contrast to hypothesis-driven research. To make this methodology available for translational researchers, we developed SmartR, a new, highly modular, analytical framework for tranSMART, that equips the platform with interactive and dynamic visualization capabilities, built using recent web technologies.
2 Materials and methods
The tranSMART platform uses Grails (https://grails.org/, 2017) as a web-framework, which provides a plugin architecture of its own. Therefore, it was a natural choice to use Grails for the back-end of our plugin as well. This has the benefit of having direct access to internal services and APIs to ensure consistent database access across the different Oracle and Postgres versions of the platform, which helps to keep maintenance low.To support the user graphics with non-trivial statistics, for instance clustering information, it was necessary to properly integrate a language for statistical computation, such as R (R Development Core Team, 2011), into Grails. Because Grails uses Groovy, which integrates well with most standard Java libraries, we could use the Java client for Rserve, ‘a TCP/IP server which allows other programs to use facilities of R’ (Urbanek, 2003). In other words, this allows direct read and write access with respect to the RSession via the back-end of our application.As a base for the front-end, we decided to use the framework AngularJS (https://angularjs.org/, 2017) to enforce a MVC (Model-View-Controller) structure for each workflow. Besides the usual advantages with regard to maintenance and testability, the enforcement of a specific workflow structure helps to keep a similar structure to each workflow, even with multiple contributing developers with different levels of experience. This was a high priority goal from the outset, because it would allow the formation of a small community, which could contribute their own ideas and requirements to the plugin. Another reason for building upon an almost completely decoupled framework, rather than integrating SmartR directly into tranSMART, are the regular changes of the platform’s code base and the long list of partially outdated dependencies.The visualizations are implemented as AngularJS directives, which enables arbitrary placement of the plots in HTML. Technically, most JavaScript visualization libraries can be used within such a directive, but we focused on the low-level library D3.js, ‘a JavaScript library for manipulating documents based on data’ (Bostock ). While the coding effort to create even basic visualizations is quite high, D3.js gives a high level of freedom for customization and creativity to the developer. This allowed us to implement features which we found useful that were not provided by other visualization libraries. An example for this is the dynamic heat map that we created.
3 Results
The framework itself equips the tranSMART platform with a new analytical engine that is testable, maintainable, and expandable. We also provide a series of prebuilt, commonly used visual-analytical workflows. In the following, we will focus on one of these workflows, namely the interactive heat map, as an example to illustrate the interactive and dynamic nature of the platform. Videos, screenshots and links to public test servers for all created visual analytics can be found in the Supplementary Material.Since several decades, heat maps are a common tool for analyzing gene expression data, but displaying non-static heat maps with the limited resources of a web-browser is a challenge. The SmartR heat map provides a solution by implementing a lazy-loading approach, where initially only the 100 most significant genes according to user defined ranking criteria are displayed. This reduction of displayed data lets us treat the single fields of the heat map as movable dynamic elements, rather than a static image. Doing so leads to several useful features, such as the possibility to change the clustering on-the-fly, select various color sets for different data types and accessibility (color blindness), or to sort rows and columns of the heat map. Another feature is the possibility to ‘expand’ the heat map by overlaying non-array, one-dimensional data types (see Fig. 1), e.g. phenotypic data like ‘Age’ (numerical) or ‘Tumor Type—T0’ (categorical). In this way one can directly relate clusters or single samples to certain user defined groups within the selected cohort(s), leading to a much better understanding of the data across different data types. All displayed genes can be directly linked to external annotation databases like Gene Cards and the EMBL-EBI database. This function allows the user to link the findings to much broader knowledge bases with a single click. Similar functionality is revealed when we apply a clustering to the heat map and click on one of the resulting dendrogram nodes. This will gather all genes in the respective sub-tree and trigger a KEGG pathway enrichment analysis via external tools like BioCompendium (http://biocompendium.embl.de/, 2017). This allows us to link a cluster directly to a possibly related KEGG pathway. Besides the interactive heat map, we have also provided a few other commonly used analyses like correlation analysis, where the user can select regions on the correlation plot and get updated analysis instantly, box plots, volcano plots, and line graphs for visualizing longitudinal-like data. We could not exhaust all possible analyses during our implementation but we would like to emphasis that SmartR not only provides a list of pre-built analyses but also provides a framework for easy implementation of customized workflows. To truly grasp the dynamic nature of this approach, we highly recommend watching the related videos available in the Supplementary Material.
Fig. 1.
The SmartR Heat Map. Seen is the interactive heat map in tranSMART based on the breast cancer mRNA data of the GEO study GSE4382 (Sorlie )
The SmartR Heat Map. Seen is the interactive heat map in tranSMART based on the breast cancer mRNA data of the GEO study GSE4382 (Sorlie )Click here for additional data file.
Authors: Diego A Angulo; Cyril Schneider; James H Oliver; Nathalie Charpak; Jose T Hernandez Journal: Front Neuroinform Date: 2016-08-23 Impact factor: 4.081
Authors: Therese Sorlie; Robert Tibshirani; Joel Parker; Trevor Hastie; J S Marron; Andrew Nobel; Shibing Deng; Hilde Johnsen; Robert Pesich; Stephanie Geisler; Janos Demeter; Charles M Perou; Per E Lønning; Patrick O Brown; Anne-Lise Børresen-Dale; David Botstein Journal: Proc Natl Acad Sci U S A Date: 2003-06-26 Impact factor: 12.779
Authors: J Christoph; C Knell; A Bosserhoff; E Naschberger; M Stürzl; M Rübner; H Seuss; M Ruh; H-U Prokosch; B Sedlmayr Journal: Appl Clin Inform Date: 2017-12-21 Impact factor: 2.342
Authors: Andrew P Cope; Michael R Barnes; Alexandra Belson; Michael Binks; Sarah Brockbank; Francisco Bonachela-Capdevila; Claudio Carini; Benjamin A Fisher; Carl S Goodyear; Paul Emery; Michael R Ehrenstein; Neil Gozzard; Ray Harris; Sally Hollis; Sarah Keidel; Marc Levesque; Catharina Lindholm; Michael F McDermott; Iain B McInnes; Christopher M Mela; Gerry Parker; Simon Read; Ayako Wakatsuki Pedersen; Frederique Ponchel; Duncan Porter; Ravi Rao; Anthony Rowe; Peter Schulz-Knappe; Matthew A Sleeman; Deborah Symmons; Peter C Taylor; Brian Tom; Wayne Tsuji; Denny Verbeeck; John D Isaacs Journal: Nat Rev Rheumatol Date: 2017-12-07 Impact factor: 32.286