| Literature DB >> 27294023 |
Axel Schumacher1, Tamas Rujan1, Jens Hoefkens2.
Abstract
The integration and analysis of large datasets in translational research has become an increasingly challenging problem. We propose a collaborative approach to integrate established data management platforms with existing analytical systems to fill the hole in the value chain between data collection and data exploitation. Our proposal in particular ensures data security and provides support for widely distributed teams of researchers. As a successful example for such an approach, we describe the implementation of a unified single platform that combines capabilities of the knowledge management platform tranSMART and the data analysis system Genedata Analyst™. The combined end-to-end platform helps to quickly find, enter, integrate, analyze, extract, and share patient- and drug-related data in the context of translational R&D projects.Entities:
Keywords: Data analytics; Data sharing; Integration; Omics; Scalability; Translational research
Year: 2014 PMID: 27294023 PMCID: PMC4888831 DOI: 10.1016/j.atg.2014.09.010
Source DB: PubMed Journal: Appl Transl Genom ISSN: 2212-0661
Fig. 1Example of a data-sharing and big-data analytics value chain in translational medicine. The collection of large volumes of structured phenotypic data and its integration with the abundant Omic data adds new dimensions and challenges for the management, analysis, and visualization of this information. Clinical electronic data capture systems (EDCs, such as OpenClinica or REDCap) may feed patient data into tranSMART for data integration. An in-depth analysis of the data can then be performed in Genedata Analyst, an established system for the integrated analysis of high-dimensional omics data in the context of low-dimensional (clinical) sample information, often used in translational research projects. It enables scientists to efficiently analyze experiments by applying rigorous statistical algorithms combined with intuitive, interactive data visualization tools. Leveraging a built-in scripting engine, Analyst standardizes and automates complex and time-consuming data analysis processes. Via a flexible application program interface (API), the analyst platform also provides the possibility to use popular open source tools such as the R/Bioconductor-environment for downstream analyses (Gentleman et al., 2004). Overall, the platform has the ability to reduce the time to import, export, integrate, and analyze complex data—from days to minutes. (ETL = Extract, Transform, and Load process for loading raw source data into tranSMART.) The APIs have been integrated into tranSMART and are freely available to the research community. The statistical analysis software itself is a commercial software available for licensing.
Fig. 2Example of a bidirectional data exploration workflow using the tranSMART/analyst platform. The platform is designed to improve understanding of large amounts of raw (unstructured) data by combining and comparing it with structured data sets (where inclusion in a relational database is seamless and it is readily searchable by simple, straightforward search engine algorithms). Left: The advanced workflow tab in the tranSMART dataset explorer contains a button (red rectangle) that makes clinical data available for statistical analyses in Analyst through an easy drag and drop concept. 1: The patient cohorts to be examined are selected and further inclusion criteria can be defined. 2: Multi-omics, high dimensional data (e.g. microarray or NGS data) are selected and transferred by a single mouse-click into the Analyst GUI. 3: For curation purposes, imported data can be run through preprocessing and quality control steps. 4: Filtered data are selected for in-depth analyses such as PCA, correlation, network analyses, logistic regression, ANOVA, time-series analysis, clustering, annotation analyses, pathway mapping, and partial least square analysis (PLS). Third-party tools such as R-scripts can also be integrated. 5: The results of the analysis can be saved directly back into tranSMART via a menu-button. At any point, it is possible to simply pull more clinical/annotation data from tranSMART (and other sources such as GEO or ArrayExpress) into Analyst and vice versa. 6: Analyzed and curated data is available for further storage, sharing, and analyses in tranSMART.