Literature DB >> 25505093

diXa: a data infrastructure for chemical safety assessment.

Diana M Hendrickx¹, Hugo J W L Aerts¹, Florian Caiment¹, Dominic Clark¹, Timothy M D Ebbels¹, Chris T Evelo¹, Hans Gmuender¹, Dennie G A J Hebels¹, Ralf Herwig¹, Jürgen Hescheler¹, Danyel G J Jennen¹, Marlon J A Jetten¹, Stathis Kanterakis¹, Hector C Keun¹, Vera Matser¹, John P Overington¹, Ekaterina Pilicheva¹, Ugis Sarkans¹, Marcelo P Segura-Lepe¹, Isaia Sotiriadou¹, Timo Wittenberger¹, Clemens Wittwehr¹, Antonella Zanzi¹, Jos C S Kleinjans¹.

Abstract

MOTIVATION: The field of toxicogenomics (the application of '-omics' technologies to risk assessment of compound toxicities) has expanded in the last decade, partly driven by new legislation, aimed at reducing animal testing in chemical risk assessment but mainly as a result of a paradigm change in toxicology towards the use and integration of genome wide data. Many research groups worldwide have generated large amounts of such toxicogenomics data. However, there is no centralized repository for archiving and making these data and associated tools for their analysis easily available.
RESULTS: The Data Infrastructure for Chemical Safety Assessment (diXa) is a robust and sustainable infrastructure storing toxicogenomics data. A central data warehouse is connected to a portal with links to chemical information and molecular and phenotype data. diXa is publicly available through a user-friendly web interface. New data can be readily deposited into diXa using guidelines and templates available online. Analysis descriptions and tools for interrogating the data are available via the diXa portal.
AVAILABILITY AND IMPLEMENTATION: http://www.dixa-fp7.eu CONTACT: d.hendrickx@maastrichtuniversity.nl; info@dixa-fp7.eu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 25505093 PMCID： PMC4410652 DOI： 10.1093/bioinformatics/btu827

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

During the last decade, technology developments as well as new legislation, ethical considerations and concerns about the reliability and relevance of traditional animal experimentation for toxicity testing, have led to the expansion of the field of toxicogenomics (Hartung, 2009; Sycheva, ). Many projects worldwide have generated large amounts of toxicogenomics data, but so far, there is no centralized repository collecting, curating and maintaining all these data. To make sure data are easily accessible and do not disappear over time, we developed the Data Infrastructure for Chemical Safety Assessment (diXa), a database and web interface providing access to toxicogenomics datasets and analysis. While several toxicogenomics projects made their data already available via public databases (e.g. ArrayExpress, GEO, Expression Atlas), data from other projects are more difficult to access. Moreover, toxicogenomics data are generally deposited in isolation, not as structured sets. There are several reasons for this, among others: non comparable experimental designs, different technology platforms and different data (pre)processing steps. Furthermore, available metadata for public data sources are often insufficient for data reuse. diXa aims to overcome these drawbacks by defining standard workflows for data (pre)processing and standard formats for metadata annotation. These standards are applied to the diXa data through servicing. Moreover, diXa integrates information from toxicology, chemistry and human disease databases alongside the original data, helping interpretation of data analysis results and increasing the relevance for evaluating toxicity. Combining data sets from different sources centrally can provide important information about experimental design and mechanistic interpretations. When all relevant data for a study are available in a public repository, a remaining challenge is to integrate these data in order to get a better understanding of the entire biological system (Gomez-Cabrero, ; Schumacher, ). Data from different platforms and different technologies are very heterogeneous in terms of experimental conditions, species, noise levels, time scales and linearity of response (Steinfath, ). As a consequence, integrating data from different sources requires new data analysis methodologies (Gomez-Cabrero, ). Here we describe diXa, a database providing access to toxicogenomics data from different sources and data analysis tools.

2 Data infrastructure and access

diXa consists of a central warehouse containing data from toxicogenomics projects and other public repositories. The data warehouse is linked to a chemical portal as well as to a human disease database. An overview of diXa is presented in Figure 1.

Fig. 1.

Overview of the diXa data infrastructure

2.1 Data sources

Currently, 34 studies involving 469 compounds are deposited in diXa, originating from various toxicogenomics projects (see Supplementary Table S1). The data have been generated through in vitro and in vivo rat and human transcriptomics, metabolomics and proteomics experiments. Additionally, diXa contains more recently measured Copy Number Variation and epigenetics data. Data in diXa are described in ISA-Tab format (Rocca-Serra, ; see Supplementary data, section ‘Uploading data’). Understanding chemical, toxicity, and bioactivity properties of compounds under investigation is crucial in studying adverse outcomes (Stokstad, 2009). To provide direct access to curated public chemical databases, diXa is connected to the bioactivity database (ChEMBL; www.ebi.ac.uk/chembl/) and the JRC ChemAgora portal (chemagora.jrc.ec.europa.eu/). The ChemAgora portal provides direct access for each compound in the diXa data warehouse to chemical information available on third-party resources (see Supplementary Table S2): the portal, through an on-the-fly search, informs whether a compound has data in each of the external resources, and offers links leading to the exact third-party website pages where information about the compound can be found. Some third-party resources contain regulatory chemical information typically identified using the CAS Registry Number—this complements the use in the diXa data warehouse, of the standard InChIKey as core chemical structure identifier. Through ChemAgora a search is performed also in such third-party repositories, after the mapping of the InChiKey received from the diXa data warehouse into the corresponding CAS Registry Number.

2.2 Web interface

The diXa homepage (see Supplementary Fig. S1) provides ‘search’ and ‘browse’ sections allowing querying and browsing by studies, samples, compounds, analyses or diseases (see Supplementary Figs.S2–S11). The Experimental Factor Ontology (Malone, ) is used to ensure that the contents can be also searched on synonyms and child terms. The ‘links’ section provides relevant information about diXa, among others on submitting data, training and novel analytical tools developed under diXa (Tools Catalogue). To link studies to relevant chemical information, the ChemAgora portal provides options to perform searches for chemicals, based on InChIKeys (www.iupac.org), CAS Registry Numbers (www.cas.org), trivial names (including partial names), and structure.

2.3 Quality control, pre-processing and data analysis

Data deposited in diXa have been subject to quality control (QC), pre-processing and initial analyses (log2 ratios, differentially expressed genes) using pipelines implemented in Genedata Expressionist® (Hoefkens, ). Researchers submitting data into diXa are requested to follow the guidelines mentioned above. Furthermore, there will be control on data completeness and standardization of meta-data through the use of ISA-Tab tools. The algorithms used are described individually for each analysis and are published on the diXa homepage under “Analysis”. An overview of currently available analysis descriptions, together with their location on the diXa website is presented in Supplementary Table S4.

2.4 Applications

The accurate prediction of the toxicity of compounds remains a significant challenge. Availability of a centralized data warehouse allows combining data from different sources, including cross-omics analyses. Within diXa, it has been shown that combining data from in vitro studies on liver carcinogens with gene expression data from human liver cancers improved prediction of carcinogenicity (Caiment, ). This also formed the basis of a promising approach for biomarker discovery for liver toxicity (Hebels, ), where gene sets derived from different text mining and human liver ‘omics’ databases, were compared to determine the most promising gene lists for biomarker discovery. Furthermore, both studies showed that compound classifications based on in vivo data outperform classifications based on gene sets from the literature (‘expert knowledge’).

3 Current developments

diXa is a sustainable data-infrastructure. It will be updated for storing more data types and classes, including next generation sequencing and methylation data. Furthermore, new tools for integrated statistical analysis will be developed and added to diXa. diXa has already been adopted as the informatics framework for the EU FP7 HeCaTos project (http://www.hecatos.eu/). The ChemAgora portal is also a long-term strategic development, to which the European Commission’s Joint Research Centre is fully committed. ChemAgora has already caught the attention of other initiatives, e.g. IPCheM (http://ipchem.jrc.ec.europa.eu/), a European Commission project, which will take advantage of the search service provided by ChemAgora.

4 Conclusion

diXa is a stable and long-term data repository providing free public access to toxicogenomics data. A web interface with several query tools was implemented, allowing users to search and browse diXa. We expect that the extensive use of structured metadata will have large impact on implementation, in particular by allowing flexible application in future use cases.

9 in total

Review 1. Integrated data analysis for genome-wide research.

Authors: Matthias Steinfath; Dirk Repsilber; Matthias Scholz; Dirk Walther; Joachim Selbig
Journal: EXS Date: 2007

2. Assessing compound carcinogenicity in vitro using connectivity mapping.

Authors: Florian Caiment; Maria Tsamou; Danyel Jennen; Jos Kleinjans
Journal: Carcinogenesis Date: 2013-08-12 Impact factor: 4.944

3. Toxicology for the twenty-first century.

Authors: Thomas Hartung
Journal: Nature Date: 2009-07-09 Impact factor: 49.962

4. Putting chemicals on a path to better risk assessment.

Authors: Erik Stokstad
Journal: Science Date: 2009-08-07 Impact factor: 47.728

5. Modeling sample variables with an Experimental Factor Ontology.

Authors: James Malone; Ele Holloway; Tomasz Adamusiak; Misha Kapushesky; Jie Zheng; Nikolay Kolesnikov; Anna Zhukova; Alvis Brazma; Helen Parkinson
Journal: Bioinformatics Date: 2010-03-03 Impact factor: 6.937

Review 6. Evaluation of database-derived pathway development for enabling biomarker discovery for hepatotoxicity.

Authors: Dennie G A Hebels; Marlon J A Jetten; Hugo J W Aerts; Ralf Herwig; Daniël H J Theunissen; Stan Gaj; Joost H van Delft; Jos C S Kleinjans
Journal: Biomark Med Date: 2014 Impact factor: 2.851

7. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level.

Authors: Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone
Journal: Bioinformatics Date: 2010-08-02 Impact factor: 6.937

Review 8. Data integration in the era of omics: current and future challenges.

Authors: David Gomez-Cabrero; Imad Abugessaisa; Dieter Maier; Andrew Teschendorff; Matthias Merkenschlager; Andreas Gisel; Esteban Ballestar; Erik Bongcam-Rudloff; Ana Conesa; Jesper Tegnér
Journal: BMC Syst Biol Date: 2014-03-13

9. A collaborative approach to develop a multi-omics data analytics platform for translational research.

Authors: Axel Schumacher; Tamas Rujan; Jens Hoefkens
Journal: Appl Transl Genom Date: 2014-09-16

9 in total

1. Supporting evidence-based analysis for modified risk tobacco products through a toxicology data-sharing infrastructure.

Authors: Stéphanie Boué; Thomas Exner; Samik Ghosh; Vincenzo Belcastro; Joh Dokler; David Page; Akash Boda; Filipe Bonjour; Barry Hardy; Patrick Vanscheeuwijck; Julia Hoeng; Manuel Peitsch
Journal: F1000Res Date: 2017-01-05

2. The BioStudies database-one stop shop for all data supporting a life sciences study.

Authors: Ugis Sarkans; Mikhail Gostev; Awais Athar; Ehsan Behrangi; Olga Melnichuk; Ahmed Ali; Jasmine Minguet; Juan Camillo Rada; Catherine Snow; Andrew Tikhonov; Alvis Brazma; Johanna McEntyre
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

3. Xenobiotic metabolism in differentiated human bronchial epithelial cells.

Authors: Jan J W A Boei; Sylvia Vermeulen; Binie Klein; Pieter S Hiemstra; Renate M Verhoosel; Danyel G J Jennen; Agustin Lahoz; Hans Gmuender; Harry Vrieling
Journal: Arch Toxicol Date: 2016-10-13 Impact factor: 5.153

4. ToxicoDB: an integrated database to mine and visualize large-scale toxicogenomic datasets.

Authors: Sisira Kadambat Nair; Christopher Eeles; Chantal Ho; Gangesh Beri; Esther Yoo; Denis Tkachuk; Amy Tang; Parwaiz Nijrabi; Petr Smirnov; Heewon Seo; Danyel Jennen; Benjamin Haibe-Kains
Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971

5. Network and Pathway Analysis of Toxicogenomics Data.

Authors: Gal Barel; Ralf Herwig
Journal: Front Genet Date: 2018-10-22 Impact factor: 4.599

Review 6. Trends in the Application of "Omics" to Ecotoxicology and Stress Ecology.

Authors: Joshua Niklas Ebner
Journal: Genes (Basel) Date: 2021-09-23 Impact factor: 4.096

7. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics.

Authors: Alejandra González-Beltrán; Peter Li; Jun Zhao; Maria Susana Avila-Garcia; Marco Roos; Mark Thompson; Eelke van der Horst; Rajaram Kaliyaperumal; Ruibang Luo; Tin-Lap Lee; Tak-Wah Lam; Scott C Edmunds; Susanna-Assunta Sansone; Philippe Rocca-Serra
Journal: PLoS One Date: 2015-07-08 Impact factor: 3.240

8. ToxDB: pathway-level interpretation of drug-treatment data.

Authors: C Hardt; M E Beber; A Rasche; A Kamburov; D G Hebels; J C Kleinjans; R Herwig
Journal: Database (Oxford) Date: 2016-04-13 Impact factor: 3.451

9. From ArrayExpress to BioStudies.

Authors: Ugis Sarkans; Anja Füllgrabe; Ahmed Ali; Awais Athar; Ehsan Behrangi; Nestor Diaz; Silvie Fexova; Nancy George; Haider Iqbal; Sandeep Kurri; Jhoan Munoz; Juan Rada; Irene Papatheodorou; Alvis Brazma
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

9 in total