| Literature DB >> 35386103 |
Angela Serra1,2,3, Laura Aliisa Saarimäki1,2,3, Alisa Pavel1,2,3, Giusy Del Giudice1,2,3, Michele Fratello1,2,3, Luca Cattelani1,2,3, Antonio Federico1,2,3, Omar Laurino4, Veer Singh Marwah1,2, Vittorio Fortino5, Giovanni Scala1,2,6, Pia Anneli Sofia Kinaret1,2,3,7, Dario Greco1,2,3,7.
Abstract
The recent advancements in toxicogenomics have led to the availability of large omics data sets, representing the starting point for studying the exposure mechanism of action and identifying candidate biomarkers for toxicity prediction. The current lack of standard methods in data generation and analysis hampers the full exploitation of toxicogenomics-based evidence in regulatory risk assessment. Moreover, the pipelines for the preprocessing and downstream analyses of toxicogenomic data sets can be quite challenging to implement. During the years, we have developed a number of software packages to address specific questions related to multiple steps of toxicogenomics data analysis and modelling. In this review we present the Nextcast software collection and discuss how its individual tools can be combined into efficient pipelines to answer specific biological questions. Nextcast components are of great support to the scientific community for analysing and interpreting large data sets for the toxicity evaluation of compounds in an unbiased, straightforward, and reliable manner. The Nextcast software suite is available at: ( https://github.com/fhaive/nextcast).Entities:
Keywords: Computational toxicology; Nextcast; Pipeline; Predictive toxicology; Software suite; Toxicogenomics
Year: 2022 PMID: 35386103 PMCID: PMC8956870 DOI: 10.1016/j.csbj.2022.03.014
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Nextcast is a software suite whose core functionalities allow robust modelling and analysis of bioinformatics (dark blue) and cheminformatics (dark yellow) data as well as read-across analyses (orange). Nextcast components (outer layer in gray) implement methods for omics data analytic such as preprocessing (eUTOPIA), functional annotation (FunMappOne), dose–response (BMDx, TinderMIX), and co-expression network generation and analysis (INfORM, VOLTA). Advanced modelling algorithms are also available (dark green) including data set simulator (MOSIM), multi-view (MV) clustering (MVDA), and feature selection strategies (FPRF, GARBO). Nextcast also includes methods for quantitative structure–activity relationship (QSAR) such as MaNGA and hyQSAR.
Nextcast components currently utilised and reviewed in the literature.
| eUTOPIA | Bioinformatics | ||
| Analytics | |||
| R, Shiny | Preprocessing | ||
| INfORM | Bioinformatics | ||
| Analytics | |||
| R, Shiny | Network Analysis | ||
| VOLTA | Bioinformatics | ||
| - | Analytics | ||
| Python | Network Analysis | ||
| BMDx | Bioinformatics | ||
| Analytics | |||
| R, Shiny | - | Dose-Responsive | |
| TinderMIX | Bioinformatics | ||
| Analytics | |||
| R | Dose-Responsive | ||
| FunMappOne | Bioinformatics | ||
| Analytics | |||
| R, Shiny | Functional Annotation | ||
| MOSIM | Bioinformatics | ||
| - | modelling Simulator | ||
| R | |||
| MVDA | Bioinformatics | ||
| modelling | |||
| R | Multi-view clustering | ||
| FPRF | Bioinformatics | ||
| modelling | |||
| R | Feature Selection | ||
| GARBO | Bioinformatics | ||
| modelling | |||
| Python | Feature Selection | ||
| INSIdE NANO | |||
| Read-Across | |||
| MaNGA | |||
| Python | – | QSAR | |
| hyQSAR | – | QSAR |
Examples of interoperability of the Nextcast data formats with external tools.
| eUTOPIA | gene expression matrix | MORPHEUS | ||
| eUTOPIA | gene expression matrix | t-SNE | Dimensionality reduction techniques available in R or Python | |
| eUTOPIA | differentially expressed genes | WebGestalt | Pathway enrichment analysis | |
| eUTOPIA | differentially expressed genes | STRING | ||
| FunMappOne | enriched GO terms | REVIGO | Tool for summarization and to study of GO terms interactions (available at | |
| INfORM | Co-expression networks | Cytoscape | Tools for network visualisation | |
| INfORM | Prioritised genes | WebGestalt | Pathway enrichment analysis | |
| INfORM | Prioritised genes | STRING |
Fig. 2Nextcast pipeline for the characterisation of the MOA of a compound. Raw omics data is preprocessed with eUTOPIA. The output of the tool includes a matrix with normalised (and batch corrected) expression values and a list of differentially expressed genes. This data can be fed to INfORM to identify a set of responsive gene modules. VOLTA can be further used to analyse networks built with INfORM. Alternatively, differentially expressed genes can be directly provided as the input for the FunMappOne tool to perform enrichment analysis and identify the underlying biological processes. The result is a list of regulated genes and corresponding enriched pathways or regulated genes in co-expressed modules and their corresponding pathways. The red box represents the input for the pipeline while the green box describes the outcome of the pipeline. The dark blue boxes correspond to the individual Nextcast components of the “Analytics” category, and the light blue boxes indicate the intermediate outputs/inputs.
Fig. 3Nextcast pipeline for the estimation of relevant doses of chemical exposure. Raw omics data can be preprocessed with eUTOPIA to obtain a matrix with normalised (and batch corrected) expression values and a list of differentially expressed genes. These data can be given in input to BMDx for a benchmark dose analysis or to TinderMIX to identify dynamic-dose responsive genes. Eventually, enrichment analysis can be conducted for the set of dose-dependent genes to identify the affected biological processes. The red box indicates the input for the pipeline, while the green boxes mark the output. The dark blue boxes are the individual Nextcast components of the ”Analytics” category, and the light blue box shows the intermediate output/input.
Fig. 4Nextcast pipeline for biomarker identification from toxicogenomics data. Raw omics data can be preprocessed with eUTOPIA. Preprocessed transcriptomics data can be provided as input to INfORM, VOLTA (after INfORM), BMDx, or TinderMIX to identify a set of biomarkers in a univariate way. The whole list of genes or only the prioritised set can be provided to the feature selection algorithm (GARBO or FPRF) to identify the smallest predictive set of biomarkers. The red boxes represent the input for the pipeline. The sample category is the variable of interest for the biomarker discovery phase. The lighter green box marks the output of the pipeline, dark blue and dark green boxes indicate the individual Nextcast components belonging to the ”Analytics” and ”modelling” categories, respectively. The light blue boxes represent the intermediate outputs/inputs.
Fig. 5Nextcast pipeline for biomarkers identification and QSAR models development from toxicogenomics and cheminformatics data. Raw omics data can be preprocessed with eUTOPIA. Then, the preprocessed transcriptomics data, chemical representation data, and the outcome variable can be provided to hyQSAR or MaNGA to identify the optimal predictive model. The red boxes indicate the input for the pipeline while the green box is the output. The dark blue and the yellow box are the individual Nextcast components, and the light blue box represents the intermediate output/input.
Fig. 6Nextcast pipeline with multi-view clustering for chemical read-across. Raw omics data can be preprocessed with eUTOPIA. The preprocessed multi-view data for the same samples and/or chemical structure data (e.g. molecular descriptors) can be fed to MVDA to obtain the multi-view cluster assignment of each sample and the influence of each view on the clustering. Red boxes indicate the input while the lighter green boxes mark the output of the pipeline. The dark blue and dark green boxes are the individual Nextcast components, and the light blue boxes correspond to the intermediate output/input.
Fig. 7Example application of the characterisation of the MWCNT MOA employing INfORM. (A) eUTOPIA was used to preprocess input raw data and to perform differential analysis. The normalised expression matrix, as well as the lists of differentially expressed genes, were exported. (B) A custom script was used to select the most frequently deregulated 1,000 genes across the exposures and to produce inputs for INfORM. (C) INfORM was used to infer the gene co-expression networks and to rank the genes according to their topological properties. (D) The first 200 positions of each list were selected and combined in a format compatible with the FunMappOne input. (E) FunMappOne was used to perform enrichment analysis of the KEGG human pathways. (F) The output was interpreted for MOA characterisation of MWCNT exposures at different doses and time points.
Fig. 8Example application of the characterisation of the dose–response to MWCNT with BMDx. The preprocessed data were downloaded from eUTOPIA in a format compatible with the BMDx input. After completing the benchmark dose analysis, the results can be explored via various visual presentations. For example, (A) the distributions of the computed BMD values were compared between the time points. The BMD values computed at 24 h of exposure exhibit a higher peak at low doses compared to the later time points. (B) the Venn diagram indicates a larger number of dose-dependent genes at 24 h than at 48 and 72 h. (C) The best model for TNF with the computed BMD (blue), BMDL (red), BMDU (green) and IC/EC50 (green) values. (D) Selected pathways enriched in the functional enrichment indicate that the mean BMD values for distinct biological functions increase at later time points. The colour of the cell represents the mean BMD values of the genes enriching the pathway. (E) Line graph representing the genes enriching TNF signalling pathway at 48 h with their BMD, BMDL and BMDU values plotted.