| Literature DB >> 32276469 |
Angela Serra1,2, Michele Fratello1,2, Luca Cattelani1,2, Irene Liampa3, Georgia Melagraki4, Pekka Kohonen5,6, Penny Nymark5,6, Antonio Federico1,2, Pia Anneli Sofia Kinaret1,2,7, Karolina Jagiello8,9, My Kieu Ha10,11,12, Jang-Sik Choi10,11,12, Natasha Sanabria13, Mary Gulumian13,14, Tomasz Puzyn8,9, Tae-Hyun Yoon10,11,12, Haralambos Sarimveis3, Roland Grafström5,6, Antreas Afantitis4, Dario Greco1,2,7.
Abstract
Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.Entities:
Keywords: QSAR; benchmark dose analysis; data integration; data modelling; deep learning; machine learning; network analysis; read-across; toxicogenomics; transcriptomics
Year: 2020 PMID: 32276469 PMCID: PMC7221955 DOI: 10.3390/nano10040708
Source DB: PubMed Journal: Nanomaterials (Basel) ISSN: 2079-4991 Impact factor: 5.076
Tools available for benchmark dose analysis.
| BMDS | PROAST | BMDExpress 2 | ISOgene | BMDx | |
|---|---|---|---|---|---|
| EPA Models * | X | X | |||
| Probe id | - | - | X | ||
| Gene id | - | - | X | ||
| BMD/BMDL | X | X | X | X | |
| BMDU | X | X | X | ||
| IC50 | X | ||||
| EC50 | X | ||||
| Enrichment Analysis | - | - | X | X | |
| Interactive enriched maps | - | - | X | ||
| Comparisons at different time points | - | - | X | ||
| GUI | X | X | X | X | X |
* Models approved by the US Environmental Protection Agency.
Figure 1Example of ML pipeline for TGx data. Data Acquisition and Preprocessing: Data is collected and analyzed to ensure the quality of the dataset. During the preprocessing, feature selection and/or feature transformations may be applied to improve stability. Training-Hyperparameter tuning-Validation loop: candidate models are fit to the data. This is embedded in an iterative process where for each candidate model the best hyperparameters are optimized through the validation step. Model Selection and Testing: Optimized candidate models are identified and the best ones are tested on a final hold-out dataset to evaluate generalization capabilities.