| Literature DB >> 32054490 |
Marco Chierici1, Margherita Francescatto2, Nicole Bussola2,3, Giuseppe Jurman2, Cesare Furlanello2.
Abstract
BACKGROUND: Drug-induced liver injury (DILI) is a major concern in drug development, as hepatotoxicity may not be apparent at early stages but can lead to life threatening consequences. The ability to predict DILI from in vitro data would be a crucial advantage. In 2018, the Critical Assessment Massive Data Analysis group proposed the CMap Drug Safety challenge focusing on DILI prediction. METHODS ANDEntities:
Keywords: CMap; Classification; DILI; Deep learning; Microarray
Year: 2020 PMID: 32054490 PMCID: PMC7020573 DOI: 10.1186/s13062-020-0259-4
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Fig. 1Experimental design scheme and batch correction. The figure represents schematically the data processing approach adopted in the article
Fig. 2Deep learning analysis strategies and architectures. a Strategies used for the analysis. “single” indicates that the logFC values or the expression of each compound were considered as input for the models; “end-to-end” indicates that the expression values of each compound are considered along with its corresponding vehicles. b Schematic representation of the DL architectures used for the analysis
Fig. 3Classification results. a Overall DL results. b Overall SL results. c Random TR/TS splits results. d Overall results obtained testing various strategies to balance classes. MCC CV: MCC in CV; MCC val: MCC in validation
Number of samples belonging to DILI-0 and DILI-1 classes for TR and TS sets
| DILI-1 | DILI-0 | |
|---|---|---|
| TR | 120 | 60 |
| TS | 67 | 19 |
Results obtained for RF and NBM2 classifiers using different class balancing strategies
| balancing strategy | classifier | MCC | MCC |
|---|---|---|---|
| adasyn | RF | 0.63 (0.60, 0.66) | |
| oversampled_all | RF | -0.13 | |
| oversampled_minority | RF | -0.13 | |
| smote | RF | 0.63 (0.60, 0.66) | 0.02 |
| smote_svm | RF | 0.61 (0.59, 0.65) | -0.09 |
| smote_borderline1 | RF | 0.61 (0.58, 0.64) | -0.04 |
| smote_borderline2 | RF | 0.59 (0.55, 0.63) | -0.07 |
| adasyn | NBM2 | 0.07 (0.03, 0.10) | 0.02 |
| oversampled_all | NBM2 | -0.02 | |
| oversampled_minority | NBM2 | 0.23 (0.19, 0.28) | 0.07 |
| smote | NBM2 | 0.20 (0.15, 0.25) | -0.2 |
| smote_svm | NBM2 | ||
| smote_borderline1 | NBM2 | 0.23 (0.19, 0.29) | -0.11 |
| smote_borderline2 | NBM2 | 0.11 (0.06, 0.16) | -0.01 |
Boldface indicates the best performance of RF or NBM2 models either in cross validation or in validation
CEL files available in the original CAMDA2018 Drug Safety challenge dataset
| Affymetrix chip | MCF7 | PC3 |
|---|---|---|
| HT_HG-U133A | 588 | 475 |
| HG-U133A | 7 | 25 |
Number of samples available after removing CEL files profiled with the HG-U133A chip
| category | MCF7 | PC3 |
|---|---|---|
| compound train | 180 | 180 |
| compound test | 86 | 86 |
| vehicle | 316 | 209 |
Sample numbers are reported according to three categories: samples treated with a compound assigned to the TR test, samples treated with a compound assigned to the TS set and samples treated with DSMO vehicle only