| Literature DB >> 32743586 |
Thai-Hoang Pham1, Yue Qiu2, Jucheng Zeng3, Lei Xie2,4,5,6, Ping Zhang1,3.
Abstract
Target-based high-throughput compound screening dominates conventional one-drug-one-gene drug discovery process. However, the readout from the chemical modulation of a single protein is poorly correlated with phenotypic response of organism, leading to high failure rate in drug development. Chemical-induced gene expression profile provides an attractive solution to phenotype-based screening. However, the use of such data is currently limited by their sparseness, unreliability, and relatively low throughput. Several methods have been proposed to impute missing values for gene expression datasets. However, few existing methods can perform de novo chemical compound screening. In this study, we propose a mechanism-driven neural network-based method named DeepCE (Deep Chemical Expression) which utilizes graph convolutional neural network to learn chemical representation and multi-head attention mechanism to model chemical substructure-gene and gene-gene feature associations. In addition, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves the superior performances not only in de novo chemical setting but also in traditional imputation setting compared to state-of-the-art baselines for the prediction of chemical-induced gene expression. We further verify the effectiveness of gene expression profiles generated from DeepCE by comparing them with gene expression profiles in L1000 dataset for downstream classification tasks including drug-target and disease predictions. To demonstrate the value of DeepCE, we apply it to patient-specific drug repurposing of COVID-19 for the first time, and generate novel lead compounds consistent with clinical evidences. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data as well as screening novel chemicals for the modulation of systemic response to disease.Entities:
Year: 2020 PMID: 32743586 PMCID: PMC7386506 DOI: 10.1101/2020.07.19.211235
Source DB: PubMed Journal: bioRxiv
Figure 1.General framework of training computational models for L1000 gene expression profile prediction and using them for downstream application (i.e. drug repurposing). The objective for the learning process is minimizing the loss between predicted profiles and grouth-truth profiles in L1000 dataset. After training, models is used for generating profiles for new chemicals in external molecular database (e.g. DrugBank, ChEMBL). These profiles are then used for in silico screening to find potential drugs for disease treatment
Figure 2.Overall architecture of DeepCE (The details of 2 layer which has similar architecture to the 1 layer in the interaction network are omitted to save space)
Performances (Pearson correlation) on testing set of vanilla neural network, kNN, and linear models with different chemical features trained with different training sets
| Training sets | Models | PubChem | ECFP | Drug-target | LTIP | Random |
|---|---|---|---|---|---|---|
| Original | Vanilla neural network | 0.1101 | 0.0705 | 0.1076 | 0.0770 | - |
| kNN | 0.0844 | 0.1469 | 0.1811 | 0.1231 | - | |
| High-quality | Vanilla neural network | 0.3929 | 0.4105 | 0.4270 | 0.4259 | 0.3129 |
| kNN | 0.3903 | 0.3991 | 0.3907 | 0.3922 | - | |
| Linear regression | 0.1762 | 0.1770 | 0.1763 | 0.1764 | - | |
| Lasso | 0.1761 | 0.1770 | 0.1764 | 0.1764 | - | |
| Ridge regression | 0.1762 | 0.1770 | 0.1764 | 0.1764 | - | |
| Augmented | Vanilla neural network | 0.4204 | 0.4177 | 0.4302 | 0.4299 | - |
| kNN | 0.3973 | 0.4121 | 0.4023 | 0.4016 | - |
Performances (Pearson correlation) on testing set of TT-WOPT and DeepCE with its simpler variants trained with different training sets
| Training sets | Models | Performances |
|---|---|---|
| TT-WOPT | 0.0133 | |
| DeepCE w/o interaction component | 0.4418 | |
| High-quality | DeepCE w/o chemical substructure-gene attention | 0.4620 |
| DeepCE w/o gene-gene attention | 0.4477 | |
| DeepCE | 0.4907 | |
| Augmented | DeepCE | 0.5014 |
Figure 3.Performances of DeepCE, vanilla neural network, and kNN with different distances among chemicals in the training and testing sets
Figure 4.Pearson correlation scores of vanilla neural network and kNN trained on training sets generated by filtering unreliable experiments with different APC thresholds
Figure 5.Improvement of predicted profiles over original profiles in AUC
The chemical structures, status, and known uses of potential drugs for COVID-19 treatment (i.e. drugs appeared in top 100 drugs for all 8 cell lines when comparing their cell-specific predicted gene expression profiles with the patient profile by Spearman’s correlation.
| Drug | Structure | Status | Known Uses |
|---|---|---|---|
| Elbasvir | Approved | Hepatitis C, NS5A inhibitor | |
| Pibrentasvir | Approved | Hepatitis C, NS5A inhibitor | |
| Velpatasvir | Approved | Hepatitis C, NS5A inhibitor | |
| Ruzasvir | Investigational | Hepatitis C, NS5A inhibitor | |
| Samatasvir | Investigational | Hepatitis C, NS5A inhibitor | |
| Odalasvir | Investigational | Hepatitis C, NS5A inhibitor | |
| Coblopasvir | Investigational | Hepatitis C, NS5A inhibitor | |
| Baloxavir Marboxil | Approved | Influenza A and B | |
| Metocurine | Approved | Muscle relaxant | |
| Dactinomycin | Approved | Cancer | |
| Laniquidar | Investigational | Cancer, P-glycoprotein inhibitor | |
| Tadalafil | Approved | Erectile Dysfunction, PDE5 inhibitors | |
| GE-2270A | Experimental | Antibiotic | |
| SD146 | Experimental | Binds HIV-1 protease | |
| AMG-487 | Experimental | CXCR3 antagonist |