| Literature DB >> 30717646 |
Abstract
BACKGROUND: Although in silico drug discovery is necessary for drug development, two major strategies, a structure-based and ligand-based approach, have not been completely successful. Currently, the third approach, inference of drug candidates from gene expression profiles obtained from the cells treated with the compounds under study requires the use of a training dataset. Here, the purpose was to develop a new approach that does not require any pre-existing knowledge about the drug-protein interactions, but these interactions can be inferred by means of an integrated approach using gene expression profiles obtained from the cells treated with the analysed compounds and the existing data describing gene-gene interactions.Entities:
Keywords: Feature extraction; Gene expression; Tensor decomposition
Mesh:
Substances:
Year: 2019 PMID: 30717646 PMCID: PMC7394334 DOI: 10.1186/s12859-018-2395-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A schematic that illustrates how drug candidates and target genes are identified. Gene expression profiles retrieved from LINCS are processed by TD-based unsupervised FE. Then, ‘inferred genes’ and ‘inferred compounds’ are identified as being associated with dose dependence (this approach is detailed in Fig. 2). Then, ‘inferred genes’ are compared with a single-gene perturbation in Enrichr. Next, ‘target proteins’ are identified (this part is detailed in Fig. 3)
Fig. 2An overview of the analysis using TD-based unsupervised FE. Top left: Gene expression tensor x of dose dependence mode (i), compound mode (j), and gene mode (ℓ). Top right: Using the TD, x, was decomposed to the tensor product of core tensor , dose dependence matrix , compound matrix , and gene matrix . Bottom right: Because the second component of dose dependence mode shows linear dose dependence (Additional file 2), and cumulative contribution of the core matrix up to the sixth components exceeds 95% of the total contribution, core matrix is considered for FE. Bottom left: Outlier compounds (they correspond to ‘inferred compounds’ in Table 1) and outlier genes (they correspond to ‘inferred genes’ in Table 1) are identified within the space restricted with and , respectively
The number of the inferred compounds and inferred genes associated with significant dose-dependent activity
| Cell lines | BT20 | HS578T | MCF10A | MCF7 | MDAMB231 | SKBR3 |
|---|---|---|---|---|---|---|
| Tumour | Breast | |||||
| Inferred genes | 41 | 57 | 42 | 55 | 41 | 46 |
| Inferred compounds | 4 | 3 | 2 | 6 | 5 | 6 |
| All compounds | 110 | 106 | 106 | 108 | 108 | 106 |
| Predicted targets | 418 | 576 | 476 | 480 | 560 | 423 |
| Cell lines | A549 | HCC515 | HA1E | HEPG2 | HT29 | PC3 |
| Tumour | Lung | Kidney | Liver | Colon | Prostate | |
| Inferred genes | 45 | 46 | 48 | 54 | 50 | 63 |
| Inferred compounds | 8 | 5 | 7 | 2 | 2 | 9 |
| All compounds | 265 | 270 | 262 | 269 | 270 | 270 |
| Predicted targets | 428 | 352 | 423 | 396 | 358 | 439 |
| Cell lines | A375 | |||||
| Tumour | Melanoma | |||||
| Inferred genes | 43 | |||||
| Inferred compounds | 6 | |||||
| All compounds | 269 | |||||
| Predicted targets | 421 |
The target genes predicted by means of the comparison with the data showing upregulation of the expression of individual genes (‘predicted targets’) are also shown. The full list of inferred genes and predicted targets is available in Additional file 7. Inferred compounds are presented in Table 2. ‘All compounds’ rows represent the total number of compounds used for the treatment of each cell line
Fig. 3How to infer target proteins. By means of TD-based unsupervised FE, a set of genes with the expression level alterations following the activity of specific compounds can be inferred (‘inferred compounds’ and ‘inferred genes’ in Table 1), but a compound’s target genes (blue rectangle, ‘predicted targets’ in Table 1) cannot. Nonetheless, a list of inferred gene sets can be compared with that of the single-gene perturbations taken from Enrichr’s ‘Single Gene Perturbations category from GEO up’, enabling identification of the compound’s target genes
Compound–gene interactions presented in Table 1 that significantly overlap with interactions described in two datasets
| Compounds | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | (10) | (11) | (12) | (13) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dabrafenib | ○ | ||||||||||||
| ○ | |||||||||||||
| Dinaciclib | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||||||
| ○ | ○ | ○ | ○ | ○ | ○ | ○ | |||||||
| CGP-60474 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |||||
| × | × | × | × | × | × | × | ○ | ||||||
| LDN-193189 | ○ | ○ | ○ | ||||||||||
| ○ | ○ | ○ | |||||||||||
| OTSSP167 | − | − | − | − | − | ||||||||
| ○ | ○ | ○ | ○ | ○ | |||||||||
| WZ-3105 | − | − | − | − | − | − | − | − | |||||
| ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||||||
| AT-7519 | ○ | ○ | ○ | ○ | ○ | ||||||||
| ○ | ○ | ○ | ○ | ○ | |||||||||
| BMS-387032 | ○ | ○ | ○ | ○ | |||||||||
| ○ | ○ | ○ | ○ | ||||||||||
| JNK-9L | ○ | ||||||||||||
| ○ | |||||||||||||
| Alvocidib | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||||||
| − | − | − | − | − | − | − | |||||||
| GSK-2126458 | − | − | |||||||||||
| − | − | ||||||||||||
| NVP-BEZ235 | ○ | ○ | |||||||||||
| × | × | ||||||||||||
| Torin-2 | × | × | |||||||||||
| ○ | ○ | ||||||||||||
| NVP-BGT226 | − | − | − | − | |||||||||
| − | − | − | − | ||||||||||
| QL-XII-47 | − | ||||||||||||
| − | |||||||||||||
| Celastrol | ○ | ||||||||||||
| − | |||||||||||||
| A443654 | ○ | ○ | |||||||||||
| ○ | ○ | ||||||||||||
| NVP-AUY922 | × | ○ | |||||||||||
| − | − | ||||||||||||
| Radicicol | ○ | ||||||||||||
| − |
For each compound in the table, the upper row: the drug2gene.com dataset was used for comparisons [69], the lower row: the DSigDB dataset was used for comparisons [70]. Columns represent cell lines used in the analysis: (1) BT20, (2) HS578T, (3) MCF10A, (4) MCF7, (5) MDAMB231, (6) SKBR3, (7) A549, (8) HCC515, (9) HA1E, (10) HEPG2, (11) HT29, (12) PC3, (13) A375. ○: a significant overlap between the datasets (P<0.05); ×: no significant overlap between the datasets; —: no data; blank: no significant dose–response relation was identified. The confusion matrix and a full list of commonly selected genes are available in Additional file 3
Genes identified as being targeted by compounds shown to have a dose-dependent activity
| Genes | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | (10) | (11) | (12) | (13) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CDK5RAP1 | ○ | ○ | ○ | ○ | |||||||||
| CDK9 | ○ | ○ | |||||||||||
| CDK4 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| CDKN1B | ○ | ○ | ○ | ○ | ○ | ||||||||
| CDK19 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ||
| CDKN1A | ○ | ○ | ○ | ○ | ○ | ||||||||
| CDK8 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |
| BRD4 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |||
| HSP90B1 | ○ |
Labels (1) to (13) represent the cell lines described in Table 2
A significant overlap demonstrated between compound–target interactions presented in Table 1 and drug2gene.com.
| Compounds | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | (10) | (11) | (12) | (13) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dinaciclib | ○ | ○ | ○ | ||||||||||
| CGP-60474 | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |||||
| LDN-193189 | ○ | ||||||||||||
| AT-7519 | ○ | ○ | ○ | ○ | ○ | ||||||||
| BMS-387032 | ○ | ○ | ○ | ○ | |||||||||
| Alvocidib | ○ | ○ | ○ | ○ | ○ | ○ | |||||||
| NVP-BEZ235 | ○ | ||||||||||||
| Celastrol | ○ | ||||||||||||
| A443654 | ○ | ○ | |||||||||||
| NVP-AUY922 | ○ | ○ | |||||||||||
| Radicicol | ○ |
In this case, the ‘PPI Hub Proteins’ category in Enrichr was used. Labels (1) to (13) represent the same cell lines as described in Table 2. The full list of confusion matrices and commonly selected genes is available in Additional file 3
Categories associated with adjusted P-values less than 10−4 among 100 trials
| Enrichr Categories | Adjusted |
|---|---|
| KEA_2013 | 1.56×10−5, 1.42×10−5, 1.38×10−5, 2.12×10−5 |
| KEA_2015 | 1.42×10−5, 1.38×10−5 |
| LINCS_L1000_Chem_Pert_down | 9.46×10−6, 1.37×10−5 |
| LINCS_L1000_Chem_Pert_up | 3.49×10−7, 3.28×10−7 |
| WikiPathways_2013 | 4.80×10−5 |
| WikiPathways_2015 | 3.31×10−5, 1.30×10−5 |
| WikiPathways_2016 | 1.30×10−5 |
| GO_Biological_Process_2013 | 1.68×10−5 |
| GO_Biological_Process_2017 | 5.35×10−7 |
| GO_Biological_Process_2017b | 5.89×10−6 |
| GeneSigDB | 1.16×10−5 |
| BioCarta_2015 | 9.36×10−6 |
| BioCarta_2016 | 9.36×10−6 |
‘Enrichr Libraries Most Popular Genes’ were selected when 50 genes randomly selected from the total of 978 genes analysed in LINCS were uploaded to Enrichr
Fig. 4A boxplot of ranks of TFs inferred by Enrichr. The numbers are median ranks. TD: TD-based unsupervised FE, DeltaNet: Noh and Gunawan, SSEM: sparse simultaneous equation model, Z-score: Z-score–based ranking. The full list is available in Additional file 6
The numbers of target proteins of individual compounds included in four databases
| Compounds | DrugBank | BindingDB | drug2gene.com | DSigDB |
|---|---|---|---|---|
| Dabrafenib | 5 | 4 | 15 | 125 |
| Dinaciclib | — | 5 | 67 | 40 |
| CGP-60474 | — | 8 | 49 | 16 |
| LDN-193189 | — | 17 | 12 | 19 |
| OTSSP167 | — | — | — | 237 |
| WZ-3105 | — | — | — | 36 |
| AT-7519 | 2 | 8 | 388 | 30 |
| BMS-387032 | — | 3 | 392 | 37 |
| JNK-9L | — | 3 | 16 | 64 |
| Alvocidib | 12 | 31 | 495 | — |
| GSK-2126458 | — | 5 | — | — |
| NVP-BEZ235 | — | 7 | 76 | 6 |
| Torin-2 | — | 10 | 15 | 15 |
| NVP-BGT226 | — | — | — | — |
| NVP-BGT226 | — | — | — | — |
| Celastrol | — | 6 | — | 89 |
| A443654 | — | 3 | 177 | 104 |
| NVP-AUY922 | — | 3 | 5 | — |
| Radicicol | 5 | 9 | — | 136 |