| Literature DB >> 32391323 |
Benoît Baillif1, Joerg Wichard2, Oscar Méndez-Lucio1,3, David Rouquié1.
Abstract
Pharmaceutical or phytopharmaceutical molecules rely on the interaction with one or more specific molecular targets to induce their anticipated biological responses. Nonetheless, these compounds are also prone to interact with many other non-intended biological targets, also known as off-targets. Unfortunately, off-target identification is difficult and expensive. Consequently, QSAR models predicting the activity on a target have gained importance in drug discovery or in the de-risking of chemicals. However, a restricted number of targets are well characterized and hold enough data to build such in silico models. A good alternative to individual target evaluations is to use integrative evaluations such as transcriptomics obtained from compound-induced gene expression measurements derived from cell cultures. The advantage of these particular experiments is to capture the consequences of the interaction of compounds on many possible molecular targets and biological pathways, without having any constraints concerning the chemical space. In this work, we assessed the value of a large public dataset of compound-induced transcriptomic data, to predict compound activity on a selection of 69 molecular targets. We compared such descriptors with other QSAR descriptors, namely the Morgan fingerprints (similar to extended-connectivity fingerprints). Depending on the target, active compounds could show similar signatures in one or multiple cell lines, whether these active compounds shared similar or different chemical structures. Random forest models using gene expression signatures were able to perform similarly or better than counterpart models built with Morgan fingerprints for 25% of the target prediction tasks. These performances occurred mostly using signatures produced in cell lines showing similar signatures for active compounds toward the considered target. We show that compound-induced transcriptomic data could represent a great opportunity for target prediction, allowing to overcome the chemical space limitation of QSAR models.Entities:
Keywords: QSAR; cellular context; compound-induced transcriptomic data; machine learning; target prediction
Year: 2020 PMID: 32391323 PMCID: PMC7191531 DOI: 10.3389/fchem.2020.00296
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
Figure 1Data analysis pipeline performed in current work. Starting from the CMAP L1000 dataset, signatures produced at 10 μM and 24 h from 8 cell lines were extracted and used in t-SNE and distance plots. One dataset was built per cell line (GES and corresponding compound structure), and each of these datasets were restricted to compounds having known annotations (active or inactive) for the evaluated target. For each target—cell line dataset, a first model was built using the gene expression signatures (GES model). Alongside, a second counterpart model was built using the Morgan fingerprints of compounds whose signatures were used in the first model (Morgan FP model).
The 8 core cell lines used in this work, with their corresponding number of GESs for compounds with known structure tested at 10 μM/24 h.
| A375 | Skin | Malignant melanoma | 3,525 |
| A549 | Lung | Non small cell lung cancer| carcinoma | 5,267 |
| HA1E | Kidney | Normal kidney | 3,646 |
| HCC515 | Lung | Carcinoma | 1,932 |
| HT29 | Large intestine | Colorectal adenocarcinoma | 3,192 |
| MCF7 | Breast | Adenocarcinoma | 7,546 |
| PC3 | Prostate | Adenocarcinoma | 8,071 |
| VCAP | Prostate | Carcinoma | 6,365 |
Molecular targets used in this work, with number of active and inactive compounds in total, and in each cell line GES dataset.
| ABCB1 | 801 | 96 | ATP binding cassette subfamily B member 1 [HGNC:40] | 331 | 42 | 545 | 60 | 424 | 50 | 211 | 32 | 317 | 32 | 770 | 95 | 772 | 95 | 608 | 68 |
| ABHD5 | 2,458 | 57 | Abhydrolase domain containing 5 [HGNC:21396] | 908 | 22 | 1,533 | 47 | - | - | - | - | - | - | 2,030 | 57 | 2,045 | 57 | 1,730 | 52 |
| ALOX15 | 1,136 | 101 | Arachidonate 15-lipoxygenase [HGNC:433] | 508 | 27 | 764 | 92 | 808 | 52 | 496 | 33 | 498 | 27 | 1,062 | 97 | 1,066 | 96 | 804 | 91 |
| AR | 1,085 | 103 | Androgen receptor [HGNC:644] | 580 | 36 | 639 | 64 | 757 | 62 | 374 | 36 | 572 | 37 | 1,038 | 94 | 1,036 | 93 | 682 | 69 |
| ATAD5 | 2,213 | 97 | Atpase family, AAA domain containing 5 [HGNC:25,752] | 1,007 | 43 | 1,466 | 70 | 1288 | 60 | 638 | 41 | 922 | 42 | 2,087 | 90 | 2,090 | 88 | 1,629 | 72 |
| ATXN2 | 1,897 | 143 | Ataxin 2 [HGNC:10555] | 695 | 36 | 1,139 | 104 | 950 | 50 | 501 | 41 | 652 | 41 | 1556 | 123 | 1,558 | 129 | 1,243 | 102 |
| BAZ2B | 1,252 | 143 | Bromodomain adjacent to zinc finger domain 2B [HGNC:963] | 516 | 63 | 873 | 101 | 653 | 76 | 319 | 37 | 470 | 56 | 1,199 | 135 | 1,197 | 136 | 974 | 112 |
| BRCA1 | 3,008 | 160 | BRCA1, dna repair associated [HGNC:1100] | 1,117 | 67 | 1945 | 116 | 1,519 | 63 | 800 | 33 | 1,014 | 50 | 2,537 | 148 | 2,549 | 150 | 2,134 | 131 |
| CBX1 | 1,999 | 80 | Chromobox 1 [HGNC:1551] | 899 | 41 | 1,412 | 64 | 1,120 | 61 | 532 | 38 | 809 | 40 | 1,899 | 75 | 1,904 | 76 | 1,576 | 67 |
| CHRM1 | 2,433 | 86 | Cholinergic receptor muscarinic 1 [HGNC:1950] | 906 | 49 | 1,544 | 62 | 1,055 | 59 | 460 | 39 | 791 | 48 | 2,036 | 84 | 2051 | 83 | 1,739 | 65 |
| CHRM4 | 2,476 | 70 | Cholinergic receptor muscarinic 4 [HGNC:1953] | 908 | 45 | 1,552 | 54 | 1,057 | 55 | 460 | 40 | 793 | 45 | 2,049 | 68 | 2,064 | 67 | 1,751 | 54 |
| CHRM5 | 2,478 | 62 | Cholinergic receptor muscarinic 5 [HGNC:1954] | 908 | 41 | 1,553 | 47 | 1,057 | 49 | 461 | 36 | 793 | 39 | 2,050 | 62 | 2,065 | 61 | 1,751 | 50 |
| CYP1A2 | 307 | 526 | Cytochrome P450 family 1 subfamily A member 2 [HGNC:2596] | 145 | 265 | 183 | 399 | 219 | 428 | 136 | 277 | 144 | 261 | 299 | 505 | 294 | 505 | 183 | 410 |
| CYP2C19 | 717 | 276 | Cytochrome P450 family 2 subfamily C member 19 [HGNC:2621] | 329 | 151 | 514 | 213 | 486 | 228 | 289 | 138 | 324 | 148 | 688 | 271 | 684 | 272 | 524 | 225 |
| CYP2C9 | 708 | 270 | Cytochrome P450 family 2 subfamily C member 9 [HGNC:2623] | 310 | 157 | 510 | 197 | 476 | 222 | 285 | 126 | 305 | 154 | 679 | 264 | 674 | 263 | 517 | 207 |
| CYP3A4 | 1,153 | 164 | Cytochrome P450 family 3 subfamily A member 4 [HGNC:2637] | 472 | 113 | 780 | 104 | 847 | 133 | 561 | 64 | 467 | 110 | 1,070 | 160 | 1069 | 161 | 802 | 118 |
| DRD1 | 1,843 | 99 | Dopamine receptor D1 [HGNC:3020] | 807 | 54 | 1,295 | 71 | 1028 | 78 | 526 | 55 | 725 | 54 | 1,762 | 91 | 1,762 | 91 | 1,450 | 71 |
| DRD2 | 2,262 | 95 | Dopamine receptor D2 [HGNC:3023] | 769 | 58 | 1,371 | 73 | 956 | 84 | 474 | 55 | 683 | 57 | 1,858 | 93 | 1873 | 93 | 1,541 | 74 |
| DRD3 | 2,446 | 142 | Dopamine receptor D3 [HGNC:3024] | 877 | 76 | 1,432 | 110 | 1,129 | 114 | 551 | 78 | 823 | 75 | 2,004 | 139 | 2,017 | 139 | 1,569 | 111 |
| EPAS1 | 2,443 | 70 | Endothelial PAS domain protein 1 [HGNC:3374] | – | – | 1,524 | 52 | – | – | – | – | – | – | 2,021 | 64 | 2,033 | 67 | 1,723 | 57 |
| FEN1 | 2,100 | 53 | Flap structure-specific endonuclease 1 [HGNC:3650] | 961 | 23 | 1,496 | 29 | 1,213 | 28 | – | – | 866 | 21 | 1,990 | 46 | 1,999 | 46 | 1,669 | 32 |
| GFER | 1,589 | 89 | Growth factor, augmenter of liver regeneration [HGNC:4236] | 679 | 37 | 1,153 | 59 | 813 | 46 | 363 | 21 | 600 | 29 | 1,519 | 81 | 1,519 | 81 | 1,294 | 70 |
| GLS | 2,989 | 66 | Glutaminase [HGNC:4331] | 1,240 | 22 | 1,878 | 46 | 1,515 | 31 | – | – | – | – | 2,560 | 58 | 2,574 | 59 | 2,072 | 53 |
| GMNN | 2,079 | 161 | Geminin, DNA replication inhibitor [HGNC:17493] | 969 | 67 | 1,392 | 121 | 1,224 | 72 | 569 | 43 | 884 | 63 | 1,972 | 146 | 1,974 | 153 | 1,552 | 128 |
| HPGD | 1,464 | 92 | 15-Hydroxyprostaglandin dehydrogenase [HGNC:5154] | 575 | 38 | 1,000 | 74 | 962 | 62 | 618 | 39 | 567 | 37 | 1363 | 89 | 1,363 | 86 | 1,060 | 71 |
| HSD17B10 | 1,211 | 107 | Hydroxysteroid 17-beta dehydrogenase 10 [HGNC:4800] | 516 | 48 | 827 | 84 | 858 | 81 | 548 | 47 | 506 | 48 | 1,134 | 99 | 1,135 | 95 | 866 | 81 |
| HSP90AA1 | 666 | 56 | Heat shock protein 90 alpha family class A member 1 [HGNC:5253] | 295 | 25 | 453 | 39 | 419 | 38 | – | – | 288 | 25 | 640 | 50 | 637 | 50 | 502 | 40 |
| HSPB1 | 876 | 76 | Heat shock protein family B (small) member 1 [HGNC:5246] | 461 | 40 | 522 | 45 | 600 | 53 | 304 | 26 | 454 | 41 | 837 | 72 | 837 | 71 | 558 | 47 |
| HTR1A | 412 | 60 | 5-Hydroxytryptamine receptor 1A [HGNC:5286] | 186 | 34 | 279 | 49 | 232 | 55 | 122 | 37 | 180 | 34 | 401 | 58 | 400 | 58 | 315 | 50 |
| IL1B | 1,773 | 206 | Interleukin 1 beta [HGNC:5992] | 589 | 54 | 1,005 | 165 | 768 | 78 | 382 | 54 | 541 | 54 | 1,385 | 190 | 1,391 | 196 | 1,122 | 163 |
| JAK2 | 895 | 80 | Janus kinase 2 [HGNC:6192] | 378 | 39 | 663 | 58 | 478 | 43 | 248 | 29 | 364 | 40 | 867 | 71 | 867 | 74 | 723 | 55 |
| JUN | 842 | 97 | Jun proto-oncogene, AP-1 transcription factor subunit [HGNC:6204] | 442 | 49 | 491 | 67 | 570 | 72 | 279 | 41 | 435 | 49 | 801 | 91 | 799 | 91 | 523 | 70 |
| KCNH2 | 363 | 190 | Potassium voltage-gated channel subfamily H member 2 [HGNC:6251] | 174 | 119 | 212 | 136 | 250 | 161 | 128 | 104 | 173 | 119 | 331 | 183 | 329 | 184 | 228 | 139 |
| KDM4A | 1,607 | 192 | Lysine demethylase 4A [HGNC:22978] | 693 | 76 | 1,130 | 125 | 834 | 87 | 379 | 42 | 603 | 69 | 1,529 | 173 | 1,536 | 175 | 1,286 | 140 |
| KDM4E | 1,389 | 124 | Lysine demethylase 4E [HGNC:37098] | 543 | 43 | 999 | 88 | 880 | 70 | 547 | 40 | 530 | 42 | 1,320 | 109 | 1,321 | 110 | 1,057 | 95 |
| MITF | 3,626 | 132 | Melanogenesis associated transcription factor [HGNC:7105] | 1,170 | 42 | 2,238 | 91 | 1,562 | 51 | 858 | 42 | 1,083 | 37 | 2,832 | 116 | 2,871 | 120 | 2,460 | 93 |
| MLLT3 | 14,566 | 101 | MLLT3, super elongation complex subunit [HGNC:7136] | – | – | 2,244 | 26 | – | – | – | – | – | – | 3,002 | 33 | 3,461 | 50 | 3,095 | 46 |
| MPHOSPH8 | 506 | 52 | M-Phase phosphoprotein 8 [HGNC:29810] | 278 | 21 | 365 | 39 | 403 | 41 | 253 | 29 | 278 | 21 | 490 | 48 | 485 | 50 | 382 | 43 |
| MYC | 2,069 | 121 | MYC proto-oncogene, bHLH transcription factor [HGNC:7553] | – | – | 1,067 | 113 | – | – | – | – | – | – | 1,230 | 114 | 1,249 | 117 | 1,151 | 115 |
| NFE2L2 | 2,850 | 226 | Nuclear factor, erythroid 2 like 2 [HGNC:7782] | 1,142 | 94 | 1,816 | 148 | 1,355 | 153 | 620 | 83 | 1,013 | 95 | 2,425 | 204 | 2,439 | 204 | 2,023 | 152 |
| NFKB1 | 2,875 | 107 | Nuclear factor kappa B subunit 1 [HGNC:7794] | 730 | 23 | 1,608 | 91 | 1,237 | 37 | 814 | 29 | 716 | 22 | 1,978 | 100 | 2,000 | 101 | 1,742 | 94 |
| NOD1 | 1,056 | 51 | Nucleotide binding oligomerization domain containing 1 [HGNC:16390] | – | – | 754 | 43 | – | – | – | – | 409 | 21 | 1,010 | 47 | 1,010 | 49 | 844 | 40 |
| NOD2 | 2,578 | 59 | Nucleotide binding oligomerization domain containing 2 [HGNC:5331] | 952 | 23 | 1,629 | 53 | 1,124 | 21 | – | – | 836 | 23 | 2,152 | 57 | 2,165 | 59 | 1,837 | 48 |
| NPSR1 | 1,007 | 55 | Neuropeptide S receptor 1 [HGNC:23631] | – | – | 712 | 44 | 554 | 21 | – | – | – | – | 959 | 52 | 956 | 54 | 777 | 50 |
| NR3C1 | 925 | 54 | Nuclear receptor subfamily 3 group C member 1 [HGNC:7978] | – | – | 636 | 38 | 692 | 34 | 451 | 25 | – | – | 896 | 54 | 890 | 54 | 659 | 47 |
| NR5A1 | 419 | 69 | Nuclear receptor subfamily 5 group A member 1 [HGNC:7983] | 190 | 22 | 285 | 50 | 239 | 29 | – | – | – | – | 408 | 60 | 407 | 64 | 322 | 50 |
| OPRK1 | 1,122 | 51 | Opioid receptor kappa 1 [HGNC:8154] | 455 | 29 | 805 | 41 | 564 | 34 | – | – | 428 | 29 | 1,068 | 51 | 1,070 | 50 | 895 | 41 |
| PIP4K2A | 1,898 | 88 | Phosphatidylinositol-5-phosphate 4-kinase type 2 alpha [HGNC:8997] | 609 | 31 | 1,082 | 66 | 812 | 34 | – | – | 566 | 23 | 1,486 | 84 | 1,501 | 83 | 1,206 | 72 |
| PLA2G7 | 1,907 | 57 | Phospholipase A2 group VII [HGNC:9040] | 828 | 27 | 1,367 | 37 | 995 | 37 | – | – | 737 | 28 | 1,825 | 52 | 1,826 | 51 | 1,534 | 37 |
| PLK1 | 1,935 | 108 | Polo like kinase 1 [HGNC:9077] | 631 | 45 | 1,125 | 82 | 836 | 39 | 431 | 23 | 590 | 36 | 1,531 | 100 | 1,542 | 102 | 1,241 | 88 |
| POLB | 1,166 | 53 | DNA polymerase beta [HGNC:9174] | 480 | 22 | 850 | 37 | 646 | 30 | – | – | 462 | 21 | 1,113 | 50 | 1,113 | 50 | 932 | 39 |
| POLH | 2,202 | 70 | DNA polymerase eta [HGNC:9181] | 858 | 26 | 1,342 | 44 | 1,015 | 37 | – | – | 749 | 23 | 1,827 | 61 | 1,838 | 64 | 1,532 | 48 |
| POLI | 1,726 | 79 | DNA polymerase iota [HGNC:9182] | 775 | 29 | 1,210 | 52 | 945 | 43 | 448 | 24 | 689 | 27 | 1,629 | 71 | 1,637 | 72 | 1,359 | 55 |
| POLK | 2,895 | 79 | DNA polymerase kappa [HGNC:9183] | 1,248 | 30 | 2,048 | 49 | 1,799 | 43 | 986 | 27 | 1,154 | 26 | 2,713 | 69 | 2,722 | 70 | 2,225 | 54 |
| PRMT1 | 2,886 | 80 | Protein arginine methyltransferase 1 [HGNC:5187] | 1,114 | 28 | 1,704 | 59 | 1,073 | 23 | – | – | – | – | 2,394 | 74 | 2,415 | 74 | 2,081 | 68 |
| RAD52 | 14,593 | 132 | RAD52 homolog, DNA repair protein [HGNC:9824] | – | – | 2,291 | 25 | – | – | – | – | – | – | 3,043 | 40 | 3,496 | 54 | 3,121 | 49 |
| SIRT5 | 14,103 | 141 | Sirtuin 5 [HGNC:14933] | – | – | 2,086 | 30 | – | – | – | – | – | – | 2,769 | 40 | 3,211 | 44 | 2,844 | 42 |
| SLC6A3 | 1,006 | 94 | Solute carrier family 6 member 3 [HGNC:11049] | 461 | 49 | 773 | 71 | 584 | 72 | 252 | 54 | 453 | 49 | 976 | 91 | 973 | 90 | 823 | 73 |
| SMN2 | 1,633 | 53 | Survival of motor neuron 2, centromeric [HGNC:11118] | – | – | 1,136 | 44 | 1,059 | 28 | – | – | – | – | 1,520 | 49 | 1,521 | 49 | 1,209 | 45 |
| STK33 | 3,358 | 423 | Serine/threonine kinase 33 [HGNC:14568] | 1,127 | 101 | 2,077 | 304 | 1,458 | 163 | 776 | 131 | 1,034 | 102 | 2,660 | 329 | 2,663 | 360 | 2,268 | 328 |
| TARDBP | 1,802 | 60 | TAR DNA binding protein [HGNC:11571] | – | – | 1,044 | 50 | 748 | 25 | – | – | – | – | 1,409 | 58 | 1,418 | 59 | 1,156 | 52 |
| TNFRSF10B | 2,429 | 80 | TNF receptor superfamily member 10b [HGNC:11905] | – | – | 1,510 | 66 | 1,056 | 26 | – | – | 786 | 27 | 2,008 | 73 | 2,021 | 75 | 1,718 | 58 |
| TP53 | 2,310 | 198 | Tumor protein p53 [HGNC:11998] | 974 | 97 | 1,554 | 137 | 1,380 | 130 | 737 | 84 | 891 | 98 | 2,172 | 179 | 2,174 | 181 | 1,714 | 137 |
| TSHR | 2,259 | 70 | Thyroid stimulating hormone receptor [HGNC:12373] | 968 | 25 | 1,579 | 58 | 1,317 | 43 | 727 | 35 | – | – | 2,133 | 68 | 2,131 | 67 | 1,739 | 64 |
| TUBB | 697 | 51 | Tubulin beta class I [HGNC:20778] | – | – | 503 | 32 | 373 | 32 | – | – | – | – | 692 | 48 | 693 | 49 | 563 | 32 |
| USP1 | 2,356 | 64 | Ubiquitin specific peptidase 1 [HGNC:12607] | 877 | 30 | 1,425 | 46 | 1,260 | 45 | 697 | 29 | 833 | 30 | 1,972 | 55 | 1,985 | 58 | 1,557 | 44 |
| VDR | 2,696 | 140 | Vitamin D receptor [HGNC:12679] | 1,161 | 44 | 1,901 | 101 | 1,673 | 80 | 915 | 50 | 1,076 | 43 | 2,530 | 127 | 2,536 | 128 | 2,059 | 107 |
| YES1 | 138 | 101 | YES proto-oncogene 1, Src family tyrosine kinase [HGNC:12841] | 106 | 89 | 99 | 86 | 117 | 89 | 77 | 76 | 106 | 89 | 132 | 97 | 131 | 99 | 90 | 67 |
Figure 2Exploration of the 2D chemical space, along with the corresponding 2D biological space formed by all GES. (A) t-SNE on Morgan fingerprints from the 9,035 compounds in working dataset, representing the chemical space. Points corresponding to compounds for which there is no known target are represented by gray points (n = 4,163). Points corresponding to compounds for which there is at least one known target are in blue (n = 4,872), with darker blue depending on the increasing number of targets. (B) t-SNE on all GESs in the working dataset, representing the biological (transcriptomic response) space. Points corresponding to GESs are colored by cell line. (C) Biological space highlighting only PC3 and VCAP signatures, 2 cell lines originating from prostate cancer. (D) Biological space highlighting only A549 and HCC515 signatures, 2 cell lines originating from lung cancer.
Figure 3NR3C1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) t-SNE on all A549 signatures (A549 biological space); (C) t-SNE on all MCF7 signatures (MCF7 biological space); (D) t-SNE on all PC3 signatures (PC3 biological space). Points corresponding to NR3C1 actives are red (n = 54), NR3C1 inactives (n = 925) are blue, gray points have no available label concerning NR3C1 activity. Orange circles point out clustering of active compounds.
Figure 4TUBB active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) A549 biological space; (C) MCF7 biological space; (D) PC3 biological space. Points corresponding to TUBB actives (n = 51) are red, TUBB inactives (n = 697) are blue, gray points have no available label concerning TUBB activity. Orange circles point out clustering of active compounds.
Figure 5DRD1 active and inactives compounds in the chemical space and the different biological spaces formed by GES produced in a single cell line. (A) Chemical space; (B) A549 biological space; (C) MCF7 biological space; (D) PC3 biological space. Points corresponding to DRD1 actives (n = 99) are red, DRD1 inactives (n = 1843) are blue, gray points have no available label concerning DRD1 activity.
Mean BAs of models (mean per condition).
| Enzyme | 15-Hydroxyprostaglandin dehydrogenase | HPGD | 0.5 | 0.65 | 0.51 | 0.7 | 0.52 | 0.67 | 0.5 | 0.72 | 0.51 | 0.68 | 0.5 | 0.71 | 0.52 | 0.67 | 0.52 | 0.64 |
| Arachidonate 15-lipoxygenase | ALOX15 | 0.54 | 0.74 | 0.57 | 0.75 | 0.54 | 0.67 | 0.5 | 0.63 | 0.5 | 0.73 | 0.56 | 0.74 | 0.65 | 0.75 | 0.53 | 0.75 | |
| ATP binding cassette B1 | ABCB1 | 0.54 | 0.62 | 0.51 | 0.61 | 0.5 | 0.54 | 0.5 | 0.5 | 0.51 | 0.5 | 0.56 | 0.62 | 0.59 | 0.63 | 0.51 | 0.63 | |
| BRCA1, dna repair associated | BRCA1 | 0.74 | 0.69 | 0.69 | 0.62 | 0.67 | 0.57 | 0.6 | 0.52 | 0.71 | 0.6 | 0.78 | 0.64 | 0.75 | 0.64 | 0.67 | 0.65 | |
| Cytochrome P450 1A2 | CYP1A2 | 0.54 | 0.6 | 0.56 | 0.6 | 0.57 | 0.61 | 0.59 | 0.58 | 0.5 | 0.59 | 0.59 | 0.63 | 0.6 | 0.63 | 0.56 | 0.61 | |
| Cytochrome P450 2C19 | CYP2C19 | 0.52 | 0.57 | 0.55 | 0.59 | 0.5 | 0.55 | 0.52 | 0.55 | 0.51 | 0.57 | 0.56 | 0.58 | 0.56 | 0.59 | 0.53 | 0.58 | |
| Cytochrome P450 2C9 | CYP2C9 | 0.55 | 0.57 | 0.52 | 0.61 | 0.52 | 0.6 | 0.51 | 0.6 | 0.53 | 0.58 | 0.55 | 0.58 | 0.6 | 0.58 | 0.54 | 0.6 | |
| Cytochrome P450 3A4 | CYP3A4 | 0.51 | 0.55 | 0.52 | 0.59 | 0.52 | 0.58 | 0.49 | 0.6 | 0.51 | 0.54 | 0.52 | 0.6 | 0.54 | 0.6 | 0.53 | 0.57 | |
| DNA polymerase beta | POLB | 0.5 | 0.66 | 0.5 | 0.69 | 0.51 | 0.82 | – | – | 0.5 | 0.69 | 0.5 | 0.69 | 0.53 | 0.73 | 0.5 | 0.69 | |
| DNA polymerase eta | POLH | 0.55 | 0.71 | 0.51 | 0.74 | 0.66 | 0.82 | – | – | 0.5 | 0.76 | 0.51 | 0.73 | 0.55 | 0.72 | 0.52 | 0.69 | |
| DNA polymerase iota | POLI | 0.5 | 0.7 | 0.5 | 0.71 | 0.54 | 0.71 | 0.5 | 0.6 | 0.51 | 0.65 | 0.51 | 0.72 | 0.51 | 0.73 | 0.5 | 0.68 | |
| DNA polymerase kappa | POLK | 0.58 | 0.77 | 0.51 | 0.78 | 0.56 | 0.83 | 0.52 | 0.79 | 0.5 | 0.76 | 0.54 | 0.81 | 0.56 | 0.83 | 0.5 | 0.83 | |
| Flap structure-specific endonuclease 1 | FEN1 | 0.5 | 0.72 | 0.5 | 0.68 | 0.5 | 0.77 | – | – | 0.5 | 0.74 | 0.5 | 0.77 | 0.51 | 0.75 | 0.5 | 0.7 | |
| Glutaminase | GLS | 0.5 | 0.64 | 0.51 | 0.64 | 0.53 | 0.58 | – | – | – | – | 0.51 | 0.74 | 0.51 | 0.69 | 0.51 | 0.7 | |
| Growth factor, augmenter of liver regeneration | GFER | 0.51 | 0.72 | 0.5 | 0.72 | 0.59 | 0.74 | 0.5 | 0.71 | 0.5 | 0.72 | 0.52 | 0.72 | 0.56 | 0.71 | 0.5 | 0.7 | |
| Hydroxysteroid 17-beta dehydrogenase 10 | HSD17B10 | 0.51 | 0.57 | 0.54 | 0.62 | 0.55 | 0.63 | 0.51 | 0.63 | 0.49 | 0.58 | 0.54 | 0.65 | 0.53 | 0.64 | 0.5 | 0.62 | |
| Janus kinase 2 | JAK2 | 0.71 | 0.57 | 0.73 | 0.63 | 0.69 | 0.56 | 0.63 | 0.56 | 0.68 | 0.56 | 0.8 | 0.63 | 0.73 | 0.59 | 0.65 | 0.58 | |
| MDM2 proto-oncogene | MDM2 | 0.77 | 0.69 | 0.71 | 0.61 | 0.7 | 0.62 | 0.76 | 0.53 | 0.67 | 0.59 | 0.83 | 0.65 | 0.81 | 0.64 | 0.76 | 0.66 | |
| Phosphatidylinositol-5-phosphate 4-kinase type 2 alpha | PIP4K2A | 0.51 | 0.69 | 0.54 | 0.75 | 0.53 | 0.66 | – | – | 0.5 | 0.58 | 0.53 | 0.76 | 0.5 | 0.76 | 0.52 | 0.74 | |
| Phospholipase A2 group VII | PLA2G7 | 0.5 | 0.71 | 0.51 | 0.65 | 0.57 | 0.72 | – | – | 0.5 | 0.69 | 0.54 | 0.69 | 0.57 | 0.71 | 0.51 | 0.69 | |
| Polo like kinase 1 | PLK1 | 0.5 | 0.62 | 0.52 | 0.64 | 0.54 | 0.52 | 0.49 | 0.56 | 0.5 | 0.52 | 0.52 | 0.68 | 0.52 | 0.67 | 0.49 | 0.68 | |
| Serine/threonine kinase 33 | STK33 | 0.78 | 0.58 | 0.68 | 0.62 | 0.78 | 0.59 | 0.74 | 0.6 | 0.7 | 0.56 | 0.72 | 0.61 | 0.71 | 0.64 | 0.66 | 0.65 | |
| Ubiquitin specific peptidase 1 | USP1 | 0.5 | 0.54 | 0.51 | 0.58 | 0.51 | 0.53 | 0.51 | 0.53 | 0.5 | 0.55 | 0.52 | 0.57 | 0.57 | 0.59 | 0.5 | 0.57 | |
| YES proto-oncogene 1, Src family tyrosine kinase | YES1 | 0.7 | 0.72 | 0.67 | 0.7 | 0.71 | 0.72 | 0.63 | 0.69 | 0.66 | 0.71 | 0.66 | 0.75 | 0.7 | 0.72 | 0.54 | 0.68 | |
| Epigenetic regulator | Bromodomain adjacent to zinc finger domain 2B | BAZ2B | 0.6 | 0.66 | 0.54 | 0.68 | 0.63 | 0.69 | 0.52 | 0.6 | 0.5 | 0.65 | 0.55 | 0.67 | 0.6 | 0.66 | 0.53 | 0.68 |
| Chromobox 1 | CBX1 | 0.56 | 0.57 | 0.55 | 0.62 | 0.6 | 0.6 | 0.57 | 0.61 | 0.54 | 0.55 | 0.61 | 0.6 | 0.57 | 0.62 | 0.55 | 0.61 | |
| Lysine demethylase 4A | KDM4A | 0.64 | 0.62 | 0.56 | 0.67 | 0.65 | 0.7 | 0.57 | 0.63 | 0.6 | 0.65 | 0.58 | 0.68 | 0.6 | 0.65 | 0.53 | 0.63 | |
| Lysine demethylase 4E | KDM4E | 0.59 | 0.76 | 0.52 | 0.75 | 0.61 | 0.75 | 0.53 | 0.75 | 0.5 | 0.72 | 0.53 | 0.73 | 0.57 | 0.74 | 0.52 | 0.72 | |
| M-phase phosphoprotein 8 | MPHOSPH8 | 0.52 | 0.53 | 0.52 | 0.64 | 0.54 | 0.64 | 0.51 | 0.65 | 0.5 | 0.54 | 0.63 | 0.63 | 0.54 | 0.64 | 0.51 | 0.65 | |
| Protein arginine methyltransferase 1 | PRMT1 | 0.5 | 0.65 | 0.51 | 0.71 | 0.5 | 0.54 | – | – | – | – | 0.52 | 0.7 | 0.51 | 0.71 | 0.51 | 0.72 | |
| Sirtuin 5 | SIRT5 | – | – | 0.51 | 0.65 | – | – | – | – | – | – | 0.51 | 0.62 | 0.5 | 0.61 | 0.5 | 0.6 | |
| Survival of motor neuron 2, centromeric | SMN2 | – | – | 0.5 | 0.62 | 0.51 | 0.56 | – | – | – | – | 0.52 | 0.63 | 0.52 | 0.6 | 0.5 | 0.62 | |
| Ion channel | Potassium voltage-gated channel H2 | KCNH2 | 0.72 | 0.82 | 0.64 | 0.8 | 0.66 | 0.79 | 0.65 | 0.76 | 0.74 | 0.8 | 0.65 | 0.84 | 0.65 | 0.8 | 0.67 | 0.8 |
| Membrane receptor | 5-hydroxytryptamine receptor 1A | HTR1A | 0.5 | 0.72 | 0.51 | 0.75 | 0.54 | 0.75 | 0.53 | 0.75 | 0.51 | 0.7 | 0.55 | 0.77 | 0.52 | 0.76 | 0.55 | 0.79 |
| Cholinergic receptor muscarinic 1 | CHRM1 | 0.6 | 0.69 | 0.64 | 0.69 | 0.66 | 0.71 | 0.7 | 0.69 | 0.67 | 0.62 | 0.59 | 0.71 | 0.6 | 0.73 | 0.65 | 0.72 | |
| Cholinergic receptor muscarinic 4 | CHRM4 | 0.63 | 0.71 | 0.67 | 0.75 | 0.64 | 0.72 | 0.7 | 0.68 | 0.63 | 0.66 | 0.66 | 0.7 | 0.62 | 0.73 | 0.62 | 0.72 | |
| Cholinergic receptor muscarinic 5 | CHRM5 | 0.6 | 0.69 | 0.62 | 0.76 | 0.64 | 0.72 | 0.69 | 0.72 | 0.63 | 0.68 | 0.64 | 0.73 | 0.59 | 0.71 | 0.61 | 0.68 | |
| Dopamine receptor D1 | DRD1 | 0.64 | 0.68 | 0.6 | 0.73 | 0.63 | 0.7 | 0.58 | 0.72 | 0.62 | 0.69 | 0.62 | 0.74 | 0.61 | 0.72 | 0.59 | 0.74 | |
| Dopamine receptor D2 | DRD2 | 0.61 | 0.74 | 0.61 | 0.79 | 0.61 | 0.79 | 0.6 | 0.79 | 0.61 | 0.73 | 0.65 | 0.79 | 0.62 | 0.79 | 0.63 | 0.8 | |
| Dopamine receptor D3 | DRD3 | 0.6 | 0.66 | 0.58 | 0.72 | 0.56 | 0.71 | 0.58 | 0.71 | 0.58 | 0.66 | 0.63 | 0.72 | 0.57 | 0.72 | 0.58 | 0.73 | |
| Neuropeptide S receptor 1 | NPSR1 | – | – | 0.59 | 0.64 | 0.5 | 0.55 | – | – | – | – | 0.63 | 0.66 | 0.57 | 0.66 | 0.58 | 0.63 | |
| Opioid receptor kappa 1 | OPRK1 | 0.5 | 0.61 | 0.52 | 0.65 | 0.5 | 0.61 | – | – | 0.54 | 0.63 | 0.57 | 0.65 | 0.53 | 0.64 | 0.55 | 0.68 | |
| Thyroid stimulating hormone receptor | TSHR | 0.51 | 0.64 | 0.5 | 0.56 | 0.56 | 0.52 | 0.57 | 0.5 | – | – | 0.56 | 0.61 | 0.56 | 0.6 | 0.55 | 0.61 | |
| TNF receptor superfamily member 10b | TNFRSF10B | – | – | 0.69 | 0.61 | 0.56 | 0.52 | – | – | 0.7 | 0.56 | 0.71 | 0.61 | 0.78 | 0.58 | 0.65 | 0.56 | |
| Other cytosolic protein | Heat shock protein 90 alpha A1 | HSP90AA1 | 0.53 | 0.65 | 0.5 | 0.67 | 0.59 | 0.73 | – | – | 0.53 | 0.67 | 0.55 | 0.67 | 0.59 | 0.65 | 0.51 | 0.64 |
| Heat shock protein family B1 | HSPB1 | 0.58 | 0.58 | 0.54 | 0.53 | 0.63 | 0.58 | 0.54 | 0.51 | 0.5 | 0.55 | 0.66 | 0.59 | 0.66 | 0.61 | 0.55 | 0.57 | |
| Secreted protein | Interleukin 1 beta | IL1B | 0.62 | 0.55 | 0.65 | 0.6 | 0.65 | 0.55 | 0.65 | 0.55 | 0.66 | 0.57 | 0.68 | 0.63 | 0.69 | 0.62 | 0.63 | 0.62 |
| Structural protein | Tubulin beta class I | TUBB | – | – | 0.81 | 0.8 | 0.82 | 0.78 | – | – | – | – | 0.88 | 0.82 | 0.84 | 0.8 | 0.82 | 0.8 |
| Transcription factor | Androgen receptor | AR | 0.51 | 0.63 | 0.58 | 0.75 | 0.55 | 0.71 | 0.53 | 0.75 | 0.51 | 0.62 | 0.61 | 0.77 | 0.55 | 0.74 | 0.67 | 0.76 |
| Jun proto-oncogene, AP-1 transcription factor subunit | JUN | 0.6 | 0.69 | 0.54 | 0.63 | 0.58 | 0.65 | 0.56 | 0.61 | 0.59 | 0.67 | 0.6 | 0.65 | 0.57 | 0.67 | 0.6 | 0.63 | |
| Melanogenesis associated transcription factor | MITF | 0.81 | 0.64 | 0.7 | 0.61 | 0.73 | 0.57 | 0.73 | 0.56 | 0.68 | 0.57 | 0.82 | 0.65 | 0.79 | 0.68 | 0.69 | 0.61 | |
| Nuclear factor kappa B1 | NFKB1 | 0.51 | 0.51 | 0.5 | 0.66 | 0.51 | 0.5 | 0.5 | 0.51 | 0.55 | 0.5 | 0.5 | 0.63 | 0.51 | 0.64 | 0.51 | 0.63 | |
| Nuclear receptor 3C1 | NR3C1 | – | – | 0.77 | 0.96 | 0.67 | 0.94 | 0.76 | 0.98 | – | – | 0.6 | 0.93 | 0.73 | 0.95 | 0.69 | 0.95 | |
| Nuclear receptor 5A1 | NR5A1 | 0.55 | 0.53 | 0.65 | 0.56 | 0.64 | 0.57 | – | – | – | – | 0.72 | 0.62 | 0.73 | 0.62 | 0.65 | 0.6 | |
| Tumor protein p53 | TP53 | 0.72 | 0.57 | 0.62 | 0.55 | 0.65 | 0.55 | 0.7 | 0.57 | 0.62 | 0.58 | 0.71 | 0.58 | 0.7 | 0.57 | 0.6 | 0.56 | |
| Vitamin D receptor | VDR | 0.5 | 0.57 | 0.5 | 0.6 | 0.52 | 0.6 | 0.51 | 0.54 | 0.53 | 0.53 | 0.58 | 0.62 | 0.55 | 0.59 | 0.53 | 0.59 | |
| Transporter | Abhydrolase domain containing 5 | ABHD5 | 0.51 | 0.57 | 0.51 | 0.66 | – | – | – | – | – | – | 0.55 | 0.68 | 0.54 | 0.68 | 0.53 | 0.69 |
| Solute carrier family 6 member 3 | SLC6A3 | 0.64 | 0.65 | 0.62 | 0.66 | 0.65 | 0.65 | 0.67 | 0.62 | 0.61 | 0.65 | 0.66 | 0.66 | 0.66 | 0.67 | 0.64 | 0.68 | |
| Unclassified protein | Ataxin 2 | ATXN2 | 0.78 | 0.5 | 0.7 | 0.62 | 0.74 | 0.52 | 0.7 | 0.53 | 0.69 | 0.58 | 0.72 | 0.62 | 0.72 | 0.61 | 0.7 | 0.61 |
| ATPase family, AAA domain containing 5 | ATAD5 | 0.58 | 0.56 | 0.52 | 0.67 | 0.59 | 0.62 | 0.55 | 0.62 | 0.52 | 0.6 | 0.6 | 0.65 | 0.64 | 0.65 | 0.52 | 0.68 | |
| Endothelial PAS domain protein 1 | EPAS1 | – | – | 0.63 | 0.62 | – | – | – | – | – | – | 0.73 | 0.68 | 0.76 | 0.65 | 0.69 | 0.67 | |
| Geminin, DNA replication inhibitor | GMNN | 0.7 | 0.59 | 0.71 | 0.58 | 0.69 | 0.55 | 0.65 | 0.59 | 0.67 | 0.55 | 0.76 | 0.62 | 0.75 | 0.59 | 0.68 | 0.6 | |
| MLLT3, super elongation complex subunit | MLLT3 | – | – | 0.5 | 0.54 | – | – | – | – | – | – | 0.55 | 0.55 | 0.51 | 0.63 | 0.51 | 0.67 | |
| MYC proto-oncogene, bHLH transcription factor | MYC | – | – | 0.73 | 0.65 | – | – | – | – | – | – | 0.69 | 0.65 | 0.63 | 0.66 | 0.77 | 0.64 | |
| Nuclear factor, erythroid 2 like 2 | NFE2L2 | 0.57 | 0.51 | 0.57 | 0.6 | 0.55 | 0.58 | 0.57 | 0.59 | 0.58 | 0.53 | 0.56 | 0.61 | 0.61 | 0.6 | 0.59 | 0.6 | |
| Nucleotide binding oligomerization domain containing 1 | NOD1 | – | – | 0.58 | 0.66 | – | – | – | – | 0.6 | 0.58 | 0.66 | 0.69 | 0.65 | 0.68 | 0.56 | 0.7 | |
| Nucleotide binding oligomerization domain containing 2 | NOD2 | 0.5 | 0.53 | 0.53 | 0.61 | 0.56 | 0.54 | – | – | 0.52 | 0.53 | 0.58 | 0.68 | 0.61 | 0.68 | 0.56 | 0.64 | |
| RAD52 homolog, DNA repair protein | RAD52 | – | – | 0.54 | 0.57 | – | – | – | – | – | – | 0.52 | 0.61 | 0.53 | 0.63 | 0.53 | 0.69 | |
| TAR DNA binding protein | TARDBP | – | – | 0.5 | 0.56 | 0.51 | 0.54 | – | – | – | – | 0.54 | 0.55 | 0.54 | 0.54 | 0.5 | 0.57 | |
Target in lines, and cell line and used descriptor in columns. GES, model using gene expression signature. Morgan FP, model using chemical fingerprints from counterpart GES model dataset. Cells containing “-” corresponds to models that were not computed in cause of a too low number of actives (<20) in the dataset to perform appropriate classification. Model presenting BA between 0.7 and 0.8 are highlighted in orange whereas those with BA >0.8 are highlighted in red.
Figure 6Morgan fingerprints Dice distance vs. GES cosine distance (distance plots). Different panels show information for pairs of NR3C1 (A–C), TUBB (D–F), and DRD1 (G–I); active compounds in A549 (A,D,G), MCF7 (B,E,H), and PC3 (C,F,I) cell lines.