| Literature DB >> 30830479 |
Sera Park1, Yeajee Kwon2, Hyesoo Jung1, Sukyung Jang1, Haeseung Lee1, Wankyu Kim3,4.
Abstract
Drug discovery typically involves investigation of a set of compounds (e.g. drug screening hits) in terms of target, disease, and bioactivity. CSgator is a comprehensive analytic tool for set-wise interpretation of compounds. It has two unique analytic features of Compound Set Enrichment Analysis (CSEA) and Compound Cluster Analysis (CCA), which allows batch analysis of compound set in terms of (i) target, (ii) bioactivity, (iii) disease, and (iv) structure. CSEA and CCA present enriched profiles of targets and bioactivities in a compound set, which leads to novel insights on underlying drug mode-of-action, and potential targets. Notably, we propose a novel concept of 'Hit Enriched Assays", i.e. bioassays of which hits are enriched among a given set of compounds. As an example, we show its utility in revealing drug mode-of-action or identifying hidden targets for anti-lymphangiogenesis screening hits. CSgator is available at http://csgator.ewha.ac.kr , and most analytic results are downloadable.Entities:
Keywords: Bioactivity profile; Bioassay; Compound network; Compound set analysis; Drug target
Year: 2019 PMID: 30830479 PMCID: PMC6419788 DOI: 10.1186/s13321-019-0339-6
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Compound-target interaction data from 15 public databases
| Source name | # interactions |
|---|---|
| BindingDB | 1,078,520 |
| Binding MOAD | 15,320 |
| Comparative Toxicogenomics Database | 77,327 |
| ChEMBL v.21 | 512,341 |
| DCDB | 1902 |
| DGIdb | 16,852 |
| DrugBank | 12,501 |
| IUPHAR | 12,429 |
| KEGG Drug | 9787 |
| KiDB | 20,610 |
| MATADOR | 1163 |
| PharmGKB | 3606 |
| Therapeutic Targets Database | 45,901 |
| GLASS | 460,881 |
| STITCH v.5a | 788,024 |
| Total | 3,057,164 |
aSTITCH provides scores for protein–chemical interactions, we filtered that interactions on two conditions: experimental score ≥ 700 and database score ≥ 700
Data sources and statistics collected in CSgator
| Number of entries | Number of compounds | Sources | Number of relations | Standard ID | |
|---|---|---|---|---|---|
|
| |||||
| Compound | – | 89,602,599 | PubChem, ChEMBL, ChEBI, DrugBank | – | InChIKey |
| Target | 252,498 | 852,375 | 15 Public DBs | 6,027,120 | Entrez Gene ID & UniProtKB |
| Disease | 5680 | 10,975 | CTD | 1,575,457 | MeSH & OMIM |
| Bioassay | 1,218,658 | 2,253,835 | PubChem, ChEMBL | 229,842,265 | PubChem AID & ChEMBL |
|
| |||||
| Protein family | 575 | 833,590 | ChEMBL 21 | 1,691,879 | ChEMBL protein class |
| GO term | 19,234 | 851,359 | Gene Ontology | 68,331,986 | GO term |
| Disease ontology | 1824 | 5429 | Disease Ontology | 46,053 | DO term |
| MeSH disease | 6351 | 6909 | NIH | 143,277 | MeSH |
| Approval status | 9 | 3765 | DrugBank | 12,820 | InChIKey |
Fig. 1System overview of CSgator web platform. a Input Compound Set generated by user or selected among the predefined sets. It can be also created by applying various filters, and combining multiple sets using Set Operator such as union or intersection. b Comprehensive annotations of the input compound set are listed in four categories: target, bioactivity (bioassay), disease, and structure. c CSgator provides unique analyses. i.e. Compound Set Enrichment Analysis (CSEA) and Compound Cluster Analysis (CCA), of which details are described in the main text
HEAs (Hit Enriched Assays) from lymphangiogenesis hits
| Rank | Assay title | Target | Enrichment score of HEA | Number of hit/assayed compounds | FDR-adjusted | PubChem AID | Reference |
|---|---|---|---|---|---|---|---|
| 1 | Inhibitors of regulator of G protein signaling (RGS) 4 | RGS4 | 9.63 | 152/390,220 | 3.40E−16 | 504,845 (2011) | [ |
| 2 | Validation screen for inhibitors of Lassa infection | – | 7.18 | 54/1279 | 3.04E−13 | 463,096 (2010) | |
| 3 | High content imaging cell-Based qHTS for inhibitors of the mTORC1 signaling pathway in MEF (Tsc2-/-, p53-/-) cells | MTOR | 7.12 | 23/1280 | 8.03E−09 | 2666 (2010) | [ |
| 4 | Validation screen for small molecules that induce DNA re-replication in MCF 10A normal breast cells | GMNN | 6.77 | 71/1280 | 4.61E−11 | 463,097 (2010) | |
| 5 | High content imaging cell-based qHTS for inhibitors of the mTORC1 signaling pathway in MEF cells | MTOR | 6.35 | 52/1280 | 1.85E−10 | 2667 (2010) | [ |
| 6 | Validation screen for small molecules that inhibit ELG1-dependent DNA repair in human embryonic kidney (HEK293T) cells expressing luciferase-tagged ELG1 | ATAD5 | 6.22 | 79/1280 | 3.78E−10 | 493,107 (2011) | |
| 7 | qHTS assay for identification of small molecule antagonists for thrombopoietin (TPO) signaling pathway | THPO | 6.21 | 122/1277 | 1.11E−08 | 918 (2010) | [ |
| 8 | qHTS for inhibitors of ATXN expression: validation | ATXN2 | 5.94 | 73/1280 | 2.67E−07 | 588,378 (2011) | |
| 9 | qHTS assay for the inhibitors of human flap endonuclease 1 (FEN1) | FEN1 | 5.87 | 1368/391,275 | 2.04E−07 | 588,795 (2011) | |
| 10 | AP1 signaling pathway | AP1 | 5.85 | 55/10,692 | 4.90E−5 | 357 (2006) | [ |
Target enrichment tree results from lymphangiogenesis hits
| Rank | Target family | Target |
| Enrichment score | FDR adjusted |
|---|---|---|---|---|---|
| 1 | CA ACT CL (calcium-activated chloride channel) | ANO1 | 2 | 5.90 | 4.31E−03 |
| 3 | CYP_3A2 (cytochrome P450 3A2) | Cyp3a2 (Tax ID: 10116) | 2 | 5.30 | 8.34E−03 |
| 4 | SLC47 (SLC47 family of multidrug and toxin extrusion transporters) | SLC47A1 | 2 | 4.48 | 2.21E−02 |
| 5 | Structural (structural protein) | COL1A2 | 38 | 4.07 | 6.33E−31 |
| 6 | Ca ATPase (calcium ATPase) | ATP2A2 | 2 | 3.97 | 3.81E−02 |
| 7 | CYP_2E1 (cytochrome P450 2E1) | CYP2E1 | 5 | 3.74 | 4.36E−04 |
| 8 | CYP_2E (cytochrome P450 family 2E) | CYP2E1 | 5 | 3.74 | 4.57E−04 |
| 9 | GLY (glycine receptor) | GLRA1 | 3 | 3.74 | 1.02E−02 |
| 10 | CYP_1B1 (cytochrome P450 1B1) | CYP1B1 | 3 | 3.70 | 1.03E−02 |
| 11 | CYP_1B (cytochrome P450 family 1B) | CYP1B1 | 3 | 3.70 | 1.06E−02 |
Fig. 2CCA result for the anti-lymphangiogenetic screening hits. CC #1–#3 are the structurally similar clusters of the input compounds generated by k-means clustering (k = 3), which are linked to the relevant DO (Disease Ontology) terms