| Literature DB >> 28413689 |
Qiaonan Duan1,2, St Patrick Reid3, Neil R Clark1,2, Zichen Wang1,2, Nicolas F Fernandez1,2, Andrew D Rouillard1,2, Ben Readhead2, Sarah R Tritsch3, Rachel Hodos2, Marc Hafner4, Mario Niepel4, Peter K Sorger4, Joel T Dudley2, Sina Bavari3, Rekha G Panchal3, Avi Ma'ayan1,2.
Abstract
The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.Entities:
Year: 2016 PMID: 28413689 PMCID: PMC5389891 DOI: 10.1038/npjsba.2016.15
Source DB: PubMed Journal: NPJ Syst Biol Appl ISSN: 2056-7189
Figure 1Intrinsic benchmarking. Expression signatures for each small molecule are computed with the Characteristic Direction (CD) algorithm or downloaded from lincscloud.org. The signatures on lincscloud are computed using the Moderated Z-score (MODZ) method. (a, b) Histograms of the significance scores for the 8,301 signatures from the LJP5 and LJP6 batches. (c) Correlation between the strength metrics for signatures computed by the two methods. (d) Correlation between differential expression significance rank and dose rank using the two methods of computing differential expression. (e) Correlation between differential expression significance rank and dose rank using the two methods of computing differential expression without the influence of insignificant perturbations.
Figure 2Extrinsic benchmarking. (a) ROC curves showing the recovery of structurally similar small-molecule compounds compared with gene expression signature similarities in A549 cells after 24 h treatment with 10 μM of all compounds computed using the two different methods: the cosine distance between Characteristic Direction (CD) signatures in blue, and the Euclidean distance of the Modulated Z-score (MODZ) signatures in orange, and cosine distance of MODZ signatures in green. Chemical fingerprints similarities used to benchmark the gene expression signature similarity are MACCS and ECFP4, plotted in solid and dashed curves, respectively. (b) The deviation from the cumulative distribution of a uniform for the rankings of drug targets and their direct interactors in gene expression signatures computed using CD (blue) and MODZ (orange) under the same conditions. (c) Recovery of known drug targets by observing the ranks of gene expression signatures extracted from GEO (n=2206) where 917 genes were perturbed by either knocked-down, knocked out, or over-expressed in mammalian cells. GEO signatures are ranked by cosine distance when queried with the L1000 LINCS data processed by the MODZ or the CD methods. The deviation from the cumulative distribution of a uniform for the rankings of drug targets as determined by DrugBank where signatures are computed using CD (blue) and MODZ (orange) under the same conditions. ECFP4, extended-connectivity fingerprints; MACCS, molecular access system; ROC, reveiver operating curves.
Figure 3Screenshot from the input page of the L1000CDS2 software application. The input text boxes toggle between up and down sets, or an input vector option. Canned analysis for 670 disease signatures is provided with few clicks. The Ebola, ligand and cancer cell line signatures are provided as canned examples.
Figure 4Screenshot from the single drug/small-molecule results page of the L1000CDS2 software application.
Figure 5Screenshot from the drug pair results page of the L1000CDS2 software application.
Top five predicted drugs at each time point
| Kenpaullone | 1.2584 | CPC002 | HA1E | 6 | 10 |
| 0800-0289 | 1.1978 | CPC014 | A549 | 24 | 10 |
| BRD-K37312348 | 1.1965 | CPC016 | HT29 | 6 | 10 |
| SB 225002 | 1.1896 | CPC001 | HA1E | 24 | 10 |
| 10006350 | 1.1856 | CPC012 | MCF7 | 24 | 10 |
| Kenpaullone | 1.2756 | CPC002 | HA1E | 6 | 10 |
| PD 166793 | 1.2619 | CPC001 | HA1E | 24 | 10 |
| BAPTA-AM | 1.2564 | CPC001 | HA1E | 24 | 10 |
| BRD-U74615290 | 1.2396 | CPC014 | HT29 | 6 | 10 |
| NCGC00229596-01 | 1.2303 | CPC008 | HT29 | 6 | 10 |
| Kenpaullone | 1.3489 | CPC002 | HA1E | 6 | 10 |
| NSC 23766 | 1.3478 | CPC006 | PC3 | 24 | 160 |
| Rosiglitazone | 1.3386 | CPC006 | HA1E | 24 | 80 |
| 7-hydroxy-2, 3, 4, 5-tetrahydro-1H-[1]benzofuro[2, 3–c]azepin-1-one | 1.3351 | CPC007 | A549 | 24 | 10 |
| LY 364947 | 1.3183 | CPC003 | PC3 | 24 | 10 |
Top 10 predicted drugs by their rank product, i.e., multiplying the ranks across the three time points as determined by the cosine distance
| BRD-K37312348 | Kenpaullone | 1 |
| BRD-K40919711 | BAPTA-AM | 252 |
| BRD-A97437073 | Rosiglitazone | 1,755 |
| BRD-A68009927 | Daunorubicin hydrochloride | 2,880 |
| BRD-A06352508 | SB 218078 | 3,136 |
| BRD-K78126613 | MENADIONE | 3,328 |
| BRD-K88741031 | Methyl 2, 5-dihydroxycinnamate | 4,284 |
| BRD-K00615600 | AG14361 | 5,566 |
| BRD-K31342827 | GF-109203X | 8,385 |
| BRD-A75409952 | wortmannin | 17,204 |
Figure 6Experimental validation of small-molecule predictions. (a) Initial screen of the top five predicted small molecules to attenuate Ebola infection. HeLa cells were treated with 20 μM of each small molecule and then infected with Ebola at a multiplicity of infection (MOI) of five for 48 h. Ebola-infected cells were stained for viral antigen and analyzed on a confocal high-content imaging platform. (b) Dose–response experiments. HeLa cells were pretreated with a dose range of kenpaullone (0.3–75 μM) then infected with Ebola at an MOI of five for 48 h. (c) Representative images of cells treated in b. (d) Pre-treatment of HeLa cells with NCGC00184902-01 at two doses infected with Ebola. NCGC00184902-01 was predicted to reverse expression of the Ebola infection signatures at all three time points.
Figure 7GO, KEGG, MGI, KEA and X2K enrichment analyses. (a) Gene Ontology, KEGG pathways and mammalian phenotype enrichment analyses visualization on three representative canvases for the upregulated genes after Ebola infection at 30 min. Each tile in each canvas represents a term/gene-set and where all terms are arranged based on their gene-set content similarity. The canvas is continuous so the sides fold on each other. The tiles brightness represents high enrichment scores (or low P values) computed with the Fisher’s exact test. The most top enriched terms are highlighted. Complete results can be seen in supporting Table 1. (b) Kinase enrichment analysis visualized on a canvas where each tile represents a mammalian kinase and the gene sets for each kinase are its known substrates. The brightness of the tiles represent the enrichment P value scores computed using the Fisher’s exact test. (c) Expression2Kinases analysis of the upregulated genes after 2 h. In this analysis, we first identify transcription factors that are enriched for targets within the differentially expressed genes based on prior ChIP-seq experiments. Then, the top ten transcription factors are connected through known protein–protein interactions. Finally, the resultant proteins within this subnetwork are subjected to kinase enrichment analysis with KEA. Node size reflects connectivity and color distinguishes transcription factors in blue, intermediate proteins in gray and kinases in green. GO, gene ontology.