| Literature DB >> 33068417 |
Yuzhu Duan1, Daniel S Evans2, Richard A Miller3, Nicholas J Schork4, Steven R Cummings2, Thomas Girke1.
Abstract
signatureSearch is an R/Bioconductor package that integrates a suite of existing and novel algorithms into an analysis environment for gene expression signature (GES) searching combined with functional enrichment analysis (FEA) and visualization methods to facilitate the interpretation of the search results. In a typical GES search (GESS), a query GES is searched against a database of GESs obtained from large numbers of measurements, such as different genetic backgrounds, disease states and drug perturbations. Database matches sharing correlated signatures with the query indicate related cellular responses frequently governed by connected mechanisms, such as drugs mimicking the expression responses of a disease. To identify which processes are predominantly modulated in the GESS results, we developed specialized FEA methods combined with drug-target network visualization tools. The provided analysis tools are useful for studying the effects of genetic, chemical and environmental perturbations on biological systems, as well as searching single cell GES databases to identify novel network connections or cell types. The signatureSearch software is unique in that it provides access to an integrated environment for GESS/FEA routines that includes several novel search and enrichment methods, efficient data structures, and access to pre-built GES databases, and allowing users to work with custom databases.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33068417 PMCID: PMC7708038 DOI: 10.1093/nar/gkaa878
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Categories of GESS algorithms by data types
| Category | Method | Query | Database |
|---|---|---|---|
| Set-based | CMAP | GS | Ranka |
| gCMAP | Rank | GS | |
| LINCS | GS |
| |
| Fisher exact | GS | GS | |
| Correlation | PCC/SCCd | LFC or SIGc | LFC or SIG |
The table compares the different data types used as queries and databases by the GESS methods implemented in signatureSearch. The specific GEP types used by the methods are: arank transformed profiles, bZ-scores, cnormalized intensities or read counts. dPearson or Spearman correlation coefficient.
Figure 1.Overview of GESS and FEA workflow. GES queries are used to search a drug-based GES reference database for drugs inducing GESs similar to the query. To interpret the results mechanistically, the GESS results are subjected to functional enrichment analysis (FEA) including drug and target set enrichment analyses (DSEA, TSEA). Both identify enriched functional categories (GO terms and/or KEGG pathways) in the GESS results. Subsequently, drug-target networks (DTNs) are reconstructed for visualization and interpretation.
List of important functionalities provided by signatureSearch and signatureSearchData
| Function name | Description | Inputa | Output and comments |
|---|---|---|---|
|
| |||
| CMAP2 | Affymetrix drug signatures | Raw, normalized and rank-based expression data | GES reference DB stored as HDF5 that |
| LINCS | L1000 drug & genetic signatures | Normalized and weighted averaged expression data | can be accessed via |
| Custom | User provided signatures | Many types of expression data | from |
|
| |||
|
| CMAP method ( | GS-Q: DEG; GEP-DB: |
|
|
| LINCS method ( | GS-Q: DEG; GEP-DB: | table with similarity scores for each |
|
| gCMAP method ( | GEP-Q: | perturbagen GES in the reference database, |
|
| Fisher’s exact test ( | GS-Q: DEG; GS-DB: DEG | the query signature itself, as well as details |
|
| Correlation methods ( | GEP-Q; GEP-DB: same genes and GEP type | about the chosen search parameters |
|
| |||
|
| Modified GSEA algorithm ( | Score ranked target list |
|
|
| Duplication adjusted hyperG test ( | Target set with duplication | enrichment results, details about chosen |
|
| meanAbs method ( | Score ranked target list | functional annotation system, labels of |
|
| Hypergeometric test ( | Drug set | drugs used for testing, as well as |
|
| GSEA algorithm ( | Score ranked drug list | their corresponding target information |
|
| |||
|
| GESS result visualization |
| Dot plot of drug similarity scores |
|
| FEA result comparison | List of | Dot plot comparing result consistency |
|
| Drug-target networks | Drug set; pathway ID | Interactive network graph |
The names of functions and libraries are italicized. aOnly the most common input types are listed. Acronyms are defined in the text.
Figure 2.Design of signatureSearch package. GES reference databases are constructed from expression profile collections (RNA-Seq, Affymetrix chip or other technologies) and stored as HDF5 files. To perform GESSs, all query parameters are defined in a qSig search object where users can choose among over five search algorithms. The results are stored in a gessResult object that can be functionally annotated with different TSEA and DSEA methods. The enrichment results are organized in an feaResult object that can be used for drug-target network analysis and visualization.
Figure 3.Performance testing strategy of GESS methods. (A) The GESs of the drugs in each MOA and SSC category were searched against the LINCS database with each of the six GESS methods. The results were sorted by the corresponding similarity scores, here indicated by boxes with color gradient. GESs from the same and different MOA/SSC categories (CAT) as the query were indicated in a binary vector with ones and zeros (next to boxes), respectively. After joining the binary vectors for each category group and re-sorting them by the corresponding scores, cumulative TPRs and FPRs were plotted in form of ROCs. This was done on the global level (B) and the CAT level (C) for the MOA and SSC classifications separately. (D) The distributions of AUC/pAUC values from each CAT-level are depicted by violin plots with mean values and standard deviation (STDEV) bars given in the middle. In addition, the global AUC/pAUC values are indicated by triangles. (E) The statistical significance of the observed differences among the global AUC/pAUC values of the six GESS methods was assessed by a bootstrap test described in the text.
Figure 4.Recall performance of GESS methods on MOA and SSC categories. (A) The distributions of the ROC performance results of the 69 MOA categories are plotted in form of violin plots for each of the six GESS methods. The corresponding mean values, standard deviation bars and global AUCs are indicated within each violin by dots, vertical lines and triangles, respectively. The GESS methods are ordered by increasing global AUC values. (B) The corresponding distributions of pAUC values are given for FPRs of 1%, 5% and 10%. In this composite plot, the GESS methods are ordered by the mean of the ranks of their global pAUC values. (C–D) The GESS performance results of the 139 SSC categories are plotted the same way as the corresponding MOA results. (E) The performance results under (A)–(D) are summarized in form of stacked bar plots where the sum of the ranks is used to order the GESS methods from left to right by increasing performance. Each bar is composed of the ranking of the global AUCs and the mean ranking of the corresponding pAUCs for both MOA and SSC categories.
Time and memory performance
| GESS method | Time | Memory |
|---|---|---|
| CMAP | 1.2 min | 3.5GB |
| LINCS | 1.7 min | 2.3GB |
| gCMAP | 1 min | 290MB |
| Fisher | 9 s | 238MB |
| SPall | 1 min | 838MB |
| SPsub | 13 s | 238MB |
Top ranking drugs of vorinostat query
| Ranka | Drug nameb | Cell typec | SCCd | Targetse |
|---|---|---|---|---|
| 1 | Vorinostat | SKB | 1.00 | HDAC1; HDAC10; HDAC11... |
| 2 | Trichostatin-a | SKB | 0.99 | HDAC1; HDAC10; HDAC2... |
| 3 | KM-00927 | SKB | 0.98 | |
| 4 | Scriptaid | SKB | 0.97 | HDAC1; HDAC2; HDAC3... |
| 5 | HC-toxin | SKB | 0.97 | HDAC1 |
| 6 | Belinostat | SKB | 0.97 | HDAC1; HDAC10; HDAC11... |
| 7 | Panobinostat | SKB | 0.96 | HDAC1; HDAC10; HDAC11... |
| 8 | PCI-24781 | ASC | 0.95 | |
| 9 | HC-toxin | ASC | 0.95 | HDAC1 |
| 10 | Vorinostat | ASC | 0.94 | HDAC1; HDAC10; HDAC11... |
The GES of SKB cells treated with vorinostat was used as query to search the LINCS database with the SPsub method. The rows are sorted decreasingly by absolute Spearman Correlation Coefficientsd. The other columns include ranksa, drug namesb, cell types c and the gene symbols of the corresponding target sitese.
Figure 5.Structure-based hierarchical clustering dendrogram for drugs listed in Table 4. Experimental drugs lacking structure information are not included.
Top ranking MF and BP terms obtained with dup_hyperG
| Ontologya | GO termb |
|
|
|
|
|
|---|---|---|---|---|---|---|
| MF | HDAC activity (H3-K14) (GO:0031078) | 11 | 323 | 97 | 0.00e+00 | 0.00e+00 |
| MF | NAD-dependent HDAC activity (H3-K14, GO:0032041) | 11 | 323 | 97 | 0.00e+00 | 0.00e+00 |
| MF | NAD-dependent HDAC activity (GO:0017136) | 16 | 323 | 98 | 0.00e+00 | 0.00e+00 |
| MF | NAD-dependent PDAC activity (GO:0034979) | 17 | 323 | 99 | 0.00e+00 | 0.00e+00 |
| MF | HDAC activity (GO:0004407) | 44 | 323 | 98 | 0.00e+00 | 0.00e+00 |
| BP | Histone H3 deacetylation (GO:0070932) | 21 | 323 | 98 | 0.00e+00 | 0.00e+00 |
| BP | Histone H4 deacetylation (GO:0070933) | 11 | 323 | 59 | 0.00e+00 | 0.00e+00 |
| BP | Histone deacetylation (GO:0016575) | 86 | 323 | 101 | 0.00e+00 | 0.00e+00 |
| BP | Hair follicle placode formation (GO:0060789) | 5 | 323 | 23 | 0.00e+00 | 0.00e+00 |
| BP | Fungiform papilla morphogenesis (GO:0061197) | 5 | 323 | 23 | 0.00e+00 | 0.00e+00 |
The columns contain: GO ontologya; GO term description/IDb; number of proteins in GO termc, test setd and intersecte, raw P-valuef, and adjusted P-valueg using the BH method for multiple testing correction. To save space, longer GO term descriptions have been shortened.
Figure 6.Drug-target network module of Histone Deacetylase Activity (H3-K14 specific; GO MF ID: GO:0031078). Drugs and targets are depicted as boxes and circles, respectively. The color of the circles indicates the number of connections.