| Literature DB >> 27170106 |
Qing Wen1, Chang-Sik Kim1,2, Peter W Hamilton3, Shu-Dong Zhang4,5.
Abstract
BACKGROUND: Gene expression connectivity mapping has gained much popularity recently with a number of successful applications in biomedical research testifying its utility and promise. Previously methodological research in connectivity mapping mainly focused on two of the key components in the framework, namely, the reference gene expression profiles and the connectivity mapping algorithms. The other key component in this framework, the query gene signature, has been left to users to construct without much consensus on how this should be done, albeit it has been an issue most relevant to end users. As a key input to the connectivity mapping process, gene signature is crucially important in returning biologically meaningful and relevant results. This paper intends to formulate a standardized procedure for constructing high quality gene signatures from a user's perspective.Entities:
Keywords: Breast cancer; Connectivity mapping; Differentially expressed genes; Disease inhibitory compounds; Gene signature progression; Lung cancer
Mesh:
Substances:
Year: 2016 PMID: 27170106 PMCID: PMC4864913 DOI: 10.1186/s12859-016-1066-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flow chart of the two stage process. The output from stage 1 is a ranked list of significant genes with their regulation status (direction of differential expression in the treatment condition relative to the control). This is used as input to stage 2, gene signature progression. Starting from k=1, the top k genes are fed to sscMap as a query signature to pull out significant drugs. This process is run iteratively with increasing k until a pre-set target FDR is achieved for the returned significant drugs
Known HDAC inhibitors in the CMap reference dataset
| Compound | Replicates |
|---|---|
| Vorinostat | 12 |
| Trichostatin A | 182 |
| Valproic acid | 57 |
| HC toxin | 1 |
| Sodium phenylbutyrate | 7 |
| Scriptaid | 3 |
| MS-275 | 2 |
The significant connections returned from sscMap using the HDACi optimal gene signature (k=5) obtained from the “simulated experiment”
| Compound | Replicates | cscore |
| zscore | SM | PS |
|---|---|---|---|---|---|---|
| Sodium phenylbutyrate | 7 | 0.684 | 5.0E-06 | 7.93 | 1 | 1 |
| Trichostatin A | 182 | 0.822 | 5.0E-06 | 6.18 | 1 | 1 |
| Scriptaid | 3 | 0.928 | 5.0E-06 | 5.39 | 1 | 1 |
| Valproic acid | 57 | 0.323 | 1.0E-05 | 5.31 | 1 | 1 |
| Rifabutin | 3 | 0.880 | 5.0E-06 | 5.21 | 1 | 1 |
| Vorinostat | 12 | 0.890 | 5.0E-06 | 5.19 | 1 | 1 |
| MS-275 | 2 | 0.954 | 5.0E-06 | 4.30 | 1 | 1 |
| Withaferin A | 4 | 0.729 | 2.0E-05 | 3.97 | 1 | 1 |
| HC toxin | 1 | 0.928 | 5.0E-06 | 3.59 | 1 | 1 |
| 2-deoxy-D-glucose | 1 | 0.902 | 1.0E-05 | 3.49 | 1 | 1 |
All 7 known HDAC inhibitors, including those not used in signature generation, were pulled out. SM (significance mark) =1 indicates that the connection p value is less than the preset threshold E /N =1/1309≈7.6×10−4; PS (perturbation stability) is also shown here
Fig. 2Snapshots of the gene signature progression process in the lung cancer case study. Each panel shows results of sscMap with the top k ranked genes as the query signature, with k=150,200,300, and 322, respectively for (a)–(d). This figure exemplifies the intermediate results of individual steps in the gene signature progression process. The blue horizontal line indicates the position corresponding to the threshold p-value. Any points above this line are considered statistically significant. The number of compounds significantly connected to the query gene signature (solid red circles) are 2,4,7, and 10 respectively for panels (a)–(d). k=322 is the optimal signature length at which the preset FDR target ≤0.1 was first achieved
The significant connections returned from sscMap using the Lung cancer optimal gene signature obtained from the gene signature progression process in the case study with GSE19804 dataset
| Compound | Replicates | cscore |
| zscore | SM | PS |
|---|---|---|---|---|---|---|
| Trichostatin A | 182 | –0.072 | 1.0E-05 | –4.33 | 1 | 1.00 |
| Rofecoxib | 6 | –0.047 | 6.0E-05 | –4.29 | 1 | 1.00 |
| Calmidazolium | 2 | –0.110 | 1.0E-05 | –4.23 | 1 | 1.00 |
| MS-275 | 2 | –0.108 | 7.0E-05 | –3.87 | 1 | 1.00 |
| Rifabutin | 3 | –0.080 | 7.0E-05 | –3.76 | 1 | 1.00 |
| Exemestane | 1 | –0.113 | 2.8E-04 | –3.50 | 1 | 1.00 |
| STOCK1N-35696 | 2 | –0.082 | 3.0E-04 | –3.38 | 1 | 1.00 |
| 1,5-isoquinolinediol | 1 | –0.107 | 4.5E-04 | –3.31 | 1 | 1.00 |
| Pioglitazone | 11 | –0.030 | 3.4E-04 | –3.33 | 1 | 0.99 |
| Gefitinib | 1 | –0.105 | 5.8E-04 | –3.23 | 1 | 0.93 |
SM (significance mark) =1 indicates that the connection p value is less than the preset threshold E /N =1/1309≈7.6×10−4; PS (perturbation stability) is also shown here
The significant connections returned from sscMap using the breast cancer optimal gene signature based on the case study with dataset GSE15852, comparing 43 breast tumors with their paired normal tissues
| Compound | Replicates | cscore |
| zscore | SM | PS |
|---|---|---|---|---|---|---|
| IC-86621 | 4 | –0.079 | 5.0E-06 | –4.84 | 1 | 1.00 |
| Trichostatin A | 182 | –0.080 | 5.0E-06 | –4.07 | 1 | 1.00 |
| Semustine | 4 | –0.082 | 1.0E-04 | –3.82 | 1 | 1.00 |
| W-13 | 2 | –0.091 | 8.0E-05 | –3.77 | 1 | 1.00 |
| Copper sulfate | 4 | –0.066 | 1.5E-04 | –3.63 | 1 | 1.00 |
| Exemestane | 1 | –0.136 | 2.8E-04 | –3.58 | 1 | 1.00 |
| Vorinostat | 12 | –0.087 | 1.9E-04 | –3.47 | 1 | 1.00 |
| Danazol | 4 | –0.053 | 2.5E-04 | –3.43 | 1 | 1.00 |
| Dexverapamil | 1 | –0.127 | 5.8E-04 | –3.34 | 1 | 1.00 |
| 15-delta prostaglandin J2 | 15 | –0.051 | 6.4E-04 | –3.20 | 1 | 0.73 |
Summary of the two datasets analyzed in the case studies
| Dataset | GSE19804 | GSE15852 |
|---|---|---|
| Disease | Lung cancer | Breast cancer |
| Samples Size | 120 | 86 |
| Samples Details | 60 tumors | 43 tumors |
| 60 normals | 43 normals | |
| Total Genes (Probesets) | 22277 | 22283 |
| Threshold | 1/22277 | 1/22283 |
| Significant Genes | 6316 | 1241 |
| Expression filtered | 6066 | 1229 |
| Differential expression Filtered (Fold Change > 2) | 1779 | 368 |
| Gene Signature Progression optimal length | 322 | 232 |
| Potential drugs | 10 | 10 |