| Literature DB >> 23840550 |
Darragh G McArt1, Philip D Dunne, Jaine K Blayney, Manuel Salto-Tellez, Sandra Van Schaeybroeck, Peter W Hamilton, Shu-Dong Zhang.
Abstract
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping.Entities:
Mesh:
Year: 2013 PMID: 23840550 PMCID: PMC3694114 DOI: 10.1371/journal.pone.0066902
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flow chart of processing stages involved in establishing signatures from RNA-Seq and Microarray analysis for connectivity mapping.
DESeq top ranking differentially expressed genes.
| EnsemblID | GeneSymbol | Mean-stimu | Mean-unstimu | ratio(stimu/unstimu) | log2Ratio | pvalue | adjustedPvalue | DSeqPosition | affy Mapped ID |
| ENSG00000151503 | NCAPD3 | 3725.06 | 78.04 | 47.73 | 5.58 | 0.00E+00 | 0.00E+00 | 1 | 212789 at |
| ENSG00000096060 | FKBP5 | 1605.85 | 47.32 | 33.94 | 5.08 | 0.00E+00 | 0.00E+00 | 2 | 204560 at |
| ENSG00000116133 | DHCR24 | 1997.42 | 192.42 | 10.38 | 3.38 | 0.00E+00 | 0.00E+00 | 3 | 200862 at |
| ENSG00000156689 | GLYATL2 | 1652.52 | 157.72 | 10.48 | 3.39 | 1.83E-317 | 1.47E-313 | 4 | not Found In Annotation File |
| ENSG00000113594 | LIFR | 1030.64 | 54.44 | 18.93 | 4.24 | 7.33E-311 | 4.72E-307 | 5 | 205876 at |
| ENSG00000166451 | CENPN | 752.57 | 22.15 | 33.98 | 5.09 | 3.96E-299 | 2.12E-295 | 6 | 219555 s at, 222118 at |
| ENSG00000115648 | MLPH | 2733.96 | 422.61 | 6.47 | 2.69 | 2.81E-286 | 1.29E-282 | 7 | 218211 s at |
| ENSG00000244324 | RP11-67L3.6 | 884.95 | 73.73 | 12.00 | 3.59 | 7.97E-235 | 3.21E-231 | 8 | not Found In Annotation File |
| ENSG00000116285 | ERRFI1 | 642.06 | 33.44 | 19.20 | 4.26 | 1.44E-226 | 5.14E-223 | 9 | not Found In Annotation File |
| ENSG00000130066 | SAT1 | 1049.43 | 138.12 | 7.60 | 2.93 | 3.58E-197 | 1.15E-193 | 10 | 213988 s at, 210592 s at, 203455 s at |
The top 10 genes that were retrieved by DESeq using the R-Cloud on EBI for the LNCaP dataset. Expression ratio is (stimulated/un-stimulated). See Table S1 for the full list of differentially expressed genes returned by the DESeq analysis.
EdgeR top ranking differentially expressed genes.
| EnsemblID | GeneSymbol | log2Ratio | adjustedP | EdgeRposition | affy Mapped ID | DSeqPosition |
| ENSG00000151503 | NCAPD3 | 5.58 | 0 | 1 | 212789 at | 1 |
| ENSG00000096060 | FKBP5 | 5.10 | 0 | 2 | 204560 at | 2 |
| ENSG00000166451 | CENPN | 5.09 | 0 | 3 | 219555 s at, 222118 at | 6 |
| ENSG00000113594 | LIFR | 4.24 | 0 | 4 | 205876 at | 5 |
| ENSG00000244324 | RP11-67L3.6 | 3.60 | 0 | 5 | not Found InAnnotation File | 8 |
| ENSG00000156689 | GLYATL2 | 3.40 | 0 | 6 | not Found InAnnotation File | 4 |
| ENSG00000116133 | DHCR24 | 3.38 | 0 | 7 | 200862 at | 3 |
| ENSG00000155368 | DBI | 3.01 | 0 | 8 |
| 22 |
| ENSG00000130066 | SAT1 | 2.93 | 0 | 9 | 213988 s at, 210592 s at,203455 s at | 10 |
| ENSG00000115648 | MLPH | 2.72 | 0 | 10 | 218211 s at | 7 |
The top 10 genes that were retrieved by EdgeR using the R-Cloud on EBI for the LNCaP dataset. Expression ratio is (stimulated/un-stimulated). Here we can see that the same set of identifiers used in the sscMap from the DESeq analysis would have been attained by EdgeR with the exception of ENSG00000155368 which was ranked 22nd in DESeq analysis. Table S2 contains the full list of differentially expressed genes returned by the EdgeR analysis.
The gene signature from the NGS dataset using DESeq analysis and their positions in the microarray DEGs by SamR.
| ProbeSetID | GeneSymbol | EnsemblID | log2Ratio | adjustedPvalue | DESeqPosition | SamRposition | SamRlog2FC | SamRq-value(%) |
| 212789 at | NCAPD3 | ENSG00000151503 | 5.58 | 0 | 1 | 96 | 1.23 | 0.00 |
| 204560 at | FKBP5 | ENSG00000096060 | 5.08 | 0 | 2 | 30 | 2.04 | 0.00 |
| 200862 at | DHCR24 | ENSG00000116133 | 3.38 | 0 | 3 | 91 | 1.14 | 0.00 |
| 205876 at | LIFR | ENSG00000113594 | 4.24 | 4.72E-307 | 5 | 374 | 0.68 | 0.89 |
| 219555 s at | CENPN | ENSG00000166451 | 5.09 | 2.12E-295 | 6 | 29 | 1.93 | 0.00 |
| 222118 at | CENPN | ENSG00000166451 | 5.09 | 2.12E-295 | 6 | 11 | 2.58 | 0.00 |
| 218211 s at | MLPH | ENSG00000115648 | 2.69 | 1.29E-282 | 7 | 913 | 0.47 | 3.09 |
| 203455 s at | SAT1 | ENSG00000130066 | 2.93 | 1.15E-193 | 10 | 146 | 0.95 | 0.32 |
| 210592 s at | SAT1 | ENSG00000130066 | 2.93 | 1.15E-193 | 10 | 161 | 0.91 | 0.32 |
| 213988 s at | SAT1 | ENSG00000130066 | 2.93 | 1.15E-193 | 10 | 148 | 0.95 | 0.32 |
The list of identifiers and their associated genes extracted from the NGS dataset using DESeq analysis and put to the sscMap. We established where these genes were located in full list (Table S3) of statistically differentially expressed genes returned by the SamR analysis on the microarray dataset. All these genes lay within a SamR reported FDR of . Table S4 also contains the signed ranks of these 10 probesetIDs in the 6 instances of reference profiles for cotinine.
Figure 2sscMap output for the signature from the RNA-Seq dataset.
Figure demonstrates the volcano plot of the distribution of candidate compounds that may enhance (right side) or suppress (left side) the phenotype. Significant candidates are above the green line.
Figure 3NGS signature genes explored in Microarray study.
The set of genes utilised in the NGS gene signature for sscMap are explored in the GEO Dataset Browser with the Wang et al microarray dataset.
The gene signature from the microarray dataset using SamR analysis.
| ProbeSetID | GeneSymbol | log2FoldChange | SamR-q-value(%) | SamRposition | DESeqPosition | log2Ratio | adjustedPvalue |
| 209854 s at | KLK2 | 3.84 | 0 | 1 | 15 | 3.18 | 5.77E-160 |
| 210339 s at | KLK2 | 3.48 | 0 | 2 | 15 | 3.18 | 5.77E-160 |
| 205041 s at | ORM1/ORM2 | 3.78 | 0 | 3 | NA | NA | NA |
| 211689 s at | TMPRSS2 | 3.12 | 0 | 4 | 18 | 2.26 | 8.87E-130 |
| 222118 at | CENPN | 2.58 | 0 | 11 | 6 | 5.09 | 2.12E-295 |
| 217875 s at | PMEPA1 | 2.38 | 0 | 13 | 142 | 2.99 | 1.4E-30 |
| 205862 at | GREB1 | 2.36 | 0 | 14 | 1356 | 1.00 | 0.000149717 |
| 219049 at | CSGALNACT1 | 2.21 | 0 | 15 | 1061 | 1.81 | 0.0000119 |
| 205102 at | TMPRSS2 | 2.07 | 0 | 16 | NA | NA | NA |
| 204583 x at | KLK3 | 1.96 | 0 | 20 | 31 | 1.81 | 1.14E-102 |
| 204582 s at | KLK3 | 2.02 | 0 | 21 | 31 | 1.81 | 1.14E-102 |
| 209706 at | NKX3-1 | 1.95 | 0 | 22 | 36 | 2.49 | 2.59E-90 |
| 219555 s at | CENPN | 1.93 | 0 | 29 | 6 | 5.09 | 2.12E-295 |
| 204560 at | FKBP5 | 2.04 | 0 | 30 | 2 | 5.08 | 0 |
| 203196 at | ABCC4 | 1.69 | 0 | 33 | 23 | 2.85 | 5.94E-117 |
| 221584 s at | KCNMA1 | 1.57 | 0 | 35 | 85 | 2.16 | 2.31E-43 |
| 204897 at | PTGER4 | 1.62 | 0 | 37 | 671 | 3.53 | 0.000000015 |
| 211548 s at | HPGD | 1.60 | 0 | 38 | 185 |
| 3.19E-25 |
| 220014 at | PRR16 | 1.50 | 0 | 44 | NA | NA | NA |
| 219476 at | C1orf116 | 1.47 | 0 | 46 | 96 | 1.94 | 5.44E-39 |
| 203180 at | ALDH1A3 | 1.46 | 0 | 48 | 668 | 1.29 | 1.44E-08 |
| 210787 s at | CAMKK2 | 1.42 | 0 | 49 | NA | NA | NA |
| 201110 s at | THBS1 | 1.38 | 0 | 51 | NA | NA | NA |
The list of identifiers and their associated genes extracted from the microarray using SamR analysis and put to the sscMap. 18 out these 23 gene identifiers are also identified as differentially expressed genes (DEGs) by the DESeq analysis on the NGS dataset (Table S1). NA indicates that the corresponding gene was not returned as DEG by DESeq and hence is not found in Table S1. Expression fold change is defined as ratio (stimulated/unstimulated). Note that Dseq reported 0 expression for this gene in the unstimulated state, hence ratio(stimulated/unstimulated) and logratio are not defined. Table S5 also contains the signed ranks of these 23 probesetIDs in the 6 instances of reference profiles for cotinine.
Figure 4sscMap output for the signature from the Microarray dataset.
Distribution of candidate compounds that may enhance (right side) or suppress (left side) the phenotype of the Microarray study.
Compounds declared significant between both technologies that had full perturbation stability.
| refsetname | setsize | queryName | queryLength | setscore | sig | Per | refsetname | setsize | queryName | queryLength | setscore | sig | Per |
| cotinine | 6 | RNA-seq | 10 | −0.598 | 1 | 1 | cotinine | 6 | Microarray | 23 | −0.377 | 1 | 1 |
| morantel | 5 | RNA-seq | 10 | −0.557 | 1 | 1 | morantel | 5 | Microarray | 23 | −0.366 | 1 | 1 |
| tobramycin | 4 | RNA-seq | 10 | −0.671 | 1 | 1 | chlorphenesin | 4 | Microarray | 23 | −0.398 | 1 | 1 |
| trioxysalen | 4 | RNA-seq | 10 | −0.658 | 1 | 1 | trioxysalen | 4 | Microarray | 23 | −0.383 | 1 | 1 |
| pentoxyverine | 4 | RNA-seq | 10 | −0.601 | 1 | 1 | trimetazidine | 4 | Microarray | 23 | −0.370 | 1 | 1 |
| levamisole | 4 | RNA-seq | 10 | −0.569 | 1 | 1 | pentoxyverine | 4 | Microarray | 23 | −0.369 | 1 | 1 |
| trimetazidine | 4 | RNA-seq | 10 | −0.552 | 1 | 1 | levamisole | 4 | Microarray | 23 | −0.356 | 1 | 1 |
| chlorphenesin | 4 | RNA-seq | 10 | −0.548 | 1 | 1 | lysergol | 4 | Microarray | 23 | −0.349 | 1 | 1 |
| oxprenolol | 4 | RNA-seq | 10 | −0.535 | 1 | 1 | tobramycin | 4 | Microarray | 23 | −0.348 | 1 | 1 |
| zomepirac | 4 | RNA-seq | 10 | −0.533 | 1 | 1 | oxprenolol | 4 | Microarray | 23 | −0.336 | 1 | 1 |
| lysergol | 4 | RNA-seq | 10 | −0.505 | 1 | 1 | zomepirac | 4 | Microarray | 23 | −0.330 | 1 | 1 |
| fosfosal | 4 | RNA-seq | 10 | −0.428 | 1 | 1 | fosfosal | 4 | Microarray | 23 | −0.280 | 1 | 1 |
| sertaconazole | 4 | RNA-seq | 10 | −0.411 | 1 | 1 | sertaconazole | 4 | Microarray | 23 | −0.256 | 1 | 1 |
| abamectin | 4 | RNA-seq | 10 | −0.392 | 1 | 1 | abamectin | 4 | Microarray | 23 | −0.245 | 1 | 1 |
| saquinavir | 4 | RNA-seq | 10 | −0.359 | 1 | 1 | saquinavir | 4 | Microarray | 23 | −0.243 | 1 | 1 |
| ipratropium bromide | 3 | RNA-seq | 10 | −0.564 | 1 | 1 | ipratropium bromide | 3 | Microarray | 23 | −0.365 | 1 | 1 |
| furazolidone | 4 | RNA-seq | 10 | 0.602 | 1 | 1 | furazolidone | 4 | Microarray | 23 | 0.344 | 1 | 1 |
| 5186223 | 1 | RNA-seq | 10 | 0.701 | 1 | 1 | 5186223 | 1 | Microarray | 23 | 0.504 | 1 | 1 |
The list of compounds that overlapped between the two technologies, which was 18 out of a possible 64. 16 of the 18 compounds were candidates that would potentially suppress the phenotype. queryLength is the number of genes included in the query gene signature. refset is the set of reference profiles for a compound in the cmap database; Setsize is the size of the set of Reference Profiles for that compound in the cmap core database. sig = 1 indicates the connection score is statistically significant; Per = 1 means that the connection has full perturbation stability.
GeneCodis analysis with twenty-two sets of processes that were significantly enriched.
| Items | NumGenes | ListSize | Ref-Support | Ref-Size | pvalue | Corrected pvalue | NGS | NGS-Microarray | Microarray |
| GO:0005886: plasma membrane (CC) | 7 | 24 | 3575 | 34208 | 0.009427 | 0.021997 | LIFR | – | C1orf116, PTGER4, KCNMA1, PMEPA1, ABCC4, TMPRSS2 |
| Transcription Factor: V$ER Q6 01 | 3 | 24 | 202 | 34208 | 0.000375 | 0.003279 | LIFR | – | CAMKK2, GREB1 |
| GO:0016020: membrane (CC),GO:0005886: plasma membrane (CC) | 3 | 24 | 603 | 34208 | 0.008374 | 0.020936 | – | – | KCNMA1, ABCC4, TMPRSS2 |
| GO:0003824: catalytic activity (MF) | 3 | 24 | 371 | 34208 | 0.002163 | 0.009461 | – | – | KLK3, KLK2, HPGD |
| GO:0006508: proteolysis (BP),GO:0008233: peptidase activity (MF), (InterPro) IPR001314: Peptidase S1A chymotrypsin-type, (InterPro) IPR001254: Peptidase S1/S6, chymotrypsin/Hap, GO:0004252: serine-type endopeptidase activity (MF) | 3 | 24 | 97 | 34208 | 0.000043 | 0.000500 | – | – | KLK3, KLK2, TMPRSS2 |
| GO:0005576: extracellular region (CC) | 5 | 24 | 1913 | 34208 | 0.009477 | 0.020731 | LIFR | – | KLK3, THBS1, ORM2, TMPRSS2 |
| GO:0016021: integral to membrane (CC) | 7 | 24 | 4400 | 34208 | 0.027478 | 0.043715 | DHCR24 | – | PTGER4, KCNMA1, PMEPA1, ABCC4, GREB1, CSGALNACT1 |
| GO:0007596: blood coagulation (BP),GO:0030168: platelet activation (BP) | 3 | 24 | 218 | 34208 | 0.000468 | 0.003276 | – | – | THBS1, KCNMA1, ABCC4 |
| GO:0005634: nucleus (CC),GO:0005829: cytosol (CC) | 3 | 24 | 863 | 34208 | 0.021829 | 0.036382 | DHCR24 | CENPN | HPGD |
| GO:0005737: cytoplasm (CC) | 9 | 24 | 5302 | 34208 | 0.007291 | 0.023198 | SAT1, DHCR24, MLPH | FKBP5 | C1orf116, ALDH1A3, CAMKK2, HPGD, TMPRSS2 |
| Transcription Factor: V$MYC Q2 | 3 | 24 | 741 | 34208 | 0.014602 | 0.028393 | – | FKBP5 | CAMKK2, ABCC4 |
| GO:0016020: membrane (CC),Transcription Factor: V$FOXO4 01 | 3 | 24 | 306 | 34208 | 0.001248 | 0.006240 | – | FKBP5 | GREB1, TMPRSS2 |
| GO:0005737: cytoplasm (CC); Transcription Factor: V$SP1 Q6 | 3 | 24 | 862 | 34208 | 0.021763 | 0.038085 | MLPH | FKBP5 | HPGD |
| GO:0005634: nucleus (CC),Transcription Factor: V$E12 Q6 | 3 | 24 | 587 | 34208 | 0.007781 | 0.020949 | – | FKBP5 | HPGD, NKX3-1 |
| GO:0016020: membrane (CC) | 7 | 24 | 4065 | 34208 | 0.018458 | 0.034002 | DHCR24 | FKBP5 | KCNMA1, ABCC4, GREB1, TMPRSS2, CSGALNACT1 |
| GO:0016020: membrane (CC), Transcription Factor: V$E12 Q6 | 3 | 24 | 401 | 34208 | 0.002695 | 0.009433 | – | FKBP5 | KCNMA1, GREB1 |
| Transcription Factors: V$E12 Q6,V | 3 | 24 | 65 | 34208 | 0.000013 | 0.000451 | – | FKBP5 | KCNMA1, HPGD |
| Transcription Factor: V$E12 Q6 | 5 | 24 | 1805 | 34208 | 0.007455 | 0.021744 | – | FKBP5 | KCNMA1, HPGD, GREB1, NKX3-1 |
| GO:0005634: nucleus (CC),Transcription Factor: V$NFY Q6 01 | 3 | 24 | 401 | 34208 | 0.002695 | 0.009433 | DHCR24 | FKBP5 | NKX3-1 |
| GO:0005488: binding (MF) | 3 | 24 | 731 | 34208 | 0.014083 | 0.028993 | NCAPD3 | FKBP5 | ORM2 |
| GO:0016020: membrane (CC),Transcription Factor: V$LEF1 Q2,GO:0005737: cytoplasm (CC) | 3 | 24 | 90 | 34208 | 0.000034 | 0.000599 | DHCR24 | FKBP5 | TMPRSS2 |
| GO:0016020: membrane (CC),Transcription Factor: V$NFAT Q4 01 | 3 | 24 | 286 | 34208 | 0.001028 | 0.005995 | – | – | KCNMA1, GREB1, TMPRSS2 |
GeneCodis analysis for both signatures. pvalue and corrected pvalue are the hypergeometric pvalues with Ref standing for reference.
Figure 5Cotinine was the top candidate compound to suppress the proliferative phenotype.
Validation of cotinine as the top candidate to suppress cell proliferation phenotype induced via androgen pathway. A, Cells in 6 well plate format were treated with vehicle or various doses of cotinine for 96 hours. Total viable cell numbers were counted by haemocytometer. Cell counts are represented as relative to the untreated control value. B, Cells were treated with indicated doses of cotinine and seeded in xCELLigence 16 well E-plates. Real-time analysis of cell doubling rates was recorded and rate of doubling between 72–96 hours was plotted relative to untreated control using the system software. The experiment was repeated three times with similar results. ** denotes using unpaired two-tailed t-test.