| Literature DB >> 23408876 |
Shao-shan Carol Huang1, David C Clarke, Sara J C Gosline, Adam Labadorf, Candace R Chouinard, William Gordon, Douglas A Lauffenburger, Ernest Fraenkel.
Abstract
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118-310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets.Entities:
Mesh:
Year: 2013 PMID: 23408876 PMCID: PMC3567149 DOI: 10.1371/journal.pcbi.1002887
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Setting up the PCST problem.
A. Finding a network of interactions that link phosphorylation events and differentially transcribed genes can be formulated as an optimization problem on a protein interactome. The objective function (equation in box) represents a balance between excluding nodes for which there is experimental evidence (phosphorylated proteins as yellow circles and transcription factors as blue triangles) and including edges weighted by reliability. The light grey rectangle containing edges from transcription factors to target mRNAs indicates these edges are not directly included in the interactome. Instead, they are used to infer the activity of transcription factor candidates (see Materials and Methods). The optimal solution to the PCST problem connects the phoshoprotein termini and the transcription factor termini by reliable interactions (red lines) that may involve nodes not explicitly observed in the experimental data (Steiner nodes; dark grey circles). TF: transcription factor. DHS: differentially hypersensitive. DE: differential expression. The superscripts a to e correspond to the superscript labels of input data types in B. B. The input datasets from U87MG EGFRvIII-expressing cells used in this study.
Figure 2PCST constructed from the U87 datasets.
This is a composite network representing the union of the optimal solution to the original PCST problem and 10 suboptimal solutions where 15 percent of the nodes must be different from the optimal solution. TF: transcription factor. Node weight: the log2 fold changes in phosphorylation from the phosphoproteomic data comparing U87H to U87DK cells, or values from the expression regression procedure using the mRNA microarray, DNase-Seq and transcription factor motif data. The absolute value of node weights was used as penalty values for the PCST algorithm.
Figure 3The PCST solution network is compact, relevant to GBM and specific to EGFRvIII.
A. The number of nodes in networks constructed from multiple approaches and their overlap with the PCST solution. NN of pY termini: the proteins containing phosphorylated tyrosine residues reported by mass-spectrometry and their direct interactors (nearest neighbors) in the interactome. NN of TF termini: transcription factor candidates selected by the expression regression procedure and their direct interactors in the interactome. NN of all termini: the union of pY termini, TF termini and their direct interactors in the interactome. RN: a network constructed by using a flow based approach ResponseNet [16] to connect the pY termini to the TF termini. B. GBM gene ranker scores for nodes included in the PCST solution were significantly higher than the nodes excluded from the PCST solution (labeled as “Interactome excl. PCST”; p<2.2E-16 by Wilcoxon rank-sum test) and compared favorably to the nearest neighbor networks. Higher GBM scores indicate greater relevance to the disease. C. Scoring proteins by connectivity to the PCST solution representing a disease network. The score of each protein, whether the protein is inside or outside of the original PCST network, is the sum of the scores of all its interactions with the nodes in the PCST. Thus a node in the interactome (deep red) with many high confidence interactions to the nodes in the PCST disease network receives a higher score than a node in the interactome (light red) that has fewer or lower confidence interactions to the nodes in the PCST. D. Proteins with EGFRvIII regulated tyrosine phosphorylation in mouse xenografts (red bars) are more closely connected to the PCST solution than the proteins on which the tyrosine phosphorylation levels do not change significantly (green bars). Each protein in the interactome was scored then ranked by its connectivity to the PCST solution constructed from the U87 cell line data as described in B and in Materials and Methods. P-value was computed by Wilcoxon rank-sum test comparing the ranks of EGFRvIII-specific and not EGFRvIII specific phosphorylated proteins. The number of proteins in each category is indicated in parentheses. E. The targets for transcription factors identified in condition-specific DNaseI hypersensitive regions are enriched for genes differentially expressed in response to EGFRvIII. U87H TF: transcription factors that have motif matches in regions with increased DNaseI hypersensitivity in the U87H cells and within 40 kb of transcription start sites. U87DK TF: transcription factors that have motif matches in the regions with higher DNaseI hypersensitivity in the U87DK cells and within 40 kb of transcription start sites. EGFRvIII up- and down-regulated genes: genes that are up- or down- regulated in the TCGA GBM exon array dataset comparing EGFRvIII positive samples to wild-type EGFR samples. For each TF, we computed a minimum hypergeometric (mHG) p-value that tested for the probability that the set of target genes are differentially expressed in the TCGA GBM samples by chance. Top panel: U87H TF targets are more enriched (smaller mHG values) in EGFRvIII up-regulated genes than in EGFRvIII down-regulated genes. Bottom panel: U87DK TF targets are more enriched in EGFRvIII down-regulated genes than in EGFRvIII up-regulated genes. P-values were computed by Student's t-test comparing the mHG p-values on EGFRvIII up- and down-regulated genes for each set of TF. F. The transcription factors included in the PCST solution are more enriched in EGFRvIII-induced differential gene expression than the transcription factors excluded from the PCST. Each set of U87H TF and U87DK TF were further divided into whether they were included in the PCST solution, denoted by the “Yes” and “No” categories. First panel: targets of U87H TF included in the PCST solution have stronger enrichment in EGFRvIII up-regulated genes than targets of the TF excluded from the solution. Fourth panel: targets of U87DK TF included in the PCST solution have stronger enrichment in EGFRvIII down-regulated genes than targets of the TF excluded from the PCST. Second and third panel: with respect to the comparison between U87H TF targets and EGFRvIII down-regulated genes, or between U87DK TF targets and EGFRvIII up-regulated genes, the TF included in the PCST do not show significantly stronger enrichment than the TF excluded from the PCST. P-values were computed by Student's t-test comparing the mHG scores of TF included in the PCST and TF excluded from the PCST.
High-, mid- lower-ranked nodes by the PCST network and the experiments used to validate their importance.
| Experiment | Small molecule inhibitor | Antibody | Target | Target rank | Target type |
| Viability | Dasatinib | SRC | 3 | High-ranked target | |
| FYN | 12 | ||||
| ChIP-Seq | sc-585x | EP300 | 4 | High-ranked target | |
| Viability | ICG-001 | CREBBP | 5 | High-ranked target | |
| Viability | 4-hydroxytamoxifen (4-OHT) | ESR1 | 15 | High-ranked target | |
| Viability | suberoylanilide hydroxamic acid (SAHA) | HDAC1 | 19 | High-ranked target | |
| Viability | PKF118–310 | CTNNB1 | 21 | High-ranked target | |
| Viability | ammonium pyrrolidinedithiocarbamate (PDTC) | NFKB1 | 23 | High-ranked target | |
| Viability | 17-N-Allylamino-17-demethoxygeldanamycin (17-AAG) | HSP90AA1 | 26 | High-ranked target | |
| Viability | SB-505124 | TGFBR1 | 193 | Mid-ranked target | |
| Viability | SB-431542 | TGFBR1 | 193 | Mid-ranked target | |
| ACVR1B | 1695 | ||||
| Viability | Rapamycin | MTOR | 698 | Lower-ranked target | |
| Viability | D4476 | CSNK1A1 | 875 | Lower-ranked target | |
| Viability | Harmine | DYRK1A | 2232 | Lower-ranked target | |
| MAOA | 8508.5 | ||||
| Viability | Paclitaxel | TUBB1 | 3582 | Lower-ranked target |
For cell viability assays, the inhibitors used are listed. Note that some inhibitors have multiple targets. For ChIP-Seq, the antibody used is listed.
Figure 4Validation of targets predicted by network connectivity by cell viability assays.
A. Cell viability for treatment with compounds targeting high-scoring nodes (high-ranked targets), intermediate-scoring nodes (mid-ranked targets) and low-scoring nodes (lower-ranked targets), at 0.5 µM concentration of 17-AAG, 5 µM for harmine (due to low solubility in DMSO) and 10 µM concentration of others. The color bar at the top of each target corresponds to its relative ranking within the interactome. B. Dose response curves of compounds targeting high-scoring nodes and lower-scoring nodes for those that can be fitted to the four-parameter log-logistic model (lack-of-fit test p-value>0.05). P-values between cell lines were computed by comparing the model where one curve was fitted to the data from each cell line to the null model where one shared curve was fitted to the data from both cell lines.
Figure 5ChIP-Seq reveals functional role of p300.
A. EMT marker genes bound by p300 in U87H cells. Shown are genome browser tracks for p300 bound regions near several EMT marker genes, where the horizontal axis represent coordinates along the genome and the height of the solid area represents the number of ChIP-Seq reads mapped to a position in the genome. For each region we show this signal from the ChIP sample that used an antibody specific to p300 (bottom track) and the signal from the sample that used an IgG antibody for non-specific binding (top track). Arrow indicates direction of transcription. B. Regions that are more hypersensitive (HS) in the U87H cells were significantly enriched for overlap with p300 binding regions (p<1E-05) compared to a background of all regions called hypersensitive in U87H cells, for a range of peak calling thresholds of hypersensitivity specified on the x-axis tick marks. Enrichment p-values computed by Fisher exact test are indicated immediately below each set of bars.
Enriched Gene Ontology (GO) categories of p300 target genes in U87H cells.
| GO Term | Description | P-value | % FDR | Official gene symbol |
| GO:0007155 | cell adhesion | 1.15E-03 | 2.05 | AEBP1, NRP1, THRA, MYBPC3, CUZD1, CDH22, WISP1, ROBO1, DGCR6, CNTNAP2, ZYX, LOXL2, DLG1, SPON1, ROCK1, PTPRF, NRXN2, TRPM7, PCDHB1, SDK1, ACTN1, PTPRT, NRXN1, PSEN1, HAS1, GPR56, VCAN, SEMA4D, CD226, PARVA, PKHD1, ITGAE, TNC, ITGA11, ITGB2, PKD1L1, ITGAM, CLDN14, CDH5, ITGBL1, LY6D, SORBS1, PTK2B, COL27A1, TTYH1, ITGB6, BAI1, COL6A2, TSTA3, THBS2, TECTA, COL18A1, MAG, FLRT1, COL5A3, RAPH1, CLDN23, LYVE1, COL14A1, LAMA3, CDH16, ITGA6, ERBB2IP, CD300A, DSG3, PKP4, FCGBP, MUC5AC, CDH11, MUC16 |
| GO:0022610 | biological adhesion | 1.16E-03 | 2.07 | AEBP1, NRP1, THRA, MYBPC3, CUZD1, CDH22, WISP1, ROBO1, DGCR6, CNTNAP2, ZYX, LOXL2, DLG1, SPON1, ROCK1, PTPRF, NRXN2, TRPM7, PCDHB1, SDK1, ACTN1, PTPRT, NRXN1, PSEN1, HAS1, GPR56, VCAN, SEMA4D, CD226, PARVA, PKHD1, ITGAE, TNC, ITGA11, ITGB2, PKD1L1, ITGAM, CLDN14, CDH5, ITGBL1, LY6D, SORBS1, PTK2B, COL27A1, TTYH1, ITGB6, BAI1, COL6A2, TSTA3, THBS2, TECTA, COL18A1, MAG, FLRT1, COL5A3, RAPH1, CLDN23, LYVE1, COL14A1, LAMA3, CDH16, ITGA6, ERBB2IP, CD300A, DSG3, PKP4, FCGBP, MUC5AC, CDH11, MUC16 |
| GO:0032870 | cellular response to hormone stimulus | 1.65E-03 | 2.94 | ADCY3, IRS2, WDTC1, ADCY2, THRA, ADCY5, PRKCI, AP3S1, IGF2, CUZD1, TRH, IRS1, GNG8, GRB10, SORBS1, GNB1, PRKAR1B, GHRL, GNAS, HDAC9 |
| GO:0009755 | hormone-mediated signaling | 2.34E-03 | 4.16 | GNG8, ADCY3, ADCY2, THRA, GNB1, ADCY5, PRKAR1B, GHRL, GNAS, CUZD1, TRH |
1,969 genes within 10 kb of 7,657 high stringency peaks called by MACS (p<1E-07) [112] were input into the DAVID functional annotation tool [120] to identify enriched Biological Process terms with FDR <5%.
Summary of anti-tumor therapies corresponding to the high-ranked targets and compounds that are found to exert significant killing, with emphasis on relevance to GBM.
| Compound | Target | Current status |
| Dasatinib | SRC, FYN | Ongoing phase 1 and 2 trials of mono- and combination-therapy for primary and recurrent GBM |
| ICG-001 | CREBBP | Related compound PRI-724 in phase 1 trial of advanced colorectal cancer and pancreatic cancer (trial identifier NCT01302405); no pre-clinical data for GBM. |
| 17-AAG | HSP90AA1 | Phase 1, 2, and 3 trials in multiple cancer types but not including GBM |
| 4-OHT | ESR1 | A subgroup of GBM patients responded to high dose tamoxifen |
| SAHA | HDAC1 | Modest single agent activity in a phase 2 trial for GBM |
| PKF118–310 | CTNNB1 | Showed efficacy in models of multiple cancer types but not yet tested for GBM. |