| Literature DB >> 29900118 |
Yu Kondrakhin1,2, T Valeev1,3, R Sharipov1, I Yevshin1, F Kolpakov1,3, A Kel1,4,5.
Abstract
We compared positional weight matrix-based prediction methods for transcription factor (TF) binding sites using selected fraction of ChIP-seq data with the help of partial AUC measure (limited to false positive rate 0.1, that is the most relevant for the application of the TF search in the genome scale). Comparison of three prediction methods-additive, multiplicative and information-vector based (MATCH) showed an advantage of the MATCH method for majority of transcription factors tested. We demonstrated that application of TF site identifying methods can help to connect the proteomics and phosphoproteomics world of signaling networks to gene regulation and transcriptomics world.Entities:
Keywords: Area under curve; ChIP-Seq; Position weight matrix approach; Protein-DNA interactions; Proteomics versus transcriptomics; The ROC curve; Transcription factor binding site
Year: 2016 PMID: 29900118 PMCID: PMC5988505 DOI: 10.1016/j.euprot.2016.09.001
Source DB: PubMed Journal: EuPA Open Proteom ISSN: 2212-9685
Fig. 1The ROC curves obtained for different values of τ on the YY1-binding regions that were generated by MACS peak detection algorithm. Dark blue lines correspond to the additive model, red lines to the multiplicative model, and light blue lines to the information vector based model (MATCH model).
AUCs calculated for different values of τ on the YY1-binding regions that were generated by MACS peak detection algorithm.
| Site model | Percentage of regions that are classified as “empty” | |||
|---|---|---|---|---|
| Percentage, τ | Information vector based model | Multiplicative model | Additive model | |
| 100 | 0.548 | 0.550 | 0.555 | 0 |
| 50 | 0.707 | 0.694 | 0.716 | 37.5 |
| 35 | 0.782 | 0.744 | 0.778 | 51.5 |
| 25 | 0.835 | 0.817 | 0.852 | 65.4 |
| 15 | 0.892 | 0.899 | 0.918 | 78.8 |
| 5 | 0.956 | 0.963 | 0.972 | 92.9 |
Comparison of three site models with the help of Friedman test using two peak detection algorithms. P-value show the statistical significance of the value of the Friedman test statistic showing global difference of the distributions of AUCs for 265 (for MACS) and 263 (for SISSRs) TF-binding ChIP-seq data.
| Peak detection algorithm | Percentage τ | Friedman test statistic | p-value |
|---|---|---|---|
| MACS | 100 | 17.556 | 1.541 × 10−4 |
| 35 | 108.076 | <10−12 | |
| 25 | 139.908 | <10−12 | |
| 15 | 163.188 | <10−12 | |
| 5 | 218.362 | <10−12 | |
| SISSRs | 100 | 15.165 | 5.093×10−4 |
| 35 | 51.732 | 5.843×10−12 | |
| 25 | 91.103 | <10−12 | |
| 15 | 92.104 | <10−12 | |
| 5 | 106.150 | <10−12 |
Results of comparison of three site model methods applied to the TRANSFAC PWMs on respective ChIP-seq data sets. Three measures of site recognition methods were applied—full AUC and two partial-AUCs. We computed the number of PWMs that gives maximal value of the measure (full AUC or partial-AUC) for the given site model. The last row gives the number of PWMs when all three methods produced equal values for the respective measure. In bold we indicate a method that gives the highest number of PWMs with maximal AUC_FP0.1 criteria.
| A) MACS | |||
|---|---|---|---|
| Site model method. | Number of PWMs with maximal AUC | Number of PWMs with maximal partial AUC_TP0.8 | Number of PWMs with maximal partial AUC_FP0.1 |
| Additive | 152 | 154 | 40 |
| Multiplicative | 61 | 58 | 92 |
| MATCH | 42 | 43 | |
| All three methods give the same AUC value | 10 | 10 | 20 |
Transcription factors found by the combined analysis of transcriptomics and phosphoprotyomics data. With the help of MATCH algorithm we identified overrepresented TF binding sites in promoters of differentially expressed genes (DEG) (from transcriptomics data). TRANSPAC PWM—name of the position weight matrix from TRANSFAC database which was used by MATCH; Yes-No ratio—the ratio of TF site frequency in promoters of DEG compared to the promoters of non-changed genes; p-value—statistical significance of the Yes-No ratio; Phospho Cytoplasm/Nucleus—detection of the phosphorypation of the TF in cytoplasm or in nucleus of the cells (p- phosphorylation was detected, p-up—phosphorylation was found increased upon treatment by RA, p-dn—decreased by RA).
| Gene symbol | TF name | TRANSFAC PWM | Yes-No ratio | P-value | UniProt ID | Phospho Cytoplasm | Phospho Nucleus | Gene description |
|---|---|---|---|---|---|---|---|---|
| RELA | RelA-p65 | V$RELA_Q6 | 1.22 | 2.78E-04 | Q04206 | p | p | v-rel reticuloendotheliosis viral oncogene homolog A (avian) |
| RXRA | RXR-alpha | V$DR4_Q2 | 1.34 | 8.36E-15 | P19793 | p | p | retinoid X receptor, alpha |
| SP1 | Sp1 | V$SP1_Q6_01 | 2.37 | 1.36E-85 | P08047 | p | p-dn | Sp1 transcription factor |
| CTCF | ctcf | V$CTCF_01 | 1.71 | 1.75E-16 | P49711 | p | p | CCCTC-binding factor (zinc finger protein) |
| RXRB | RXR-beta | V$DR4_Q2 | 1.34 | 8.36E-15 | P28702 | p | p | retinoid X receptor, beta |
| TRIM28 | RNF96 | V$RNF96_01 | 2.54 | 6.71E-43 | Q13263 | p-up | p-dn | tripartite motif containing 28 |
| NFYC | NF-YC | V$NFY_Q3 | 1.67 | 1.16E-04 | Q13952 | p | p | nuclear transcription factor Y, gamma |
| SP3 | Sp3 | V$SP1_Q6_01 | 2.37 | 1.36E-85 | Q02447 | p | p | Sp3 transcription factor |
| RREB1 | RREB-1 | V$RREB1_01 | 1.33 | 1.28E-12 | Q92766 | p | p-dn | ras responsive element binding protein 1 |
| NR2F2 | COUP-TF2 | V$DR4_Q2 | 1.34 | 8.36E-15 | P24468 | p | p | nuclear receptor subfamily 2, group F, member 2 |
| KLF4 | GKLF | V$GKLF_Q4 | 1.63 | 4.06E-135 | O43474 | p | p-dn | Kruppel-like factor 4 (gut) |
| PATZ1 | PATZ | V$MAZR_01 | 2.14 | 1.90E-11 | Q9HBE1 | p | p | POZ (BTB) and AT hook containing zinc finger 1 |
Statistically significant common regulators found by the graph algorithm of the geneXplain platform (www.genexplain.com) by searching upstream of TFs listed in Table 5 in the signal transduction network of TRANSPATH database [38]. TF-reached—number of TFs (out of 12 from Table 5) that are reached in the network downstream from the respective common regulator; Score—score of the common regulator calculated on the basis of the number of reached TFs and topology of the network [28]; FDR and Z-score are calculated by multiple randomization of input set of TFs [28].(FDR <0.05 AND Z-Score > 1.0 AND TF-reached > 7).
| TRANSPATH ID | Name of common regulator | TF reached | Score | FDR | Z-Score | Phospho Cytoplasm | Phospho Nucleus |
|---|---|---|---|---|---|---|---|
| MO000056714 | HDAC1 | 8 | 0.623 | 0.036 | 1.031 | p-up | p-up |
| MO000257368 | SUSP1 | 8 | 0.555 | 0.031 | 1.354 | p | p-dn |
| MO000103308 | CKI-gamma1 | 8 | 0.530 | 0.035 | 1.093 | ||
| MO000019363 | RelA-p65 | 7 | 0.484 | 0.030 | 1.679 | p | P |
| MO000132731 | PP4C | 7 | 0.445 | 0.047 | 1.068 | p | P |
| MO000140900 | ing4 | 8 | 0.434 | 0.050 | 1.613 | ||
| MO000272358 | ctcf{sumo} | 7 | 0.390 | 0.035 | 1.455 | p | P |
| MO000284804 | RNF96{p} | 7 | 0.341 | 0.047 | 1.590 | p-up | p-dn |
| MO000107711 | RXR-alpha{sumo} | 8 | 0.337 | 0.033 | 1.549 | p | P |
| MO000272357 | ctcf{sumo} | 7 | 0.275 | 0.049 | 1.564 | p | P |
| MO000284833 | RNF96{pS473}{pS824} | 7 | 0.250 | 0.040 | 1.833 | p-up | p-dn |
Fig. 2Signal transduction diagram that connects two most significant common regulators (light red nodes at the top of the diagram) and TFs (light blue nodes in the middle and at the bottom of the diagram) whose sites found overrepresented in the promoters of differentially expressed genes. Red, blue and gray decoration of the odes in the diagram annotates the phosphorylation of the respective proteins detected in the phosphoproteomics experiment. The left part of the decoration circle corresponds to the protein phosporialytion observed in the cytoplasm of the cells and the right side corresponds to the protein phosphorylation observed in the nucleus. The red color of the decoration corresponds to the increased level of phosphorialtion after treatment of cells by RA, blue color corresponds to decreased level and gray – the same level of phosphorylation of these proteins after the RA treatment.