| Literature DB >> 29069301 |
Fabio Cumbo1,2,3, Davide Vergni4, Daniele Santoni1.
Abstract
Proteins are the core and the engine of every process in cells thus the study of mechanisms that drive the regulation of protein expression, is essential. Transcription factors play a central role in this extremely complex task and they synergically co-operate in order to provide a fine tuning of protein expressions. In the present study, we designed a mathematically well-founded procedure to investigate the mutual positioning of transcription factors binding sites related to a given couple of transcription factors in order to evaluate the possible association between them. We obtained a list of highly related transcription factors couples, whose binding site occurrences significantly group together for a given set of gene promoters, identifying the biological contexts in which the couples are involved in and the processes they should contribute to regulate.Entities:
Keywords: biological process; computational biology; gene regulation; transcription factors
Year: 2018 PMID: 29069301 PMCID: PMC5824945 DOI: 10.1093/dnares/dsx041
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Workflow for the extraction of human TFBSs.
Figure 2.Scheme of the algorithm for computing similarity score of models.
Figure 3.Each point represents a TF couple whose Z-scores are significant (Z > 5) for at least 20 transcripts. The plane is divided into two regions: the upper-left sector, where Overlapping Score is higher than Similarity Score, provides selected couples. Lower-right sector, where Overlapping Score is lower than Similarity Score, provides Discarded couples.
Figure 4.Workflow for the analysis of TF couples. In panel A, the procedure for the selection of relevant couples is depicted. In panel B, the analysis of significant couples is described both in terms of enrichment tests and sorting of best couples.
Categories and terms for the enrichment analysis
| Category | Number of terms | Number of genes |
|---|---|---|
| OMIM expanded | 187 | 2,178 |
| Tissue protein expression from proteomicsDB | 207 | 62,307 |
| KEGG 2015 | 179 | 3,800 |
| OMIM disease | 90 | 1,759 |
| GO molecular function | 1,136 | 12,753 |
| GO cellular component | 641 | 13,236 |
| GO biological process | 5,192 | 14,264 |
| Chromosome location | 386 | 32,740 |
Figure 5.Number of couples enriched in at least one term as a function of their smallest P-value. Solid plot shows the enrichment of the couples identified by our algorithm, dashed plots show the enrichment of random couples for comparison.
The PWM model IDs and the corresponding TF names are reported in columns 1–4 for each couple; the category and the related term the couple resulted enriched in are reported in columns 5–6; column 7 reports the related P-value of the hypergeometric test
| Model 1 | TF 1 | Model 2 | TF 2 | Category | Term | P-value |
|---|---|---|---|---|---|---|
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Molecular function | ubiquitinyl hydrolase activity (GO: 0036459) | 3, 63E–19 |
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Molecular function | cysteine-type peptidase activity (GO: 0008234) | 4, 56E–17 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Gonadal mesoderm development (GO: 0007506) | 6, 60E–15 |
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Biological process | Ubiquitin-dependent protein catabolic process (GO: 0006511) | 3, 59E–14 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Gonadal mesoderm development (GO: 0007506) | 4, 09E–14 |
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Biological process | Modification-dependent protein catabolic process (GO: 0019941) | 4, 42E–14 |
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Biological process | Modification-dependent macromolecule catabolic process (GO: 0043632) | 4, 90E–14 |
| M00106 | CDP CR3+HD | M00967 | HNF4 COUP | GO Biological process | Proteolysis involved in cellular protein catabolic process (GO: 0051603) | 8, 43E–14 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Nucleosome organization (GO: 0034728) | 1, 42E–13 |
| M00777 | STAT | M00980 | TBP | Chromosome location | Chr5q13 | 1, 54E–13 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Protein-DNA complex subunit organization (GO: 0071824) | 4, 01E–13 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Nucleosome assembly (GO: 0006334) | 2, 81E–12 |
| M00457 | STAT5A | M00980 | TBP | Chromosome location | Chr5q13 | 6, 40E–12 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Protein-DNA complex assembly (GO: 0065004) | 8, 69E–12 |
| M00223 | STATx | M00980 | TBP | Chromosome location | Chr5q13 | 4, 82E–11 |
| M00799 | Myc | M00927 | AP-4 | GO Biological process | Gonadal mesoderm development (GO: 0007506) | 1, 12E–10 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Mesoderm development (GO: 0007498) | 2, 71E–10 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Mesenchyme development (GO: 0060485) | 4, 02E–10 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Mesoderm development (GO: 0007498) | 1, 66E–09 |
| M00736 | E2F-1: DP-1 | M00799 | Myc | GO Biological process | Mesenchyme development (GO: 0060485) | 2, 46E–09 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Nucleosome assembly (GO: 0006334) | 4, 46E–09 |
| M00462 | GATA-6 | M00921 | GR | GO Molecular function | FK506 binding (GO: 0005528) | 7, 13E–09 |
| M00462 | GATA-6 | M00921 | GR | GO Molecular function | Macrolide binding (GO: 0005527) | 7, 13E–09 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Protein-DNA complex assembly (GO: 0065004) | 9, 90E–09 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Nucleosome organization (GO: 0034728) | 1, 44E–08 |
| M00655 | PEA3 | M00803 | E2F | Chromosome location | Chr5q13 | 1, 50E–08 |
| M00799 | Myc | M00803 | E2F | GO Biological process | Protein autophosphorylation (GO: 0046777) | 2, 05E–08 |
| M00739 | E2F-4: DP-2 | M00799 | Myc | GO Biological process | Protein-DNA complex subunit organization (GO: 0071824) | 2, 73E–08 |
| M00059 | YY1 | M00148 | SRY | GO Molecular function | Sodium ion transmembrane transporter activity (GO: 0015081) | 2, 96E–08 |
| M00415 | AREB6 | M00706 | TFII-I | KEGG | Valine leucine and isoleucine biosynthesis | 4, 10E–08 |
The PWM model IDs and the corresponding TF names are reported in columns 1–4 for each couple; Column 5 reports the number of transcripts whose Z-scores, related to the couple, are higher than 5; the average Z-score of all the significant (Z > 5) transcripts is reported in column 6; Overlapping and Similarity Scores are reported in columns 7 and 8, respectively
| Model 1 | TF 1 | Model 2 | TF 2 | Number of transcripts | Average | Overlapping score | Similarity score |
|---|---|---|---|---|---|---|---|
| M00083 | MZF1 | M00649 | MAZ | 1,014 | 7, 77 | 0, 19 | 0, 21 |
| M00083 | MZF1 | M00803 | E2F | 456 | 6, 77 | 0, 09 | 0, 21 |
| M00803 | E2F | M00976 | AhR Arnt HIF-1 | 338 | 6, 63 | 0, 03 | 0, 21 |
| M00706 | TFII-I | M00971 | Ets | 317 | 7, 71 | 0, 16 | 0, 24 |
| M00706 | TFII-I | M00803 | E2F | 275 | 6, 61 | 0, 00 | 0, 27 |
| M00148 | SRY | M00747 | IRF-1 | 254 | 7, 77 | 0, 00 | 0, 20 |
| M00148 | SRY | M00471 | TBP | 239 | 6, 78 | 0, 02 | 0, 21 |
| M00148 | SRY | M00980 | TBP | 203 | 6, 45 | 0, 00 | 0, 23 |
| M00698 | HEB | M00803 | E2F | 196 | 6, 35 | 0, 02 | 0, 29 |
| M00649 | MAZ | M00658 | PU.1 | 187 | 7, 49 | 0, 03 | 0, 22 |
| M00649 | MAZ | M00799 | Myc | 184 | 6, 85 | 0, 00 | 0, 33 |
| M00799 | Myc | M00933 | Sp1 | 182 | 7, 06 | 0, 01 | 0, 28 |
| M00462 | GATA-6 | M00471 | TBP | 182 | 6, 47 | 0, 08 | 0, 21 |
| M00799 | Myc | M00931 | Sp1 | 167 | 6, 97 | 0, 00 | 0, 30 |
| M00803 | E2F | M00927 | AP-4 | 160 | 6, 29 | 0, 01 | 0, 24 |
| M00801 | CREB | M00803 | E2F | 153 | 6, 43 | 0, 00 | 0, 28 |
| M00706 | TFII-I | M00931 | Sp1 | 141 | 6, 58 | 0, 04 | 0, 22 |
| M00649 | MAZ | M00971 | Ets | 132 | 7, 32 | 0, 00 | 0, 23 |
| M00933 | Sp1 | M00976 | AhR Arnt HIF-1 | 127 | 6, 65 | 0, 02 | 0, 21 |
| M00803 | E2F | M00981 | CREB ATF | 127 | 6, 72 | 0, 00 | 0, 23 |
| M00799 | Myc | M00932 | Sp1 | 121 | 7, 17 | 0, 00 | 0, 28 |
| M00148 | SRY | M00706 | TFII-I | 120 | 8, 58 | 0, 00 | 0, 31 |
| M00931 | Sp1 | M00976 | AhR Arnt HIF-1 | 117 | 6, 47 | 0, 01 | 0, 20 |
| M00803 | E2F | M00917 | CREB | 115 | 6, 81 | 0, 00 | 0, 25 |
| M00008 | Sp1 | M00706 | TFII-I | 114 | 6, 36 | 0, 02 | 0, 21 |
| M00791 | HNF3 | M00975 | RFX | 107 | 6, 58 | 0, 00 | 0, 20 |
| M00471 | TBP | M00747 | IRF-1 | 107 | 6, 32 | 0, 02 | 0, 25 |
| M00649 | MAZ | M00976 | AhR Arnt HIF-1 | 106 | 6, 28 | 0, 00 | 0, 26 |
| M00148 | SRY | M00962 | AR | 106 | 6, 04 | 0, 00 | 0, 20 |
| M00148 | SRY | M00789 | GATA | 106 | 6, 08 | 0, 00 | 0, 23 |
| M00148 | SRY | M00975 | RFX | 104 | 6, 20 | 0, 00 | 0, 21 |
| M00008 | Sp1 | M00799 | Myc | 102 | 6, 77 | 0, 00 | 0, 29 |
| M00775 | NF-Y | M00803 | E2F | 101 | 6, 24 | 0, 03 | 0, 20 |
The couple Myc, E2F is reported in bold to highlight it is also included in Table 2.
Frequencies of couples as a function of the shortest path (SP) distance for three classes of protein couples: (i) the selected 547 TF couples (namely BEST, first row), (ii) all the couples of TFs (namely ALL, second row) and iii) 10 random sample sets made of 547 randomly picked protein couples from the whole PPI (namely RANDOM, mean and standard deviation in the third and fourth row, respectively)
| SP 1 | SP 2 | SP 3 | SP 4 | SP 5 | SP 6 | SP 7 | |
|---|---|---|---|---|---|---|---|
| BEST | 0.117318 | 0.478585 | 0.284916 | 0.10987 | 0.009311 | 0 | 0 |
| ALL | 0.056789 | 0.463694 | 0.376242 | 0.084080 | 0.014167 | 0.004661 | 0.000368 |
| RANDOM (mean) | 0.003291 | 0.065601 | 0.377048 | 0.385820 | 0.134512 | 0.027972 | 0.005346 |
| RANDOM (stdv) | 0.000943 | 0.006363 | 0.016408 | 0.008298 | 0.009771 | 0.003329 | 0.002414 |