| Literature DB >> 26627005 |
Cornelia Meckbach1, Rebecca Tacke2, Xu Hua3, Stephan Waack4, Edgar Wingender5, Mehmet Gültas6.
Abstract
BACKGROUND: Transcription factors (TFs) are important regulatory proteins that govern transcriptional regulation. Today, it is known that in higher organisms different TFs have to cooperate rather than acting individually in order to control complex genetic programs. The identification of these interactions is an important challenge for understanding the molecular mechanisms of regulating biological processes. In this study, we present a new method based on pointwise mutual information, PC-TraFF, which considers the genome as a document, the sequences as sentences, and TF binding sites (TFBSs) as words to identify interacting TFs in a set of sequences.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26627005 PMCID: PMC4667426 DOI: 10.1186/s12859-015-0827-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Total number of edges in method-dependent significant collaboration networks
| Total number of edges in predicted collaboration network | ||||||
|---|---|---|---|---|---|---|
| Sequence sets of RefSeq genes in |
|
|
|
|
|
|
| Genome-wide analysis | 54 | 86 | 91 | 19 | 17 | 21 |
| Breast cancer analysis | 64 | 82 | 88 | 13 | 6 | 25 |
Total number of edges in two predicted collaboration networks of different methods
| Total number of common edges in collaboration networks | ||
|---|---|---|
| Genome-wide analysis | Breast cancer analysis | |
|
| 43 | 54 |
|
| 41 | 43 |
|
| 3 | 1 |
|
| 6 | 0 |
|
| 0 | 0 |
|
| 82 | 80 |
|
| 4 | 1 |
|
| 8 | 1 |
|
| 2 | 0 |
|
| 4 | 1 |
|
| 9 | 0 |
|
| 2 | 0 |
|
| 1 | 0 |
|
| 0 | 1 |
|
| 3 | 1 |
Performance comparison between PC-TraFF20,PC-TraFF50,PC-TraFF100, MatrixCatch (MC), CPModule (CPM), and CrmMiner (CrmM)
| Sensitivity | Specificity | MCC | |
|---|---|---|---|
| PC-TraFF20 | 2.3 | 99.5 | 0.088 |
| PC-TraFF50 | 3.1 | 99.3 | 0.10 |
| PC-TraFF100 | 3.2 | 99.3 | 0.102 |
|
| 0.5 | 99.9 | 0.053 |
|
| 0.5 | 100 | 0.06 |
|
| 0.6 | 99.6 | 0.025 |
The complementary usage of different methods can lead to an improved performance in identifying important pairs in sequences
| Sensitivity | Specificity | MCC | |
|---|---|---|---|
| PC-TraFF20∪ | 2.8 % | 99.5 % | 0.101 |
| PC-TraFF50∪ | 3.6 % | 99.3 % | 0.112 |
| PC-TraFF100∪ | 3.8 % | 99.3 % | 0.114 |
| PC-TraFF20∪ | 2.6 % | 99.5 % | 0.099 |
| PC-TraFF50∪ | 3.4 % | 99.3 % | 0.107 |
| PC-TraFF100∪ | 3.5 % | 99.3 % | 0.109 |
| PC-TraFF20∪ | 3.0 % | 99.2 % | 0.087 |
| PC-TraFF50∪ | 3.8 % | 99 % | 0.10 |
| PC-TraFF100∪ | 3.9 % | 99 % | 0.102 |
|
| 1.0 % | 99.9 % | 0.079 |
|
| 1.2 % | 99.6 % | 0.050 |
|
| 1.2 % | 99.6 % | 0.051 |
| PC-TraFF20∪ | 3.1 % | 99.5 % | 0.11 |
| PC-TraFF50∪ | 3.8 % | 99.3 % | 0.118 |
| PC-TraFF100∪ | 4 % | 99.3 % | 0.12 |
| PC-TraFF20∪ | 3.8 % | 99.2 % | 0.10 |
| PC-TraFF50∪ | 4.5 % | 99 % | 0.116 |
| PC-TraFF100∪ | 4.7 % | 99 % | 0.119 |
|
| 1.7 % | 99.6 % | 0.07 |
Significant TFBS pairs found by PC-TraF in genome-wide promoter analysis of human RefSeq genes. The table shows the top 10 significant TFBS pairs, which are sorted in descending order based on their z-scores
| Significant pair | Z-score | Reference | ||
|---|---|---|---|---|
| V$PU1_Q6 | - | V$ETS_Q6 | 9.84 | TRANSCompel |
| BioGRID, STRING | ||||
| V$CETS1P54_01 | - | V$ETS_Q6 | 5.76 | TRANSCompel |
| BioGRID, STRING | ||||
| V$ETS_Q4 | - | V$ETS_Q6 | 5.49 | TRANSCompel |
| BioGRID, STRING | ||||
| V$EGR_Q6 | - | V$SP1_Q2_01 | 5.09 | BioGRID, STRING |
| V$CETS1P54_01 | - | V$SP1_Q2_01 | 4.94 | TRANSCompel |
| STRING | ||||
| V$AP1_Q2_01 | - | V$AP1_Q4_01 | 4.69 | TRANSCompel |
| BioGRID | ||||
| V$STAT6_01 | - | V$OCT_Q6 | 4.66 | - |
| V$CEBPB_02 | - | V$STAT6_01 | 4.58 | TRANSCompel |
| STRING | ||||
| V$MYCMAX_B | - | V$SP1_Q2_01 | 4.36 | BioGRID, STRING |
| V$AP1FJ_Q2 | - | V$AP1_Q2 | 4.09 | TRANSCompel |
| BioGRID, STRING | ||||
Fig. 1PC-TraFF significant collaborating TFBS pairs based on promoter sequences of human RefSeq genes. Blue lines denote interactions between TFs whose importance is experimentally verified whereas red lines indicate potential interactions between transcription factors that have not been experimentally validated yet
The hubs and their top three collaboration partners in the predicted collaboration network of significant TFBS pairs for human RefSeq genes
| Hub | Top three collaborating pairs | Z-score | Reference |
|---|---|---|---|
| V$SP1_Q2_01 | V$EGR_Q6 | 5.09 | BioGRID, STRING |
| V$CETS1P54_01 | 4.94 | TRANSCompel | |
| V$MYCMAX_B | 4.36 | BioGRID, STRING | |
| V$STAT6_01 | V$OCT_Q6 | 4.66 | - |
| V$CEBPB_02 | 4.58 | TRANSCompel | |
| V$CEBP_Q2_01 | 3.74 | TRANSCompel | |
| V$CETS1P54_01 | V$ETS_Q6 | 5.76 | TRANSCompel |
| V$SP1_Q2_01 | 4.94 | TRANSCompel | |
| V$NFKB_Q6 | 3.96 | TRANSCompel | |
| V$AP1_Q4_01 | V$AP1_Q2_01 | 4.69 | TRANSCompel |
| V$STAT6_01 | 3.35 | TRANSCompel | |
| V$AP1_Q6 | 3.35 | TRANSCompel |
Significant TFBS pairs found by PC-TraFF in genome-wide promoter analysis of human miRNA genes. The table shows the top 10 significant TFBS pairs, which are sorted in descending order based on their z-scores
| Significant pair | Z-score | Reference | ||
|---|---|---|---|---|
| V$STAT6_01 | - | V$HMGIY_Q6 | 13.73 | - |
| V$HMGIY_Q6 | - | V$LEF_Q2 | 5.89 | - |
| V$HMGIY_Q6 | - | V$GATA_Q6 | 5.18 | - |
| V$CREB_Q3 | - | V$AP1_Q4_01 | 5.16 | BioGRID, STRING |
| V$MYCMAX_B | - | V$AHRIF_Q6 | 5.03 | BioGRID, STRING |
| V$STAT6_01 | - | V$AP1_Q4_01 | 4.98 | TRANSCompel |
| BioGRID, STRING | ||||
| V$HMGIY_Q6 | - | V$AP1_Q4_01 | 4.97 | BioGRID, STRING |
| V$STAT6_01 | - | V$LEF_Q2 | 4.83 | - |
| V$SF1_Q6 | - | V$HNF4_Q6 | 4.79 | - |
| V$HMGIY_Q6 | - | V$CREB_Q3 | 4.79 | BioGRID, STRING |
Fig. 2PC-TraFF significant collaborating TFBS pairs based on promoter sequences of human miRNA genes. Blue lines denote interactions between TFs whose importance is experimentally verified whereas red lines indicate potential interactions between transcription factors that have not been experimentally validated yet
The hubs and their top three cooperation pairs in the predicted collaboration network of significant TFBS pairs for human miRNA genes
| Hub | Top three collaborating pairs | Z-score | Reference |
|---|---|---|---|
| V$AP1_Q4_01 | V$CREB_Q3 | 5.16 | BioGRID, STRING |
| V$STAT6_01 | 4.98 | TRANSCompel | |
| V$HMGIY_Q6 | 4.97 | BioGRID, STRING | |
| V$CETS1P54_01 | V$MYCMAX_B | 4.33 | - |
| V$PU1_Q6 | 3.67 | TRANSCompel | |
| V$EGR_Q6 | 3.64 | - | |
| V$STAT6_01 | V$HMGIY_Q6 | 13.73 | - |
| V$AP1_Q4_01 | 4.98 | TRANSCompel | |
| V$LEF_Q2 | 4.82 | - |
Fig. 3PC-TraFF significant collaborating TFBS pairs based on breast cancer-associated promoter sequences of human RefSeq genes. Blue lines denote interactions between TFs whose importance is experimentally verified whereas red lines indicate potential interactions between transcription factors that have not been experimentally validated yet. The binding sites V$NFKB_Q6, V$CETS1P54_01, and V$MYCMAX_B constitute three hubs in the predicted collaboration network of significant TFBS pairs. The hubs and their top three collaboration partners are given in Table 9
The hubs and their top three collaboration partners in the predicted collaboration network of breast cancer-associated significant TFBS pairs for human RefSeq genes
| Hub | Top three collaborating pairs | Z-score | Reference |
|---|---|---|---|
| V$NFKB_Q6 | V$CETS1P54_01 | 5.42 | TRANSCompel |
| V$ETS_Q6 | 4.80 | BioGRID, TRANSCompel | |
| V$SP1_Q4_01 | 3.43 | BioGRID, TRANSCompel | |
| V$CETS1P54_01 | V$ETS_Q6 | 8.01 | BioGRID, TRANSCompel |
| V$NFKB_Q6 | 5.42 | TRANSCompel | |
| V$MYCMAX_B | 5.21 | - | |
| V$MYCMAX_B | V$CETS1P54_01 | 5.16 | - |
| V$E2F_Q3_01 | 5.21 | TRANSCompel | |
| V$AHRHIF_Q6 | 4.39 | BioGRID, STRING |
Fig. 4PC-TraFF significant collaborating TFBS pairs based on breast cancer-associated promoter sequences of human miRNA genes. Blue lines denote interactions between TFs whose importance is experimentally verified whereas red lines indicate potential interactions between transcription factors, that have not been experimentally validated yet
The hubs and their top three collaboration partners in the predicted collaboration network of significant TFBS pairs for breast cancer-associated human miRNA genes
| Hub | Top three collaborating pairs | Z-score | Reference |
|---|---|---|---|
| V$STAT6_01 | V$HMGIY_Q6 | 13.28 | - |
| V$MYB_Q5_01 | 5.77 | - | |
| V$GATA_Q6 | 4.98 | - | |
| V$ETS_Q6 | V$PU1_Q6 | 13.49 | TRANSCompel |
| V$SF1_Q6 | 6.16 | - | |
| V$CETS1P54_01 | 5.00 | TRANSCompel | |
| V$AP1_Q4_01 | V$HMGIY_Q6 | 4.85 | BioGRID, STRING |
| V$LEF1_Q2 | 4.27 | BioGRID | |
| V$STAT6_01 | 4.17 | TRANSCompel, BioGRID, STRING | |
| V$HMGIY_Q6 | V$STAT6_01 | 13.28 | - |
| V$MYB_Q5_01 | 6.17 | - | |
| V$LEF1_Q2 | 6.00 | - | |
| V$PU1_Q6 | V$ETS_Q6 | 13.49 | TRANSCompel |
| V$SF1_Q6 | 5.88 | - | |
| V$CETS168_Q6 | 3.29 | TRANSCompel |
Number of promoter sequences of breast cancer subtype-associated RefSeq genes and corresponding significant pairs found by PC-TraFF
| Subtype | Number of sequences | Number of Pairs |
|---|---|---|
| Luminal A | 86 | 61 |
| Luminal B | 57 | 62 |
| Basal-like | 31 | 72 |
| Normal-like | 27 | 49 |
| ErbB2 over-expressing | 16 | 62 |
Number of pairwise overlapping significant pairs of the RefSeq genes of breast cancer subtypes Luminal A, Luminal B, Basal-like, Normal-like, and ErbB2 over-expressing
| Subtype | Luminal A | Luminal B | Basal-like | Normal-like | ErbB2 |
|---|---|---|---|---|---|
| over-exp. | |||||
| Luminal A | - | 36 | 28 | 26 | 23 |
| Luminal B | - | 30 | 20 | 19 | |
| Basal-like | - | 25 | 19 | ||
| Normal-like | - | 16 | |||
| ErbB2 over-exp. | - |
Six PC-TraFF significant TFBS pairs found in promoter sequences of RefSeq genes of all five breast cancer subtypes
| Significant pairs | Reference | ||
|---|---|---|---|
| V$MYCMAX_B | - | V$E2F_Q3_01 | TRANSCompel |
| V$CETS1P54_01 | - | V$PEBP_Q6 | TRANSCompel |
| V$CETS1P54_01 | - | V$NFKB_Q6 | TRANSCompel |
| V$CEBP_Q2 | - | V$STAT6_01 | TRANSCompel |
| V$AP1_Q2_01 | - | V$AP1_Q4_01 | TRANSCompel |
| V$CEBPB_02 | - | V$STAT6_01 | TRANSCompel |
Number of breast cancer subtype-associated miRNA genes and corresponding significant pairs found by PC-TraFF
| Subtype | Number of miRNAs | Number of Pairs |
|---|---|---|
| Luminal A | 186 | 46 |
| Luminal B | 53 | 61 |
| Basal-like | 76 | 45 |
| Normal-like | 23 | 52 |
| ErbB2 over-expressing | 70 | 45 |
Number of pairwise overlapping significant pairs of the miRNA analysis of breast cancer subtypes Luminal A, Luminal B, Basal-like, Normal-like, and ErbB2 over-expressing
| Subtype | Luminal A | Luminal B | Basal-like | Normal like | ErbB2 |
|---|---|---|---|---|---|
| over-exp. | |||||
| Luminal A | - | 38 | 28 | 31 | 30 |
| Luminal B | - | 31 | 32 | 33 | |
| Basal-like | - | 27 | 24 | ||
| Normal-like | - | 27 | |||
| ErbB2 over-exp. | - |
20 PC-TraFF significant TFBS pairs found in promoter sequences of miRNA genes of all five breast cancer subtypes
| Significant pairs | Reference | ||
|---|---|---|---|
| V$STAT6_01 | - | V$HMGIY_Q6 | - |
| V$HMGIY_Q6 | - | V$LEF1_Q2 | - |
| V$HMGIY_Q6 | - | V$MYB_Q5_01 | - |
| V$STAT6_01 | - | V$MYB_Q5_01 | - |
| V$SF1_Q6 | - | V$CETS168_Q6 | - |
| V$HMGIY_Q6 | - | V$AP1_Q4_01 | BioGRID, STRING |
| V$STAT6_01 | - | V$AP1_Q4_01 | TRANSCompel |
| BioGRID, STRING | |||
| V$STAT6_01 | - | V$GATA_Q6 | - |
| V$HMGIY_Q6 | - | V$GATA_Q6 | - |
| V$GATA_Q6 | - | V$LEF1_Q2 | - |
| V$MYCMAX_B | - | V$AHRHIF_Q6 | BioGRID, STRING |
| V$AP1_C | - | V$AP1_Q4_01 | TRANSCompel |
| BioGRID, STRING | |||
| V$SF1_Q6 | - | V$E2A_Q6 | - |
| V$SF1_Q6 | - | V$HNF4_Q6 | - |
| V$GATA_Q6 | - | V$AP1_Q4_01 | TRANSCompel |
| BioGRID, STRING | |||
| V$LEF1_Q2 | - | V$AP1_Q4_01 | BioGRID |
| V$MYCMAX_B | - | V$E2F_Q3_01 | TRANSCompel |
| V$NFKAPPAB65_01 | - | V$CREL_01 | BioGRID, STRING |
| V$STAT_01 | - | V$HMGIY_Q6 | - |
| V$E2F_Q3_01 | - | V$AHRHIF_Q6 | - |
Computational time (in seconds) / memory usage (in megabyte) of the individual tools
| Genome-wide analysis | Breast cancer analysis | |
|---|---|---|
| PC-TraFF | 4158.4 s / 3229 Mb | 4.4 s / 581 Mb |
| CPModule | 2213.0 s / 721.6 Mb | 5.9 s / 7.8 Mb |
| CrmMiner | 34409.6 s / 526 Mb | 857.4 s / 90 Mb |
| MatrixCatch | 627.2 s / 70.7 Mb | 16.9 s / 46.2 Mb |
Fig. 5Filtering procedure of the overlap filter. Overlapping TFBSs of the same type (marked in red cycles) are filtered in a way that the TFBS survives which is closer to TSS
Fig. 6The problem of homotypic clusters: The TFBSs (t ) form an homotypic cluster within a certain interval on the sequence. The TFBS t is also included in this interval. According to our definition to construct TFBS pairs and by following the DNA strand in 5’-3’ direction: i) we consider one t −t pair in this interval indicating that an individual TFBS can only participate in one count of a specified pair (shown with black line); ii) if we consider t −t pairs, there are two pairs within this interval (shown with blue lines). The red (dashed) lines demonstrate that the remaining t −t and t −t pairs are not taken into account in the calculation of pointwise mutual information of this pairs