| Literature DB >> 17309789 |
Elena A Ananko1, Yury V Kondrakhin, Tatiana I Merkulova, Nikolay A Kolchanov.
Abstract
BACKGROUND: Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-kappaB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17309789 PMCID: PMC1810324 DOI: 10.1186/1471-2105-8-56
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The occurrence frequencies of the IRF1, ISGF3, STAT1, and NF-κB binding sites in the 5'-flanking regions from the training-ISG set with respect to the TSS. The TSS is designated as 0 along the X-axis; Y-axis, the occurrence frequency of a BS within 200 bp region normalized for gene number in the sample; a) putative binding sites; b) true experimentally confirmed sites.
Relative occurrence frequencies of the BSs in the promoter regions of genes grouped according to functional activity
| Relative occurrence frequency of a site in a sample | ||||||
| BS (position with respect to the TSS) | training-ISG set | training-ISG set excl.* | control-Gluco set | control-LipM set | control-EPD set | ISGs/EPD ratio |
| IRF1 (-200 to -1) | 0.77 | 0.70 | 0.21 | 0.18 | 0.21 | 3.67 |
| ISGF3 (-100 to -1) | 0.28 | 0.19 | 0.00 | 0.00 | 0.05 | 5.60 |
| ISGF3 (-200 to -1) | 1.00 | 0.85 | 0.08 | 0.17 | 0.10 | 10.00 |
| STAT1 (-200 to +1) | 0.19 | 0.15 | 0.05 | 0.00 | 0.08 | 2.38 |
| NF-κB (-100 to +100) | 0.93 | 0.80 | 0.46 | 0.46 | 0.12 | 2.02 |
*BSs used for drawing training samples for site recognition were omitted
Patterns for method 0, recognition of any IFN-inducible DNA region (stimulation by any IFN)
| Site 1 | Site 2 | W = F1/F2 (weight for the pattern) |
| strongISGF3 (b)* | - | 8.57 |
| strong ISGF3 (+) | - | 10.48 |
| ISGF3 (b) | - | 9.41 |
| ISGF3 (+) | ISGF3 (+) | 9.41 |
| strong ISGF3 (+) | ISGF3 (+) | 3.14 |
| strong STAT1 (-) | - | 9.36 |
| STAT1 (-) | - | 3.66 |
| STAT1 (-) | STAT1 (+) | 3.76 |
| STAT1 (-) | - | 3.22 |
| strong IRF1 (b) | - | 5.0 |
| IRF1 (-) | - | 5.18 |
| IRF1 (-) | IRF1 (-) | 3.50 |
| AP1 (+) | - | 1.20 |
| AP1 (b) | - | 1.13 |
| NF-Y (-) | - | 1.28 |
| OCT1 (+) | - | 1.31 |
| NF-κB (-) | NF-κB (+) | 5.16 |
| ISGF3 (b) | STAT1 (+) | 3.64 |
| STAT1 (b) | TATA (+) | 3.50 |
| STAT1 (b) | TATA (+) | 13.0 |
| STAT1 (-) | TATA (+) | 3.0 |
| AP1 (b) | TATA (+) | 2.0 |
| STAT1 (+) | IRF1 (b) | 3.74 |
| IRF1 (+) | ISGF3 (b) | 5.1 |
| strong ISGF3 (b) | STAT1 (b) | 10.8 |
| ISGF3 (b) | STAT1 (b) | 5.8 |
| strong IRF1 (b) | STAT1 (b) | 10.8 |
| strong IRF1 (b) | strong ISGF3 (b) | 5.8 |
| ISGF3 (b) | strong IRF1 (b) | 3.8 |
| strong ISGF3 (b) | IRF1 (b) | 4.8 |
| ISGF3 (b) | IRF1 (b) | 4.8 |
| NF-κB (b) | strong IRF1 (b) | 8.8 |
| NF-κB (b) | strong IRF1 (b) | 3.8 |
| NF-κB (b) | AP1 (b) | 1.8 |
| IRF1 (b) | NF-κB (b) | 2.0 |
| AP1 (b) | strong IRF1 (b) | 3.8 |
| AP1 (b) | IRF1 (b) | 1.8 |
| ISGF3 (b) | OCT1 (b) | 1.4 |
| STAT1 (+) | OCT1 (b) | 1.3 |
*b (both) - any direction of a site; + forward DNA strand; - reverse DNA strand;
** Location of a site with respect to the TSS
Patterns for method 1, recognition of DNA regions induced by type I IFNs (IFNα, IFNβ)
| Site 1 | Site 2 | W = F1/F2 (weight for the pattern) |
| strongISGF3 (b)* | - | 3.25 |
| ISGF3 (b) | - | 3.89 |
| ISGF3 (b) | - | 1.83 |
| STAT1 (+) | - | 2.29 |
| STAT1 (b) | - | 1.68 |
| STAT1 (-) | - | 1.67 |
| strong IRF1 (b) | - | 2.76 |
| strong IRF1 (+) | - | 2.0 |
| IRF1 (+) | - | 2.16 |
| strong NF-κB (-) | NF-κB (-) | 1.8 |
| IRF1 (b) | ISGF3 (b) | 7.64 |
| IRF1 (b) | ISGF3 (+) | 4.25 |
| IRF1 (b) | STAT1 (b) | 9.4 |
| ISGF3 (b) | AP1 (b) | 3.74 |
| IRF1 (b) | GATA1 (b) | 3.45 |
| IRF1 (+) | GATA1 (b) | 2.1 |
| ISGF3 (b) | OCT1 (b) | 2.3 |
| ISGF3 (b) | GATA1 (b) | 2.35 |
| ISGF3 (b) | AP1 (b) | 5.31 |
| ISGF3 (b) | OCT1 (b) | 1.92 |
| STAT1 (b) | ISGF3 (b) | 4.53 |
| STAT1 (b) | GATA1 (b) | 1.92 |
| STAT1 (+) | OCT1 (b) | 2.66 |
*b (both) - any direction of a site; + forward DNA strand; - reverse DNA strand;
** Location of a site with respect to the TSS
Patterns for method 2, recognition of DNA regions stimulated by the type II IFN (IFNγ)
| Site 1 | Site 2 | W = F1/F2 (weight for the pattern) |
| strong STAT1 (-) | STAT1 (-) | 20.00 |
| STAT1 (-) | - | 9.36 |
| STAT1 (-) | - | 4.22 |
| strong IRF1 (b) | IRF1 (b) | 10.0 |
| IRF1 (-) | AP1 (b) | 5.4 |
| IRF1 (b) | GATA1 (b) | 1.17 |
| IRF1 (b) | NF-κB (b) | 6.0 |
| IRF1 (+) | OCT1 (b) | 2.0 |
| IRF1 (+) | TATA (+) | 5.2 |
| STAT1 (-) | strong IRF1 (b) | 10.0 |
| strong STAT1 (-) | STAT1 (-) | 20.00 |
| STAT1 (-) | - | 9.36 |
| STAT1 (-) | - | 4.22 |
| strong IRF1 (b) | IRF1 (b) | 10.00 |
| STAT1 (b) | strong IRF1 (b) | 6.1 |
| STAT1 (b) | ISGF3 (b) | 2.1 |
| STAT1 (+) | AP1 (b) | 3.8 |
| STAT1 (+) | OCT1 (b) | 3.3 |
| STAT1 (b) | NF-κB (b) | 4.1 |
| STAT1 (-) | TATA (+) | 4.0 |
| STAT1 (b) | TATA (+) | 8.0 |
| NF-κB (b) | AP1 (b) | 5.4 |
*b (both) - any direction of a site; + forward DNA strand; - reverse DNA strand;
** Location of a site with respect to the TSS
Recognition of the IFN-inducible DNA regions in the microarray-derived genes
| Sample | -5000 to +2000* | -1000 to +1000 |
| Microarray-ISG set | 23% (31%)** | 17.5% (27%) |
| Microarray-ISG subset 1 | 49% (54%) | 24% (26%) |
| Microarray-ISG subset 2 | 37% (37%) | 25% (26%) |
* Sequence size in a sample with respect to TSS
**The percentage for the ISG-training set is in parentheses
Microarray-ISG subset 1, IFN type I-inducible ISGs
Microarray-ISG subset 2, IFN type II-inducible ISGs
Figure 2Dependence of recognition accuracy on the function threshold. a) method 0; b) method 1; c) method 2.
Recognition of the ISGs for various samples
| Sample | Total number of sequences in a sample | Number of recognized genes | Recognized genes, % |
| control-EPD set | 1664 | 74 | 4.4 |
| control-RefSeq set | 6809 | 79 | 1.2 |
| training-ISG set | 72 | 17 | 23.6 |
| microarray-ISG set | 1004 | 39 | 3.9 |
| control-Gluco set | 70 | 0 | 0 |
| control-LipM set | 58 | 0 | 0 |
Accuracy of the BS recognition
| Binding site for the TF | Type I error (α1) underprediction | Type II error (α2) overprediction | Independent control (underestimation value at the given α2) |
| IRF1 | 24% | 9.59E-05 | 31.8% |
| ISGF3 | 25% | 6.84E-04 | 46.2% |
| NF-κ B | 42% | 5.32E-04 | 70.8% |
| STAT1 | 43% | 8.82E-05 | 84.6% |