| Literature DB >> 22369687 |
Tzong-Yi Lee1, Wen-Chi Chang, Justin Bo-Kai Hsu, Tzu-Hao Chang, Dray-Ming Shien.
Abstract
BACKGROUND: Sequence features in promoter regions are involved in regulating gene transcription initiation. Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22369687 PMCID: PMC3587379 DOI: 10.1186/1471-2164-13-S1-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1System flow of GPMiner.
Figure 2Analytical flowchart of promoter identification.
Supported regulatory features in GPMiner
| Regulatory features | Integrated database or tools | Descriptions |
|---|---|---|
| Transcriptional start site | NNPP2.2 [ | Applying a time-delay neural network for promoter annotation |
| McPromoter [ | Using a statistical method to identify eukaryotic polymerase II TSS in genomic DNA | |
| Eponine [ | Predicting the transcription start site for a DNA sequence with prediction specificity > 70% | |
| Transcription factor (TF) binding site | TRANSFAC public release 7.0 [ | Storing the experimentally verified transcription factors, their genomic binding sites and DNA-binding profiles |
| MATCH [ | Scanning the transcription factor binding site using the transcription factor binding profiles from TRANSFAC public release 7.0 and JASPAR | |
| CpG island | CpGProD [ | Detecting the CpG island |
| Repeats | TRF [ | A tandem repeat finder |
| TATA box, CCAAT box, and GC box | MATCH [ | Scanning the TATA-, CCAAT- and GC-box by the transcription factor binding profiles from TRANSFAC |
| Narang | Defining the 6-mer pattern of the TATA box, CCAAT box, and GX box with positional density | |
| Over-represented pattern | Huang | Defining the statistically significant pattern in the promoter region |
| DNA stability | Aditi Kanhere | Predicting the DNA stability of the promoter region |
| Co-occurrence of TF binding sites | apriori [ | A method to mine the association rules |
| Conserved regions between homologous gene promoter sequences | Blast [ | Using the |
The prediction performance of SVM models with combinations of three kinds of regulatory features such as over-represented hexamer nucleotides (OR), nucleotide composition (NC), and DNA stability (DS), is evaluated by benchmark "Cross-validation" based on the specified window size -200 to +100 of TSS(+1).
| Training set | Window size | Features | Precision | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|---|---|
| All | -200 ~+100 | OR+NC | 77% | 71% | 79% | 75% |
| -200 ~+100 | OR+DS | 76% | 69% | 78% | 74% | |
| -200 ~+100 | NC+DS | 75% | 74% | 76% | 75% | |
| -200 ~+100 | OR+NC+DS | 79% | 76% | 79% | 78% | |
| With CpG | -200 ~+100 | OR+NC | 79% | 81% | 79% | 80% |
| -200 ~+100 | OR+DS | 77% | 80% | 76% | 78% | |
| -200 ~+100 | NC+DS | 77% | 82% | 75% | 78% | |
| -200 ~+100 | OR+NC+DS | 80% | 84% | 79% | 82% | |
| Without CpG (1,554) | -200 ~+100 | OR+NC | 68% | 70% | 67% | 68% |
| -200 ~+100 | OR+DS | 68% | 71% | 66% | 68% | |
| -200 ~+100 | NC+DS | 66% | 67% | 66% | 66% | |
| -200 ~+100 | OR+NC+DS | 69% | 69% | 71% | 70% | |
The number of training sequences used to construct the SVM models is shown in parenthesis of the column "Training set".
Figure 3The submission and result interface of GPMiner.
Figure 4Gene group analysis in GPMiner.
Comparison of GPMiner with several representative gene promoter annotation programs
| Transcriptional regulatory features | GPMiner | ||||
|---|---|---|---|---|---|
| Species supported | Human, mouse, and rat | Human and mouse | Mammalian | Eukaryote | Human, mouse, rat, chimp, and dog |
| Promoter identification | Yes | Yes | Yes | Yes | Yes |
| Map to known gene promoters | Yes | - | - | - | DBTSS, EPD and Ensembl |
| Transcription factor binding site | - | Yes | Yes | - | TRANSFAC public release and JASPAR, MATCH |
| TATA-box | - | Yes | - | Yes | Yes |
| Tandem repeat | Yes | - | - | Yes | Tandem Repeat Finder |
| CpG island | - | - | Yes | - | CpGProD |
| Over-represented pattern | - | - | - | - | Yes |
| DNA stability | - | - | - | - | Yes |
| GC content | - | - | Yes | - | Yes |
| Co-occurrence of TFBSs | - | Yes | - | - | Yes |
| Graphical view | Yes | - | - | Yes | Yes |