| Literature DB >> 18294399 |
Jian-Yi Yang1, Yu Zhou, Zu-Guo Yu, Vo Anh, Li-Qian Zhou.
Abstract
BACKGROUND: Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18294399 PMCID: PMC2292139 DOI: 10.1186/1471-2105-9-113
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The discriminant accuracies for various methods with Fisher's discriminant. The method marked "3+6+7" in the 8th row means the combination of the methods listed in the 3rd, 6th and 7th rows. The meanings of the methods marked for the 9th row is similar.
| Order | Method | No. of parameters | ||||
| 1 | 73.05 | 85.63 | 74.73 | 83.33 | 9 | |
| 2 | 79.16 | 75.78 | 76.88 | 62.67 | 9 | |
| 3 | 78.86 | 88.00 | 79.03 | 85.33 | 12 | |
| 4 | 78.62 | 89.33 | 79.57 | 89.33 | 12 | |
| 5 | 80.30 | 90.74 | 80.65 | 90.00 | 15 | |
| 7 | 81.92 | 91.48 | 81.72 | 89.33 | 48 | |
| 10 | 87.31 | 93.19 | 86.02 | 92.00 | All methods | 141 |
The accuracies of the prediction for promoter sequences by Fisher's discriminant algorithm. The S, S, Aand CC are the results for the training set and S', S', A' and CC' are the results for the test set. The rows are related to those in Table 1 according to the mark order.
| Order | ||||||||
| 1 | 73.05 | 86.28 | 79.67 | 0.58 | 74.73 | 84.76 | 79.74 | 0.58 |
| 2 | 79.16 | 80.17 | 79.67 | 0.55 | 76.88 | 71.86 | 74.37 | 0.40 |
| 3 | 78.86 | 89.05 | 83.95 | 0.66 | 79.03 | 86.98 | 83.01 | 0.64 |
| 4 | 78.62 | 90.12 | 84.37 | 0.68 | 79.57 | 90.24 | 84.91 | 0.69 |
| 5 | 80.30 | 91.47 | 85.89 | 0.71 | 80.65 | 90.91 | 85.78 | 0.70 |
| 7 | 81.92 | 92.25 | 87.08 | 0.73 | 81.72 | 90.48 | 86.10 | 0.71 |
| 10 | 87.31 | 94.06 | 90.68 | 0.80 | 86.02 | 93.02 | 89.52 | 0.78 |
The promoter prediction accuracies for the test data set made up of 186 promoter sequences and 150 non-promoter sequences using five kinds of tools and our methods.
| Tool | ||||
| NNPP(threshold 0.8) | 69.89 | 60.75 | 65.32 | 0.14 |
| Soft Berry(TSSW) | 67.74 | 81.29 | 74.52 | 0.48 |
| Promoter Scan version 1.7 | 67.20 | 88.65 | 77.93 | 0.57 |
| Dragon Promoter Finder version 1.5 | 30.65 | 65.52 | 48.08 | 0.12 |
| Promoter 2.0 Prediction Server | 52.15 | 91.51 | 71.83 | 0.49 |
| Our method ( | 86.02 | 93.57 | 89.79 | 0.78 |
The accuracies for M1, M2 and M3 with 50% sequences as training and the remaining 50% as test set in Fisher's discriminant algorithm.
| Order | ||||||||
| 81.67 | 89.53 | 87.60 | 0.73 | 91.49 | 85.50 | 88.49 | 0.73 | |
| 87.28 | 93.32 | 90.30 | 0.79 | 90.41 | 89.07 | 89.74 | 0.77 | |
| 88.25 | 93.17 | 90.71 | 0.80 | 90.52 | 89.74 | 90.13 | 0.78 |
The accuracies for M1, M2 and M3 with 80% sequences as training and the remaining 20% as test set in Fisher's discriminant algorithm.
| Order | ||||||||
| 85.78 | 89.65 | 87.71 | 0.73 | 87.10 | 88.28 | 87.69 | 0.73 | |
| 86.39 | 93.71 | 90.05 | 0.79 | 87.90 | 91.09 | 89.49 | 0.77 | |
| 86.86 | 93.88 | 90.37 | 0.79 | 87.10 | 91.78 | 89.44 | 0.77 |
Figure 1The free energy sequence of one promoter sequence. See text for a detailed description about how to get such numeric sequence.
Figure 2The four kinds of fractal curves for the promoter, exon and intron sequences. The figures show that there are some differences between the promoter and non-promoter (exon/intron) sequences, which suggests that it's possible to extract some values from them to distinguish the promoter sequences from the non-promoter sequences.
Figure 3The relationship between ln M(L) and ln(L) using the free energy sequence of one promoter (Left); the h(q) spectra for the one promoter calculated by AMFA (Right).