| Literature DB >> 21092114 |
Malik Nadeem Akhtar1, Syed Abbas Bukhari, Zeeshan Fazal, Raheel Qamar, Ilham A Shahmuradov.
Abstract
BACKGROUND: mRNA polyadenylation is an essential step of pre-mRNA processing in eukaryotes. Accurate prediction of the pre-mRNA 3'-end cleavage/polyadenylation sites is important for defining the gene boundaries and understanding gene expression mechanisms.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21092114 PMCID: PMC3053588 DOI: 10.1186/1471-2164-11-646
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schematic presentation of the search regions for PAS-, CS- and GU/U motifs, as well as for upstream (Up)/downstream (DN) pentamers used in the algorithm. [-40:-1], for PAS-strong and PAS-weak poly(A) sites. [+2:+50], for all three classes of poly(A) sites. 60 nt upstream of the PAS-motif's left boundary, for PAS-strong poly(A) sites. [-60:-1], for PAS-weak and PAS-less poly(A) sites. [+2:+60], for PAS-weak poly(A) sites. [+2:+100], for PAS-less poly(A) sites.
Characteristics of polyadenylation regions used for recognition of PAS-strong, PAS-weak and PAS-less sites, and Mahalonobis distance (D2; 38) showing the power of recognition of each characteristic
| Characteristics | |||
|---|---|---|---|
| PAS-motif, [-40: -1] | 3.18 | 1.98 | |
| CS-motif | 1.15 | 1.51 | 1.36 |
| GU/U-motif, [+1: +50] | 1.08 | 0.90 | 0.48 |
| Upstream Pentamer Composition | 0.72 | 0.52 | 0.83 |
| Downstream Pentamer Composition | 0.84 | 1.32 | |
| PAS-CS distance | 1.24 | 0.77 | |
| CS-GT distance | 0.30 | 0.24 | 0.17 |
| Total | 7.67 | 6.75 | 4.16 |
* Location of CS-motif: [-15: +3] for PAS-strong and PAS-weak sites; [-9: +3] for PAS-less sites.
Statistics of initial testing of POLYAR program on three classes of CS/poly(A) sites
| CS Class | Positive Samples | Negative Samples | CC | ||||
|---|---|---|---|---|---|---|---|
| TP | FN | TN | FP | ||||
| PAS-strong | 4158 | 1067 | 79.6 | 5207 | 18 | 99.7 | 0.81 |
| PAS-week | 478 | 1997 | 19.3 | 2470 | 5 | 99.8 | 0.32 |
| PAS-less | 93 | 468 | 16.5 | 561 | 0 | 100 | 0.30 |
* Sensitivity was calculated by formula [12].
Comparative testing results of POLYAR, polya_svm and polyadq rograms in [-300:+300] regions around the mapped PAS-strong, PAS-weak and PAS-less CS/poly (A) sites
| Data Set | Programs | TP | FN | FP | Predicted CSs, total | ||
|---|---|---|---|---|---|---|---|
| PAS-strong | POLYAR | 4221 | 1004 | 2135 | 6356 | 80.8 | 66.4 |
| polya_svm | 3431 | 1794 | 3206 | 6637 | 65.7 | 51.7 | |
| polyadq | 2804 | 2421 | 1177 | 3981 | 53.7 | 70.4 | |
| PAS-weak | POLYAR | 624 | 1851 | 1571 | 2195 | 25.2 | 28.4 |
| polya_svm | 645 | 1830 | 1417 | 2185 | 26.1 | 29.5 | |
| PAS-less | POLYAR | 76 | 485 | 544 | 518 | 13.5 | 14.7 |
| polya_svm | 42 | 519 | 223 | 292 | 7.5 | 14.4 |
* Sensitivity and specificity was calculated by formulas [12] and [14], respectively.
Figure 2Location of the nearest CS/poly(A) sites relative to the gene end region (0; ± 12 nt), predicted by POLYAR (red) and polya_svm (blue).
Comparative testing results of POLYAR, polya_svm and polyadq programs on coding sequences of 17600 human genes
| Programs | TN | FP | ||
|---|---|---|---|---|
| POLYAR, All | 13503 | 4097 | 7920 | 76.72% |
| PAS-strong | 14091 | 3509 | 6005 | 80.06% |
| PAS-weak | 15012 | 2588 | 4308 | 85.30% |
| PAS_less | 16699 | 901 | 1351 | 94.88% |
| polya_svm | 13290 | 4310 | 6472 | 75.51% |
| polyadq | 16010 | 1590 | 1794 | 90.97% |
Search for poly(A) sites of all 3 classes. Search for only PAS-strong, PAS-weak and PAS-less poly(A) sites, respectively. Totally predicted sites. Sensitivity was calculated by formula [13].
Comparative testing results of POLYAR, polya_svm and polyadq programs on 19600 intronic sequences
| Programs | TN | FP | ||
|---|---|---|---|---|
| POLYAR, All | 4581 | 15019 | 31998 | 23.92% |
| PAS-strong | 11837 | 7763 | 9695 | 60.39% |
| PAS-weak | 9172 | 10428 | 14900 | 46.80% |
| PAS_less | 5080 | 14520 | 29870 | 25.92% |
| polya_svm | 13139 | 6461 | 8688 | 67.04% |
| polyadq | 16744 | 2856 | 3291 | 85.43% |
See Table 4.