| Literature DB >> 16772025 |
Sayed-Amir Marashi1, Changiz Eslahchi, Hamid Pezeshk, Mehdi Sadeghi.
Abstract
BACKGROUND: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16772025 PMCID: PMC1526458 DOI: 10.1186/1471-2105-7-297
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Average number of structural changes in a 21-nucleotide window around 1000 donor GUs. See the text for details.
Figure 2Distribution of predicted linear distances of base-paired nucleotides in RNA sequences. See text for details.
Comparison of eight-letter vs. four-letter predictions of acceptor and donor sites for different datasets. Where application of eight-letter alphabet enhances the prediction compared to the conventional four-letter alphabet, the data pair is shown in bold.
| 4- | 3' | 0.8540* | 0.8923 | ||||||||
| 8- | 3' | 0.8536* | 0.8911 | ||||||||
| 4- | 5' | 0.8832 | 0.8871 | 0.9565* | |||||||
| 8- | 5' | 0.8817 | 0.8853 | 0.9565* | |||||||
| 4- | 3' | 0.7025 | 0.8763 | 0.8539 | |||||||
| 8- | 3' | 0.7002 | 0.8705 | 0.8495 | |||||||
| 4- | 5' | 0.8029 | 0.8846 | 0.8224 | 0.9311 | ||||||
| 8- | 5' | 0.7986 | 0.8787 | 0.8222 | 0.9250 | ||||||
| 4- | 3' | 0.7848 | 0.8649 | 0.9403 | |||||||
| 8- | 3' | 0.7843 | 0.8635 | 0.9375 | |||||||
| 4- | 5' | 0.8117 | 0.9409 | 0.8449 | 0.8988 | 0.9639 | |||||
| 8- | 5' | 0.8070 | 0.9348 | 0.8446 | 0.8935 | 0.9619 | |||||
| 4- | 3' | 0.9002 | 0.9046 | 0.9276 | 0.9761 | ||||||
| 8- | 3' | 0.8981 | 0.8996 | 0.9137 | 0.9696 | ||||||
| 4- | 5' | 0.8845 | 0.9369 | 0.9644 | |||||||
| 8- | 5' | 0.8841 | 0.9346 | 0.9615 | |||||||
| 4- | 3' | ||||||||||
| 8- | 3' | ||||||||||
| 4- | 5' | 0.9378 | 0.9645 | ||||||||
| 8- | 5' | 0.9095 | 0.9558 | ||||||||
* Data pairs with insignificant differences (Mann-Whitney test).
Figure 3Log likelihood ratio (LLR with log-base-2) of formation of loop structure at different positions around splice sites in AtGS and HsGS datasets. The sequences are shown in 5'→3' direction. Asterisked positions are those positions that show a significant difference (p < 0.05 based on the test for differences of two binomial proportions) between the frequency of "loops" in real and decoy sites. 3' AtGS (A), 3' HsGS (B), 5' AtGS (C) and 5' HsGS (D).