| Literature DB >> 21085627 |
Ulf Schaefer1, Rimantas Kodzius, Chikatoshi Kai, Jun Kawai, Piero Carninci, Yoshihide Hayashizaki, Vladimir B Bajic.
Abstract
BACKGROUND: Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3'UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5' completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21085627 PMCID: PMC2981523 DOI: 10.1371/journal.pone.0013934
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Algorithm layout.
Layout of daisy-chain algorithm, performance estimates after each step in parenthesis.
Figure 2Performance curve.
Sensitivity vs. Specificity trade-off curve for human and mouse average CV performance (blue and red), and performance on the whole data sets (black and green).
Sensitivity and Specificity values for mouse and human test and training cases.
| Mouse whole set | Mouse CV | Human whole set | Human CV | ||||||||
| threshold | Sensitivity | Specificity | threshold | Sensitivity | Specificity | threshold | Sensitivity | Specificity | threshold | Sensitivity | Specificity |
| −2.50 | 100.00% | 45.45% | −2.50 | 96.10% | 45.01% | −2.50 | 100.00% | 44.77% | −2.50 | 96.36% | 44.51% |
| −2.00 | 99.99% | 45.95% | −2.00 | 96.07% | 45.55% | −2.00 | 100.00% | 44.97% | −2.00 | 96.36% | 44.74% |
| −1.50 | 99.91% | 55.03% | −1.50 | 95.72% | 54.82% | −1.50 | 99.98% | 49.47% | −1.50 | 96.20% | 49.34% |
| −1.00 | 99.22% | 81.75% | −1.00 | 94.30% | 78.40% | −1.00 | 99.53% | 78.45% | −1.00 | 94.62% | 73.73% |
| −0.50 | 97.47% | 92.03% | −0.50 | 92.49% | 89.16% | −0.50 | 97.59% | 89.92% | −0.50 | 92.46% | 86.71% |
| −0.25 | 96.50% | 94.05% | −0.25 | 91.41% | 92.10% | −0.25 | 96.42% | 91.96% | −0.25 | 91.47% | 89.69% |
| 0.00 | 95.44% | 95.63% | 0.00 | 90.21% | 94.11% | 0.00 | 95.33% | 93.58% | 0.00 | 90.29% | 91.91% |
| 0.25 | 94.33% | 96.86% | 0.25 | 88.80% | 95.52% | 0.25 | 94.23% | 94.94% | 0.25 | 88.91% | 93.41% |
| 0.50 | 92.98% | 97.81% | 0.50 | 86.98% | 96.68% | 0.50 | 92.98% | 96.17% | 0.50 | 87.12% | 94.74% |
| 0.75 | 91.50% | 98.51% | 0.75 | 84.52% | 97.66% | 0.75 | 91.46% | 97.29% | 0.75 | 83.42% | 95.91% |
| 1.00 | 84.76% | 99.16% | 1.00 | 77.36% | 98.40% | 1.00 | 80.74% | 98.83% | 1.00 | 70.95% | 97.22% |
| 1.25 | 47.32% | 99.65% | 1.25 | 46.08% | 99.09% | 1.25 | 34.57% | 99.76% | 1.25 | 33.19% | 98.91% |
NTL and TIAR of three showcase human chromosomes.
| Chromosome 21 | Chromosome 22 | Chromosome 4 | ||||
| Threshold | NTL | TIAR | NTL | TIAR | NTL | TIAR |
| 0.0 | 91.53% | 8.47% | 78.18% | 21.82% | 95.87% | 4.13% |
| −2.5 | 41.1% | 58.9% | 27.2% | 72.8% | 46.84% | 53.16% |
Figure 3Application example.
Illustration of DDM explaining failed amplification of 5′- RACE; true and false TSSs for mouse gene Oprm1 recognized by DDM.
Statistics on k-mers used in development of the algorithm.
| k-mer length | Number of k-mers | Cumulative number of features used |
| 1 | 4 | 4 |
| 2 | 16 | 20 |
| 3 | 64 | 84 |
| 4 | 256 | 340 |
| 5 | 1024 | 1364 |
Figure 4Constraining sequence properties (nt C).
Constraining boundaries for occurrences of 1-mer ‘C’.