| Literature DB >> 23282225 |
Suping Deng1, Yixiang Shi, Liyun Yuan, Yixue Li, Guohui Ding.
Abstract
BACKGROUND: Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23282225 PMCID: PMC3535712 DOI: 10.1186/1471-2164-13-S8-S19
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Di-nucleotides mapping in 22-symbol alphabet.
| Di-nucleotide | Symbol | Di-nucleotide | Symbol |
|---|---|---|---|
| AA | AA | GA | GA |
| AC | AC | GC | GC |
| AG | AG | GG | GG |
| AT | AT | GT | GT |
| CA | CA | TA | TA |
| CC | CC | TC | TC |
| CG | CG | TG | TG |
| CT | CT | TT | TT |
SCPs mapping in 22-symbol alphabet.
| Codons | Phase | Symbol |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 1 | ||
| 2 | ||
| 3 | ||
Figure 1The flow chart of the procedure for finding borders between coding and non-coding DNA regions.
Figure 2Jensen-Rényi divergence versus cutting position for a DNA sequence. The DNA sequence contains a coding region followed by a non-coding region. The maximum values for the divergences are circled on the graph. (a) The analyzed DNA segment was chosen from bacterium Rickettsia prowazekii (AJ235269, 3757-6226 bp). (b) The analyzed DNA segment was chosen from bacterium Rickettsia prowazekii (AJ235269, 10683-11820 bp).
The maximum accuracy of different methods applied to different data sets.
| Organism | GenBank ID | 1-CBC(×100%) | |
|---|---|---|---|
| A12-JR | A22-JR | ||
| 62.50 | 63.85 | ||
| 69.18 | 70.57 | ||
| 70.48 | 73.18 | ||
| 72.26 | 75.44 | ||
| 73.39 | 77.70 | ||
| 71.45 | 75.68 | ||
Figure 3Comparison between the known coding regions and the predicted borders of a DNA sequence. The known coding regions are gray regions with solid lines as borders. The predicted borders (vertical dotted lines) is obtained through recursive segmentation using A22_JR (a = 0.5). The DNA sequence is from bacteria Rickettsia prowazekii and the borders. The coding regions shown downwards are on the opposite DNA strand.
Figure 4Accuracies of recursive segmentation for different thresholds of segmentation strength. The DNA sequences are from the genomes of Rickettsia prowazekiiand and Borrelia burgdorferi (the first 30000 bp).