| Literature DB >> 18478079 |
Abstract
This paper presents a new method for exon detection in DNA sequences based on multi-scale parametric spectral analysis. A forward-backward linear prediction (FBLP) with the singular value decomposition (SVD) algorithm FBLP-SVD is applied to the double-base curves (DB-curves) of a DNA sequence using a variable moving window sizes to estimate the signal spectrum at multiple scales. Simulations are done on short human genes in the range of 11bp to 2032bp and the results show that our proposed method out-performs the classical Fourier transform method. The multi-scale approach is shown to be more effective than using a single scale with a fixed window size. In addition, our method is flexible as it requires no training data.Entities:
Keywords: DNA sequence analysis; autoregressive model; double-base curve; gene identification; spectral estimation
Year: 2008 PMID: 18478079 PMCID: PMC2374370 DOI: 10.6026/97320630002273
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1a) ROC curves for the multiple-size moving window using FBLP-SVD using the DB-curve (solid line), FBLP-SVD using binary representation (dotted line) and the DFT using the DB-curves (dashed line). b) Comparison of the FBLP-SVD and the DFT algorithm of gene Z20656. The length of the exons shown is in the range of 90bp-389bp embedded in non-coding regions of length in the range of 80bp-2113bp. The actual exon locations are marked with straight vertical lines. Upper diagram is the spectrum generated using FBLP-SVD whereas the lower diagram is generated from the DFT. c) Graph of P along the gene X62654. The exons are 77bp in length starting at position 1316, 189bp at position 2328, 75bp at position 2701, 96bp at position 2870, 141bp at position 3403, 83bp at position 3768, and 153bp at position 4038. The dotted line represents the original P signal along the gene. The solid line shows the Gaussian-smoothed signal. The actual exon locations are marked with straight vertical lines. The horizontal dashed line is the threshold value of 0.06214.