Literature DB >> 12566410

A complexity reduction algorithm for analysis and annotation of large genomic sequences.

Trees-Juen Chuang1, Wen-Chang Lin, Hurng-Chun Lee, Chi-Wei Wang, Keh-Lin Hsiao, Zi-Hao Wang, Danny Shieh, Simon C Lin, Lan-Yang Ch'ang.   

Abstract

DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12566410      PMCID: PMC420370          DOI: 10.1101/gr.313703

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  38 in total

1.  Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment.

Authors:  S H Sze; P A Pevzner
Journal:  J Comput Biol       Date:  1997       Impact factor: 1.479

2.  A tool for analyzing and annotating genomic sequences.

Authors:  X Huang; M D Adams; H Zhou; A R Kerlavage
Journal:  Genomics       Date:  1997-11-15       Impact factor: 5.736

3.  Gene prediction by pattern recognition and homology search.

Authors:  Y Xu; E C Uberbacher
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1996

4.  Evaluation of gene structure prediction programs.

Authors:  M Burset; R Guigó
Journal:  Genomics       Date:  1996-06-15       Impact factor: 5.736

5.  Identification of protein coding regions in the human genome by quadratic discriminant analysis.

Authors:  M Q Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  1997-01-21       Impact factor: 11.205

6.  A tool for aligning very similar DNA sequences.

Authors:  K M Chao; J Zhang; J Ostell; W Miller
Journal:  Comput Appl Biosci       Date:  1997-02

Review 7.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

8.  Improved splice site detection in Genie.

Authors:  M G Reese; F H Eeckman; D Kulp; D Haussler
Journal:  J Comput Biol       Date:  1997       Impact factor: 1.479

9.  Prediction of complete gene structures in human genomic DNA.

Authors:  C Burge; S Karlin
Journal:  J Mol Biol       Date:  1997-04-25       Impact factor: 5.469

10.  Gene recognition via spliced sequence alignment.

Authors:  M S Gelfand; A A Mironov; P A Pevzner
Journal:  Proc Natl Acad Sci U S A       Date:  1996-08-20       Impact factor: 11.205

View more
  3 in total

1.  Plant Gene and Alternatively Spliced Variant Annotator. A plant genome annotation pipeline for rice gene and alternatively spliced variant identification with cross-species expressed sequence tag conservation from seven plant species.

Authors:  Feng-Chi Chen; Sheng-Shun Wang; Shu-Miaw Chaw; Yao-Ting Huang; Trees-Juen Chuang
Journal:  Plant Physiol       Date:  2007-01-12       Impact factor: 8.340

2.  Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat.

Authors:  Feng-Chi Chen; Chuang-Jong Chen; Jar-Yi Ho; Trees-Juen Chuang
Journal:  BMC Bioinformatics       Date:  2006-03-15       Impact factor: 3.169

3.  An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm.

Authors:  Biswanath Chowdhury; Arnav Garai; Gautam Garai
Journal:  BMC Bioinformatics       Date:  2017-10-24       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.