Literature DB >> 18373878

RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure.

Qi Liu1, Yu Yang, Chun Chen, Jiajun Bu, Yin Zhang, Xiuzi Ye.   

Abstract

BACKGROUND: With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression.
RESULTS: RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective.
CONCLUSION: A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.

Entities:  

Mesh:

Year:  2008        PMID: 18373878      PMCID: PMC2335284          DOI: 10.1186/1471-2105-9-176

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  28 in total

1.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history.

Authors:  B Knudsen; J Hein
Journal:  Bioinformatics       Date:  1999-06       Impact factor: 6.937

2.  ATP requirements and small interfering RNA structure in the RNA interference pathway.

Authors:  A Nykänen; B Haley; P D Zamore
Journal:  Cell       Date:  2001-11-02       Impact factor: 41.582

3.  Rfam: an RNA family database.

Authors:  Sam Griffiths-Jones; Alex Bateman; Mhairi Marshall; Ajay Khanna; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  DNACompress: fast and effective DNA sequence compression.

Authors:  Xin Chen; Ming Li; Bin Ma; John Tromp
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

5.  CMfinder--a covariance model based RNA motif finding algorithm.

Authors:  Zizhen Yao; Zasha Weinberg; Walter L Ruzzo
Journal:  Bioinformatics       Date:  2005-12-15       Impact factor: 6.937

6.  CONTRAfold: RNA secondary structure prediction without physics-based models.

Authors:  Chuong B Do; Daniel A Woods; Serafim Batzoglou
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

7.  Computer prediction of RNA structure.

Authors:  M Zuker
Journal:  Methods Enzymol       Date:  1989       Impact factor: 1.600

8.  A computational screen for methylation guide snoRNAs in yeast.

Authors:  T M Lowe; S R Eddy
Journal:  Science       Date:  1999-02-19       Impact factor: 47.728

9.  Complex Loci in human and mouse genomes.

Authors:  Pär G Engström; Harukazu Suzuki; Noriko Ninomiya; Altuna Akalin; Luca Sessa; Giovanni Lavorgna; Alessandro Brozzi; Lucilla Luzi; Sin Lam Tan; Liang Yang; Galih Kunarso; Edwin Lian-Chong Ng; Serge Batalov; Claes Wahlestedt; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Christine Wells; Vladimir B Bajic; Valerio Orlando; James F Reid; Boris Lenhard; Leonard Lipovich
Journal:  PLoS Genet       Date:  2006-04-28       Impact factor: 5.917

10.  RNAdb--a comprehensive mammalian noncoding RNA database.

Authors:  Ken C Pang; Stuart Stephen; Pär G Engström; Khairina Tajul-Arifin; Weisan Chen; Claes Wahlestedt; Boris Lenhard; Yoshihide Hayashizaki; John S Mattick
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  2 in total

1.  Differential direct coding: a compression algorithm for nucleotide sequence data.

Authors:  Gregory Vey
Journal:  Database (Oxford)       Date:  2009-09-14       Impact factor: 3.451

2.  A stochastic context free grammar based framework for analysis of protein sequences.

Authors:  Witold Dyrka; Jean-Christophe Nebel
Journal:  BMC Bioinformatics       Date:  2009-10-08       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.