Literature DB >> 26023357

Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

Daniel T Yehdego1, Boyu Zhang2, Vikram K R Kodimala1, Kyle L Johnson1, Michela Taufer2, Ming-Ying Leung1.   

Abstract

Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

Entities:  

Keywords:  Hadoop; Performance analysis; Prediction accuracy; Pseudoknots; RNA segmentation

Year:  2013        PMID: 26023357      PMCID: PMC4444072          DOI: 10.1109/IPDPSW.2013.109

Source DB:  PubMed          Journal:  IEEE Int Symp Parallel Distrib Process Workshops Phd Forum        ISSN: 2164-7062


  14 in total

1.  A partition function algorithm for nucleic acid secondary structure including pseudoknots.

Authors:  Robert M Dirks; Niles A Pierce
Journal:  J Comput Chem       Date:  2003-10       Impact factor: 3.376

2.  An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots.

Authors:  Robert M Dirks; Niles A Pierce
Journal:  J Comput Chem       Date:  2004-07-30       Impact factor: 3.376

3.  FX: an RNA-Seq analysis tool on the cloud.

Authors:  Dongwan Hong; Arang Rhie; Sung-Soo Park; Jongkeun Lee; Young Seok Ju; Sujung Kim; Saet-Byeol Yu; Thomas Bleazard; Hyun-Seok Park; Hwanseok Rhee; Hyonyong Chong; Kap-Seok Yang; Yeon-Su Lee; In-Hoo Kim; Jin Soo Lee; Jong-Il Kim; Jeong-Sun Seo
Journal:  Bioinformatics       Date:  2012-01-17       Impact factor: 6.937

4.  Improved free energy parameters for RNA pseudoknotted secondary structure prediction.

Authors:  Mirela S Andronescu; Cristina Pop; Anne E Condon
Journal:  RNA       Date:  2009-11-20       Impact factor: 4.942

5.  A dynamic programming algorithm for RNA structure prediction including pseudoknots.

Authors:  E Rivas; S R Eddy
Journal:  J Mol Biol       Date:  1999-02-05       Impact factor: 5.469

6.  Fast algorithm for predicting the secondary structure of single-stranded RNA.

Authors:  R Nussinov; A B Jacobson
Journal:  Proc Natl Acad Sci U S A       Date:  1980-11       Impact factor: 11.205

7.  Cloud-scale RNA-sequencing differential expression analysis with Myrna.

Authors:  Ben Langmead; Kasper D Hansen; Jeffrey T Leek
Journal:  Genome Biol       Date:  2010-08-11       Impact factor: 13.583

8.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

9.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics.

Authors:  Jens Reeder; Robert Giegerich
Journal:  BMC Bioinformatics       Date:  2004-08-04       Impact factor: 3.169

Review 10.  Viral RNA pseudoknots: versatile motifs in gene expression and replication.

Authors:  Ian Brierley; Simon Pennell; Robert J C Gilbert
Journal:  Nat Rev Microbiol       Date:  2007-08       Impact factor: 60.633

View more
  1 in total

1.  Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce.

Authors:  Boyu Zhang; Daniel T Yehdego; Kyle L Johnson; Ming-Ying Leung; Michela Taufer
Journal:  BMC Struct Biol       Date:  2013-11-08
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.