William H Majoros1, Niel Lebeck2, Uwe Ohler3, Song Li2. 1. Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, GermanyProgram in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, Germany. 2. Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, Germany. 3. Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, GermanyProgram in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, GermanyProgram in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin 13125, Germany and Department of Biology, Humboldt University of Berlin, Berlin 10115, GermanyProgram in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27705, USA, Department of Computer Science, Duke University, Durham, NC 27708, USA, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA, Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicin
Abstract
MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.
MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION:RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.
Authors: Aziz M Mezlini; Eric J M Smith; Marc Fiume; Orion Buske; Gleb L Savich; Sohrab Shah; Sam Aparicio; Derek Y Chiang; Anna Goldenberg; Michael Brudno Journal: Genome Res Date: 2012-11-29 Impact factor: 9.043