BACKGROUND: Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, on the tissue the cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways for isoform identification and quantification based on short reads, and various methods have been put forward for this non-trivial problem. RESULTS: In this paper we propose a novel radically different method based on minimum-cost network flows. This has a two-fold advantage: on the one hand, it translates the problem as an established one in the field of network flows, which can be solved in polynomial time, with different existing solvers; on the other hand, it is general enough to encompass many of the previous proposals under the least sum of squares model. Our method works as follows: in order to find the transcripts which best explain, under a given fitness model, a splicing graph resulting from an RNA-Seq experiment, we find a min-cost flow in an offset flow network, under an equivalent cost model. Under very weak assumptions on the fitness model, the optimal flow can be computed in polynomial time. Parsimoniously splitting the flow back into few path transcripts can be done with any of the heuristics and approximations available from the theory of network flows. In the present implementation, we choose the simple strategy of repeatedly removing the heaviest path. CONCLUSIONS: We proposed a new very general method based on network flows for a multiassembly problem arising from isoform identification and quantification with RNA-Seq. Experimental results on prediction accuracy show that our method is very competitive with popular tools such as Cufflinks and IsoLasso. Our tool, called Traph (Transcrips in gRAPHs), is available at: http://www.cs.helsinki.fi/gsa/traph/.
BACKGROUND: Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, on the tissue the cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways for isoform identification and quantification based on short reads, and various methods have been put forward for this non-trivial problem. RESULTS: In this paper we propose a novel radically different method based on minimum-cost network flows. This has a two-fold advantage: on the one hand, it translates the problem as an established one in the field of network flows, which can be solved in polynomial time, with different existing solvers; on the other hand, it is general enough to encompass many of the previous proposals under the least sum of squares model. Our method works as follows: in order to find the transcripts which best explain, under a given fitness model, a splicing graph resulting from an RNA-Seq experiment, we find a min-cost flow in an offset flow network, under an equivalent cost model. Under very weak assumptions on the fitness model, the optimal flow can be computed in polynomial time. Parsimoniously splitting the flow back into few path transcripts can be done with any of the heuristics and approximations available from the theory of network flows. In the present implementation, we choose the simple strategy of repeatedly removing the heaviest path. CONCLUSIONS: We proposed a new very general method based on network flows for a multiassembly problem arising from isoform identification and quantification with RNA-Seq. Experimental results on prediction accuracy show that our method is very competitive with popular tools such as Cufflinks and IsoLasso. Our tool, called Traph (Transcrips in gRAPHs), is available at: http://www.cs.helsinki.fi/gsa/traph/.
Authors: Jingyi Jessica Li; Ci-Ren Jiang; James B Brown; Haiyan Huang; Peter J Bickel Journal: Proc Natl Acad Sci U S A Date: 2011-12-01 Impact factor: 11.205
Authors: Sohrab P Shah; Andrew Roth; Rodrigo Goya; Arusha Oloumi; Gavin Ha; Yongjun Zhao; Gulisa Turashvili; Jiarui Ding; Kane Tse; Gholamreza Haffari; Ali Bashashati; Leah M Prentice; Jaswinder Khattra; Angela Burleigh; Damian Yap; Virginie Bernard; Andrew McPherson; Karey Shumansky; Anamaria Crisan; Ryan Giuliany; Alireza Heravi-Moussavi; Jamie Rosner; Daniel Lai; Inanc Birol; Richard Varhol; Angela Tam; Noreen Dhalla; Thomas Zeng; Kevin Ma; Simon K Chan; Malachi Griffith; Annie Moradian; S-W Grace Cheng; Gregg B Morin; Peter Watson; Karen Gelmon; Stephen Chia; Suet-Feung Chin; Christina Curtis; Oscar M Rueda; Paul D Pharoah; Sambasivarao Damaraju; John Mackey; Kelly Hoon; Timothy Harkins; Vasisht Tadigotla; Mahvash Sigaroudinia; Philippe Gascard; Thea Tlsty; Joseph F Costello; Irmtraud M Meyer; Connie J Eaves; Wyeth W Wasserman; Steven Jones; David Huntsman; Martin Hirst; Carlos Caldas; Marco A Marra; Samuel Aparicio Journal: Nature Date: 2012-04-04 Impact factor: 49.962
Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908
Authors: Mihaela Pertea; Geo M Pertea; Corina M Antonescu; Tsung-Cheng Chang; Joshua T Mendell; Steven L Salzberg Journal: Nat Biotechnol Date: 2015-02-18 Impact factor: 54.908
Authors: Serghei Mangul; Adrian Caciula; Sahar Al Seesi; Dumitru Brinza; Ion Mӑndoiu; Alex Zelikovsky Journal: BMC Genomics Date: 2014-07-14 Impact factor: 3.969