MOTIVATION: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. AVAILABILITY: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). CONTACT: ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu SUPPLEMENTARY INFORMATION: The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
MOTIVATION: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. AVAILABILITY: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). CONTACT: ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu SUPPLEMENTARY INFORMATION: The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
Authors: Stephen J Ansley; Jose L Badano; Oliver E Blacque; Josephine Hill; Bethan E Hoskins; Carmen C Leitch; Jun Chul Kim; Alison J Ross; Erica R Eichers; Tanya M Teslovich; Allan K Mah; Robert C Johnsen; John C Cavender; Richard Alan Lewis; Michel R Leroux; Philip L Beales; Nicholas Katsanis Journal: Nature Date: 2003-09-21 Impact factor: 49.962
Authors: R Sachidanandam; D Weissman; S C Schmidt; J M Kakol; L D Stein; G Marth; S Sherry; J C Mullikin; B J Mortimore; D L Willey; S E Hunt; C G Cole; P C Coggill; C M Rice; Z Ning; J Rogers; D R Bentley; P Y Kwok; E R Mardis; R T Yeh; B Schultz; L Cook; R Davenport; M Dante; L Fulton; L Hillier; R H Waterston; J D McPherson; B Gilman; S Schaffner; W J Van Etten; D Reich; J Higgins; M J Daly; B Blumenstiel; J Baldwin; N Stange-Thomann; M C Zody; L Linton; E S Lander; D Altshuler Journal: Nature Date: 2001-02-15 Impact factor: 49.962
Authors: Jeffrey Martin; Vincent M Bruno; Zhide Fang; Xiandong Meng; Matthew Blow; Tao Zhang; Gavin Sherlock; Michael Snyder; Zhong Wang Journal: BMC Genomics Date: 2010-11-24 Impact factor: 3.969
Authors: Jennifer M Spaethling; David Piel; Hannah Dueck; Peter T Buckley; Jacqueline F Morris; Stephen A Fisher; Jaehee Lee; Jai-Yoon Sul; Junhyong Kim; Tamas Bartfai; Sheryl G Beck; James H Eberwine Journal: FASEB J Date: 2013-11-05 Impact factor: 5.191
Authors: Richa Hanamsagar; Mark D Alter; Carina S Block; Haley Sullivan; Jessica L Bolton; Staci D Bilbo Journal: Glia Date: 2017-06-15 Impact factor: 7.452
Authors: Benjamin N Ediger; Hee-Woong Lim; Christine Juliana; David N Groff; LaQueena T Williams; Giselle Dominguez; Jin-Hua Liu; Brandon L Taylor; Erik R Walp; Vasumathi Kameswaran; Juxiang Yang; Chengyang Liu; Chad S Hunter; Klaus H Kaestner; Ali Naji; Changhong Li; Maike Sander; Roland Stein; Lori Sussel; Kyoung-Jae Won; Catherine Lee May; Doris A Stoffers Journal: J Clin Invest Date: 2016-12-12 Impact factor: 14.808
Authors: Giacomo Baruzzo; Katharina E Hayer; Eun Ji Kim; Barbara Di Camillo; Garret A FitzGerald; Gregory R Grant Journal: Nat Methods Date: 2016-12-12 Impact factor: 28.547