Abhinav Nellore1,2,3, Leonardo Collado-Torres2,3,4, Andrew E Jaffe2,3,4,5, José Alquicira-Hernández2,6, Christopher Wilks1,3, Jacob Pritt1,3, James Morton7, Jeffrey T Leek2,3, Ben Langmead1,2,3. 1. Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA. 2. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA. 3. Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA. 4. Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA. 5. Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA. 6. Undergraduate Program on Genomic Sciences, National Autonomous University of Mexico, 04510 Mexico City, D.F., Mexico. 7. Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
Abstract
MOTIVATION: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples. RESULTS: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is open-source software available at http://rail.bio. CONTACTS: anellore@gmail.com or langmea@cs.jhu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples. RESULTS: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is open-source software available at http://rail.bio. CONTACTS: anellore@gmail.com or langmea@cs.jhu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen Journal: Nat Biotechnol Date: 2013-09-15 Impact factor: 54.908
Authors: Alyssa C Frazee; Sarven Sabunciyan; Kasper D Hansen; Rafael A Irizarry; Jeffrey T Leek Journal: Biostatistics Date: 2014-01-06 Impact factor: 5.899
Authors: Leonardo Collado-Torres; Abhinav Nellore; Kai Kammers; Shannon E Ellis; Margaret A Taub; Kasper D Hansen; Andrew E Jaffe; Ben Langmead; Jeffrey T Leek Journal: Nat Biotechnol Date: 2017-04-11 Impact factor: 54.908
Authors: Leonardo Collado-Torres; Abhinav Nellore; Alyssa C Frazee; Christopher Wilks; Michael I Love; Ben Langmead; Rafael A Irizarry; Jeffrey T Leek; Andrew E Jaffe Journal: Nucleic Acids Res Date: 2016-09-29 Impact factor: 16.971
Authors: Matthew D Young; Thomas J Mitchell; Lars Custers; Thanasis Margaritis; Francisco Morales-Rodriguez; Kwasi Kwakwa; Eleonora Khabirova; Gerda Kildisiute; Thomas R W Oliver; Ronald R de Krijger; Marry M van den Heuvel-Eibrink; Federico Comitani; Alice Piapi; Eva Bugallo-Blanco; Christine Thevanesan; Christina Burke; Elena Prigmore; Kirsty Ambridge; Kenny Roberts; Felipe A Vieira Braga; Tim H H Coorens; Ignacio Del Valle; Anna Wilbrey-Clark; Lira Mamanova; Grant D Stewart; Vincent J Gnanapragasam; Dyanne Rampling; Neil Sebire; Nicholas Coleman; Liz Hook; Anne Warren; Muzlifah Haniffa; Marcel Kool; Stefan M Pfister; John C Achermann; Xiaoling He; Roger A Barker; Adam Shlien; Omer A Bayraktar; Sarah A Teichmann; Frank C Holstege; Kerstin B Meyer; Jarno Drost; Karin Straathof; Sam Behjati Journal: Nat Commun Date: 2021-06-23 Impact factor: 14.919