Marius A Wenzel1, Berndt Müller2, Jonathan Pettitt2. 1. School of Biological Sciences, University of Aberdeen, Zoology Building, Tillydrone Avenue, Aberdeen, AB24 2TZ, UK. marius.wenzel@abdn.ac.uk. 2. School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, UK.
Abstract
BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5' end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5' read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5' SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.
BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5' end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5' read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5' SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.
Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169
Authors: Thomas Blumenthal; Donald Evans; Christopher D Link; Alessandro Guffanti; Daniel Lawson; Jean Thierry-Mieg; Danielle Thierry-Mieg; Wei Lu Chiu; Kyle Duke; Moni Kiraly; Stuart K Kim Journal: Nature Date: 2002-06-20 Impact factor: 49.962
Authors: Lincoln D Stein; Zhirong Bao; Darin Blasiar; Thomas Blumenthal; Michael R Brent; Nansheng Chen; Asif Chinwalla; Laura Clarke; Chris Clee; Avril Coghlan; Alan Coulson; Peter D'Eustachio; David H A Fitch; Lucinda A Fulton; Robert E Fulton; Sam Griffiths-Jones; Todd W Harris; LaDeana W Hillier; Ravi Kamath; Patricia E Kuwabara; Elaine R Mardis; Marco A Marra; Tracie L Miner; Patrick Minx; James C Mullikin; Robert W Plumb; Jane Rogers; Jacqueline E Schein; Marc Sohrmann; John Spieth; Jason E Stajich; C Wei; David Willey; Richard K Wilson; Richard Durbin; Robert H Waterston Journal: PLoS Biol Date: 2003-11-17 Impact factor: 8.029
Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908
Authors: George Cherian Pandarakalam; Michael Speake; Stuart McElroy; Ammar Alturkistani; Lucas Philippe; Jonathan Pettitt; Berndt Müller; Bernadette Connolly Journal: Int J Parasitol Drugs Drug Resist Date: 2019-04-05 Impact factor: 4.077
Authors: Peter D Olson; Alan Tracey; Andrew Baillie; Katherine James; Stephen R Doyle; Sarah K Buddenborg; Faye H Rodgers; Nancy Holroyd; Matt Berriman Journal: BMC Biol Date: 2020-11-09 Impact factor: 7.431
Authors: Jonathan Pettitt; Lucas Philippe; Debjani Sarkar; Christopher Johnston; Henrike Johanna Gothe; Diane Massie; Bernadette Connolly; Berndt Müller Journal: Genetics Date: 2014-06-14 Impact factor: 4.562