Literature DB >> 25111964

SLaP mapper: a webserver for identifying and quantifying spliced-leader addition and polyadenylation site usage in kinetoplastid genomes.

Michael Fiebig¹, Eva Gluenz¹, Mark Carrington², Steven Kelly³.

Abstract

The Kinetoplastida are a diverse and globally distributed class of free-living and parasitic single-celled eukaryotes that collectively cause a significant burden on human health and welfare. In kinetoplastids individual genes do not have promoters, but rather all genes are arranged downstream of a small number of RNA polymerase II transcription initiation sites and are thus transcribed in polycistronic gene clusters. Production of individual mRNAs from this continuous transcript occurs co-transcriptionally by trans-splicing of a ∼39 nucleotide capped RNA and subsequent polyadenylation of the upstream mRNA. SLaP mapper (Spliced-Leader and Polyadenylation mapper) is a fully automated web-service for identification, quantitation and gene-assignment of both spliced-leader and polyadenylation addition sites in Kinetoplastid genomes. SLaP mapper only requires raw read data from paired-end Illumina RNAseq and performs all read processing, mapping, quality control, quantification, and analysis in a fully automated pipeline. To provide usage examples and estimates of the quantity of sequence data required we use RNAseq obtained from two different library preparations from both Trypanosoma brucei and Leishmania mexicana to show the number of expected reads that are obtained from each preparation type. SLaP mapper is an easy to use, platform independent webserver that is freely available for use at http://www.stevekellylab.com/software/slap. Example files are provided on the website.

Entities: Chemical Disease Gene Species

Keywords: Kinetoplastida; Leishmania; Polyadenylation; RNA splicing; RNA-Seq; Trypanosoma

Mesh：

Substances：
RNA, Spliced Leader

Year: 2014 PMID： 25111964 PMCID： PMC4222701 DOI： 10.1016/j.molbiopara.2014.07.012

Source DB: PubMed Journal: Mol Biochem Parasitol ISSN： 0166-6851 Impact factor: 1.759

Introduction

The Kinetoplastida are a diverse and globally distributed group of free-living and parasitic single-celled eukaryotes. In kinetoplastids messenger RNAs are produced by co-transcriptional processing of continuously transcribed polycistronic gene clusters [1]. Co-transcriptional processing occurs via trans-splicing of a ∼39 nucleotide 5′-capped spliced-leader sequence and 3′ polyadenylation of the upstream gene [2-5]. Trans-splicing occurs predominantly at AG dinucleotides, however no canonical nucleotide motif has been identified for polyadenylation sites. Moreover, the AAUAAA motif found at polyadenylation sites in most other eukaryotes [6] is not present in kinetoplastids [7]. Several tools have been developed that predict trans-splice acceptor and polyadenylation sites [8-10], however, these tools do not predict relative site usage. Current sequencing technology now makes it possible to determine these sites empirically on a genome-wide scale and quantify the extent to which different trans-splice and polyadenylation sites are used. RNA-sequencing studies of Trypanosoma brucei and Leishmania major have already begun to reveal the large extent to which individual genes can harbour multiple trans-splice and polyadenylation sites [11,12]. To capitalise on this technological advancement, and enable widespread analysis of trans-splice and polyadenylation sites within the community, we have developed a fully automated web-service called SLaP Mapper. This server only requires raw read data obtained by paired-end Illumina RNASeq, and uses this raw read data to identify and quantify trans-splice-acceptor and polyadenylation sites genome-wide.

Materials and methods

Differences in library construction

Typical libraries generated from random hexamer primed cDNA are suitable for identification of splice acceptor sites (Table 1). However, the extent to which polyadenylation sites are discovered depends on the library preparation protocol that is used (Table 1). To generate a library enriched for poly(A) containing reads it is recommended that the first strand cDNA synthesis reaction is primed with a 5′-T15VN-3′ oligonucleotide (V = A, G or C; N = T, A, G or C) [11], followed by second strand synthesis with random hexamer primers.

Table 1

The number of reads observed using different library preparation methods in two different species. L. mexicana based on 3 independent biological replicates. T. brucei based on 2 independent biological replicates. Numbers in brackets indicate one standard deviation.

Species	Library type	Poly(A) reads per million reads	Trans-splice reads per million reads
L. mexicana	T15VN	34 784 (2100)	4343 (300)
L. mexicana	Random primed	5673 (730)	79 773 (10 000)
T. brucei	T15VN	167 864 (8000)	4782 (250)
T. brucei	Random primed	701 (50)	76 563 (4800)

Algorithm overview

SLaP mapper uses pre-built indices for the currently available kinetoplastid genomes. The user must supply raw sequence reads in gzipped fastq format (phred encoding offset is automatically detected). Due to FTP limitations SLaP mapper can only accept individual files less than 2 GB. Read files larger then 2 GB can be analysed using SLAP mapper by splitting these files into pieces each smaller than 2 GB then combining the results files. Once read files have been uploaded, reads containing a putative poly(A) tail or spliced-leader sequence are identified, the spliced-leader or poly(A) tail is removed from the read and the rest of the read is mapped to the user specified genome. Each putative identified splice-acceptor site is checked to confirm that it does not contain the spliced-leader sequence and bona fide splice-acceptor sites are assigned to their nearest directionally appropriate coding sequence (CDS). Similarly, putative poly(A) addition sites are checked that they do not encode runs of A residues and bona fide sites are assigned to their nearest directionally appropriate CDS. When the analysis is complete the results are emailed to the user in tab-delimited text, BED and GFF file formats so that the results are easily viewed in spreadsheet editors or viewed on commonly used genome browsers such as the Integrative genomics viewer [13] (Fig. 1A). The results files contain the position of the observed site, its dinucleotide (for trans-splice sites only), its strand, its occurrence (i.e. the number of mapped reads) and the nearest directionally appropriate CDS (Fig. 1B). A summary of the mapping and filtration processes and the settings used to perform these steps is also provided in the results package that is emailed to the user.

Fig. 1

(A) Screen shot of SLaP mapper results as visualised on the IGV genome browser. Four data tracks are shown. CDS are the gene models from V6 of the L. mexicana genome. Coverage is the from raw RNAseq reads mapped to the L. mexicana V6 genome (the coloured lines in the coverage plot indicate single nucleotide polymorphisms between the genome reference and the strain used for RNAseq). SAS are the splice acceptor addition sites identified by SLaP mapper. PAS are the polyadenylation addition sites identified by SLaP mapper. (B) The corresponding entries in the SLaP mapper results file for all SAS sites shown in A. The poly(A) results for the 55 sites shown in part A are not listed for space reasons. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

SLaP mapper is an analysis pipeline and uses a number of freely available programs and custom written Perl scripts. The algorithm proceeds in five phases. Read quality control. This step uses the Trimmomatic read processing tool [14] to remove known Illumina adaptor sequences and to trim reads based on quality scores. At this step read-pairs are also assessed for overlapping segments and those reads that overlap in the centre are joined using the fastq-join utility [15]. Identification, preparation and mapping of spliced-leader and poly(A) containing reads. Here all reads are treated individually and scanned for the presence of poly(A) tails or the appropriate species-specific spliced-leader sequence. A spliced-leader containing read is defined as a read containing at least 12 nucleotides of the 3′ end of the spliced-leader sequence, this minimum length is changeable by the user. A poly(A) containing read is defined as a read which ends in 5 or more A residues (reads are treated as un-stranded and scanned on both strands). The minimum poly(A) length is also specifiable by the user. Spliced-leader reads are split at the splice junction and the non-spliced-leader part of the read is mapped to the selected reference genome. Similarly the poly(A) read is split at the run of A residues and the non-poly(A) tail part of the read is mapped to the selected reference genome. Read mapping is performed using bowtie2 [16]. Filtering of putative splice-acceptor and poly(A) reads. Mapped putative splice-junction reads are checked to ensure that the location in the genome does not encode the 12 bases of the splice acceptor. Similarly mapped putative poly(A) tail reads are checked to ensure that the location in the genome does not contain an analogous run of A residues as was present in the read. Only bona fide splice-junction and polyadenylation addition sites are retained for further analysis. The option to disable this poly(A) site filter is provided on the webserver. Assigning sites to genes. Once reads have been mapped the location of the sites is recorded and they are assigned to CDS according to the following rules. Trans-splice sites are assigned to a CDS if they occur on the same strand as the CDS and downstream of the stop codon of the preceding gene and upstream of the stop codon of the gene in question. Polyadenylation sites are assigned to a CDS if they lie on the same strand as the CDS, downstream of the start codon of the CDS and upstream of the start codon of the next downstream CDS. Splice leader addition sites and poly(A) sites that occur within CDS are assigned to the CDS in which they reside. It should be noted here that trypanosomatid genomes contain a number of stable transcripts lacking CDSs that occur between true mRNAs. Thus it is possible that some sites belonging to non-coding transcripts may be incorrectly annotated to CDSs by this method. For this reason a separate file is also provided that lists the identified sites without assigning them to CDS. Quantification. Sites are quantified as the number of reads which uniquely map to each site location.

Discussion and conclusions

SLaP mapper is a simple to use resource that enables users to identify and quantify trans-splice and polyadenylation sites in kinetoplastid genomes. It is the only such software of its kind and it requires only a web-browser and no specialised knowledge of any programming environment. The user only need select the appropriate species and upload unprocessed read files. We describe the expected number of informative reads per-million reads that are obtained using two different library preparation protocols in two different species (Table 1). This shows that relatively little sequence data (<10 million reads) is required to provide a comprehensive genome-wide analysis of site usage. Recent RNA-sequencing studies of T. brucei and L. major have revealed that many genes can harbour multiple alternative processing sites [11,12]. While the functional significance of these sites has yet to be determined on a genome wide scale, it is likely that some of these alternative sites are important to the regulation and/or function of the final transcript. For example, alternative use of two different spliced-leader addition sites in T. brucei facilitates the dual localisation of an isoleucyl-tRNA synthetase [17]. In this case the alternative processing sites either include or exclude a mitochondrial localisation signal from the N-terminus of the final polypeptide. SLaP mapper can be readily used to detect such alternative processing sites for transcripts (for example see Fig. 1). In addition to providing a resource that will facilitate the annotation of novel kinetoplastid genomes, this server can also be used to quantify differences in splice-acceptor and polyadenylation site usage across a range of species. This is useful for comparative gene expression studies and in the analysis of post-transcriptional processing of kinetoplastid mRNA. Future releases of SLaP mapper will include more kinetoplastid genomes as they become available.

16 in total

Review 1. How the messenger got its tail: addition of poly(A) in the nucleus.

Authors: M Wickens
Journal: Trends Biochem Sci Date: 1990-07 Impact factor: 13.807

2. Temporal order of RNA-processing reactions in trypanosomes: rapid trans splicing precedes polyadenylation of newly synthesized tubulin transcripts.

Authors: E Ullu; K R Matthews; C Tschudi
Journal: Mol Cell Biol Date: 1993-01 Impact factor: 4.272

3. Coupling of poly(A) site selection and trans-splicing in Leishmania.

Authors: J H LeBowitz; H Q Smith; L Rusche; S M Beverley
Journal: Genes Dev Date: 1993-06 Impact factor: 11.361

4. Accurate polyadenylation of procyclin mRNAs in Trypanosoma brucei is determined by pyrimidine-rich elements in the intergenic regions.

Authors: N Schürch; A Hehl; E Vassella; R Braun; I Roditi
Journal: Mol Cell Biol Date: 1994-06 Impact factor: 4.272

5. The genome of the kinetoplastid parasite, Leishmania major.

Authors: Alasdair C Ivens; Christopher S Peacock; Elizabeth A Worthey; Lee Murphy; Gautam Aggarwal; Matthew Berriman; Ellen Sisk; Marie-Adele Rajandream; Ellen Adlem; Rita Aert; Atashi Anupama; Zina Apostolou; Philip Attipoe; Nathalie Bason; Christopher Bauser; Alfred Beck; Stephen M Beverley; Gabriella Bianchettin; Katja Borzym; Gordana Bothe; Carlo V Bruschi; Matt Collins; Eithon Cadag; Laura Ciarloni; Christine Clayton; Richard M R Coulson; Ann Cronin; Angela K Cruz; Robert M Davies; Javier De Gaudenzi; Deborah E Dobson; Andreas Duesterhoeft; Gholam Fazelina; Nigel Fosker; Alberto Carlos Frasch; Audrey Fraser; Monika Fuchs; Claudia Gabel; Arlette Goble; André Goffeau; David Harris; Christiane Hertz-Fowler; Helmut Hilbert; David Horn; Yiting Huang; Sven Klages; Andrew Knights; Michael Kube; Natasha Larke; Lyudmila Litvin; Angela Lord; Tin Louie; Marco Marra; David Masuy; Keith Matthews; Shulamit Michaeli; Jeremy C Mottram; Silke Müller-Auer; Heather Munden; Siri Nelson; Halina Norbertczak; Karen Oliver; Susan O'neil; Martin Pentony; Thomas M Pohl; Claire Price; Bénédicte Purnelle; Michael A Quail; Ester Rabbinowitsch; Richard Reinhardt; Michael Rieger; Joel Rinta; Johan Robben; Laura Robertson; Jeronimo C Ruiz; Simon Rutter; David Saunders; Melanie Schäfer; Jacquie Schein; David C Schwartz; Kathy Seeger; Amber Seyler; Sarah Sharp; Heesun Shin; Dhileep Sivam; Rob Squares; Steve Squares; Valentina Tosato; Christy Vogt; Guido Volckaert; Rolf Wambutt; Tim Warren; Holger Wedler; John Woodward; Shiguo Zhou; Wolfgang Zimmermann; Deborah F Smith; Jenefer M Blackwell; Kenneth D Stuart; Bart Barrell; Peter J Myler
Journal: Science Date: 2005-07-15 Impact factor: 47.728

6. Systematic study of sequence motifs for RNA trans splicing in Trypanosoma brucei.

Authors: T Nicolai Siegel; Kevin S W Tan; George A M Cross
Journal: Mol Cell Biol Date: 2005-11 Impact factor: 4.272

7. The transcriptome of the human pathogen Trypanosoma brucei at single-nucleotide resolution.

Authors: Nikolay G Kolev; Joseph B Franklin; Shai Carmi; Huafang Shi; Shulamit Michaeli; Christian Tschudi
Journal: PLoS Pathog Date: 2010-09-09 Impact factor: 6.823

8. A computational investigation of kinetoplastid trans-splicing.

Authors: Shuba Gopal; Saria Awadalla; Terry Gaasterland; George A M Cross
Journal: Genome Biol Date: 2005-10-17 Impact factor: 13.583

9. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

10. Trypanosome mRNAs share a common 5' spliced leader sequence.

Authors: M Parsons; R G Nelson; K P Watkins; N Agabian
Journal: Cell Date: 1984-08 Impact factor: 41.582

8 in total

Review 1. Untranslated regions of mRNA and their role in regulation of gene expression in protozoan parasites.

Authors: Shilpa J Rao; Sangeeta Chatterjee; Jayantapal K Pal
Journal: J Biosci Date: 2017-03 Impact factor: 1.826

2. Genome Sequence of Phytomonas françai, a Cassava (Manihot esculenta) Latex Parasite.

Authors: Claire E Butler; Eleanor Jaskowska; Steven Kelly
Journal: Genome Announc Date: 2017-01-12

3. Genome sequencing reveals metabolic and cellular interdependence in an amoeba-kinetoplastid symbiosis.

Authors: Goro Tanifuji; Ugo Cenci; Daniel Moog; Samuel Dean; Takuro Nakayama; Vojtěch David; Ivan Fiala; Bruce A Curtis; Shannon J Sibbald; Naoko T Onodera; Morgan Colp; Pavel Flegontov; Jessica Johnson-MacKinnon; Michael McPhee; Yuji Inagaki; Tetsuo Hashimoto; Steven Kelly; Keith Gull; Julius Lukeš; John M Archibald
Journal: Sci Rep Date: 2017-09-15 Impact factor: 4.379

4. Trypanosoma cruzi specific mRNA amplification by in vitro transcription improves parasite transcriptomics in host-parasite RNA mixtures.

Authors: Rafael Luis Kessler; Daniela Parada Pavoni; Marco Aurelio Krieger; Christian Macagnan Probst
Journal: BMC Genomics Date: 2017-10-16 Impact factor: 3.969

5. Nuclear Compartmentalization Contributes to Stage-Specific Gene Expression Control in Trypanosoma cruzi.

Authors: Lucía Pastro; Pablo Smircich; Andrés Di Paolo; Lorena Becco; María A Duhagon; José Sotelo-Silveira; Beatriz Garat
Journal: Front Cell Dev Biol Date: 2017-02-13

6. Stage-specific transcription activator ESB1 regulates monoallelic antigen expression in Trypanosoma brucei.

Authors: Lara López-Escobar; Benjamin Hänisch; Clare Halliday; Midori Ishii; Bungo Akiyoshi; Samuel Dean; Jack Daniel Sunter; Richard John Wheeler; Keith Gull
Journal: Nat Microbiol Date: 2022-07-25 Impact factor: 30.964

7. Comparative Life Cycle Transcriptomics Revises Leishmania mexicana Genome Annotation and Links a Chromosome Duplication with Parasitism of Vertebrates.

Authors: Michael Fiebig; Steven Kelly; Eva Gluenz
Journal: PLoS Pathog Date: 2015-10-09 Impact factor: 6.823

8. An Alternative Strategy for Trypanosome Survival in the Mammalian Bloodstream Revealed through Genome and Transcriptome Analysis of the Ubiquitous Bovine Parasite Trypanosoma (Megatrypanum) theileri.

Authors: Steven Kelly; Alasdair Ivens; G Adam Mott; Ellis O'Neill; David Emms; Olivia Macleod; Paul Voorheis; Kevin Tyler; Matthew Clark; Jacqueline Matthews; Keith Matthews; Mark Carrington
Journal: Genome Biol Evol Date: 2017-08-01 Impact factor: 3.416

8 in total