Literature DB >> 30010768

SL-quant: a fast and flexible pipeline to quantify spliced leader trans-splicing events from RNA-seq data.

Abstract

Background: The spliceosomal transfer of a short spliced leader (SL) RNA to an independent pre-mRNA molecule is called SL trans-splicing and is widespread in the nematode Caenorhabditis elegans. While RNA-sequencing (RNA-seq) data contain information on such events, properly documented methods to extract them are lacking. Findings: To address this, we developed SL-quant, a fast and flexible pipeline that adapts to paired-end and single-end RNA-seq data and accurately quantifies SL trans-splicing events. It is designed to work downstream of read mapping and uses the reads left unmapped as primary input. Briefly, the SL sequences are identified with high specificity and are trimmed from the input reads, which are then remapped on the reference genome and quantified at the nucleotide position level (SL trans-splice sites) or at the gene level. Conclusions: SL-quant completes within 10 minutes on a basic desktop computer for typical C. elegans RNA-seq datasets and can be applied to other species as well. Validating the method, the SL trans-splice sites identified display the expected consensus sequence, and the results of the gene-level quantification are predictive of the gene position within operons. We also compared SL-quant to a recently published SL-containing read identification strategy that was found to be more sensitive but less specific than SL-quant. Both methods are implemented as a bash script available under the MIT license [1]. Full instructions for its installation, usage, and adaptation to other organisms are provided.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：

Year: 2018 PMID： 30010768 PMCID： PMC6055573 DOI： 10.1093/gigascience/giy084

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Background

The capping, splicing, and polyadenylation of eukaryotic pre-mRNAs are well-studied maturation processes that are essential for proper gene expression in eukaryotes [2]. Much less is known about spliced leader (SL) trans-splicing, a process by which a capped small nuclear RNA called spliced leader is spliced onto the 5’ end of a pre-mRNA molecule, substituting for canonical capping [3] (Fig. 1A). SL trans-splicing has a patchy phylogenetic distribution ranging from protists [4] to bilaterian metazoans, including nematodes, rotifers [5], and even chordates [6]. It appears not conserved in mammals, although “non-SL” trans-splicing events—when exons from two different RNA transcripts are spliced together—have been detected at low frequency [7]. In contrast, SL trans-splicing is widespread in the Caenorhabditis elegans nematode where there are two classes of SL, SL1 and SL2, which trans-splice about 70% of the mRNA transcripts. Strikingly, the SL2 trans-splicing is highly specific for genes in position two and over within operons that range from two to eight genes expressed from a single promoter [8].

Figure 1:

Trans-splicing and RNA-seq. (A) The trans-splicing process. Splice leader RNA precursors (SL RNA) are small nuclear RNAs capped with a trimethyl-guanosine (TMG). The 5’-region of the SL RNA, including the TMG cap, is spliced on the first exon of the pre-mRNAs. (B) Reads originating from trans-spliced RNA fragments do not map end-to-end to the reference genome. (C) The left-most read (R2) of a read pair does not map end-to-end to the reference. (D) Special case when the paired-end reads “dovetail” and both reads do not map end-to-end to the reference due to the SL sequence. While the function of SL trans-splicing begins to be elucidated [9], its regulation remains unclear. To study this question, two main strategies have been proposed to exploit RNA-sequencing (RNA-seq) data in order to quantify SL trans-splicing. The first one involves the mapping of the reads to a complex database containing all the possible trans-spliced gene models [10, 11]. The creation of such a database requires the in silico trans-splicing of every SL sequence isoform (12 in C. elegans) to all the putative trans-splice sites predicted for a gene. In contrast, the second strategy does not rely on trans-splice site annotation or prediction. Instead, the SL sequences are directly identified in reads partially mapped to the genome or transcriptome [12-14]. However, no implementation of these methods is directly available, which prompted us to develop, test, and optimize SL-quant, a ready-to-use pipeline that applies the second strategy to rapidly quantify SL trans-splicing events from RNA-seq data.

Pipeline overview

In order to search for SL sequences in a limited number of reads, only unmapped reads are used as input for SL-quant, assuming that reads containing the SL sequence (or the 3’ end of it) would not map on the reference genome or transcriptome (Fig. 1B). This implies that a first round of mapping must precede the use of SL-quant. It must be performed end-to-end in order to guarantee that reads originating from trans-spliced RNA fragments do not map. In addition to this specification, any bam file containing unmapped reads can be fed into SL-quant, making it particularly well suited for subsequent analyses of previously generated data. In the case paired-end reads are available, only the unmapped reads originating from the left-most ends of the fragments are considered. In addition, we developed an optimized paired-end mode (-p –paired option) that further limits the search for SL-containing reads by filtering out the unmapped reads whose mates are also unmapped. This assumes that only the left-most read of a pair originating from a trans-spliced fragment would not map due to the SL sequence while the other one would map (Fig. 1C). This is generally true unless the fragment is so small that the mates significantly overlap with each other (Fig. 1D). To identify SL trans-splicing events, the input reads are aligned locally to the SL sequences with Basic Local Alignment Search Tool (BLAST) [15]. Reads whose 5’ end belongs to a significant alignment (e-value <5%) that covers the 3΄ end of the SL sequence (Fig. 2A, left panel) are considered SL-containing reads. Then, the SL-containing reads are trimmed of the SL sequence (based on the length of the BLAST alignment) and mapped back on the C. elegans genome with HISAT2 [16]. Finally, the remapped reads are counted at the gene level with featureCounts [17] to obtain a quantification of the SL1 and SL2 trans-splicing events per genes.

Figure 2:

Configuration of the BLAST alignments. (A) InSL-quant, the BLAST alignments are considered as properly configured if starting from the 5’ end of the unmapped read and ending at the 3’ end of the SL sequence. (B) Proportion of properly configured alignments out of the significant alignment identified by SL-quant in single and paired-end (-p) mode on the SRR1585277 dataset, or on 106 random reads in single-end mode. (C) Number of properly configured significant alignments found by SL-quant on the SRR1585277 dataset (single-end mode) by alignment length on the SL1 or SL2 sequences.

SL-containing reads identification

We tested SL-quant on the single-end modENCODE_4594 [18] dataset (2.5 × 106 unmapped reads) and the paired-end SRR1585277 [19] dataset (1.3 × 106 unmapped left reads) using a desktop computer with basic specifications. Every run was completed within 10 minutes using four threads, with a processing rate of about 106 unmapped reads by 5 minutes. In order to assess the specificity of the BLAST alignments, we reasoned that reads originating from a trans-spliced RNA would align to the 3’ end of the SL sequence from their 5’ end, while random alignment would start anywhere (Fig. 2A). The fact that 94% of significant alignments were in that specific configuration indicates good specificity (Table 1 and Fig. 2B). In contrast, we obtained less than 0.3% with randomly generated reads. In paired-end mode, fewer alignments were found, but a slightly higher proportion of them (95%) were in proper configuration and considered SL-containing reads. This was expected given the more stringent prefiltering implemented in that mode. When considering only the nonsignificant alignments, we obtained intermediate proportions of proper configuration (15%–20%), suggesting that most, but not all, of those nonsignificant alignments were spurious.

Table 1:

Identification of SL-containing reads by SL-quant

				Significant alignments		Nonsignificant alignments
Dataset	Method	Total reads	Input reads	Total	Properly configured	Total	Properly configured
SRR1585277	SL-quant	40 × 10⁶	1.3 × 10⁶	71, 512	67,021 (94%)	70 211	10,359 (15%)
	SL-quant -p	40 × 10⁶	0.9 × 10⁶	67, 463	64,010 (95%)	47 596	9,849 (21%)
modENCODE_4594	SL-quant	30 × 10⁶	2.5 × 10⁶	168, 351	158,529 (94%)	100 139	20,417 (20%)
random	SL-quant	1 × 10⁶	1.0 × 10⁶	12, 788	36 (0.3%)	43 501	83 (0.2%)

SL-containing reads are defined as reads with significant and properly configured alignment to the SL sequences (sixth column).

Identification of SL-containing reads by SL-quant SL-containing reads are defined as reads with significant and properly configured alignment to the SL sequences (sixth column). Despite the C. elegans SL sequences being 22 nucleotides (nt) long, most alignments cover them on only 10–11 nt (Fig. 2C), with a preference for 10 nt alignment for SL1-containing reads and 11 nt alignments for SL2-containing reads. This could be caused by reverse transcriptase drop-off during the library preparation due to secondary structure and the proximity of the hypermethylated cap at the 5’ end of the SL. Moreover, in classic RNA-seq library preparation protocols, the second-strand synthesis is primed by RNA oligonucleotides generated by the digestion of the RNA-DNA duplex obtained after the first strand synthesis. This results in truncated dsDNA fragments that do not preserve the 5’ end of the original RNA fragments [20].

SL trans-splice sites identification

While we designed SL-quant with the idea of quantifying SL trans-splicing events by gene, it is also possible to use it to identify the 3’ trans-splice sites at single-nucleotide resolution. SL trans-splice sites are known to display the same UUUCAG consensus as cis-splice sites [21], which could be verified with our method (Fig. 3A, 3B). Previous work described a significant switch from A to G after consensus sequence (position +1) for the SL1 trans-splice sites compared to SL2 trans-splice sites [21]. At that position, we observed a decreased preference for A for the SL1 trans-splice sites, but no significant enrichment in G. This discrepancy could be due to the fact that we identified (and included in the consensus) about 20 times more SL1 trans-splice sites than previously reported.

Figure 3:

SL-sites consensus sequence. (A) Sequence logo of the sequence environment surrounding SL1 or (B) SL2 trans-splice sites determined by SL-quant on the SRR1585277 dataset in single-end mode. (C) Proportion of AG sequences in SL trans-splice sites identified by SL-quant on the SRR1585277 dataset with the method used in Tourasse et al. 2017 [14] and with SL-quant in single-end mode with or without the sensitive option (-s). As SL trans-splice sites (and splice sites in general) contain an almost invariant AG sequence, we reasoned that non-AG splice sites were potential “spurious” trans-splice sites. In order to assess the performances of our method, we considered identified sites bearing the “AG” consensus as true positives (TPs). Reciprocally, we considered any other sites identified as false positives (FPs), although we cannot completely exclude the existence of nonconsensus splice sites. These reasonable approximations allow us to characterize our method despite not knowing the ground truth. Indicating excellent specificity (ability to exclude FP), 98% of the sites identified by SL-quant display the AG consensus, regardless of the mode used (single or paired) and the dataset studied (Table 2).

Table 2:

Performances of SL-quant with various parameters.

Dataset	Method	Run time	Mapped SL-containing reads	Trans-splice sites	Site is “AG” consensus (%)
SRR1585277	SL-quant	4 minutes 02 seconds	65,126	6,301	6,149 (98)
	SL-quant -p	5 minutes 14 seconds	61,451	6,539	6,402 (98)
	SL-quant -s	2 minutes 45 seconds	120,542	8,770	8,254 (94)
	SL-quant -s -p	6 minutes 58 seconds	114,948	8,436	7,957 (94)
	Tourasse	4 minutes 45 seconds	120,710	8,932	8,260 (92)
modENCODE_4594	SL-quant	9 minutes 51 seconds	146,358	8,247	8,081 (98)
	SL-quant -s	3 minutes 10 seconds	258,706	10,735	9,948 (93)
	Tourasse	5 minutes 08 seconds	259,284	11,155	9,953 (89)
random	SL-quant	3 minutes 20 seconds	53	52	34 (65)
	SL-quant -s	1m23s	5,757	5,692	5,612 (99 ^a)
	Tourasse	2m24s	8,890	8,777	5,612 (64)

The very high proportion of “AG” sites for the random dataset is an artifact caused by the fact that the reads were generated from randomly sampling the genome and that all the C. elegans SL sequences end by AG. -p: paired-end mode; -s: sensitive mode.

Performances of SL-quant with various parameters. The very high proportion of “AG” sites for the random dataset is an artifact caused by the fact that the reads were generated from randomly sampling the genome and that all the C. elegans SL sequences end by AG. -p: paired-end mode; -s: sensitive mode.

Comparison with a previous method

We also compared our method with a re-implementation of the SL-containing read identification strategy previously reported [14]. Briefly, the unmapped reads whose 5’ end align to the SL sequences (or their reverse complement) on at least 5 nt with at most 10% mismatch are considered SL-containing reads. The alignment is realized with cutadapt [22] that directly trims the SL sequences from the unmapped reads so they can be remapped to the genome. Compared to SL-quant, this conceptually similar method was faster and identified almost twice the number of SL-containing reads from the real datasets and 150 times the number of SL-containing reads from random reads (Table 2). More splice-sites were identified, but the proportion of spurious (nonconsensus) trans-splice sites increased almost 5-fold (Fig. 3C). The method developed in [14] has a higher detection power but appears less specific than SL-quant. Nevertheless, we consider it an interesting option for applications requiring more sensitivity (ability to detect TP) than specificity. Therefore, we decided to re-implement it within SL-quant as an [-s –sensitive] option with the following enhancement: The input reads, if strand specific, are aligned to the SL sequences only (not their reverse complement). With paired-end data in single-end mode, only the left-most unmapped reads are considered as input. With paired-end data in paired-end mode, only the left-most unmapped reads whose mates are mapped are considered as input. These modifications significantly improved the specificity of the method (although not to the level of SL-quant), with almost no compromise on sensitivity regarding SL trans-splice site detection (Fig. 3C) or SL-containing read identification (Table 2).

Gene-level quantification

Finally, we tested SL-quant for its ability to predict gene position within operons as SL2-trans-splicing is the best predictor of transcription initiated upstream of another gene [11] (Fig. 4A). Using the ratio of SL2/(SL1 + SL2) from the SL-quant output as a predictor of gene positions in operons, receiver operating characteristic curve analysis reveals a high TP rate (>90%) at a 5% false discovery rate threshold, regardless of SL-quant options (Fig. 4B). However, when tolerating more FPs, SL-quant in sensitive mode is a superior predictor.

Figure 4:

Prediction of gene position in operons. (A) Number of SL1 and SL2 trans-splicing events by genes as calculated using SL-quant. Genes annotated as downstream in the operons are represented as red dots. (B) Receiver operating characteristic curve analysis using the SL2/(SL1 + SL2) ratio as a predictor of downstream position in operons for the 5,521 genes with at least one trans-splicing event detected. The number of SL1 and SL2 trans-splicing events by genes was calculated using SL-quant in single or paired (-p) mode, with or without the sensitive (-s) option. TPR: true-positive rate, FPR: false-positive rate.

Conclusion

In summary, SL-quant is able to rapidly and accurately quantify trans-splicing events from RNA-seq data. It comes as a well-documented and ready-to-use pipeline in which two main options were implemented to fit the type of input data and the intended usage of the quantification (Fig. 5). Importantly, this work provides a way to test and validate SL trans-splicing quantification methods that might serve as a baseline for future development of such methods.

Figure 5:

Recommendations on SL-quant usage. [-s –sensitive]: it provides increased detection power at the cost of some specificity and it is significantly faster. It is not recommended for applications that are very sensitive to FPs (e.g., trans-splice sites detection) but is an interesting option otherwise (e.g., gene-level quantification of SL trans-splicing events). [-p –paired]: a more stringent prefiltering reduces the number of reads aligned to the SL sequences. It can only be used with paired-end reads. It is not recommended when the average fragment size is small (many “dovetail” reads). It can be used in combination with the [-s –sensitive] option. Recently, the hypothesis that the SL trans-splicing mechanism originates from the last eukaryotic common ancestor has been proposed to explain its broad phylogenetic distribution [23]. Given the number of applicable species, the continuously decreasing cost of RNA-seq experiments, and the thinner line between model and nonmodel organisms, it is likely that the SL trans-splicing will be studied in a growing number of species. Therefore, a procedure to adapt SL-quant to species other than C. elegans, requiring only a few steps, is detailed online. As a proof of concept, we successfully applied SL-quant to six additional RNA-seq libraries from five species (Table 3). In the near future, we anticipate that the application of SL-quant to various datasets might become instrumental in unveiling trans-splicing regulation in the model organism C. elegans and other organisms.

Table 3:

SL-quant can be applied to a wide range of datasets from various species, with varying read length and made with various library preparation protocols.

Organism	Dataset	Read length (nt)	Total reads	Input reads	Mapped SL-containing reads	Trans-splice sites (% AG)
Caenorhabditis elegans	SRR1585277	76	40 × 10⁶	1.3 × 10⁶	120,542	8,770 (94)
	modENCODE_4594	76	30 × 10⁶	2.5 × 10⁶	258,706	10,735 (93)
	SRR2832497 (*)	41	4 × 10⁶	1.8 × 10⁶	16,307	4,882 (87)
Caenorhabditis briggsae	SRR440441	42	11 × 10⁶	5.7 × 10⁶	117,738	8,382 (93)
	SRR440557	42	12 × 10⁶	4.8 × 10⁶	176,205	11,495 (92)
Caenorhabditis brenneri	modENCODE_4705	76	4 × 10⁶	0.4 × 10⁶	74,689	8,891 (97)
Caenorhabditis remanei	modENCODE_4206	76	9 × 10⁶	1.8 × 10⁶	248,335	11,223 (92)
Trypanosoma brucei	SRR038724	35	8 × 10⁶	2.2 × 10⁶	40,320	6,703 (89)

The datasets modENCODE_4594, SRR2832497, and SRR038724 are single end, the others are paired. The asterisk (*) for the SRR2832497 denotes that the second-strand synthesis was made using a ligation-based protocol instead of the classic random priming protocol. All datasets were analyzed with the same SL-quant parameters: single-end mode with the -s –sensitive option

SL-quant can be applied to a wide range of datasets from various species, with varying read length and made with various library preparation protocols. The datasets modENCODE_4594, SRR2832497, and SRR038724 are single end, the others are paired. The asterisk (*) for the SRR2832497 denotes that the second-strand synthesis was made using a ligation-based protocol instead of the classic random priming protocol. All datasets were analyzed with the same SL-quant parameters: single-end mode with the -s –sensitive option

Methods

We ran SL-quant with four threads (default) on the modENCODE_4594, modENCODE_4705, modENCODE_4206 [18], SRR2832497 [24], SRR440441, SRR440557 [25], SRR038724 [26], and SRR1585277 [19] poly-A + datasets using a desktop computer with a 2.8-GHz processor and 8 GB random access memory. The C. elegans, C. briggsae, C. brenneri, and C. remanei reference genome and annotation (WS262) were downloaded from wormbase [27]. The T. brucei reference genome and annotation (Apr_2005 version) were downloaded from Ensembl [28]. The read mapping steps prior to using SL-quant and at the end of the pipeline were performed using HISAT2 [16] (v 2.0.5) with parameters –no-softclip –no-discordant –min-intronlen 20 –max-intronlen 5000. As we noticed adaptor contamination in the modENCODE_4594 dataset, trimmomatic [29] (v 0.36) was used to trim them off prior to the mapping. Samtools [30] (v 1.5), picard [31] (v 2.9), and bedtools [32] (v 2.26) were used to convert and/or filter the reads at various stages of the pipeline. BLAST+ (v 2.6) [15] was used to align the reads locally to the relevant SL sequences [33, 34] with parameter -task blastn -word_size 8 max_target_seqs 1. Alternatively, cutadapt (v 1.14) [22] was used to directly trim the SL sequences from the reads with parameters -O 5 -m 15 –discard-untrimmed. FeatureCounts [17] was used to summarize re-mapped SL-containing reads at the gene level. Bedtools [32] was used to summarize mapped SL-containing reads at the genomic position level and to generate random reads by randomly sampling the C. elegans genome for 50-nt segments. Sequence logo were made with weblogo [35]. Finally, R [36] (v 3.4) was used for analyzing and visualizing the data.

Availability of source code and requirements

Project name: SL-quant Project home page: https://github.com/cyaguesa/SL-quant Operating system(s): UNIX-based systems (tested on macOS 10.12.6, macOS 10.11.6, Ubuntu 14.04) Programming language: Shell, R Other requirements: The BLAST+ suite (2.6.0 or higher), samtools (1.5 or higher), picard-tools (2.9.0 or higher), featureCounts from the subread package. (1.5.0 or higher), bedtools (2.26.0 or higher), cutadapt (1.14 or higher), hisat2 (2.0.5 or higher). Installation instruction for those requirements is provided online. License: MIT RRID:SCR_016205 Click here for additional data file. Click here for additional data file. Click here for additional data file. 5/20/2018 Reviewed Click here for additional data file. 6/8/2018 Reviewed Click here for additional data file. 05/22/2018 Reviewed Click here for additional data file. 6/16/2018 Reviewed Click here for additional data file. 22 May 2018 Reviewed Click here for additional data file. 6/12/2018 Reviewed Click here for additional data file. Click here for additional data file.

32 in total

1. mRNA 5'-leader trans-splicing in the chordates.

Authors: A E Vandenberghe; T H Meedel; K E Hastings
Journal: Genes Dev Date: 2001-02-01 Impact factor: 11.361

Review 2. Trans-splicing and operons in C. elegans.

Authors: Thomas Blumenthal
Journal: WormBook Date: 2012-11-20

3. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

Authors: Mark B Gerstein; Zhi John Lu; Eric L Van Nostrand; Chao Cheng; Bradley I Arshinoff; Tao Liu; Kevin Y Yip; Rebecca Robilotto; Andreas Rechtsteiner; Kohta Ikegami; Pedro Alves; Aurelien Chateigner; Marc Perry; Mitzi Morris; Raymond K Auerbach; Xin Feng; Jing Leng; Anne Vielle; Wei Niu; Kahn Rhrissorrakrai; Ashish Agarwal; Roger P Alexander; Galt Barber; Cathleen M Brdlik; Jennifer Brennan; Jeremy Jean Brouillet; Adrian Carr; Ming-Sin Cheung; Hiram Clawson; Sergio Contrino; Luke O Dannenberg; Abby F Dernburg; Arshad Desai; Lindsay Dick; Andréa C Dosé; Jiang Du; Thea Egelhofer; Sevinc Ercan; Ghia Euskirchen; Brent Ewing; Elise A Feingold; Reto Gassmann; Peter J Good; Phil Green; Francois Gullier; Michelle Gutwein; Mark S Guyer; Lukas Habegger; Ting Han; Jorja G Henikoff; Stefan R Henz; Angie Hinrichs; Heather Holster; Tony Hyman; A Leo Iniguez; Judith Janette; Morten Jensen; Masaomi Kato; W James Kent; Ellen Kephart; Vishal Khivansara; Ekta Khurana; John K Kim; Paulina Kolasinska-Zwierz; Eric C Lai; Isabel Latorre; Amber Leahey; Suzanna Lewis; Paul Lloyd; Lucas Lochovsky; Rebecca F Lowdon; Yaniv Lubling; Rachel Lyne; Michael MacCoss; Sebastian D Mackowiak; Marco Mangone; Sheldon McKay; Desirea Mecenas; Gennifer Merrihew; David M Miller; Andrew Muroyama; John I Murray; Siew-Loon Ooi; Hoang Pham; Taryn Phippen; Elicia A Preston; Nikolaus Rajewsky; Gunnar Rätsch; Heidi Rosenbaum; Joel Rozowsky; Kim Rutherford; Peter Ruzanov; Mihail Sarov; Rajkumar Sasidharan; Andrea Sboner; Paul Scheid; Eran Segal; Hyunjin Shin; Chong Shou; Frank J Slack; Cindie Slightam; Richard Smith; William C Spencer; E O Stinson; Scott Taing; Teruaki Takasaki; Dionne Vafeados; Ksenia Voronina; Guilin Wang; Nicole L Washington; Christina M Whittle; Beijing Wu; Koon-Kiu Yan; Georg Zeller; Zheng Zha; Mei Zhong; Xingliang Zhou; Julie Ahringer; Susan Strome; Kristin C Gunsalus; Gos Micklem; X Shirley Liu; Valerie Reinke; Stuart K Kim; LaDeana W Hillier; Steven Henikoff; Fabio Piano; Michael Snyder; Lincoln Stein; Jason D Lieb; Robert H Waterston
Journal: Science Date: 2010-12-22 Impact factor: 47.728

4. Sequencing of first-strand cDNA library reveals full-length transcriptomes.

Authors: Saurabh Agarwal; Todd S Macfarlan; Maureen A Sartor; Shigeki Iwase
Journal: Nat Commun Date: 2015-01-21 Impact factor: 14.919

5. The transcriptome of the human pathogen Trypanosoma brucei at single-nucleotide resolution.

Authors: Nikolay G Kolev; Joseph B Franklin; Shai Carmi; Huafang Shi; Shulamit Michaeli; Christian Tschudi
Journal: PLoS Pathog Date: 2010-09-09 Impact factor: 6.823

6. A global analysis of Caenorhabditis elegans operons.

Authors: Thomas Blumenthal; Donald Evans; Christopher D Link; Alessandro Guffanti; Daniel Lawson; Jean Thierry-Mieg; Danielle Thierry-Mieg; Wei Lu Chiu; Kyle Duke; Moni Kiraly; Stuart K Kim
Journal: Nature Date: 2002-06-20 Impact factor: 49.962

7. C. elegans sequences that control trans-splicing and operon pre-mRNA processing.

Authors: Joel H Graber; Jesse Salisbury; Lucie N Hutchins; Thomas Blumenthal
Journal: RNA Date: 2007-07-13 Impact factor: 4.942

8. RNA-seq analysis of the C. briggsae transcriptome.

Authors: Bora Uyar; Jeffrey S C Chu; Ismael A Vergara; Shu Yi Chua; Martin R Jones; Tammy Wong; David L Baillie; Nansheng Chen
Journal: Genome Res Date: 2012-07-06 Impact factor: 9.043

9. The time-resolved transcriptome of C. elegans.

Authors: Max E Boeck; Chau Huynh; Lou Gevirtzman; Owen A Thompson; Guilin Wang; Dionna M Kasper; Valerie Reinke; LaDeana W Hillier; Robert H Waterston
Journal: Genome Res Date: 2016-08-16 Impact factor: 9.043

10. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

4 in total

1. Heterodera glycines utilizes promiscuous spliced leaders and demonstrates a unique preference for a species-specific spliced leader over C. elegans SL1.

Authors: Stacey N Barnes; Rick E Masonbrink; Thomas R Maier; Arun Seetharam; Anoop S Sindhu; Andrew J Severin; Thomas J Baum
Journal: Sci Rep Date: 2019-02-04 Impact factor: 4.379

2. Resolution of polycistronic RNA by SL2 trans-splicing is a widely conserved nematode trait.

Authors: Marius Wenzel; Christopher Johnston; Berndt Müller; Jonathan Pettitt; Bernadette Connolly
Journal: RNA Date: 2020-09-04 Impact factor: 4.942

3. RNA polymerase II CTD S2P is dispensable for embryogenesis but mediates exit from developmental diapause in C. elegans.

Authors: C Cassart; C Yague-Sanz; F Bauer; P Ponsard; F X Stubbe; V Migeot; M Wery; A Morillon; F Palladino; V Robert; D Hermand
Journal: Sci Adv Date: 2020-12-09 Impact factor: 14.136

4. SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data.

Authors: Marius A Wenzel; Berndt Müller; Jonathan Pettitt
Journal: BMC Bioinformatics Date: 2021-03-22 Impact factor: 3.169

4 in total