| Literature DB >> 28050562 |
Martin Převorovský1, Martina Hálová1, Kateřina Abrhámová1, Jiří Libus1, Petr Folk1.
Abstract
Pre-mRNA splicing represents an important regulatory layer of eukaryotic gene expression. In the simple budding yeast Saccharomyces cerevisiae, about one-third of all mRNA molecules undergo splicing, and splicing efficiency is tightly regulated, for example, during meiotic differentiation. S. cerevisiae features a streamlined, evolutionarily highly conserved splicing machinery and serves as a favourite model for studies of various aspects of splicing. RNA-seq represents a robust, versatile, and affordable technique for transcriptome interrogation, which can also be used to study splicing efficiency. However, convenient bioinformatics tools for the analysis of splicing efficiency from yeast RNA-seq data are lacking. We present a complete workflow for the calculation of genome-wide splicing efficiency in S. cerevisiae using strand-specific RNA-seq data. Our pipeline takes sequencing reads in the FASTQ format and provides splicing efficiency values for the 5' and 3' splice junctions of each intron. The pipeline is based on up-to-date open-source software tools and requires very limited input from the user. We provide all relevant scripts in a ready-to-use form. We demonstrate the functionality of the workflow using RNA-seq datasets from three spliceosome mutants. The workflow should prove useful for studies of yeast splicing mutants or of regulated splicing, for example, under specific growth conditions.Entities:
Mesh:
Substances:
Year: 2016 PMID: 28050562 PMCID: PMC5168555 DOI: 10.1155/2016/4783841
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
RNA-seq datasets used in this study.
| Genotype | ArrayExpress acc. numbera | ENA acc. numberb | Read length (nt) | Total reads | Reads with MAPQ ≥ 10 | % reads with MAPQ ≥ 10 |
|---|---|---|---|---|---|---|
| WT | E-MTAB-5149 | ERR1709739 | 100 | 27 789 829 | 25 329 092 | 91,2% |
| ERR1709740 | 100 | 22 000 062 | 20 402 556 | 92,7% | ||
|
| E-MTAB-5149 | ERR1709737 | 100 | 27 842 215 | 25 491 566 | 91,6% |
| ERR1709738 | 100 | 25 156 639 | 23 359 541 | 92,9% | ||
| WT | E-GEOD-44219 | SRX233529 | 100 | 21 012 048 | 17 127 536 | 81,5% |
|
| E-GEOD-44219 | SRX233535 | 100 | 17 142 559 | 14 457 015 | 84,3% |
| WT | E-GEOD-49966 | SRR953535 | 101 | 35 203 753 | 7 655 225 | 21,8% |
|
| E-GEOD-49966 | SRR953537 | 101 | 17 326 529 | 3 596 304 | 20,8% |
aAccession number for the ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/).
bAccession number for the European Nucleotide Archive (http://www.ebi.ac.uk/ena).
Figure 1Workflow for calculating splicing efficiency from RNA-seq data. Files and datasets are represented by blue parallelograms (file formats given in parentheses), and processing steps are represented by orange rectangles (tool names given in parentheses). Some files/datasets are used repeatedly in several steps of the workflow as signified by multiple flow lines going from these files/datasets. The diagram was created using draw.io (https://www.draw.io/).
Figure 2Splicing efficiency in the prp45(1-169) mutant. Splicing efficiencies for all known introns (for 5′ and 3′ splice sites separately) were calculated using the pipeline described in Figure 1. (a, b) Results for two pooled biological replicates of the prp45(1-169) mutant and its corresponding wild-type strain. Higher values correspond to more efficient splicing. Full circles represent values for introns with sufficient coverage (≥5 transreads and ≥5 reads covering intron end base); open circles represent low-confidence values for introns with low sequencing read coverage. (c, d) Relative splicing efficiencies (prp45(1-169) normalized to wild type) at the 5′ and 3′ splice sites were calculated for each biological replicate separately. Only introns with sufficient read coverage were considered. Pearson's r values for the two replicates are indicated. (e) Comparison of relative splicing efficiencies at the 5′ and 3′ splice sites of selected genes calculated from the pooled RNA-seq data with relative splicing efficiencies determined by RT-qPCR (means of 4–6 independent RT-qPCR experiments ± SD).
Figure 3Splicing efficiency in the prp4-1 and prp40-1 mutants. Splicing efficiencies for all known introns (for 5′ and 3′ splice sites separately) were calculated using the pipeline described in Figure 1. (a, b) Results for the prp4-1 mutant and its corresponding wild-type strain [17]. (c, d) Results for the prp40-1 mutant and its corresponding wild-type strain [24]. Higher values correspond to more efficient splicing. Full circles represent values for introns with sufficient coverage (≥5 transreads and ≥5 reads covering intron end base); open circles represent low-confidence values for introns with low sequencing read coverage.