| Literature DB >> 32206209 |
Lucile Broseus1, William Ritchie1.
Abstract
Intron retention (IR) occurs when an intron is transcribed into pre-mRNA and remains in the final mRNA. An increasing body of literature has demonstrated a major role for IR in numerous biological functions and in disease. Here we give an overview of the different computational approaches for detecting IR events from sequencing data. We show that these are based on different biological and computational assumptions that may lead to dramatically different results. We describe the various approaches for mitigating errors in detecting intron retention and for discovering IR signatures between different conditions.Entities:
Keywords: AS, alternative splicing; Bioinformatics; Gene expression; IR, Intron retention; Intron retention; RNA sequencing; RNA-seq, RNA sequencing; mRNA splicing
Year: 2020 PMID: 32206209 PMCID: PMC7078297 DOI: 10.1016/j.csbj.2020.02.010
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Defining intronic intervals to be analyzed. Comprehensiveness of transcript annotation and the selection of reference intronic sequences have a major impact on IR detection. In the example, we consider a gene having three possible isoforms (A, B and C). Exons are represented as plain rectangles and introns as thick black lines. If only Isoforms B and C were annotated, the starred interval (*) would not be defined as an intron and most likely not detected as retained. Colored boxes indicate whether the annotated introns match the “all introns” or “independent/measurable intron” criteria used by current algorithms.
Fig. 2Potential sources of bias and confusion: a very unfortunate gene. Only intron 3 is retained in this example. In intron 1: expression of an overlapping feature causes a peak in alignments, which can artificially inflate the estimation of IR. Intronic alignments in intron 2 originate from an unanottated exon. Intron 3 is retained but it’s detection is hampered by multiple biases. First, the presence of a low mappability region (repeated A sequence in red) would result either in a gap or in high uncertainty in read alignments in that region. Secondly, high GC content in the 5′ exon explains the lack of exon-exon junctions and 3′ exon-intron reads and may affect filtering and IR metrics based on them. Thirdly, due to its long length, it tends to be more sparsely covered. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Standard implementation of computational detection of IR events.
Computational tools available to perform IR detection and their main features.
| Year | Publication | Language | Intron definition | IR measure | Low mappability correction | Unknown overlapping events detection | |
|---|---|---|---|---|---|---|---|
| MISO | 2010 | Nature Methods | Python | Independent introns | PSI | No | No |
| KMA | 2015 | arXiv | Python and R | Measurable introns | PSI | No | Coverage analysis (Probabilistic test) |
| iRead | 2017 | bioRXiv | Python | Independent introns | FPKM | No | Coverage Analysis (Shannon entropy) |
| IRFinder | 2017 | Genome Biology | C++ | All Introns | IRratio | Yes | Coverage Analysis (Detection of outlier regions) |
| IntEREest | 2018 | BMC Bioinformatics | R | Independent introns | PSI or FPKM | Optional | No |
| ASDT | 2018 | ATM | Perl | No (Reference-free) | No | No | Yes |
| JUM | 2018 | PNAS | Perl | No (Reference-free) | No | No | Yes |
Available computational methods to perform IR differential analysis.
| Method | Year | Language | IR-specific | IR measure | Normalization For library size | Control for gene expression | Modeling of biological variability | Statistical Framework |
|---|---|---|---|---|---|---|---|---|
| edgeR-IR | 2010 | R | No/Yes | Intron bin count | TMM (ref) | No/Yes | Yes | Generalized Linear Model |
| DESeq2-IR | 2014 | R | No/Yes | Intron bin count | Variance estimation and rescaling (ref) | No/Yes | Yes | Generalized Linear Model |
| DEXSeq-IR | 2012 | R | No/Yes | Intron bin count | Variance estimation and rescaling (ref) | Yes | Yes | Generalized Linear Model |
| iDiffIR | 2018 | Python | Yes | Average per base read coverage | TMM (ref) | Yes | Yes | LogFC statistic and Z-test |
These refer to IR-tuned versions of existing software, and may require custom pre-processing.
After IR-specific tuning.