| Literature DB >> 33425250 |
Wei Jiang1, Liang Chen1.
Abstract
Alternative splicing contributes to the majority of protein diversity in higher eukaryotes by allowing one gene to generate multiple distinct protein isoforms. It adds another regulation layer of gene expression. Up to 95% of human multi-exon genes undergo alternative splicing to encode proteins with different functions. Moreover, around 15% of human hereditary diseases and cancers are associated with alternative splicing. Regulation of alternative splicing is attributed to a set of delicate machineries interacting with each other in aid of important biological processes such as cell development and differentiation. Given the importance of alternative splicing events, their accurate mapping and quantification are paramount for downstream analysis, especially for associating disease with alternative splicing. However, deriving accurate isoform expression from high-throughput RNA-seq data remains a challenging task. In this mini-review, we aim to illustrate I) mechanisms and regulation of alternative splicing, II) alternative splicing associated human disease, III) computational tools for the quantification of isoforms and alternative splicing from RNA-seq.Entities:
Keywords: Alternative splicing; Human disease; Isoform quantification; RNA-Seq
Year: 2020 PMID: 33425250 PMCID: PMC7772363 DOI: 10.1016/j.csbj.2020.12.009
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Constitutive and five major types of alternative splicing. a: Constitutive splicing; b: exon skipping (cassette exons); c: mutually exclusive exons; d: alternative 5′ splice sites (alternative donors); e: alternative 3′ splice sites (alternative acceptors); f: intron retention.
Fig. 2Stepwise schematic presentation of general pre-mRNA splicing. Abbreviations: BP: brunch point; SS: splice site.
Methods of isoform or splicing analysis from RNA-seq. We summarized a collection of benchmark characteristics from simulated studies, literature reviews [142], [143], [144], and software documentation. Abbreviations: ICA: isoform-centric approaches; ECA: exon-centric approaches; EM: expectation maximization algorithm; VB: variational Bayes inference algorithm, MCMC: Markov chain Monte Carlo.
| Methods | Speed | Alignment-free | Novel transcript/splicing event discovery | Major algorithm | Input format | Memory Usage | Multi-threading | |
|---|---|---|---|---|---|---|---|---|
| Cufflinks | ICA | Relative slow | No | Yes | EM | SAM/BAM | Medium | Yes |
| StringTie | ICA | Fast | No | Yes | Network flow | SAM/BAM | Medium | No |
| RSEM | ICA | Relative slow | No | No | EM | SAM/BAM/FASTQ | Medium | No |
| WemIQ | ICA | Fast | No | No | EM | SAM | Medium | No |
| eXpress | ICA | Fast | No | No | EM | SAM/BAM | Small | Yes |
| Sailfish | ICA | Extremely fast | Yes | No | EM, VB | FASTA/FASTQ | Small | Yes |
| Kallisto | ICA | Very fast | Yes | No | EM | FASTA/FASTQ | Small | Yes |
| Salmon | ICA | Very fast | Yes | No | EM, VB | SAM/BAM/FASTQ | Small | Yes |
| MISO | ECA | Fast | No | No | MCMC | SAM | Small | Yes |
| SUPPA | ECA | Fast | No | Yes (with de novo assembler) | Density-based clustering algorithm | Expression in TPM | Small | Yes |
| SplAdder | ECA | Fast | No | Yes | Splicing graph | BAM, GTF/GFF | Small | No |