| Literature DB >> 31105749 |
Bo Wang1, Vivek Kumar1, Andrew Olson1, Doreen Ware1,2.
Abstract
Advances in transcriptomics have provided an exceptional opportunity to study functional implications of the genetic variability. Technologies such as RNA-Seq have emerged as state-of-the-art techniques for transcriptome analysis that take advantage of high-throughput next-generation sequencing. However, similar to their predecessors, these approaches continue to impose major challenges on full-length transcript structure identification, primarily due to inherent limitations of read length. With the development of single-molecule sequencing (SMS) from PacBio, a growing number of studies on the transcriptome of different organisms have been reported. SMS has emerged as advantageous for comprehensive genome annotation including identification of novel genes/isoforms, long non-coding RNAs and fusion transcripts. This approach can be used across a broad spectrum of species to better interpret the coding information of the genome, and facilitate the biological function study. We provide an overview of SMS platform and its diverse applications in various biological studies, and our perspective on the challenges associated with the transcriptome studies.Entities:
Keywords: Iso-Seq; RNA-Seq; alternative splicing; isoforms; single-molecule transcriptome sequencing; transcriptomics
Year: 2019 PMID: 31105749 PMCID: PMC6498185 DOI: 10.3389/fgene.2019.00384
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Schematic workflow of Iso-Seq.
List of the Iso-Seq tools along with a brief description of their usage and related online links.
| Tool | Usage | Website | Literature |
|---|---|---|---|
| ASTALAVISTA | Detect alternative splicing events | ||
| CASH | Detect alternative splicing events | ||
| CodingQuarry | Gene prediction (HMM-based) using both RNA-Seq data and genome sequence | ||
| GMAP | Spliced alignment to genome | ||
| LoRDEC | Error correction of FLNC with short read RNA-seq | ||
| LoReAn | Comparative analysis and annotation: identify novel isoforms/genes against reference annotation | ||
| LSC | Error correction of FLNC with short read RNA-seq | ||
| minimap2 | Spliced alignment to genome | ||
| PASA | Detect alternative splicing events | ||
| Proovread | Error correction of FLNC with short read RNA-seq | ||
| Quiver | Polishing PacBio RS II reads | ||
| SpliceGrapher | Detect alternative splicing events | ||
| SQANTI | Comparative analysis and annotation: identify novel isoforms/genes against reference annotation | ||
| STAR | Spliced alignment to genome | ||
| SUPPA | Detect alternative Splicing events | ||
| TAPIS | Alternative splicing, collapsing redundant or degraded transcripts | ||
| ToFU | Preprocessing (collapse to non-redundant isoforms) |
FIGURE 2Schematic model of alternative splicing utilization.
FIGURE 3Alignment and ranking of different isoforms. (A) Gene tree multiple sequence alignment color coded by interpro domain. The orange domain is conserved in grass orthologs, but is not identified completely in maize because a retained intron disrupts it and induces a frame shift (shown by arrow). The resulting longer translation was selected for analysis in the gene tree pipeline. Different colors mean different interpro domains, black color means “no domain here,” lightly shaded area indicates a gap in the multiple sequence alignment. The thin red lines show the positions of exon junctions. (B) Ranking of different isoforms of Zm00001d003817 based on different standards. T002 has the longest CDS, but T003 outperforms it in domain length and annotation edit distance.