| Literature DB >> 33971903 |
Cong Ma1, Hongyu Zheng2, Carl Kingsford3.
Abstract
BACKGROUND: The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447-55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed.Entities:
Keywords: Alternative splicing; Network flow; RNA-seq; Splice graph; Transcript quantification
Year: 2021 PMID: 33971903 PMCID: PMC8112020 DOI: 10.1186/s13015-021-00184-7
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.721
Fig. 1An example construction of the Prefix Graph. The source and sink of the prefix graph are [S] and [T], respectively. The set of phasing paths is shown in blue in the left panel, and we do not include the singleton paths for simplicity. We draw the trie and the fail edges for the a–c automaton as it reduces cluttering (dictionary suffix link can be derived from both edge sets). The colored nodes in prefix graph are the vertices (states) in AS(35) and AS(24)
Fig. 2Graph Salmon and Salmon give different PSI estimates in an example of BD RNA-seq sample. a Network flow of BD 1 and control 3 samples estimated by Graph Salmon. The subgraph includes exons 1, 3 to 7, and exons are represented by nodes and the node label indicates the index of exon. PSI of inclusion of exon 6 between exon 3 and 7 is computed. Edges involved in the PSI calculation are solid; the rest are dashed. b Network flow of the same samples computed by Salmon with reference transcripts
Fig. 3Size increase of prefix graph under different read lengths and sequencing coverages. a Scatter plot between the prefix graph edge count under base read length (100 bp) and that under an increased read length (300 bp and 500 bp). b Scatter plot between the prefix graph edge count under 50X sequencing coverage and that under 200X sequencing coverage