| Literature DB >> 26286719 |
Elsa Bernard1,2,3, Laurent Jacob4, Julien Mairal5, Eric Viara6, Jean-Philippe Vert7,8,9.
Abstract
BACKGROUND: Detecting and quantifying isoforms from RNA-seq data is an important but challenging task. The problem is often ill-posed, particularly at low coverage. One promising direction is to exploit several samples simultaneously.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26286719 PMCID: PMC4543468 DOI: 10.1186/s12859-015-0695-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Multi-dimensional splicing graph with three samples. Each candidate isoform is a path from source node s to sink node t. Nodes denoted as grey squares correspond to ordered set of exons. Each read is assigned to a unique node, corresponding to the exact set of exons that it overlaps. Note that more than 2 exons can constitute a node, properly modeling reads spanning more than 2 exons. A vector of read counts (one component per sample) is then associated to each node of the graph. Note also that some components of a vector can be equal to zero
Fig. 2Human simulations with increasing coverage and number of samples
Fig. 3Human simulations with various read lengths
Fig. 4Fscore results on the Flux Simulator simulations
Fig. 5Fscore results on the modENCODE data
Fig. 6Transcriptome predictions of gene CG15717 from 3 samples of the modENCODE data. Samples name are 0–2 h, 2–4 h and 4–6 h. Each sample track contains the read coverage (light grey) and junction reads (red) as well as FlipFlop predictions (light blue) and Cufflinks predictions (light green). The bottom of the figure displays the RefSeq records (black) and the multi-sample predictions of the group-lasso (dark blue) and of Cufflinks/Cuffmerge (dark green)