| Literature DB >> 27760567 |
Juntao Liu1,2, Ting Yu1, Tao Jiang3,4,5, Guojun Li6.
Abstract
Transcriptome assemblers aim to reconstruct full-length transcripts from RNA-seq data. We present TransComb, a genome-guided assembler developed based on a junction graph, weighted by a bin-packing strategy and paired-end information. A newly designed extension method based on weighted junction graphs can accurately extract paths representing expressed transcripts, whether they have low or high expression levels. Tested on both simulated and real datasets, TransComb demonstrates significant improvements in both recall and precision over leading assemblers, including StringTie, Cufflinks, Bayesembler, and Traph. In addition, it runs much faster and requires less memory on average. TransComb is available at http://sourceforge.net/projects/transcriptomeassembly/files/ .Entities:
Keywords: Alternative splicing; Isoform; RNA-seq; Splicing graph; Transcriptome assembly
Mesh:
Substances:
Year: 2016 PMID: 27760567 PMCID: PMC5069867 DOI: 10.1186/s13059-016-1074-1
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Methodological aspects of TransComb. For the two numbers of an edge in the splicing graph, the number above the edge represents the coverage of the edge and the circled number below represents the index of the edge. The number on each node of the junction graph represents the weight of this node, which is the coverage of the corresponding edge on the splicing graph
Fig. 2Comparison results on the simulated dataset. a Precision and recall values for each assembler. Solid circles/squares represent the precision and recall values derived using the default settings of the assemblers. Empty circles/squares represent the precision and recall values using non-default settings of the assemblers. The crossed circle represents the precision and recall values of TransComb when filtering its candidates to the same level as Bayesembler. b Recall distributions against transcript expression levels
Fig. 3Comparison results on the three real datasets: a human k562 cells, b human H1 cells, and c mouse dendritic cells. Solid circles/squares represent precision and recall values achieved using default settings. Empty circles/squares represent precision and recall values achieved using different filtering parameters. The crossed circle in c represents TransComb’s precision and recall values when filtering its candidates to the same level as StringTie and Cufflinks
Fig. 4CPU time and memory usage of the four assemblers on the human K562 cell dataset. a CPU times of the four assemblers. b Memory usage ranges of the four assemblers; the horizontal black lines represent the average memory usage for each assembler