| Literature DB >> 22971325 |
Xiaolu H Sturgeon1, Katheleen J Gardiner.
Abstract
When applied to complex transcript datasets, current tools for automated assembly of mRNA sequences require long run times and produce exponentially increasing numbers of splice variants. Here, we describe RCDA, a genome-based transcript assembly tool comprising RCluster, that recursively clusters transcripts, and DAssemble, that generates composite transcript sequences through path-finding using a directed acyclic graph. Each exon included in a final transcript is associated with an array of all upstream consecutive exon structures obtained from original transcripts. When a depth-first-search path reaches an exon, the path is retained only if it contains a structure from that exon's array. RCDA assemblies, therefore, include only those transcripts with experimentally supported exon patterns. When applied to >23,000 transcripts from human chromosome 21, using biologically reasonable filters, RCDA execution time was approximately 4h. RCDA outperformed ECgene in reconstructing RefSeq transcripts and in limiting the total number of transcripts and transcripts per gene.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22971325 PMCID: PMC5470730 DOI: 10.1016/j.ygeno.2012.08.004
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736