Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

Literature DB >> 25880035

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

Stanley Kimbung Mbandi¹, Uljana Hesse², Peter van Heusden³, Alan Christoffels⁴.

Abstract

BACKGROUND: De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies.
RESULTS: Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci.
CONCLUSIONS: IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 25880035 PMCID： PMC4344733 DOI： 10.1186/s12859-015-0492-5

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

52 in total

1. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

2. A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing.

Authors: Peng Cui; Qiang Lin; Feng Ding; Chengqi Xin; Wei Gong; Lingfang Zhang; Jianing Geng; Bing Zhang; Xiaomin Yu; Jin Yang; Songnian Hu; Jun Yu
Journal: Genomics Date: 2010-08-03 Impact factor: 5.736

3. Discrimination of non-protein-coding transcripts from protein-coding mRNA.

Authors: Martin C Frith; Timothy L Bailey; Takeya Kasukawa; Flavio Mignone; Sarah K Kummerfeld; Martin Madera; Sirisha Sunkara; Masaaki Furuno; Carol J Bult; John Quackenbush; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Graziano Pesole; John S Mattick
Journal: RNA Biol Date: 2006-04-03 Impact factor: 4.652

Review 4. Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors: Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal: Nat Methods Date: 2011-05-27 Impact factor: 28.547

5. Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq.

Authors: Bingxin Lu; Zhenbing Zeng; Tieliu Shi
Journal: Sci China Life Sci Date: 2013-02-08 Impact factor: 6.038

6. Introns and splicing elements of five diverse fungi.

Authors: Doris M Kupfer; Scott D Drabenstot; Kent L Buchanan; Hongshing Lai; Hua Zhu; David W Dyer; Bruce A Roe; Juneann W Murphy
Journal: Eukaryot Cell Date: 2004-10

7. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.

Authors: Marcel H Schulz; Daniel R Zerbino; Martin Vingron; Ewan Birney
Journal: Bioinformatics Date: 2012-02-24 Impact factor: 6.937

8. Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis.

Authors: Paolo Franchini; Mathilde van der Merwe; Rouvay Roodt-Wilding
Journal: BMC Res Notes Date: 2011-03-11

9. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

Authors: Qiong-Yi Zhao; Yi Wang; Yi-Meng Kong; Da Luo; Xuan Li; Pei Hao
Journal: BMC Bioinformatics Date: 2011-12-14 Impact factor: 3.169

10. InterProScan: protein domains identifier.

Authors: E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

3 in total

1. YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut.

Authors: Sandeep Chakraborty; Monica Britton; Jill Wegrzyn; Timothy Butterfield; Pedro José Martínez-García; Russell L Reagan; Basuthkar J Rao; Charles A Leslie; Mallikarjuna Aradhaya; David Neale; Keith Woeste; Abhaya M Dandekar
Journal: F1000Res Date: 2015-06-17

2. YeATSAM analysis of the walnut and chickpea transcriptome reveals key genes undetected by current annotation tools.

Authors: Sandeep Chakraborty; Pedro J Martínez-García; Abhaya M Dandekar
Journal: F1000Res Date: 2016-11-17

3. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies.

Authors: Cédric Cabau; Frédéric Escudié; Anis Djari; Yann Guiguen; Julien Bobe; Christophe Klopp
Journal: PeerJ Date: 2017-02-16 Impact factor: 2.984

3 in total