Literature DB >> 30815667

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.

Siyuan Chen1, Chengzhi Ren1, Jingjing Zhai1, Jiantao Yu2, Xuyang Zhao2, Zelong Li1, Ting Zhang1, Wenlong Ma1, Zhaoxue Han1, Chuang Ma1.   

Abstract

A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Keywords:  Galaxy; RNA-Seq; machine learning; pipeline; unmapped reads; workflow

Year:  2020        PMID: 30815667      PMCID: PMC7299299          DOI: 10.1093/bib/bbz018

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  48 in total

1.  Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses.

Authors:  Laurent Jourdren; Maria Bernard; Marie-Agnès Dillies; Stéphane Le Crom
Journal:  Bioinformatics       Date:  2012-04-05       Impact factor: 6.937

2.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

Review 3.  High-throughput sequencing technologies.

Authors:  Jason A Reuter; Damek V Spacek; Michael P Snyder
Journal:  Mol Cell       Date:  2015-05-21       Impact factor: 17.970

4.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.

Authors:  Ning Leng; John A Dawson; James A Thomson; Victor Ruotti; Anna I Rissman; Bart M G Smits; Jill D Haag; Michael N Gould; Ron M Stewart; Christina Kendziorski
Journal:  Bioinformatics       Date:  2013-02-21       Impact factor: 6.937

5.  Development of Race-Specific SCAR Markers for Detection of Chinese Races CYR32 and CYR33 of Puccinia striiformis f. sp. tritici.

Authors:  Baotong Wang; Xiaoping Hu; Qiang Li; Baojun Hao; Bo Zhang; Gaobao Li; Zhensheng Kang
Journal:  Plant Dis       Date:  2010-02       Impact factor: 4.438

6.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing.

Authors:  Bo Wang; Elizabeth Tseng; Michael Regulski; Tyson A Clark; Ting Hon; Yinping Jiao; Zhenyuan Lu; Andrew Olson; Joshua C Stein; Doreen Ware
Journal:  Nat Commun       Date:  2016-06-24       Impact factor: 14.919

7.  Construction and Optimization of a Large Gene Coexpression Network in Maize Using RNA-Seq Data.

Authors:  Ji Huang; Stefania Vendramin; Lizhen Shi; Karen M McGinnis
Journal:  Plant Physiol       Date:  2017-08-02       Impact factor: 8.340

8.  What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.

Authors:  Lynsey K Whitacre; Polyana C Tizioto; JaeWoo Kim; Tad S Sonstegard; Steven G Schroeder; Leeson J Alexander; Juan F Medrano; Robert D Schnabel; Jeremy F Taylor; Jared E Decker
Journal:  BMC Genomics       Date:  2015-12-29       Impact factor: 3.969

9.  Maize pan-transcriptome provides novel insights into genome complexity and quantitative trait variation.

Authors:  Minliang Jin; Haijun Liu; Cheng He; Junjie Fu; Yingjie Xiao; Yuebin Wang; Weibo Xie; Guoying Wang; Jianbing Yan
Journal:  Sci Rep       Date:  2016-01-05       Impact factor: 4.379

10.  Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer.

Authors:  Majid Kazemian; Min Ren; Jian-Xin Lin; Wei Liao; Rosanne Spolski; Warren J Leonard
Journal:  Mol Syst Biol       Date:  2015-08-07       Impact factor: 11.429

View more
  2 in total

1.  Comparative RNA-Seq transcriptome analyses reveal dynamic time-dependent effects of 56Fe, 16O, and 28Si irradiation on the induction of murine hepatocellular carcinoma.

Authors:  Anna M Nia; Kamil Khanipov; Brooke L Barnette; Robert L Ullrich; George Golovko; Mark R Emmett
Journal:  BMC Genomics       Date:  2020-07-01       Impact factor: 3.969

2.  Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species.

Authors:  Fang-Dong Li; Wei Tong; En-Hua Xia; Chao-Ling Wei
Journal:  BMC Bioinformatics       Date:  2019-11-06       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.