Literature DB >> 22942077

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.

Zechen Chong1, Jue Ruan, Chung-I Wu.   

Abstract

MOTIVATION: The innovation of restriction-site associated DNA sequencing (RAD-seq) method takes full advantage of next-generation sequencing technology. By clustering paired-end short reads into groups with their own unique tags, RAD-seq assembly problem is divided into subproblems. Fast and accurately clustering and assembling millions of RAD-seq reads with sequencing errors, different levels of heterozygosity and repetitive sequences is a challenging question.
RESULTS: Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top-down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom-up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data. AVAILABILITY: Source code in C, Rainbow is freely available at http://sourceforge.net/projects/bio-rainbow/files/

Entities:  

Mesh:

Year:  2012        PMID: 22942077     DOI: 10.1093/bioinformatics/bts482

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  28 in total

1.  DNA fingerprinting in botany: past, present, future.

Authors:  Hilde Nybom; Kurt Weising; Björn Rotter
Journal:  Investig Genet       Date:  2014-01-03

2.  Using Mendelian inheritance to improve high-throughput SNP discovery.

Authors:  Nancy Chen; Cristopher V Van Hout; Srikanth Gottipati; Andrew G Clark
Journal:  Genetics       Date:  2014-09-05       Impact factor: 4.562

3.  De novo clustering of long reads by gene from transcriptomics data.

Authors:  Camille Marchet; Lolita Lecompte; Corinne Da Silva; Corinne Cruaud; Jean-Marc Aury; Jacques Nicolas; Pierre Peterlongo
Journal:  Nucleic Acids Res       Date:  2019-01-10       Impact factor: 16.971

4.  Bartender: a fast and accurate clustering algorithm to count barcode reads.

Authors:  Lu Zhao; Zhimin Liu; Sasha F Levy; Song Wu
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

5.  Deriving genotypes from RAD-seq short-read data using Stacks.

Authors:  Nicolas C Rochette; Julian M Catchen
Journal:  Nat Protoc       Date:  2017-11-30       Impact factor: 13.491

6.  MeShClust: an intelligent tool for clustering DNA sequences.

Authors:  Benjamin T James; Brian B Luczak; Hani Z Girgis
Journal:  Nucleic Acids Res       Date:  2018-08-21       Impact factor: 16.971

7.  De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Authors:  Kristoffer Sahlin; Paul Medvedev
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

8.  Species ecology explains the spatial components of genetic diversity in tropical reef fishes.

Authors:  Giulia Francesca Azzurra Donati; Niklaus Zemp; Stéphanie Manel; Maude Poirier; Thomas Claverie; Franck Ferraton; Théo Gaboriau; Rodney Govinden; Oskar Hagen; Shameel Ibrahim; David Mouillot; Julien Leblond; Pagu Julius; Laure Velez; Irthisham Zareer; Adam Ziyad; Fabien Leprieur; Camille Albouy; Loïc Pellissier
Journal:  Proc Biol Sci       Date:  2021-09-29       Impact factor: 5.530

9.  ezRAD: a simplified method for genomic genotyping in non-model organisms.

Authors:  Robert J Toonen; Jonathan B Puritz; Zac H Forsman; Jonathan L Whitney; Iria Fernandez-Silva; Kimberly R Andrews; Christopher E Bird
Journal:  PeerJ       Date:  2013-11-19       Impact factor: 2.984

10.  dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms.

Authors:  Jonathan B Puritz; Christopher M Hollenbeck; John R Gold
Journal:  PeerJ       Date:  2014-06-10       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.