Literature DB >> 34123359

GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation.

Braulio Valdebenito-Maturana1, Gonzalo Riadi2.   

Abstract

The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and 'best k-mer', to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k-mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER.
© 2021 The Author(s).

Entities:  

Keywords:  genome assembly; genome library; genome size estimation; k-mer; quality control

Year:  2021        PMID: 34123359      PMCID: PMC8193462          DOI: 10.1098/rsfs.2020.0077

Source DB:  PubMed          Journal:  Interface Focus        ISSN: 2042-8898            Impact factor:   4.661


  13 in total

1.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

2.  Informed and automated k-mer size selection for genome assembly.

Authors:  Rayan Chikhi; Paul Medvedev
Journal:  Bioinformatics       Date:  2013-06-03       Impact factor: 6.937

3.  KmerStream: streaming algorithms for k-mer abundance estimation.

Authors:  Páll Melsted; Bjarni V Halldórsson
Journal:  Bioinformatics       Date:  2014-10-28       Impact factor: 6.937

4.  Assembly scaffolding with PE-contaminated mate-pair libraries.

Authors:  Kristoffer Sahlin; Rayan Chikhi; Lars Arvestad
Journal:  Bioinformatics       Date:  2016-03-02       Impact factor: 6.937

5.  ART: a next-generation sequencing read simulator.

Authors:  Weichun Huang; Leping Li; Jason R Myers; Gabor T Marth
Journal:  Bioinformatics       Date:  2011-12-23       Impact factor: 6.937

6.  findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies.

Authors:  Hequan Sun; Jia Ding; Mathieu Piednoël; Korbinian Schneeberger
Journal:  Bioinformatics       Date:  2018-02-15       Impact factor: 6.937

7.  KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.

Authors:  Daniel Mapleson; Gonzalo Garcia Accinelli; George Kettleborough; Jonathan Wright; Bernardo J Clavijo
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

8.  ntCard: a streaming algorithm for cardinality estimation in genomics data.

Authors:  Hamid Mohamadi; Hamza Khan; Inanc Birol
Journal:  Bioinformatics       Date:  2017-05-01       Impact factor: 6.937

9.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Authors:  Aaron M Wenger; Paul Peluso; William J Rowell; Pi-Chuan Chang; Richard J Hall; Gregory T Concepcion; Jana Ebler; Arkarachai Fungtammasan; Alexey Kolesnikov; Nathan D Olson; Armin Töpfer; Michael Alonge; Medhat Mahmoud; Yufeng Qian; Chen-Shan Chin; Adam M Phillippy; Michael C Schatz; Gene Myers; Mark A DePristo; Jue Ruan; Tobias Marschall; Fritz J Sedlazeck; Justin M Zook; Heng Li; Sergey Koren; Andrew Carroll; David R Rank; Michael W Hunkapiller
Journal:  Nat Biotechnol       Date:  2019-08-12       Impact factor: 54.908

10.  An Annotated Draft Genome of the Mountain Hare (Lepus timidus).

Authors:  João P Marques; Fernando A Seixas; Liliana Farelo; Colin M Callahan; Jeffrey M Good; W Ian Montgomery; Neil Reid; Paulo C Alves; Pierre Boursot; José Melo-Ferreira
Journal:  Genome Biol Evol       Date:  2020-01-01       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.