| Literature DB >> 34123359 |
Braulio Valdebenito-Maturana1, Gonzalo Riadi2.
Abstract
The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and 'best k-mer', to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k-mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER.Entities:
Keywords: genome assembly; genome library; genome size estimation; k-mer; quality control
Year: 2021 PMID: 34123359 PMCID: PMC8193462 DOI: 10.1098/rsfs.2020.0077
Source DB: PubMed Journal: Interface Focus ISSN: 2042-8898 Impact factor: 4.661