| Literature DB >> 28701181 |
Abstract
BACKGROUND: Next generation sequencing datasets are stored as FASTQ formatted files. In order to avoid downstream artefacts, it is critical to implement a robust preprocessing protocol of the FASTQ sequence in order to determine the integrity and quality of the data.Entities:
Keywords: FASTQ; NGS; Sequencing
Mesh:
Year: 2017 PMID: 28701181 PMCID: PMC5508660 DOI: 10.1186/s13104-017-2616-7
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1a Performance testing of fastQ_brew. FASTQ formatted files containing different numbers of reads (110 MB [462,664 reads] to 4.5 GB [24,159,698 reads]) were provided as input to fastQ_brew and ran using default settings to return summary statistics for each dataset. b Relationship between nucleotide position and Phred quality score. fastQ_brew was used to determine the average Phred quality score from a FASTQ dataset comprising 462,664 reads after the length trimming methods was invoked to trim each read from position 1–20. A negative correlation between increasing nucleotide position and quality was observed. c The quality filter method within fastQ_brew was tested by plotting the Phred scores before (blue bars) and after (red bars) quality filtering. After filtering, there was a shift in the distribution of reads towards higher quality Phred values. d Execution speed for commonly used FASTQ filtering tools were compared with fastQ_brew. For all analyses, the same file and trimming task was applied. The following software were compared and presented: fastq_brew ver 1.0.2, trimmomatic ver 0.36, NGSQCToolkit ver 2.3.3, Prinseq ver 0.20.4, seqtk, Fastxtoolkit ver 0.0.13, ngsShoRT ver 2.2, BBDuk ver 37.22, and Cutadapt ver 1.9.1