| Literature DB >> 27153699 |
Abstract
UNLABELLED: Many Next Generation Sequencing analyses involve the basic manipulation of input sequence data before downstream processing (e.g. searching for specific sequences, format conversion or basic file statistics). The rapidly increasing data volumes involved in NGS make any dataset manipulation a time-consuming and error-prone process. I have developed fqtools; a fast and reliable FASTQ file manipulation suite that can process the full set of valid FASTQ files, including those with multi-line sequences, whilst identifying invalid files. Fqtools is faster than similar tools, and is designed for use in automatic processing pipelines.Entities:
Mesh:
Year: 2016 PMID: 27153699 PMCID: PMC4908325 DOI: 10.1093/bioinformatics/btw088
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Commands present in the fqtools suite
| Description | |
|---|---|
| view | View FASTQ files |
| head | View the first reads in FASTQ files |
| count | Count FASTQ file reads |
| header | View FASTQ file header data |
| sequence | View FASTQ file sequence data |
| quality | View FASTQ file quality data |
| header2 | View FASTQ file secondary header data |
| fasta | Convert FASTQ files to FASTA format |
| basetab | Tabulate FASTQ base frequencies |
| qualtab | Tabulate FASTQ quality character frequencies |
| lengthtab | Tabulate FASTQ read lengths |
| type | Attempt to guess the FASTQ quality encoding type |
| validate | Validate FASTQ files |
| find | Find FASTQ reads containing specific sequences |
| trim | Trim reads in a FASTQ file |
| qualmap | Translate quality values using a mapping file |
Commands present in the fqtools suite. The supplementary information contains a full description of each command.
FASTQ processing tools overview
| Valid | Invalid | Process .gz | Plain (reads/s) | Compressed (reads/) | |
|---|---|---|---|---|---|
| fqtools | Y | Y | R+W | 701 375 | 444 648 |
| bash | — | — | R+W | 2 605 421 | 934 331 |
| bioawk | Y | N | R | 434 632 | 312 708 |
| seqtk | Y | N | R | 1 122 355 | 545 865 |
| fast | Y | Y | — | 2984 | — |
| fastx-toolkit | N | N | — | 69 762 | — |
| seqmagick | Y | Y | R+W | 25 325 | 4000 |
Benchmark data for various FASTQ processing tools. All tools were installed locally, and run against the complete test set (Cock ). Valid shows if all the valid test set were processed correctly. Invalid shows if the tool identified all the invalid files. Process .gz shows if the tool can natively read (R) and write (W) gzip-compressed files. The speed columns show the speed in reads per second.