| Literature DB >> 20015970 |
Peter J A Cock1, Christopher J Fields, Naohisa Goto, Michael L Heuer, Peter M Rice.
Abstract
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.Entities:
Mesh:
Year: 2009 PMID: 20015970 PMCID: PMC2847217 DOI: 10.1093/nar/gkp1137
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The three described FASTQ variants, with columns giving the description, format name used in OBF projects, range of ASCII characters permitted in the quality string (in decimal notation), ASCII encoding offset, type of quality score encoded and the possible range of scores
| Description, OBF name | ASCII characters | Quality score | ||
|---|---|---|---|---|
| Range | Offset | Type | Range | |
| Sanger standard | ||||
| | 33–126 | 33 | PHRED | 0 to 93 |
| Solexa/early Illumina | ||||
| | 59–126 | 64 | Solexa | −5 to 62 |
| Illumina 1.3+ | ||||
| | 64–126 | 64 | PHRED | 0 to 62 |
Figure 1.Visual representation of the mapping between PHRED and Solexa quality scores. Vertical layout represents the probability of error on a log scale, therefore the PHRED points are equally spaced (black circles on left), while the Solexa points are not (white circles on right). Solid black lines are reciprocal mappings between scores, and grey arrows are lossy mappings. Near the top of the figure, the black lines are almost horizontal because the two scores are almost equal. The straightforward mapping of higher scores is omitted due to space.