| Literature DB >> 31028392 |
Gregg W C Thomas1,2, Matthew W Hahn1,2.
Abstract
Genome assemblies from next-generation sequencing technologies are now an integral part of biological research, but many sequencing and assembly processes are still error-prone. Unfortunately, these errors can propagate to downstream analyses and wreak havoc on results and conclusions. Although such errors are recognized when dealing with diploid genotype data, modern reference assemblies (which are represented as haploid sequences) lack any type of succinct quality assessment for every position. Here we present Referee, a program that uses diploid genotype quality information in order to annotate a haploid assembly with a quality score for every position. Referee aims to provide an assembly with concise quality information on a Phred-like scale in FASTQ format for easy filtering of low-quality sites. Referee also provides output of quality scores in BED format that can be easily visualized as tracks on most genome browsers. Referee is freely available at https://gwct.github.io/referee/.Entities:
Keywords: bioinformatics; genomics; quality scores
Mesh:
Year: 2019 PMID: 31028392 PMCID: PMC6535810 DOI: 10.1093/gbe/evz088
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Referee Score Special Cases
| Scenario |
|
|---|---|
|
| 91 |
| Reference base called as | −1 |
| No reads mapped to site | −2 |
. 1—Reference quality scores visualized for a 100,000 bp stretch of chromosome 19 in the baboon genome (Papio anubis v2.0) on the UCSC Genome Browser (http://genome.ucsc.edu).
. 2—Referee’s run time (A) and max memory usage (B) on the Jaltomata sinuosa transcriptome and baboon genome. Note the memory improvement in the baboon genome data compared with the J. sinuosa transcriptome data as a result of splitting the input files by chromosome. ANGSD: genotype log-likelihoods were precalculated with the ANGSD software package (Korneliussen et al. 2014) and given as input to Referee; pileup: A pileup was created from the mapped reads which Referee used to calculate genotype likelihoods using only the base quality scores from the reads; pileup (w/mapping quality): The mapping qualities for the reads were included in the pileup and incorporated into Referee’s genotype likelihood calculations.