Literature DB >> 23685787

Specificity control for read alignments using an artificial reference genome-guided false discovery rate.

Sven H Giese1, Franziska Zickmann, Bernhard Y Renard.   

Abstract

MOTIVATION: Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset.
RESULTS: We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. AVAILABILITY: The ARDEN source code is freely available at http://sourceforge.net/projects/arden/.

Mesh:

Year:  2013        PMID: 23685787     DOI: 10.1093/bioinformatics/btt255

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Authors:  Franziska Zickmann; Bernhard Y Renard
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

2.  A tandem simulation framework for predicting mapping quality.

Authors:  Ben Langmead
Journal:  Genome Biol       Date:  2017-08-10       Impact factor: 13.583

3.  CADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data.

Authors:  Praveen Kumar Raj Kumar; Thanh V Hoang; Michael L Robinson; Panagiotis A Tsonis; Chun Liang
Journal:  Sci Rep       Date:  2015-08-25       Impact factor: 4.379

4.  SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

Authors:  Jérôme Audoux; Mikaël Salson; Christophe F Grosset; Sacha Beaumeunier; Jean-Marc Holder; Thérèse Commes; Nicolas Philippe
Journal:  BMC Bioinformatics       Date:  2017-09-29       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.