Melissa Y Yan1, Betsy Ferguson1,2,3, Benjamin N Bimber1,4. 1. Division of Genetics, Beaverton, OR, USA. 2. Division of Neuroscience, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR, USA. 3. Molecular and Medical Genetics Department, Oregon Health & Science University, Portland, OR, USA. 4. Division of Pathobiology, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR, USA.
Abstract
SUMMARY: Large scale genomic studies produce millions of sequence variants, generating datasets far too massive for manual inspection. To ensure variant and genotype data are consistent and accurate, it is necessary to evaluate variants prior to downstream analysis using quality control (QC) reports. Variant call format (VCF) files are the standard format for representing variant data; however, generating summary statistics from these files is not always straightforward. While tools to summarize variant data exist, they generally produce simple text file tables, which still require additional processing and interpretation. VariantQC fills this gap as a user friendly, interactive visual QC report that generates and concisely summarizes statistics from VCF files. The report aggregates and summarizes variants by dataset, chromosome, sample and filter type. The VariantQC report is useful for high-level dataset summary, quality control and helps flag outliers. Furthermore, VariantQC operates on VCF files, so it can be easily integrated into many existing variant pipelines. AVAILABILITY AND IMPLEMENTATION: DISCVRSeq's VariantQC tool is freely available as a Java program, with the compiled JAR and source code available from https://github.com/BimberLab/DISCVRSeq/. Documentation and example reports are available at https://bimberlab.github.io/DISCVRSeq/.
SUMMARY: Large scale genomic studies produce millions of sequence variants, generating datasets far too massive for manual inspection. To ensure variant and genotype data are consistent and accurate, it is necessary to evaluate variants prior to downstream analysis using quality control (QC) reports. Variant call format (VCF) files are the standard format for representing variant data; however, generating summary statistics from these files is not always straightforward. While tools to summarize variant data exist, they generally produce simple text file tables, which still require additional processing and interpretation. VariantQC fills this gap as a user friendly, interactive visual QC report that generates and concisely summarizes statistics from VCF files. The report aggregates and summarizes variants by dataset, chromosome, sample and filter type. The VariantQC report is useful for high-level dataset summary, quality control and helps flag outliers. Furthermore, VariantQC operates on VCF files, so it can be easily integrated into many existing variant pipelines. AVAILABILITY AND IMPLEMENTATION: DISCVRSeq's VariantQC tool is freely available as a Java program, with the compiled JAR and source code available from https://github.com/BimberLab/DISCVRSeq/. Documentation and example reports are available at https://bimberlab.github.io/DISCVRSeq/.
Authors: Geraldine A Van der Auwera; Mauricio O Carneiro; Christopher Hartl; Ryan Poplin; Guillermo Del Angel; Ami Levy-Moonshine; Tadeusz Jordan; Khalid Shakir; David Roazen; Joel Thibault; Eric Banks; Kiran V Garimella; David Altshuler; Stacey Gabriel; Mark A DePristo Journal: Curr Protoc Bioinformatics Date: 2013
Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937
Authors: Andrew R Carson; Erin N Smith; Hiroko Matsui; Sigrid K Brækkan; Kristen Jepsen; John-Bjarne Hansen; Kelly A Frazer Journal: BMC Bioinformatics Date: 2014-05-02 Impact factor: 3.169