| Literature DB >> 26210358 |
Steven N Hart, Patrick Duffy, Daniel J Quest, Asif Hossain, Mike A Meiners, Jean-Pierre Kocher.
Abstract
Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.Entities:
Keywords: VCF; analysis; bioinformatics; genomics; software; user interface
Mesh:
Year: 2015 PMID: 26210358 PMCID: PMC4793895 DOI: 10.1093/bib/bbv051
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1.Screenshot of VCF-Miner. The left panel shows a running tabulation of filters applied and the number of variants remaining. A pop-up dialog appears when the user clicks the ‘Add Filter’ button. The right panel consists of a tabular representation of the results. Users can choose which columns to show and hide, and when ready, a tab-delimited file of the selected filtered data and annotations can be exported.
Figure 2.Custom logic filtering. In this figure, we demonstrate how to construct filters across groups of samples. Group 1 consists of nine samples. One could restrict variants to those present in Group 1 using the default setup. By changing the genotype option to heterozygous, then the variants returned would have to be heterozygous in any sample. To return only variants that are heterozygous in all nine samples, the sample status would be changed to ‘In all samples’. The alternate allele depth filter allows the user to specify the minimum number of reads supporting a variant—provided the VCF contains an AD field (see text for more details).
Benchmark results for loading three different VCF files into VCF-Miner
| File | Size (MB) | Variants | Format and info fields | Samples | Load time (min) | |
|---|---|---|---|---|---|---|
| PC1 | PC2 | |||||
| Genome In a Bottle.vcf.gz | 398 | 3 315 166 | 84 | 1 | 147 | 105 |
| 1KG.chr22.anno.infocol.vcf.gz | 980 | 348 110 | 124 | 629 | 32.5 | 42.4 |
| 1KG.chr22.anno.vcf.gz | 918 | 346 660 | 19 | 629 | 29 | 39.7 |
| HG00098.anno.vcf.gz | 1.9 | 46 311 | 110 | 1 | 0.7 | 0.6 |
| HG00098.vcf.gz | 1.2 | 46 065 | 23 | 1 | 0.3 | 0.3 |
| 1KG.chr22.anno.20kLines.vcf.gz | 57 | 19 876 | 124 | 629 | 1.8 | 2.5 |
| 1KG.chr22.anno.10kLines.vcf.gz | 28 | 9981 | 19 | 629 | 0.8 | 1.4 |
| 1KG.chr22.anno.infocol.10kLines.vcf.gz | 29 | 9876 | 124 | 629 | 0.9 | 1.1 |
Note. PC1 is an Ubuntu Linux v12.04, AMD64 CPU at 1400 MHz and 96 GB RAM.
PC2 is a laptop running Windows 7 Professional, with an Intel Core i5-4300 CPU at 1.90 GHz and 8 GB RAM.