| Literature DB >> 27373737 |
Silvia Salatino, Varun Ramraj.
Abstract
Following variant calling and annotation, accurate variant filtering is a crucial step to extract meaningful information from sequencing data and to investigate disease aetiology. However, the variant call format (VCF) used to store this information is not easy to handle for non-bioinformaticians. We present BrowseVCF, a flexible and intuitive software to enable researchers to browse and filter millions of variants in a few seconds. Key features include querying user-defined gene lists, grouping samples for family or tumour/normal studies and exporting results in spreadsheet format. BrowseVCF's significant advantages over most existing tools include the ability to process data from any DNA sequencing experiment (exome, whole-genome and amplicons) and to correctly parse files annotated with Variant Effect Predictor. BrowseVCF can be used either locally on personal computers or as part of automated pipelines. Its user interface has been carefully designed to minimize tunable parameters. BrowseVCF is freely available from https://github.com/BSGOxford/BrowseVCF/releases/latest.Entities:
Keywords: VCF; exome sequencing; prioritization; variant analysis; variant filtering; whole-genome sequencing
Mesh:
Year: 2017 PMID: 27373737 PMCID: PMC5862253 DOI: 10.1093/bib/bbw054
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1.Screenshot of BrowseVCF step 2 (index creation). The drop-down menu lets the user browse through the annotation fields present in the input VCF file and select one or more fields to be used in filtering. A text box allows keyword searching across fields.
Figure 2.Screenshot of BrowseVCF step 3 (filtering). The left panel lists the five available filters, and expands to allow the user to define various filter options and cut-offs. The ‘Filter History’ panel keeps track of all the sequential filters applied to the initial data and shows the output number of variants. The right panel is the output area, which displays the top 100 variants resulting from each consecutive filter. Fields (shown as columns) can be sorted or hidden if desired.
Performance of BrowseVCF and VCF-Miner on exome and whole-genome data. Pre-processing times vary between operating systems due to implementation differences intrinsic to Python
| VCF file | Size (MB) | Variants | Samples | Step 1: Pre-processinga | Step 2: Indexingb | Step 3: Filteringc | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GNU/ Linux | Windows | Mac OS | GNU/ Linux | Windows | Mac OS | GNU/ Linux | Windows | Mac OS | ||||
| BrowseVCF | ||||||||||||
| Exome_trio.vcf.gz | 22 | 100 014 | 3 | 0m 18s | 0m 22s | 0m 23s | 0m 35sd | 1m 2sd | 1m 11sd | 0m 14s | 0m 28s | 0m 25s |
| GIAB_v2.18.vcf.gz | 313 | 2 915 731 | 1 | 5m 8s | 6m 29s | 6m 45s | 9m 11sd | 37m 53sd | 39m 16sd | 3m 38s | 13m 9s | 11m 48s |
| 1000G_chr22_ 10kVariants.vcf.gz | 39 | 10 000 | 1092 | 0m 48s | 1m 8s | 1m 6s | 1m 18sd | 1m 46sd | 1m 45sd | 1m 5s | 2m 17s | 1m 1s |
| 1000G_chr22_ 20kVariants.vcf.gz | 76 | 20 000 | 1092 | 1m 31s | 2m 3s | 2m 11s | 2m 33sd | 3m16sd | 3m 34sd | 1m 56s | 2m 27s | 2m 3s |
| VCF-Miner | ||||||||||||
| Exome_trio.vcf.gz | 22 | 100 014 | 3 | 0m 58s | 0m 31s | 0m 53s | 1m 11s | 1m 10s | 1m 11s | 0m 2s | 0m 2s | 0m 3s |
| GIAB_v2.18.vcf.gz | 313 | 2 915 731 | 1 | 19m 26s | 14m 56s | 22m 1s | 52m 38s | 193m 11s | ### | 0m13s | 0m 9s | ### |
| 1000G_chr22_ 10kVariants.vcf.gz | 39 | 10 000 | 1092 | 0m 40s | 0m 37s | 0m 40s | 0m 10s | 0m 8s | 0m 15s | 0m 2s | 0m 2s | 0m 3s |
| 1000G_chr22_ 20kVariants.vcf.gz | 76 | 20 000 | 1092 | 1m 13s | 1m 25s | 1m 9s | 0m 19s | 0m 15s | 1m 20s | 0m 2s | 0m 2s | 0m 4s |
Note: The ‘Windows’ machine had 8 GB RAM and bundled, non-optimized WinPython; the ‘GNU/Linux’ machine had 8 GB RAM and optimized system Python; the ‘Mac’ machine was a MacBook Pro laptop with 16 GB RAM and bundled Python. Other modules and libraries are at identical versions between the two systems. All operations were performed using only 1 CPU.
aStep required to convert the input VCF file to a format accepted by Wormtable.
bWormtables generated for the following fields: CHROM + POS, ID, REF + ALT, QUAL, FILTER.
cQuery executed on the FILTER field, keeping only PASS variants.
dThese timings can be significantly improved by using multiple cores.
Data not available (see text).