| Literature DB >> 31477778 |
G A Tollefson1, J Schuster1,2, F Gelin3, A Agudelo1, A Ragavendran3, I Restrepo3, P Stey3, J Padbury1,2,3,4, A Uzun5,6,7,8.
Abstract
High-throughput sequencing produces an extraordinary amount of genomic data that is organized into a number of high-dimension datasets. Accordingly, visualization of genomic data has become essential for quality control, exploration, and data interpretation. The Variant Call Format (VCF) is a text file format generated during the variant calling process that contains genomic information and locations of variants in a group of sequenced samples. The current workflow for visualization of genomic variant data from VCF files requires use of a combination of existing tools. Here, we describe VIVA (VIsualization of VAriants), a command line utility and Jupyter Notebook based tool for evaluating and sharing genomic data for variant analysis and quality control of sequencing experiments from VCF files. VIVA combines the functionality of existing tools into a single command to interactively evaluate and share genomic data, as well as create publication quality graphics.Entities:
Mesh:
Year: 2019 PMID: 31477778 PMCID: PMC6718772 DOI: 10.1038/s41598-019-49114-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparative workflow. We present a general workflow for filtering, extracting, and visualizing variants from VCF files. This comparative workflow shows that while VIVA can perform filtering, extracting, annotating phenotype(s), and plotting functions in single command, other existing tools need additional and intermediate steps that require computational skills.
Comparison of features for VCF filtering and visualization tools.
| Categories of Features | Features | VIVA | VCFtools | GEMINI | BrowseVCF | VCF. Filter | VCF-Miner | VCF-Server | vcfR | IGV |
|---|---|---|---|---|---|---|---|---|---|---|
| Technical Details | One-step command | ✓ | ✓ | ✓ | ||||||
| Standalone software | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Environment (OS) | Windows, Mac, Linux | Windows, Mac, Linux | Windows, Mac, Linux | Windows, Mac, Linux | Windows, Mac, Linux | Windows | Windows, Mac, Linux | Windows, Mac, Linux | Windows, Mac, Linux | |
| Language | Julia | C++, Perl | Python | Python, JavaScript, CSS, HTML5 | Java | Java | C, PERL‐CGI, JavaScript | R | Java | |
| Interface | Command Line, Jupyter Notebook | Command Line | Command Line, Web Browser | GUI, Command Line | GUI | GUI | GUI | R Console | GUI | |
| Docker container | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Filtering | Genomic ranges | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Variant position list | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| PASS filter | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Sample selection | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Variant annotations | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Dynamic filtering | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Visualization | Multi-sample heatmaps | ✓ | ✓ | |||||||
| Read depth scatter plots | ✓ | ✓ | ||||||||
| Interactive HTML5 visualization | ✓ | |||||||||
| Group samples by metadata traits | ✓ | |||||||||
| Display genotypic-phenotypic associations | ✓ | |||||||||
| Display multiple genomic regions | ✓ | ✓ | ||||||||
| Output | Filtered results as tabular data | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Tabular output grouped by phenotype | ✓ | ✓ | ||||||||
| Publication quality graphics | ✓ | ✓ | ✓ | |||||||
| Export filtered VCF file | ✓ | ✓ | ✓ | ✓ | ✓ |
Figure 2Workflow of VIVA. INPUT: VCF file is a required file. Users can use one or any combination of variant filters, sample selection, and grouping options. DATA PROCESSING: Data processing requires the Julia programing language and depends on several well-maintained Julia packages. Plotting uses the PlotlyJS.jl wrapper for Plotly. VIVA has two interface choices. Users may use the program through a Jupyter Notebook or from the command line. OUTPUT: VIVA’s four visualization options include heatmaps of genotype and read depth data as well as scatter plots of average sample read depth and average variant read depth data. These visualizations can be saved in HTML, PDF, SVG, or EPS formats. HTML format enables users to share and analyze the data interactively between research groups which supports collaborative work environments.
VIVA filtering runtime comparisons.
| Simulations | VCF Filtering Simulations | VCF File Size | Number of Samples | Number of Variants | Number of Filtered Variants | VIVA Runtime (seconds) | BrowseVCF Runtime (seconds) |
|---|---|---|---|---|---|---|---|
| Sim 1 |
| 34.1 MB | 24 | 37928 | 17901 | 25.97 | 109.29 |
| Sim 2 | 34.1 MB | 24 | 37928 | 60 | 27.30 | 96.00 | |
| Sim 3 |
| 261.5 MB | 100 | 99850 | 99850 | 58.91 | 843.15 |
| Sim 4 | 261.5 MB | 100 | 99850 | 4498 | 31.50 | 672.28 |
Figure 3VIVA Use Cases. We present two use cases for VIVA. In both heatmaps, unique variant positions are stored in rows and individual samples are stored in columns. In the first use case (a) we visualize a differential burden of putative disease associated variants in a heatmap of genotype values for a set of 100 samples grouped by case and control metadata. In the second use case (b) we identify batch effect between a total of 191 samples sequenced at two separate facilities for a variant analysis study by visualizing read depth information and grouping samples by sequencing facility.