| Literature DB >> 32501478 |
Wenxi Wang1, Zihao Wang1, Xintong Li1, Zhongfu Ni1, Zhaorong Hu1, Mingming Xin1, Huiru Peng1, Yingyin Yao1, Qixin Sun1, Weilong Guo1.
Abstract
BACKGROUND: The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making them difficult to update with new data over time. Currently, there is a lack of a widely used framework for setting up user-friendly web servers to explore new genomic variation data in diverse species.Entities:
Keywords: R/Shiny; SNP; database; server-framework; wheat
Year: 2020 PMID: 32501478 PMCID: PMC7274028 DOI: 10.1093/gigascience/giaa060
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Design schema of the SnpHub server. Once the files and information tables are provided as indicated in the “Prepare” step, the SnpHub server instance performs a pre-processing step for building basic database files and then runs through the Shiny framework. Users can query specific genomic regions or genes for either pre-defined or customized sample groups. SnpHub can efficiently load the raw query data from the hard disk to RAM and then perform efficient analysis and visualization interactively.
Figure 2:Analysis and visualization functions of the SnpHub server. In an SnpHub instance, each function is implemented in an independent web page tab.
Features supported by SnpHub, Gigwa v2, CanvasDB, and JBrowse
| Feature | SnpHub | Gigwa v2 | CanvasDB | JBrowse |
|---|---|---|---|---|
| General design | Specialized | Specialized | Specialized | Generalized |
| Main strengths in querying data | Query with support for export, visualization, and reanalysis | Query with API for external visualization | Query with filtering functions | Track-based query and visualization |
| Database implementation | Indexed BCF | MongoDB | MySQL | Indexed VCF |
| Programming language | R/Shell | Java/JavaScript | R | JavaScript/Perl |
| Support downstream haplotype analyze | Yes | No | No | No |
| Allow sample selection | Yes | Yes | Yes | No |
| Forms of results | Table and Figure | Table | Table | Track-based plot |
| Support user-defined groups | No limitation in group number | ≤2 groups | No | No |
| Deployment difficulty for bioinformaticians | Easy | Hard | Hard | Moderate |
| Visualizing variations across gene structure | Yes | No | No | Yes |
| Visualizing samples passports geographically | Yes | No | No | No |
| Access to metadata | Yes, user readable | Yes, built-in | No | Yes, user readable |
| Accession name management strategy | Triple-name strategy | Not provided | Not provided | Not provided |
| Retrieval consensus sequence | Yes | No | No | No |
The SnpHub instances available in Wheat-SnpHub-Portal
| Ploidy | Method | No. Sample | Disk usage | Source |
|---|---|---|---|---|
| Tetraploid | WGS | 35 | 15.0 GB | Wang et al. 2020 [ |
| Hexa-/Tetra-/Diploid | WGS | 63/25/5 | 39.8 GB | Cheng et al. 2019 [ |
| Hexa-/Tetra-/Diploid | WEC | 436/38/13 | 192 MB | Pont et al. 2019 [ |
| Hexaploid | WEC | 1,026 | 1.8 GB | He et al. 2019 [ |
| Hexaploid | WEC and GBS | 62 | 2.4 GB | Jordan et al. 2015 [ |
| Tetraploid | WEC | 64 | 645 MB | Avni et al. 2017 [ |
| Diploid | GBS | 567 | 234 MB | Singh et al. 2019 [ |
Data are reanalysed from raw sequencing data. GBS: genotyping-by-sequencing; WEC: whole-exome-capture; WGS: whole-genome resequencing. Disk usage refers to the size of BCF files.