| Literature DB >> 32194629 |
Meharji Arumilli1,2, Ryan M Layer3,4, Marjo K Hytönen1,2, Hannes Lohi1,2.
Abstract
SUMMARY: Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions.Entities:
Keywords: Bigdata; GQT; R package; filtering; shiny server; variant; webGQT
Year: 2020 PMID: 32194629 PMCID: PMC7063093 DOI: 10.3389/fgene.2020.00152
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) An overview of the architecture of the webGQT system. The variant information is stored in as GQT index files. The user performs the query on the GQT index files from the GUI provided by the shiny server and the results are returned to the user via GUI. The whole application is secured with a Nginx front-end proxy server to serve https requests. (B) The three-step workflow of implementing webGQT is shown here: 1) selecting the default data set (e.g., 1000 Genomes) or uploading GQT indexed files, 2) uploading phenotype file (PED) and creating sample database, and 3) choosing a module and performing variant filtering.
Figure 2Figure showing the work flow of webGQT via user interface: (1) interface showing the user upload panel for input variant data. The user can choose the default data set or upload GQT indexed files. (2a) Interface showing the user upload panel of the phenotype file. The user uploads the PED file by clicking “Browse” button. (2b) After uploading, the phenotype file is rendered as data table with the sample selection information. The user is then required to create a phenotype sample database by clicking “CreateDB” and (3) The user choses a filtering module and applies the available parameters of the corresponding module and finally filters the variants. A dominant analysis module filter is shown in the figure.
Performance comparison of webGQT with canvasDB on 1000 Genomes Phase 1 and Phase 3 data sets.
| Population | Phase 1 | Phase 3 | |
|---|---|---|---|
| canvasDB | webGQT | webGQT | |
| GBR | 30 m | 1 m 28 s | 3 m 20 s |
| FIN | 40 m | 2 m 55 s | 6 m 25 s |
| CHS | 40 m | 2 m 10 s | 4 m 5 s |
| PUR | 39 m | 2 m 12 s | 3 m 40 s |
| CLM | 43 m | 2 m 15 s | 3 m 40 s |
| IBS | 4 h 17 m | 32 m 40 s | 3 m 20 s |
| CEU | 5 m | 34 m 40 s | 4 m 30 s |
| YRI | 13 h 26 m | 28 m 10 s | 5 m 50 s |
| CHB | 33 m | 2 m 18 s | 4 m 45 s |
| JPT | 59 m | 5 m 45 s | 6 m 58 s |
| LWK | 20 h 25 m | 34 m 40 s | 13 m 58 s |
| ASW | 49 m | 6 m 25 s | 4 m 20 s |
| MXL | 59 m | 2 m 2 s | 4 m 50 s |
| TSI | 29 m | 1 m 12 s | 4 m 40 s |