Literature DB >> 35298480

dbGSRV: A manually curated database of genetic susceptibility to respiratory virus.

Ping Li1, Yan Zhang1, Wenlong Shen1, Shu Shi1, Zhihu Zhao1.   

Abstract

Human genetics has been proposed to play an essential role in inter-individual differences in respiratory virus infection occurrence and outcomes. To systematically understand human genetic contributions to respiratory virus infection, we developed the database dbGSRV, a manually curated database that integrated the host genetic susceptibility and severity studies of respiratory viruses scattered over literatures in PubMed. At present, dbGSRV contains 1932 records of genetic association studies relating 1010 unique variants and seven respiratory viruses, manually curated from 168 published articles. Users can access the records by quick searching, batch searching, advanced searching and browsing. Reference information, infection status, population information, mutation information and disease relationship are provided for each record, as well as hyperlinks to public databases in convenient of users accessing more information. In addition, a visual overview of the topological network relationship between respiratory viruses and associated genes is provided. Therefore, dbGSRV offers a convenient resource for researchers to browse and retrieve genetic associations with respiratory viruses, which may inspire future studies and provide new insights in our understanding and treatment of respiratory virus infection. Database URL: http://www.ehbio.com/dbGSRV/front/.

Entities:  

Mesh:

Year:  2022        PMID: 35298480      PMCID: PMC8929643          DOI: 10.1371/journal.pone.0262373

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Respiratory viruses are viruses that enter from respiratory tract and proliferate in respiratory mucosal epithelial cells, causing local infection in respiratory tract or lesions in other organs [1]. Common human respiratory viruses include respiratory syncytial virus (RSV), rhinovirus, influenza virus, parainfluenza virus, human metpneumonia virus, coronavirus, adenovirus and so on [2]. Respiratory virus infection is one of the leading causes of human mortality and morbidity, which confers constant public health treats and results in significant economic losses [3-5]. Of note, as of 25th August, 2021, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a novel coronavirus emerged in 2019 [6-8], has spread all over the world and caused over 213 million infections and 4.4 million deaths (https://covid19.who.int/). Human responses to respiratory virus infection differ from uninfected, asymptomatic, mild, moderate, severe to fatal outcome. The wide variations in susceptibility and severity are not only attributed to the different transmissibility and virulence of different virus strains, but also attributed to host factors like age, sex, premature birth, pregnancy, obesity and comorbidity [9-11]. Among host factors, host genetic background attracts more and more attention in these years [12, 13]. Adoption, twin and heritability studies provided the first line of evidence [14-16], followed by candidate-gene study, genome-wide association study (GWAS), whole exome sequencing (WES) and whole genome sequencing (WGS) in recent years [17], revealing that human genetic variants play an important role in susceptibility and severity to infection by altering the expression or function of genes, especially those in genes involved in viral life cycle, host inflammatory and immune response [18-21]. These genetic association studies may help dissect the underling mechanisms of viral pathogenesis and host antiviral defense and may contribute to future clinical risk prediction models, allowing for the stratification of individuals according to risk so that those at high risk would be prioritized for immunization [22]. Though great progress has been made in this area, there is a lack of database systematically collecting, formatting, annotating, storing and displaying studies of human susceptibility and severity in respiratory virus. Searching and reading related papers scattered in PubMed are time-consuming, hindering convenient access to useful information. Therefore, we present the first database of Genetic Susceptibility to Respiratory Virus (dbGSRV), which integrates published genetic studies relating susceptibility and severity in respiratory virus infection. It contains 1932 records of genetic association studies relating 1010 unique variants and seven respiratory viruses, manually curated from 168 published articles. Comprehensive information about reference, infection, samples, mutations and their relationships are available at http://www.ehbio.com/dbGSRV/front/. We anticipate that this resource will a useful tool for researchers to query and retrieve genetic association studies of respiratory viruses.

Materials and methods

Publications collection

We searched for literatures that describe genetic associations with susceptibility or severity of respiratory virus infections in PubMed, using keywords of ‘variant’, ‘polymorphism’, ‘susceptibility’ combined with names of specific respiratory virus like ‘adenovirus’, ‘bocavirus’, ‘influenza’, ‘measles’, ‘MERS’, ‘metapneumovirus’, ‘mumps’, ‘parainfluenza virus’, ‘respiratory syncytial virus’, ‘rhinovirus’, ‘rubella’, ‘SARS’ and ‘SARS-CoV-2’. We selected these viruses because they are the most common respiratory viruses that infect humans. The searching results were manually examined to only leave English publications that study associations between human single nucleotide variants (SNVs), multiple nucleotide variants (MNVs) or indels with susceptibility or severity of explicit respiratory virus in case-control researches. Meta-analysis and researches that did not test which virus the cases were infected with were excluded. As for these related publications, we collected publication information like paper title, first and corresponding author, year and journal published and PubMed Unique Identifier (PMID).

Data extraction, standardization and annotation

We defined one record of genetic association study based on the virus type, case-control sample and variant. As for each record, respiratory virus type information was extracted from the full text of the paper. Virus subtype was also extracted if specified. The number, country, ethnicity and clinical severity information of samples were collected. Based on the ethnicity, we determined the superpopulation of 1000 Genomics that the sample population belonged to. If the sample belonged to multiple superpopulation, marked as ‘Mixed’. No matter whether the variant was associated with virus susceptibility or not, all the studied variants mentioned in the main text were included, to provide a more comprehensive and unbiased scope of genetic association studies. For sake of uniformity, the name, reference allele and alternate allele of each variant were based on the dbSNP database. Many early publications did not offer dbSNP rs ID of the variant. We manually annotated the rs ID of these variants by genomic mapping. The original names of these variants in the publication were also included in the database as old name. The genomic position of the variants was annotated based on hg38 human genome. Annotation of variants relative to genes were based the following order: exon, 5’ UTR, 3’ UTR, intron, promoter (within 2kb upstream), upstream (2-5kb upstream), downstream (within 2kb downstream). The alternate allele frequencies of cases and controls, statistic method, odds ratio (OR), 95% confidence interval (CI) and p value for the allele association were extracted from the full-text or supplementary materials. If only genotype frequency was given in the paper, then alternate allele frequency was calculated manually. As for p value, ‘> 0.05%’ was marked if the paper did not give a specific value but claimed that there was no statistically significant difference or association. The allele, genotype, and haplotype association results were each classified into one of the following four categories: ‘severity’, ‘susceptibility’, ‘no association’ and ‘NA’. If at least of one of the allele, genotype or haplotype association result reported ‘severity’ or ‘susceptibility’, then the overall association status was determined as ‘severity’ or ‘susceptibility’, otherwise the overall association status was determined as ‘no association’. Additional noteworthy information about sample, variant and disease association was included in notes.

Database implementation and data analysis

dbGSRV database was implemented as a web application using JavaScript and HTML for front-end development. The used core JavaScript libraries included Vue.js (https://vuejs.org/) for the main front-end framework, vis.js (https://visjs.org) for Network viewer, plotly.js (https://plotly.com/) for Lollipop charts. High-level web framework Django (https://www.djangoproject.com/) was used for back-end data preprocess and data analysis. The global search function was based on Elasticsearch module. Open source data management system MySQL was used for table-data saving and accessing. Gene Ontology (GO) and pathway analysis of associated genes were conducted using Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/).

Results

Web interface

The dbGSRV database comprises six pages, including Home page, Browse page, Batch Search page, Advanced Search page, Network page and Help page. On the Home page, users can find a brief introduction, update log of dbGSRV and a quick search box (Fig 1A). The quick search box allows users to search genetic association records based on virus name, variant rs ID, gene name or genomic region. The Batch Search page allow users to search multiple viruses, variants, genes, or genomic loci either by entering keywords in the text box or by uploading a txt file (Fig 1B). On Advanced Search page, users can search by logical combination of more keywords (Fig 1C). The Browse page permits users to browse all records by virus, annotation or study type (Fig 1D).
Fig 1

Screen shot of dbGSRV contents.

(A) Home page. (B) Batch Search page. (C) Advanced Search page. (D) Browse page. (E) Detailed information about the record.

Screen shot of dbGSRV contents.

(A) Home page. (B) Batch Search page. (C) Advanced Search page. (D) Browse page. (E) Detailed information about the record. The search results are presented as pie charts and a table (Fig 1D). The pie charts display the number and proportion of each subgroup as for virus type, variant position relative to genes and study type respectively, while the table contains the basic information of each record, including virus type, variant information (rs ID in dbSNP database, position in hg38 genome and relative position to genes), study type, sample size and association status. Clicking the subgroup in the pie charts will get the results of the subgroup, and clicking the same subgroup one more time will return back. In addition, the table provides several features. First, users can further filter the results in the table by typing terms in the ‘filter’ box at the top-right of the table. Second, Clicking the icon on the right of ‘filter’ box, users can change the columns displayed in the table. The following three columns can also be added: ID (unique ID of each record), Year (the year that the paper is published) and PMID (the PMID of the paper in PubMed database). Third, each column could be ranked in ascending or descending by clicking the triangle on the right side of the column header. Fourth, for each record, clicking the Variant dbSNP, Position, Gene and PMID column will take users to the corresponding page in the dbSNP, UCSC, GeneCards and PubMed database respectively. Clicking ‘More’ will jump to the Details page of the record in the database, which consists of more detail information about the reference, infection, population, mutation and disease relationship (Fig 1E). The Network page provides a visual overview of the topological relationship between respiratory viruses and associated genes (Fig 2). Nodes represent respiratory viruses and genes. Respiratory virus node and gene node are linked by an edge if at least one variant on the gene are reported to be associated with the susceptibility or severity of the virus. As default, the network shows all the respiratory viruses in the database. Users can select a set of specific viruses and submit to generate a new network for these viruses. The attributes of nodes (such as size, shape, background color, label font size and label color) and the overall layout of the network can be edited. The picture can be exported in SVG format for publication usages, as well as the data used to generate the network, which can be downloaded in an Excel file.
Fig 2

Network page provides a visual overview of the topological relationship between respiratory viruses and associated genes.

Nodes represent respiratory viruses and genes. Respiratory virus node and gene node are linked by an edge if at least one variant on the gene are reported to be associated with the susceptibility or severity of the virus. Genes associated with more than one virus are highlighted by larger label size.

Network page provides a visual overview of the topological relationship between respiratory viruses and associated genes.

Nodes represent respiratory viruses and genes. Respiratory virus node and gene node are linked by an edge if at least one variant on the gene are reported to be associated with the susceptibility or severity of the virus. Genes associated with more than one virus are highlighted by larger label size. At last, dbGSRV provides a detailed tutorial for usage of the database in the Help page.

Database statistics

For a more comprehensive and unbiased understanding of the genetic association studies with respiratory viruses, the database not only includes positive results of association, but also includes negative results reported in the main text of the paper. In total, dbGSRV contains 1932 records of genetic association studies relating 1010 unique variants and seven respiratory viruses, manually curated from 168 published articles. The seven respiratory viruses are adenovirus, influenza virus, measle virus, rhinovirus, RSV, SARS-CoV and SARS-CoV-2, of which influenza virus, SARS-CoV-2, RSV and SARS-CoV have the most records, with 718, 486, 433 and 221 records respectively (Fig 3A). Besides, a majority of the records are related with variants residing in the intron (35.7%) and exon (34.3%) of genes (Fig 3B). As for study strategy, most of the records are curated from candidate-gene study (62.4%) (Fig 3C). It is worth noting that 610 records report positive genetic associations between 249 unique variants of 159 genes with respiratory virus infection, mostly based on allele frequency (Fig 3D). Among the positive associations, 149 records are related to susceptibility to infection while 461 records are related to severity.
Fig 3

Statistics of dbGSRV.

(A) The distribution of records in each respiratory virus. (B) The distribution of records in genomic elements. (C) The distribution of records in each study type. (D) The distribution of positive associated records at allele, genotype and haplotype levels.

Statistics of dbGSRV.

(A) The distribution of records in each respiratory virus. (B) The distribution of records in genomic elements. (C) The distribution of records in each study type. (D) The distribution of positive associated records at allele, genotype and haplotype levels. The Network page in the database provides a visual overview of the topological relationship between respiratory viruses and associated genes as shown in Fig 2. Influenza virus, RSV and SARS-CoV-2 have the most associated genes, which is in accordance with these three respiratory viruses having the most study records. On the other hand, a couple of genes are associated with multiple respiratory viruses. Particularly, TNF gene, which is a key mediator of the inflammatory response and is critical for host defense against a wide variety of pathogenic microbes [23], is associated with the greatest number of respiratory viruses.

GO and pathway analysis of associated genes

We performed GO and pathway analysis of associated genes using Database for Annotation, Visualization and Integrated Discovery (DAVID). The top 10 significantly enriched GO terms and pathways were shown in Tables 1 and 2, respectively.
Table 1

The top 10 significant GO terms of gene set analysis using associated genes.

CategoryTermFDR
GO_BPimmune response6.05E-26
GO_BPinflammatory response6.26E-20
GO_MFcytokine activity4.40E-15
GO_BPpositive regulation of inflammatory response1.57E-09
GO_BPpositive regulation of T cell proliferation2.75E-09
GO_BPpositive regulation of interferon-gamma production3.30E-09
GO_BPinnate immune response2.29E-08
GO_BPregulation of complement activation4.79E-08
GO_BPtype I interferon signaling pathway6.58E-08
GO_BPlipopolysaccharide-mediated signaling pathway6.58E-08
Table 2

The top 10 significant pathways of gene set analysis using associated genes.

CategoryTermFDR
KEGG_PATHWAYCytokine-cytokine receptor interaction3.78E-19
KEGG_PATHWAYInfluenza A4.21E-16
KEGG_PATHWAYInflammatory bowel disease (IBD)4.21E-16
KEGG_PATHWAYHerpes simplex infection9.03E-16
KEGG_PATHWAYMeasles9.66E-15
KEGG_PATHWAYRheumatoid arthritis1.05E-13
KEGG_PATHWAYJak-STAT signaling pathway5.57E-13
KEGG_PATHWAYLeishmaniasis5.57E-13
BIOCARTACytokine Network8.17E-12
KEGG_PATHWAYToll-like receptor signaling pathway2.14E-12
GO analysis revealed ‘immune response’ and ‘inflammatory response’ were top enriched terms. In addition, other enriched terms such as ‘positive regulation of T cell proliferation’, ‘positive regulation of interferon-gamma production’, ‘regulation of complement activation’, ‘type I interferon signaling pathway’, ‘cytokine activity’ and ‘positive regulation of inflammatory response’ are also related with immune response and inflammatory response, highlighting the central role of these processes against respiratory virus infection [24]. Notably, inflammatory response is double-edged sword in respiratory virus infection [25]. On one hand, inflammatory response promotes immune response against infection. On the other hand, ‘cytokine storm’ triggered by inflammatory response may worsen the severity of respiratory virus infection [26]. Pathway analysis revealed a significant enrichment for pathways directly related to pathogens and autoimmune diseases such as ‘Influenza A’, ‘Inflammatory bowel disease (IBD)’, ‘Herpes simplex infection’, ‘Measles’, ‘Rheumatoid arthritis’ and ‘Leishmaniasis’. There were three enriched pathways related to cytokines, ‘Cytokine-cytokine receptor interaction’, ‘Jak-STAT signaling pathway’ which is the downstream signaling pathway of cytokine interferon [27] and ‘Cytokine Network’. In addition, ‘Toll-like receptor signaling pathway’, which is essential for viral sensing and triggering downstream immune response [28], was also enriched.

Discussion

To our knowledge, dbGSRV is the first manually curated database containing comprehensive human genetic association information with respiratory viruses. It is composed of several characteristic features worth noting. First, dbGSRV contains 1932 records of genetic association studies relating 1010 unique variants and seven respiratory viruses with a user-friendly interface, which is convenient for researchers to browse and retrieve the data, access more information of public databases by hyperlinks and visualize the network of respiratory viruses and associated genes. Users could make use of the database to familiarize themselves with respiratory virus genetics, explore genes implicated in virus infection, select variants to further confirm and check whether significant variants discovered in their studies have been reported previously. Second, inconsistent results are frequently found in replication studies of genetic association [29, 30], thus only recording positive association results may result in misleading. To provide a more comprehensive and unbiased scope of genetic association studies, we not only included records of positive association results in the database, but also included records of negative results mentioned in the main text of references. Additionally, as conflicting genetic association results might result from factors such as ethnicity, sample size, allele frequency and analysis method [31], we also collected information like number, ethnicity and alternate allele frequencies of cases and controls, study type, statistic method, OR, 95% CI and p value for the allele association if available, in order to facilitate users to assess and compare different study results accurately and comprehensively. Third, users should recognize that there are certain limitations in the database. Among all kinds of common respiratory viruses, only seven respiratory viruses have been studied of human genetic susceptibility or severity and thus included in the database, and a majority of the studies are candidate gene studies. This might lead to a biased representation of certain variants and associations in the database. Therefore, more and more studies, especially GWAS studies, on a wider range of respiratory viruses are anticipated in the future, and dbGSRV will be updated about once in a year according to newly available data. In addition, we find that many genetic association studies with respiration viruses have limited sample size and statistic power, which might be compensated with meta-analysis [32]. Therefore, we plan to include meta-analysis literatures in the updated database in the future. In summary, dbGSRV will be a convenient resource for researchers to query and retrieve genetic associations with respiratory viruses, which may inspire future studies and provide new insights into our understanding and treatment of respiratory virus infection.
  31 in total

1.  Genetic and environmental influences on premature death in adult adoptees.

Authors:  T I Sørensen; G G Nielsen; P K Andersen; T W Teasdale
Journal:  N Engl J Med       Date:  1988-03-24       Impact factor: 91.245

2.  Vitamin D Receptor polymorphisms and risk of enveloped virus infection: A meta-analysis.

Authors:  Marina Laplana; José Luis Royo; Joan Fibla
Journal:  Gene       Date:  2018-08-06       Impact factor: 3.688

Review 3.  The inflammatory response triggered by Influenza virus: a two edged sword.

Authors:  Luciana P Tavares; Mauro M Teixeira; Cristiana C Garcia
Journal:  Inflamm Res       Date:  2016-10-15       Impact factor: 4.575

Review 4.  Regulation of type I interferon responses.

Authors:  Lionel B Ivashkiv; Laura T Donlin
Journal:  Nat Rev Immunol       Date:  2014-01       Impact factor: 53.106

5.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors:  Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal:  Lancet       Date:  2020-01-24       Impact factor: 79.321

Review 6.  Genetic susceptibility to infectious diseases: Current status and future perspectives from genome-wide approaches.

Authors:  Alessandra Mozzi; Chiara Pontremoli; Manuela Sironi
Journal:  Infect Genet Evol       Date:  2017-09-22       Impact factor: 3.342

7.  A novel coronavirus outbreak of global health concern.

Authors:  Chen Wang; Peter W Horby; Frederick G Hayden; George F Gao
Journal:  Lancet       Date:  2020-01-24       Impact factor: 79.321

Review 8.  Cytokine Storm.

Authors:  David C Fajgenbaum; Carl H June
Journal:  N Engl J Med       Date:  2020-12-03       Impact factor: 91.245

9.  Genetic predisposition models to COVID-19 infection.

Authors:  Farzaneh Darbeheshti; Nima Rezaei
Journal:  Med Hypotheses       Date:  2020-05-06       Impact factor: 1.538

10.  Global burden of acute lower respiratory infection associated with human metapneumovirus in children under 5 years in 2018: a systematic review and modelling study.

Authors:  Xin Wang; You Li; Maria Deloria-Knoll; Shabir A Madhi; Cheryl Cohen; Asad Ali; Sudha Basnet; Quique Bassat; W Abdullah Brooks; Malinee Chittaganpitch; Marcela Echavarria; Rodrigo A Fasce; Doli Goswami; Siddhivinayak Hirve; Nusrat Homaira; Stephen R C Howie; Karen L Kotloff; Najwa Khuri-Bulos; Anand Krishnan; Marilla G Lucero; Socorro Lupisan; Ainara Mira-Iglesias; David P Moore; Cinta Moraleda; Marta Nunes; Histoshi Oshitani; Betty E Owor; Fernando P Polack; Katherine L O'Brien; Zeba A Rasmussen; Barbara A Rath; Vahid Salimi; J Anthony G Scott; Eric A F Simões; Tor A Strand; Donald M Thea; Florette K Treurnicht; Linda C Vaccari; Lay-Myint Yoshida; Heather J Zar; Harry Campbell; Harish Nair
Journal:  Lancet Glob Health       Date:  2020-11-26       Impact factor: 26.763

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.