Literature DB >> 29140525

rSNPBase 3.0: an updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks.

Liyuan Guo1,2, Jing Wang1,2.   

Abstract

Here, we present the updated rSNPBase 3.0 database (http://rsnp3.psych.ac.cn), which provides human SNP-related regulatory elements, element-gene pairs and SNP-based regulatory networks. This database is the updated version of the SNP regulatory annotation database rSNPBase and rVarBase. In comparison to the last two versions, there are both structural and data adjustments in rSNPBase 3.0: (i) The most significant new feature is the expansion of analysis scope from SNP-related regulatory elements to include regulatory element-target gene pairs (E-G pairs), therefore it can provide SNP-based gene regulatory networks. (ii) Web function was modified according to data content and a new network search module is provided in the rSNPBase 3.0 in addition to the previous regulatory SNP (rSNP) search module. The two search modules support data query for detailed information (related-elements, element-gene pairs, and other extended annotations) on specific SNPs and SNP-related graphic networks constructed by interacting transcription factors (TFs), miRNAs and genes. (3) The type of regulatory elements was modified and enriched. To our best knowledge, the updated rSNPBase 3.0 is the first data tool supports SNP functional analysis from a regulatory network prospective, it will provide both a comprehensive understanding and concrete guidance for SNP-related regulatory studies.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29140525      PMCID: PMC5753256          DOI: 10.1093/nar/gkx1101

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Biological and clinical studies that start from human genetic data, such as results of genome-wide association study (GWAS) or next-generation sequencing (NGS) study, largely depend on the subsequent functional analysis of detected SNPs (1,2). Considering the distribution ratio of coding and non-coding regions on the human chromosome as well as the increased evidence of the association between non-coding regions and a variety of human traits (3,4), the regulatory annotation of trait-associated SNPs has become more and more important in related studies. Therefore, we developed the rSNPBase (5) to provide a reliable regulatory annotation of human SNPs; we further expanded the annotation to include several types of known and novel human variants in the updated rVarBase (6). In the past four years, the two databases have been used in many disease genetic studies (7) and SNP functional studies (8). While advances in human epigenetics enable researchers better understand complex disease molecular processes, new requirements for data analysis approaches and the corresponding data tools have appeared. An obviously trend in recent years is to use appropriate methods to gain a systematic view on interested traits (9). In such fields, gene regulatory network analysis is a useful method to understand the epigenetic mechanism underlying disease (10–12). One type of gene regulatory network is the transcription factor (TF) and microRNA (miRNA) co-regulatory network, which utilizes both transcriptional and post-transcriptional regulation relationships by integrating TF-gene and miRNA-gene interactions (13,14). This kind of analysis has been used in many disease molecular studies and reveals valuable mechanical evidence (15–17). There are several data tools to help users construct TF-miRNA co-regulatory networks by using disease expression data or disease-related elements as input (18–20), but to our best knowledge no data tool supports SNP functional analysis from a regulatory network prospective. To facilitate researchers to perform gene regulatory network analysis based on genetic results, we made both structural and data adjustments to the rSNPBase and present it as the updated version rSNPBase 3.0 (http:// rsnp3.psych.ac.cn). In rSNPBase 3.0, human genome SNPs were annotated with both related regulatory elements and regulatory element-target gene pairs (E–G pairs). In addition to the regulatory SNP (rSNP) search module that provides detailed epigenetic features for queried SNPs and their linkage disequilibrium (LD) proxies, the database also provides a network analysis module, which can generate gene regulatory networks to allow users to get a more intuitive understanding of regulatory elements, processes, and mechanisms related to their queried variants. Nodes in the generated network represent regulatory elements (TF and miRNA) and target genes, and edges between nodes show SNPs that are related to the specific E–G pairs. The combination of the two modules will help users to get an overall logical understanding of their trait of interest as well as get more concrete guidance for follow-up studies.

DATA CONTENT AND NEW FEATURES

Beyond adding newly created small variations from the NCBI dbSNP database, new types of regulatory elements and more extended annotations (such as SNP-related diseases), a significant adjustment in the database structure and function have been made in this update. The concerns of rSNPBase 3.0 extended from regulatory elements that overlapped SNPs to SNP-related E–G pairs. Based on these changes, the updated database supports SNP-based gene regulatory network analysis. As shown in Figure 1, seven types of regulatory elements were derived from reference databases (most of them provide experimental supported data), and the relationship between the included regulatory elements and genes from Ensembl (GRCh37) (21) were analysed by genomic proximity or by using reference databases. Genome-wide human SNPs from NCBI dbSNP (build 150) (22) were analysed and filtered with involved regulatory elements, and were therefore connected to corresponding E–G pairs at the same time. The analysis results were stored in rSNPBase 3.0 and presented as rSNP reports and SNP-based regulatory networks. A summarized data contents of rSNPBase 3.0 is shown in Table 1. Detailed information about the utilized regulatory elements and reference database used in the data processing is shown in the supplementary data and at http://rsnp3.psych.ac.cn/datacontent.do.
Figure 1.

Data processing of rSNPBase 3.0.

Table 1.

Data content of rSNPBase 3.0 (as of 15 September 2017) and comparison of the current and previous versions

Involved SNPs in three versionrSNPBaserVarBaserSNPBase 3.0
rSNPs22 846 89887 345 304117 452 549
LD-proxies of rSNPs (non-rSNPs)2 281 8741 626 7372 096 231
SNP related regulatory elements
Transcription factor binding regions (TFBRs)7 562 592
Chromatin interactive regions (CIRs)212 837
Topologically associated domains (TADs)38 916
Mature microRNA (miRNA) regions2 794
Predicted miRNA target sites384 284
Long non-coding RNA (lncRNA) regions211 749
Circular RNA (circRNA) regions312 673
SNP related regulatory element-target gene (E–G) pairs
TFBR-gene pairs9 484 132
CIR - gene pairs321 247
TAD – gene pairs48 482
miRNA - gene pairs
Experimentally supported 37 781
Predicted 384 284
lncRNA - gene pairs4 092
Extend annotations
Diseases(SNP-disease pairs)97 896
eQTLs (SNP-mRNA pairs)4 201 218
Data processing of rSNPBase 3.0.

Regulatory elements involved in rSNPBase 3.0

rSNPBase 3.0 is consistent with previous versions in annotating SNPs with reliable regulatory elements (mainly identified by experiments). There are seven types of regulatory elements included. Experimentally identified transcription factor binding regions (TFBRs) and chromatin interactive regions (CIRs) from the Encyclopaedia of DNA Elements (ENCODE) project (23), validated mature miRNA regions from miRBase (24), predicted human miRNA target sites in 3′-UTR regions from TargetScan (25,26) and miRNAda (27), and validated long non-coding RNAs (lncRNAs) region from LNCipedia (28) were included in the two previous versions and remain in this update. One of the greatest progress in human epigenetic in recent years is advances in 3D chromatin structures (29,30), which expands the research dimensions from DNA accessibility and DNA binding to long-distance element interactions. Thus, in rVarBase 3.0, we consolidated the topologically associated domains (TADs) from ENCODE-processed data. Circular RNAs (circRNAs) from CircNet were also added (31), along with accumulation information for this type of non-coding RNA. Although they contain rich and important information, DNA elements that describe epigenetic statues on large chromosome scales, such as active chromatin state regions, histone binding regions, and methylation of CpG islands may suffer a limited impact from the genotype changes of small variations, so they were not included in rSNPBase 3.0.

Regulatory elements-gene pairs (E–G pairs) and gene regulatory network

Except for circRNAs, target gene analyses were performed on the other six types of regulatory elements to get corresponding E–G pairs.

E–G pairs for transcriptional regulatory elements

Chromosome location of transcriptional regulatory elements was acquired from ENCODE Consortium, among which TFBRs information was acquired from proceeded ChIP-seq peak data, CIRs information was from Chromatin Interactions by 5C and ChIA-PET, and TADs information was from proceeded Hi-C data. Genome locations of the three types of regulatory elements were compared with the potential promoter region (from 2k upstream to 1k downstream of transcription start sites) of all Ensembl recorded genes on hg19 coordinate. As illustrated in Figure 1, if TFBRs (TFBR1 and TFBR2 in the illustration) were located in the potential promoter region of a gene (Gene A), the E–G pairs of TFBR1-Gene A and TFBR2-Gene A were constructed. The connection between TFs that binding to the TFBRs (TF a and TF b in the illustration) and Gene A were constructed correspondingly to support subsequently TF-gene network analysis. For CIRs, if one fragment (CIR1) in the interactive regions was located in potential promoter region of the Gene A, the relationships between both the interacted DNA fragments (CIR1 and CIR2) and Gene A were constructed. TADs were also connected to genes with overlapped chromosome location.

E–G pairs for non-coding RNAs

The regulatory targets of miRNAs and lncRNAs have been accumulated using experimental data and prediction methods. The databases of miR2Disease (32), miRTarBase (33) and lncRNA2Target (34) include experimentally supported target relationships between miRNAs/lncRNAs and genes that validated by RT-PCR, microarray or RNA-seq. These data were integrated into rSNPBase 3.0 as miRNA–gene pairs and lncRNA-gene pairs directly. The databases of TargetScan (25,26) and miRnada (27) include predicted miRNA target sites in transcripts of several species. Human transcripts and predicted targets of human miRNAs were obtained and mapped to human genome sequence (on hg 19 coordinate). As shown in Figure 1, chromosome location of genome sequence corresponding to predicted target site in transcript (Predicted miRNA target site 1) and gene that coding the target transcript (Gene E) were added into the E–G pair list. The miRNA (miRNA c in the illustration) that was predicted to match to the site was also connected with Gene E to support subsequently miRNA-gene network analysis.

SNP-based E–G pairs and gene regulatory network

All the six types of E–G pairs were used to annotate SNPs by comparing the chromosomal location between the SNP and the element in E–G pair. SNP-related E–G pairs were stored and could be used as a basic unit to build an SNP-related regulatory network. In addition to allowing researchers to browse E–G pairs as part of the SNP annotation information, rSNPBase 3.0 provides specific module for showing graphical gene regulatory networks that have been constructed based on SNP-related TFBR-gene pairs, mature miRNA region-gene pairs and predicted miRNA target site-gene pairs. Nodes in the network represent corresponding TFs, miRNAs, and genes.

Extended information

Three types of extended information were added to the rSNPs. To cover genetic-correlated SNPs, LD correlations between human SNPs were analysed according to the 1000 Genomes project (35) and HapMap (36). Non-rSNPs with strong LD (r2>0.8) with rSNPs were added as LD-proxies. To provide more functional evidence, expression quantitative trait loci (eQTLs) information was annotated for all the included SNPs (both rSNPs and their LD-proxies) with references to several experimentally support databases (37,38). To facilitate pathogenicity studies, SNP-associated diseases were also annotated. In addition to the associations between SNPs and complex diseases from the GWAS catalogue (39), relationships between SNPs and inherited diseases from the Human Gene Mutation Database (HGMD) (40) were also included in rSNPBase 3.0.

WEB INTERFACE AND DATABASE USAGE

The web interface for rSNPBase 3.0 was constructed with the following two main function modules: ‘rSNP search’ and ‘Network search’. As shown in Figure 2, the ‘rSNP search’ module contains almost all the functions in the previous version. Users could query SNPs by using dbSNP rs IDs and then get summarized regulatory features and extended information on the queried SNPs in the ‘Search Results’ page. The search results could be downloaded and a detailed SNP information report could be seen in the linked ‘SNP Report’ page. The new ‘Network search’ module, which is shown in Figure 2, can also use dbSNP rs IDs as input data. The data query in this module results in a graphic regulatory network and an ‘element-gene-related SNPs’ interaction table, which can be downloaded and modulated with graphic tools such as Cytoscape (41). In addition to the two search modules, an FTP site was provided to facilitate the download of annotations for all SNPs included in the database.
Figure 2.

Web interface and data retrieving process of rSNPBase 3.0.

Web interface and data retrieving process of rSNPBase 3.0. In addition to providing a potential functional interpretation for individual results from genetic studies, the updated rSNPBase 3.0 may support deeper and more comprehensive functional inferences. Here, we present database usage with a demo SNP set. The demo set includes 199 SNPs that were recorded in GWAS catalogue as rheumatoid arthritis (RA)-associated SNPs (P < 1E–5 in GWAS), as shown in the supplementary data. Similar to the previous version, the regulatory features of the demo SNPs could be directly searched in the ‘rSNP search’ module. As shown in Figure 2, the search results show that 167 of the 199 SNPs are included in rSNPBase 3.0. In total, 115 of the included SNPs are rSNPs that are related to TFBRs, TADs, CIRs, lncRNAs and circRNAs; the other 52 are LD-proxies of rSNPs. The demo SNPs could also be searched in the new ‘Network search’ module and generated a network constructed by 81 nodes and 115 edges. The 81 nodes represent 71 TFs and 10 genes, and the NFKBIE gene, PHF19 gene and the TF POLR2A occupy the most central positions in the network. This network illustrates potential evidence for gene regulation related to RA-associated SNPs, and may provide comprehensive information for subsequent functional analyses and experiments. For a specific TF-gene pair, detailed information could be observed in the rSNP report page using a link from the SNP in the ‘element-gene-related SNPs’ interaction table below the graphical network. The combination of systematic prospects and detailed regulation information will provide practical references for functional studies with genetic results.

CONCLUSIONS AND FUTURE PLAN

Here, we present rSNPBase 3.0, an updated SNP regulatory database. It extends the annotation scope of SNP-related regulatory elements to SNP-related regulatory element-target gene pairs and can therefore support SNP-based gene regulatory network analysis. In addition to providing intuitive SNP regulatory annotations that could be used as concrete guidance for follow-up studies, the updated rSNPBase 3.0 also allows researchers to get a more systematic view of the potential regulatory mechanisms related to their SNPs of interest. Both SNP-related E–G pairs and SNP-based TF/microRNA co-regulatory networks will provide logical evidence that bridges genetic results with specific complex disease studies. In addition to improving the data content, the web interface was also modified to a more concise and user-friendly manner. rSNPBase 3.0 will be continuously updated. In addition to adding newly reported human genetics and epigenetics data, more comprehensive analysis approaches will be considered. For example, the gene regulatory networks are connected and interplay with other types of interactive networks, such as signal transduction networks (42), protein–protein interaction networks (43), and protein phosphorylation networks (44). We will pay close attention to the frontier data and methods in this field, and aim to integrate this information in an appropriate and timely manner. Click here for additional data file.
  44 in total

Review 1.  Beyond GWASs: illuminating the dark road from association to function.

Authors:  Stacey L Edwards; Jonathan Beesley; Juliet D French; Alison M Dunning
Journal:  Am J Hum Genet       Date:  2013-11-07       Impact factor: 11.025

Review 2.  Regulation of disease-associated gene expression in the 3D genome.

Authors:  Peter Hugo Lodewijk Krijger; Wouter de Laat
Journal:  Nat Rev Mol Cell Biol       Date:  2016-11-09       Impact factor: 94.444

Review 3.  Role of non-coding sequence variants in cancer.

Authors:  Ekta Khurana; Yao Fu; Dimple Chakravarty; Francesca Demichelis; Mark A Rubin; Mark Gerstein
Journal:  Nat Rev Genet       Date:  2016-01-19       Impact factor: 53.242

4.  A novel microRNA and transcription factor mediated regulatory network in schizophrenia.

Authors:  An-Yuan Guo; Jingchun Sun; Peilin Jia; Zhongming Zhao
Journal:  BMC Syst Biol       Date:  2010-02-15

5.  The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution.

Authors:  Peter D Stenson; Edward V Ball; Matthew Mort; Andrew D Phillips; Katy Shaw; David N Cooper
Journal:  Curr Protoc Bioinformatics       Date:  2012-09

6.  LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression.

Authors:  Qinghua Jiang; Jixuan Wang; Xiaoliang Wu; Rui Ma; Tianjiao Zhang; Shuilin Jin; Zhijie Han; Renjie Tan; Jiajie Peng; Guiyou Liu; Yu Li; Yadong Wang
Journal:  Nucleic Acids Res       Date:  2014-11-15       Impact factor: 19.160

7.  An update on LNCipedia: a database for annotated human lncRNA sequences.

Authors:  Pieter-Jan Volders; Kenneth Verheggen; Gerben Menschaert; Klaas Vandepoele; Lennart Martens; Jo Vandesompele; Pieter Mestdagh
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

8.  The microRNA.org resource: targets and expression.

Authors:  Doron Betel; Manda Wilson; Aaron Gabow; Debora S Marks; Chris Sander
Journal:  Nucleic Acids Res       Date:  2007-12-23       Impact factor: 16.971

9.  High-density genotyping of immune-related loci identifies new SLE risk variants in individuals with Asian ancestry.

Authors:  Celi Sun; Julio E Molineros; Loren L Looger; Xu-Jie Zhou; Kwangwoo Kim; Yukinori Okada; Jianyang Ma; Yuan-Yuan Qi; Xana Kim-Howard; Prasenjeet Motghare; Krishna Bhattarai; Adam Adler; So-Young Bang; Hye-Soon Lee; Tae-Hwan Kim; Young Mo Kang; Chang-Hee Suh; Won Tae Chung; Yong-Beom Park; Jung-Yoon Choe; Seung Cheol Shim; Yuta Kochi; Akari Suzuki; Michiaki Kubo; Takayuki Sumida; Kazuhiko Yamamoto; Shin-Seok Lee; Young Jin Kim; Bok-Ghee Han; Mikhail Dozmorov; Kenneth M Kaufman; Jonathan D Wren; John B Harley; Nan Shen; Kek Heng Chua; Hong Zhang; Sang-Cheol Bae; Swapan K Nath
Journal:  Nat Genet       Date:  2016-01-25       Impact factor: 38.330

10.  The Ensembl gene annotation system.

Authors:  Bronwen L Aken; Sarah Ayling; Daniel Barrell; Laura Clarke; Valery Curwen; Susan Fairley; Julio Fernandez Banet; Konstantinos Billis; Carlos García Girón; Thibaut Hourlier; Kevin Howe; Andreas Kähäri; Felix Kokocinski; Fergal J Martin; Daniel N Murphy; Rishi Nag; Magali Ruffier; Michael Schuster; Y Amy Tang; Jan-Hinnerk Vogel; Simon White; Amonida Zadissa; Paul Flicek; Stephen M J Searle
Journal:  Database (Oxford)       Date:  2016-06-23       Impact factor: 3.451

View more
  9 in total

1.  SNPs rs10224002 in PRKAG2 may disturb gene expression and consequently affect hypertension.

Authors:  Xingbo Mo; Huan Zhang; Zhengyuan Zhou; Zhengbao Zhu; Xinfeng HuangFu; Tan Xu; Aili Wang; Zhirong Guo; Yonghong Zhang
Journal:  Mol Biol Rep       Date:  2019-01-28       Impact factor: 2.316

2.  agReg-SNPdb-Plants: A Database of Regulatory SNPs for Agricultural Plant Species.

Authors:  Selina Klees; Felix Heinrich; Armin Otto Schmitt; Mehmet Gültas
Journal:  Biology (Basel)       Date:  2022-04-29

3.  VARAdb: a comprehensive variation annotation database for human.

Authors:  Qi Pan; Yue-Juan Liu; Xue-Feng Bai; Xiao-Le Han; Yong Jiang; Bo Ai; Shan-Shan Shi; Fan Wang; Ming-Cong Xu; Yue-Zhu Wang; Jun Zhao; Jia-Xin Chen; Jian Zhang; Xue-Cang Li; Jiang Zhu; Guo-Rui Zhang; Qiu-Yu Wang; Chun-Quan Li
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

4.  Integrating genome-wide association study with regulatory SNP annotation information identified candidate genes and pathways for schizophrenia.

Authors:  Xiao Liang; Sen Wang; Li Liu; Yanan Du; Bolun Cheng; Yan Wen; Yan Zhao; Miao Ding; Shiqiang Cheng; Mei Ma; Lu Zhang; Xin Qi; Ping Li; Xiong Guo; Feng Zhang
Journal:  Aging (Albany NY)       Date:  2019-06-07       Impact factor: 5.682

5.  An integrative analysis of genome-wide association study and regulatory SNP annotation datasets identified candidate genes for bipolar disorder.

Authors:  Xin Qi; Yan Wen; Ping Li; Chujun Liang; Bolun Cheng; Mei Ma; Shiqiang Cheng; Lu Zhang; Li Liu; Om Prakash Kafle; Feng Zhang
Journal:  Int J Bipolar Disord       Date:  2020-02-03

6.  Hepatic lipase (LIPC) sequencing in individuals with extremely high and low high-density lipoprotein cholesterol levels.

Authors:  Dilek Pirim; Clareann H Bunker; John E Hokanson; Richard F Hamman; F Yesim Demirci; M Ilyas Kamboh
Journal:  PLoS One       Date:  2020-12-16       Impact factor: 3.240

7.  A genome-wide multiphenotypic association analysis identified candidate genes and gene ontology shared by four common risky behaviors.

Authors:  Jing Ye; Li Liu; Xiaoqiao Xu; Yan Wen; Ping Li; Bolun Cheng; Shiqiang Cheng; Lu Zhang; Mei Ma; Xin Qi; Chujun Liang; Om Prakash Kafle; Cuiyan Wu; Sen Wang; Xi Wang; Yujie Ning; Xiaomeng Chu; Lin Niu; Feng Zhang
Journal:  Aging (Albany NY)       Date:  2020-02-22       Impact factor: 5.682

Review 8.  Leveraging User-Friendly Network Approaches to Extract Knowledge From High-Throughput Omics Datasets.

Authors:  Pablo Ivan Pereira Ramos; Luis Willian Pacheco Arge; Nicholas Costa Barroso Lima; Kiyoshi F Fukutani; Artur Trancoso L de Queiroz
Journal:  Front Genet       Date:  2019-11-13       Impact factor: 4.599

9.  LincSNP 3.0: an updated database for linking functional variants to human long non-coding RNAs, circular RNAs and their regulatory elements.

Authors:  Yue Gao; Xin Li; Shipeng Shang; Shuang Guo; Peng Wang; Dailin Sun; Jing Gan; Jie Sun; Yakun Zhang; Junwei Wang; Xinyue Wang; Xia Li; Yunpeng Zhang; Shangwei Ning
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.