Literature DB >> 26503253

rVarBase: an updated database for regulatory features of human variants.

Liyuan Guo1, Yang Du2, Susu Qu2, Jing Wang3.   

Abstract

We present here the rVarBase database (http://rv.psych.ac.cn), an updated version of the rSNPBase database, to provide reliable and detailed regulatory annotations for known and novel human variants. This update expands the database to include additional types of human variants, such as copy number variations (CNVs) and novel variants, and include additional types of regulatory features. Now rVarBase annotates variants in three dimensions: chromatin states of the surrounding regions, overlapped regulatory elements and variants' potential target genes. Two new types of regulatory elements (lncRNAs and miRNA target sites) have been introduced to provide additional annotation. Detailed information about variants' overlapping transcription factor binding sites (TFBSs) (often less than 15 bp) within experimentally supported TF-binding regions (∼ 150 bp) is provided, along with the binding motifs of matched TF families. Additional types of extended variants and variant-associated phenotypes were also added. In addition to the enrichment in data content, an element-centric search module was added, and the web interface was refined. In summary, rVarBase hosts more types of human variants and includes more types of up-to-date regulatory information to facilitate in-depth functional research and to provide practical clues for experimental design.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26503253      PMCID: PMC4702808          DOI: 10.1093/nar/gkv1107

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The association between non-coding variants and human diseases has been of an increasing concern (1–3), and variants that are associated with gene expression abundance have been rapidly identified and accumulated in recent years. Annotating the regulatory features of human variants has been a practical requirement in clinical and basic research (1,4); multiple approaches have been developed to allow the functional annotation of non-coding variants (5–8). To provide reliable, comprehensive and user-friendly regulatory annotation of human single nucleotide polymorphisms (SNPs), we developed the rSNPBase database (9). In the past 2 years, burgeoning sequencing techniques have driven the identification of new disease-associated SNPs and additional types of variants, such as copy number variations (CNVs) and novel variants (10). Meanwhile, advancements in regulatory research have been made in the past few years. For example, the Roadmap project systematically characterized the epigenomic landscapes of representative primary human tissues and cells and then released the relevant data (11,12); new modes of regulation, such as long non-coding RNA (lncRNA) mediated regulation, have been studied in depth (13–16); and more expression quantitative trait loci (eQTLs) have been identified and analyzed (17). Therefore, there is a growing need to update the database to host more types of human variants and include more types of up-to-date regulatory information. The updated rVarBase hosts human regulatory variants (known SNPs and CNVs); furthermore, it annotates novel variants. rVarBase describes a variant's regulatory features in three fields: chromatin states (in different tissues/cells), overlapped regulatory elements and potential target genes. rVarBase also provides an optional extended annotation for variants, including linkage disequilibrium (LD) proxies of known regulatory SNPs (rSNPs), SNPs that are located in regulatory CNVs (rCNVs) and traits (diseases and expression quantitative traits) that are associated with variants. A three-module (variant-centric, gene-centric and element-centric) search engine is provided to facilitate data navigation.

New features

rVarBase is consistent with the previous version in its utilization of experimentally supported regulatory information to make relevant annotations. As shown in Figure 1, genome-wide human variants were gotten and standardized with information from the NCBI dbSNP (build 142) (18), the dbVar (GRCh37) (19) and the UCSC (20). The regulatory features (chromatin states of the surrounding regions, overlapped and experimentally supported regulatory elements and potential target genes) of each variant were analyzed with reference to experimentally supported information. Known human SNPs and CNVs with regulatory features were stored as rSNPs and rCNVs, on which further extended analyses were performed. The reference data utilized for the regulatory feature analysis and extended analysis are shown in http://rv.psych.ac.cn/datacontent.do and Supplementary Tables S1 and S2. A summarized comparison of the current and previous versions is shown in Table 1.
Figure 1.

Data processing and data content of rVarBase.

Table 1.

Data content of rVarBase (as of September 11, 2015) and rSNPBase

Data typerSNPBaserVarBase
Variants
rSNPsa22 846 89887 345 304
rCNVsb1 368 424
Annotation for novel variantsNoYes
Regulatory features
Chromatin statesNoYes
Regulatory elements
CpG islandsYesYes
TF binding regionsYesYes
Matched TFBS and TF-binding matrixesNoYes
Interactive chromatin regionsYesYes
lncRNAsNoYes
miRNAsYesYes
miRNA binding sitesNoYes
Target genes56 86982 640
Extended variants
LD-proxies of rSNPs (non-rSNPs)2 281 8741 626 737
Non-rSNPs inside rCNVs21 797 660
Associated traits
Diseases (variant-disease pairs)198 928
eQTLs (SNP-mRNA pairs)2 428 7274 201 218

aKnown human SNPs that have regulatory features were stored as rSNPs.

bKnown human CNVs that have regulatory features were stored as rCNVs.

Data processing and data content of rVarBase. aKnown human SNPs that have regulatory features were stored as rSNPs. bKnown human CNVs that have regulatory features were stored as rCNVs.

CNVs and novel variants

In addition to accounting for the increased number of SNPs in dbSNP since the publication of rSNPBase 2 years ago, rVarBase provides annotations on more types of human variants, such as known CNVs, novel single-nucleotide variants (SNVs) and regions. Human CNVs were obtained from the dbVar database (19). To focus on regulatory features and to avoid including long CNVs that cover one or more protein-coding gene regions, only CNVs with a length of less than 1 Mb were analyzed. The analytical flow for CNVs and user-requested novel SNVs (with their chromosomal location information) is similar to that of known SNPs; it includes an analysis of the chromatin states of the surrounding regions, a comparison with experimentally supported elements according to their genomic locations and then a map of potential target genes with reference to the genomic proximity of the regulatory elements and transcript start sites (TSSs). For novel regions that are uploaded by users, we provide known regulatory variants that overlap with such regions.

Chromatin states

The Roadmap project provides 111 reference epigenomes and a 15-state model that is trained to generate genome-wide maps of chromatin state using the 111 epigenomes along with 16 epigenomes from the ENCODE project (11). The detailed chromatin state map was downloaded from the project's supplementary data repository web portal (http://egg2.wustl.edu/roadmap/web_portal/index.html). Eight active states (‘Active TSS’, ‘Flanking Active TSS’, ‘Transcr. at gene 5′ and 3′’, ‘Strong transcription’, ‘Weak transcription’, ‘Genic enhancers’, ‘Enhancers’ and ‘ZNF genes & repeats’) and three bivalent states (‘Bivalent/Poised TSS’, ‘Flanking Bivalent TSS/Enhancer’ and ‘Bivalent Enhancer’) from the 15-state model were used to annotate the chromatin state of a variant's surrounding region. Purely repressed states in the 15-state model were not included.

lncRNAs and miRNA target sites

Regulatory elements that cover or overlap with analyzed variants are identified as variant-related elements. In addition to the regulatory elements that are included in rSNPBase (CpG islands, chromatin-interactive regions, TF-binding regions and mature miRNAs), lncRNAs and miRNA target sites were also introduced into the variants’ annotations. lncRNA information was drawn from the LNCipedia database (13); experimentally supported lncRNA target genes were obtained from the LncRNA2Target database (16). Considering the important roles that microRNA target site polymorphisms play in human diseases (21), miRNA target sites in the 3′ UTRs of experimentally supported miRNA target genes were also included for comparison with variants. miRNA target genes were obtained from the miR2Disease (22) and miRTarBase (23) databases, and matched miRNA binding sites were scanned using TargetScan (24,25) and miRnada (26). Detailed information about the utilized regulatory elements is shown in Supplementary Table S1 and http://rv.psych.ac.cn/datacontent.do.

TF binding sites and TF matrixes

In rSNPBase, experimentally supported TF-binding regions (∼150 bp) that had been generated by the ENCODE project were used to annotate variants. Because exact TF binding sites are often smaller than 15 bp, a more detailed annotation is necessary for functional analysis and experimental design. Using predicted genome-wide TFBS maps from UCSC TFBS conserved (Z score greater than 2.33) (20), JASPAR (27) and ENCODE-motif (28), the potential binding sites of matched TF families inside TF-binding regions were identified and compared with variants. Corresponding TF-binding matrixes from TRANSFAC (29), JASPAR (27) and ENCODE-motif (28) were also included in rVarBase.

More extended information

As in rSNPBase, an extended information analysis was performed on all rVarBase-hosted variants. In addition to the LD-proxies of rSNPs, extended SNPs that located in rCNVs were also added. eQTL information from more data sources, including the RTeQTL database (30), BrainEAC (31), the skin eQTL database (32) and the GTEx Portal (17,33), was added to provide eQTL labels. Variants’ associated diseases/traits were integrated from the database of GWAS catalog (34) and the database of CNVD (35). Detailed information about the reference data that were used in the extended analysis is shown in Supplementary Table S2 and http://rv.psych.ac.cn/datacontent.do.

Web interface

The web interface was refined to make data acquisition more convenient. The input format of queried variants may be as a dbSNP ID (for a known SNP) or as a genome position with zero-based coordinates (for all types of variants). In addition to ‘Variant search’ and ‘Gene search’, a new search module, ‘Element search’, was added to facilitate searches based on TFs/miRNAs/lncRNAs of interest. As shown in Figure 2A, variants in experimentally supported binding regions or predicted TFBSs, variants in mature miRNA or predicted miRNA-binding sites and variants in lncRNAs may be queried by entering the element name and the target gene name. An FTP site (ftp://rv.psych.ac.cn/pub/rv/) was added to facilitate the download of the whole database.
Figure 2.

New search module of rVarBase and an example of data retrieving process.

New search module of rVarBase and an example of data retrieving process.

DATABASE USAGE

The rVarBase was developed to bridge genetic studies with functional researches. This database can provide potential functional interpretation in terms of gene expression regulation for results of genetic studies. rVarBase can also assist researchers in filtering candidate variants by genes of interest or regulatory mechanisms. Furthermore, for queried variants, rVarBase provides detailed regulatory information, which is practical for the design of experiments that explore biological function. Because rVarBase can perform regulatory feature analysis on novel variants, it can be utilized not only with disease-associated SNPs that are generated by traditional genetic association studies, but also with more other types of genetic data. We provide a demonstration dataset as an example to show the database usage with novel variants. This dataset includes nine novel non-coding SNVs that are associated with tumors and were identified by Nils et al. (36) in 2014. Detailed chromosomal locations of the nine SNVs can be seen in Supplementary Table S3 and http://rv.psych.ac.cn/tutorial.do. As shown in Figure 2B, these variants can be quickly entered into the model ‘Variant search’ with their chromosomal locations (hg19 genome coordinates). The regulatory features of and extended information about the queried variants are summarized in the ‘Search Results’. One of the nine variants (located at chr5:1295243–1295244) has been included in NCBI dbSNP database with the ID ‘rs35550267’. All of the nine novel SNVs have regulatory features. They are located in active chromatin regions and inside TF-binding regions and chromatin-interactive regions; two genes are potentially regulated by the regulatory elements in which they are located. These regulatory variants are appropriate candidates for further validation studies and functional researches. Detailed information about each regulatory variant, such as the genomic locations of their overlapping active chromatin regions or regulatory elements, specific tissue types, target genes and related regulatory modes, are shown on the ‘Variant report’ page. Since all variants are overlapped with TF-binding regions, additional information about matched TFBS and TF-binding motif is also provided in this page. These detailed reports, as practical reference data, may directly support experimental design in functional research.

CONCLUSION AND FUTURE PLAN

Here, we upgraded the rSNPBase database, which provides reliable regulatory annotation of human SNPs, to the rVarBase database, which now provides more comprehensive regulatory annotation for multiple types of human variants. The updates include the regulatory annotations of short and structural variants with reference to up-to-date epigenetic advancements. The updated rVarBase supports the functional analysis of known and novel variants and will thus assist users in exploring data from new types of research, such as novel results from next-generation sequencing. Integrative, tissue/cell-based chromatin-state data were introduced to annotate the variants; these data will be helpful to users in gathering more biologically meaningful information. New types of regulatory elements, more detailed annotation, additional extended information and a new search module in the updated database will further aid researchers in future functional analyses of genetic studies and will provide more comprehensive reference data for candidate variant selection and for the experimental design of subsequent genetic and functional research. rVarBase will be continuously updated with newly reported human genetic and epigenetic data. In addition to continuously adding newly reported variants in dbSNP and dbVar, new annotation dimensions and new types of regulatory elements will be considered and followed. For example, the method for lncRNA target site prediction (37) is appeared and developed; we hope to add the corresponding data in the future, when the method is mature and validated. The integration of multi-dimensional regulatory features is also being considered.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  37 in total

Review 1.  The role of regulatory variation in complex traits and disease.

Authors:  Frank W Albert; Leonid Kruglyak
Journal:  Nat Rev Genet       Date:  2015-02-24       Impact factor: 53.242

Review 2.  High-throughput functional genomics using CRISPR-Cas9.

Authors:  Ophir Shalem; Neville E Sanjana; Feng Zhang
Journal:  Nat Rev Genet       Date:  2015-04-09       Impact factor: 53.242

3.  Linking disease associations with regulatory information in the human genome.

Authors:  Marc A Schaub; Alan P Boyle; Anshul Kundaje; Serafim Batzoglou; Michael Snyder
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

4.  HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants.

Authors:  Lucas D Ward; Manolis Kellis
Journal:  Nucleic Acids Res       Date:  2011-11-07       Impact factor: 16.971

5.  LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression.

Authors:  Qinghua Jiang; Jixuan Wang; Xiaoliang Wu; Rui Ma; Tianjiao Zhang; Shuilin Jin; Zhijie Han; Renjie Tan; Jiajie Peng; Guiyou Liu; Yu Li; Yadong Wang
Journal:  Nucleic Acids Res       Date:  2014-11-15       Impact factor: 19.160

6.  An update on LNCipedia: a database for annotated human lncRNA sequences.

Authors:  Pieter-Jan Volders; Kenneth Verheggen; Gerben Menschaert; Klaas Vandepoele; Lennart Martens; Jo Vandesompele; Pieter Mestdagh
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

7.  The microRNA.org resource: targets and expression.

Authors:  Doron Betel; Manda Wilson; Aaron Gabow; Debora S Marks; Chris Sander
Journal:  Nucleic Acids Res       Date:  2007-12-23       Impact factor: 16.971

8.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

9.  rSNPBase: a database for curated regulatory SNPs.

Authors:  Liyuan Guo; Yang Du; Suhua Chang; Kunlin Zhang; Jing Wang
Journal:  Nucleic Acids Res       Date:  2013-11-26       Impact factor: 16.971

10.  RTeQTL: Real-Time Online Engine for Expression Quantitative Trait Loci Analyses.

Authors:  Baoshan Ma; Jinyan Huang; Liming Liang
Journal:  Database (Oxford)       Date:  2014-07-18       Impact factor: 3.451

View more
  26 in total

1.  QBiC-Pred: quantitative predictions of transcription factor binding changes due to sequence variants.

Authors:  Vincentius Martin; Jingkang Zhao; Ariel Afek; Zachery Mielko; Raluca Gordân
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

2.  Integrative functional analysis of super enhancer SNPs for coronary artery disease.

Authors:  Juexiao Gong; Chuan Qiu; Dan Huang; Yiyan Zhang; Shengyong Yu; Chunping Zeng
Journal:  J Hum Genet       Date:  2018-02-28       Impact factor: 3.172

3.  A new locus regulating MICALL2 expression was identified for association with executive inhibition in children with attention deficit hyperactivity disorder.

Authors:  L Yang; S Chang; Q Lu; Y Zhang; Z Wu; X Sun; Q Cao; Y Qian; T Jia; B Xu; Q Duan; Y Li; K Zhang; G Schumann; D Liu; J Wang; Y Wang; L Lu
Journal:  Mol Psychiatry       Date:  2017-04-18       Impact factor: 15.992

4.  Single-nucleotide polymorphism rs13426236 contributes to an increased prostate cancer risk via regulating MLPH splicing variant 4.

Authors:  Fankai Xiao; Peng Zhang; Yuan Wang; Yijun Tian; Michael James; Chiang-Ching Huang; Lidong Wang; Liang Wang
Journal:  Mol Carcinog       Date:  2019-10-29       Impact factor: 4.784

5.  A childhood acute lymphoblastic leukemia genome-wide association study identifies novel sex-specific risk variants.

Authors:  Sandeep K Singh; Philip J Lupo; Michael E Scheurer; Anshul Saxena; Amy E Kennedy; Boubakari Ibrahimou; Manuel Alejandro Barbieri; Ken I Mills; Jacob L McCauley; Mehmet Fatih Okcu; Mehmet Tevfik Dorak
Journal:  Medicine (Baltimore)       Date:  2016-11       Impact factor: 1.889

6.  rSNPBase 3.0: an updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks.

Authors:  Liyuan Guo; Jing Wang
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

7.  CXCL9-11 polymorphisms are associated with liver fibrosis in patients with chronic hepatitis C: a cross-sectional study.

Authors:  María Ángeles Jiménez-Sousa; Ana Zaida Gómez-Moreno; Daniel Pineda-Tenor; Luz Maria Medrano; Juan José Sánchez-Ruano; Amanda Fernández-Rodríguez; Tomas Artaza-Varasa; José Saura-Montalban; Sonia Vázquez-Morón; Pablo Ryan; Salvador Resino
Journal:  Clin Transl Med       Date:  2017-07-28

8.  Common variants near IKZF1 are associated with primary Sjögren's syndrome in Han Chinese.

Authors:  Susu Qu; Yang Du; Suhua Chang; Liyuan Guo; Kechi Fang; Yongzhe Li; Fengchun Zhang; Kunlin Zhang; Jing Wang
Journal:  PLoS One       Date:  2017-05-26       Impact factor: 3.240

9.  Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data.

Authors:  Tianjiao Zhang; Yang Hu; Xiaoliang Wu; Rui Ma; Qinghua Jiang; Yadong Wang
Journal:  Biomed Res Int       Date:  2016-06-27       Impact factor: 3.411

10.  Integrative analysis of super enhancer SNPs for type 2 diabetes.

Authors:  Weiping Sun; Sihong Yao; Jielong Tang; Shuai Liu; Juan Chen; Daqing Deng; Chunping Zeng
Journal:  PLoS One       Date:  2018-01-31       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.