Literature DB >> 27924020

LincSNP 2.0: an updated database for linking disease-associated SNPs to human long non-coding RNAs and their TFBSs.

Shangwei Ning1, Ming Yue1, Peng Wang1, Yue Liu1, Hui Zhi1, Yan Zhang1, Jizhou Zhang1, Yue Gao1, Maoni Guo1, Dianshuang Zhou1, Xin Li1, Xia Li2.   

Abstract

We describe LincSNP 2.0 (http://bioinfo.hrbmu.edu.cn/LincSNP), an updated database that is used specifically to store and annotate disease-associated single nucleotide polymorphisms (SNPs) in human long non-coding RNAs (lncRNAs) and their transcription factor binding sites (TFBSs). In LincSNP 2.0, we have updated the database with more data and several new features, including (i) expanding disease-associated SNPs in human lncRNAs; (ii) identifying disease-associated SNPs in lncRNA TFBSs; (iii) updating LD-SNPs from the 1000 Genomes Project; and (iv) collecting more experimentally supported SNP-lncRNA-disease associations. Furthermore, we developed three flexible online tools to retrieve and analyze the data. Linc-Mart is a convenient way for users to customize their own data. Linc-Browse is a tool for all data visualization. Linc-Score predicts the associations between lncRNA and disease. In addition, we provided users a newly designed, user-friendly interface to search and download all the data in LincSNP 2.0 and we also provided an interface to submit novel data into the database. LincSNP 2.0 is a continually updated database and will serve as an important resource for investigating the functions and mechanisms of lncRNAs in human diseases.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27924020      PMCID: PMC5210641          DOI: 10.1093/nar/gkw945

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

An abundant class of non-coding RNAs known as long non-coding RNAs (lncRNAs), defined by having a length exceeding 200 nucleotides, have gained widespread attention in recent years (1). LncRNAs are widely encoded by the human genome and perform important functions in a spectrum of biological processes such as genome regulation, cell differentiation and development (2–4). Accumulating evidence indicates that lncRNAs are closely associated with many human diseases (5,6). In the emerging field of lncRNA research, many researchers have continued to focus on the influence of genetic variants on lncRNA function. A number of single nucleotide polymorphisms (SNPs), the most common type of genetic variant, have been identified in human lncRNA regions and have been shown to be associated with various diseases including cancers (7,8). In order to facilitate the study of lncRNA-related genetic variants, we reported the first version of the LincSNP database (LincSNP 1.0) that allows users to search all known disease-associated SNPs in human lncRNAs, together with their comprehensive functional annotations (9). Although LincSNP 1.0 has provided some useful information for researchers, this database could provide more resources and be more user friendly. For example, LincSNP 1.0 only focused on large intergenic non-coding RNA (lincRNA), a subclass of lncRNAs and did not identify disease-associated SNPs in lncRNA regulatory elements such as transcription factor binding sites (TFBSs). Previous studies have demonstrated that the SNPs in lncRNA TFBSs could affect lncRNA expression, thereby potentially affecting disease susceptibility (10). With the increasing amount of lncRNA and SNP data, there is a great need to update LincSNP 1.0 with more resources and improved tools. To date, many databases have been built to curate lncRNA-related information, such as NONCODE (11), DIANA-LncBase (12), LNCipedia (13), lncRNAdb (14), LncRNAWiki (15), ChIPBase (16), starBase (17), LncRNADisease (18) and Lnc2Cancer (19). These databases have provided valuable resources for lncRNA-related studies. However, there are very few databases that pay special attention to the relationship between SNPs and human lncRNAs. To our knowledge, only the lncRNASNP database stores lncRNA-related SNPs (20). However, this database focuses mainly on exploring the impact of SNPs on lncRNA structure and function, and only a small number of disease-associated SNPs have been identified in human lncRNAs. Until now, no specialized resource has been devoted to collecting, storing and distributing disease-associated SNPs in human lncRNAs. To meet these needs, we have updated LincSNP (9) to version 2.0 (LincSNP 2.0) (Figure 1 and Table 1). In LincSNP 2.0, the numbers of disease-associated SNPs and human lncRNAs have been increased to 809 451 and 244 545, respectively, and the number of types of lncRNA has been increased to 9. For the first time, disease-associated SNPs in lncRNA TFBSs were identified and included in LincSNP 2.0. Furthermore, the number of experimentally supported SNP-lncRNA-disease associations has grown from 3 to 58. In addition to the expansion of the core data sets, both the data search and download functions were improved. In particular, three web-based tools have been developed to facilitate data analysis, extraction and visualization. We hope that researchers will benefit from the greater resources in the updated version of LincSNP 2.0.
Figure 1.

Architecture of LincSNP 2.0.

Table 1.

Content and entries of LincSNP 2.0

Database contentLincSNP 1.0LincSNP 2.0Fold increase
SNP128 407 SNPs809 451 SNPs6.3
dbGaPYesYes
GADYesYes
GWAS CentralYesYes
Johnson and O'DonnellYesYes
NHGRI GWAS CatalogYesYes
PharmGKbYesYes
GWASdbNoYes
GRASPNoYes
lncRNA5804 lincRNAs244 545 lncRNAs42.1
EnsemblYesYes
NONCODENoYes
LNCipediaNoYes
LncRBaseNoYes
GENCODENoYes
LD-SNP1.5 million11.6 million7.7
OriginHapMap1000 Genomes Project
TFBSNo162 human TFs
OriginNoChIP-Seq data
Validated data3 entries58 entries19.3
OriginPubMedPubMed
Architecture of LincSNP 2.0.

IMPROVED CONTENT AND NEW FEATURES

Expanded entries on disease-associated SNPs in human lncRNAs

Recent advances in high-throughput sequencing technology such as RNA-Seq have produced large numbers of lncRNAs (21). There has also been a rapid increase of GWAS data in public databases (22). This information provides us with a great opportunity to identify more disease-associated SNPs in human lncRNAs (Table 1). In LincSNP 2.0, the lncRNA sources have expanded from 1 to 5 databases, including Ensembl (Version 75), LncRBase (Version 1.0), NONCODE (Version 4), LNCipedia (Version 3.1) and GENCODE (Version 19). To provide a universal lncRNA annotation for users, lncRNA transcripts downloaded from different sources were considered to be the same transcript if they had the same positions. Then, each lncRNA transcript was named using serial numbers after the ‘LSLNC’ symbol. In total, we obtained 244 545 human lncRNAs and their annotations, and the number of types of lncRNA increased to 9 (including lincRNA, 3′ overlapping ncRNA, antisense, processed transcript, exonic, retained intron, sense no exonic, sense intronic and sense overlapping). The set of human GWAS databases storing disease (traits)-associated SNPs has been expanded from 6 to 8 sources, including dbGaP (23), GAD (24), GWAS Central (25), Johnson and O'Donnell (26), the NHGRI GWAS Catalog (27), PharmGKb (28), GWASdb (Version 2) (22) and GRASP (Version 2) (29). As the integrated strategy in LincSNP 1.0, disease-associated SNPs were selected from original publications with moderate thresholds (P-values < 1.0 × 10−3) and only the most significant SNP was selected in cases where the same SNP could be obtained from different publications (9). In total, 809 451 unique disease-associated SNPs were collected. We also extracted SNPs that had linkage disequilibrium (LD-SNP, r2 ≥ 0.8) relationships with disease-associated SNPs from the 1000 Genomes Project (Phase I version 3). After LD analysis by VCFtools (30), ∼11.6 million LD-SNPs were collected in LincSNP 2.0. Finally, we identified 371 647 disease-associated SNPs located in 145 642 human lncRNAs and we identified 1 266 485 LD-SNPs in 168 915 human lncRNAs.

Newly added data on disease-associated SNPs in lncRNA TFBSs

We recently developed a database named SNP@lincTFBS to identify SNPs in potential TFBSs of human lincRNAs (31). The updated LincSNP 2.0 has integrated SNP@lincTFBS as an important resource for the functional annotation of SNPs in lncRNA TFBSs. Briefly, we downloaded ChIP-Seq data sets for human transcription factors and identified the peaks located in the promoter regions of human lncRNAs (5 kb upstream to 1 kb downstream region of the transcription start site for each lncRNA) (32). In total, we identified 5 284 709 TFBSs in the defined promoter regions of 211 928 human lncRNAs. We identified 43 672 disease-associated SNPs in 593 492 TFBSs of 86 495 lncRNAs and we identified 1 250 571 LD-SNPs in 168 915 TFBSs of 123 566 lncRNAs.

Updated entries on experimentally supported SNP-lncRNA-disease associations

To provide a reliable source for the associations between lncRNA-related SNPs and disease, we developed a new page, Linc-Confirm, to store all experimentally supported SNP-lncRNA-disease associations. All experimentally supported SNP-lncRNA-disease associations were manually collected through several steps, as previously described (33–35). First, we downloaded all published literature through searching the PubMed database (36) with a list of keywords (before July 2016), such as ‘lncRNA SNP disease,’ ‘long non-coding RNA SNP disease,’ ‘lncRNA SNP cancer,’ ‘lncRNA SNP tumor’ and ‘long noncoding RNA polymorphism disease.’ Second, experimentally supported SNP-lncRNA-disease associations were manually curated from published papers by at least two researchers. We retrieved the lncRNA, SNP and disease name, experimental samples and methods, PubMed ID, paper title and a brief description from the original studies. Third, all selected studies were rechecked for the lncRNA, SNP and disease names and some names were replaced with official or recommended names. In LincSNP 2.0, the number of experimentally supported SNP-lncRNA-disease associations has increased significantly, from 3 to 58 entries.

Linc-Mart tool for data discovery and access

Because of the large increase in the number of data entries, a new data access tool called Linc-Mart was developed to implement a customized data access pipeline for users. There are three options on the Linc-Mart page: selected project (Disease SNP – LncRNA or Disease SNP – LncRNA TFBS), chromosome and lncRNA annotation information. Users can upload an e-mail address, and Linc-Mart will process the file using a series of tunable criterion and filter steps based on the above options.

Linc-Browse tool for customized data views

Compared with LincSNP 1.0, we improved the LincSNP 2.0 architecture by adding the Linc-Browse tool to display important annotation tracks. Linc-Browse is a web-based genome browser that dynamically displays different tracks based on the queried lncRNAs. Linc-Browse provides five tracks (Reference sequence, lncRNA, SNP, LD-SNP and TFBS) that show the elements around the lncRNAs.

Linc-Score page for predicting disease-associated lncRNAs

In LincSNP 2.0, we developed a page called Linc-Score, which was used to predict potential lncRNA-disease associations based on genetic mutations. For each lncRNA and each specific disease, we counted the number of disease-associated SNPs located in this lncRNA region and TFBSs, including both disease-associated SNPs and their LD SNPs. We then calculated the top three potential lncRNA-disease associations for each lncRNA based on the number of disease-associated SNPs. We hope that this direct calculation can capture the lncRNAs most likely to be involved in specific diseases, providing disease lncRNA candidates for researchers.

DATABASE CONSTRUCTION AND IMPROVED USER INTERFACE

All data in LincSNP 2.0 were stored and managed using MySQL (version 5.5.58). The web interfaces were upgraded by applying Linux, Apache, MySQL and PHP (pre hypertext processor) (LAMP) technologies. The LincSNP 2.0 database is freely available at http://bioinfo.hrbmu.edu.cn/LincSNP or http://210.46.80.146/lincsnp. In addition, for the convenience of users who have used LincSNP 1.0, the old version is still in service. Researchers can enter it by clicking on the gateway in the LincSNP 2.0 homepage or go directly to http://210.46.85.180:8080/LincSNP. LincSNP 2.0 provides a user-friendly web interface that enables users to search, browse and download data in a few easy steps. From the ‘Search’ page, users can search by Disease or Trait name, LncRNA transcript name and alias, SNP rs number or Chromosomal region. Three flexible online tools, Linc-Mart, Linc-Browse and Linc-Score, were established to retrieve and analyze the data in LincSNP 2.0. LincSNP 2.0 is totally open source, and users can obtain all data from the ‘Download’ page. LincSNP 2.0 also offers a submission page that enables researchers to submit novel experimentally supported SNP-lncRNA-disease associations, and a detailed tutorial showing how to use LincSNP 2.0 is available on the ‘Help’ page.

CONCLUSIONS AND FUTURE DEVELOPMENT

When we developed the first version of the LincSNP database (LincSNP 1.0), only a limited number of disease-associated SNPs had been identified in human lncRNAs. With the very fast growth of identified lncRNAs and disease-associated SNPs, there is a great need to update the LincSNP database. In LincSNP 2.0, more disease-associated SNPs in human lncRNAs were identified and annotated. To improve the functions of data processing and database access, three web-based tools, Linc-Mart, Linc-Browse and Linc-Score, were developed. Moreover, we used ChIP-Seq data sets to identify disease-associated SNPs in the TFBSs of lncRNAs. We expect that the number of disease-associated SNPs mapped to lncRNAs and their TFBSs will continue to increase rapidly in the future releases of the LincSNP database. We will continually maintain and update the LincSNP database and integrate more data sets into the LincSNP database, such as cancer genomics data and clinical information, which will improve our understanding of the function of lncRNAs in human diseases.
  36 in total

1.  The NCBI dbGaP database of genotypes and phenotypes.

Authors:  Matthew D Mailman; Michael Feolo; Yumi Jin; Masato Kimura; Kimberly Tryka; Rinat Bagoutdinov; Luning Hao; Anne Kiang; Justin Paschall; Lon Phan; Natalia Popova; Stephanie Pretel; Lora Ziyabari; Moira Lee; Yu Shao; Zhen Y Wang; Karl Sirotkin; Minghong Ward; Michael Kholodov; Kerry Zbicz; Jeffrey Beck; Michael Kimelman; Sergey Shevelev; Don Preuss; Eugene Yaschenko; Alan Graeff; James Ostell; Stephen T Sherry
Journal:  Nat Genet       Date:  2007-10       Impact factor: 38.330

2.  PharmGKB: a logical home for knowledge relating genotype to drug response phenotype.

Authors:  Russ B Altman
Journal:  Nat Genet       Date:  2007-04       Impact factor: 38.330

3.  Human polymorphisms at long non-coding RNAs (lncRNAs) and association with prostate cancer risk.

Authors:  Guangfu Jin; Jielin Sun; Sarah D Isaacs; Kathleen E Wiley; Seong-Tae Kim; Lisa W Chu; Zheng Zhang; Hui Zhao; Siqun Lilly Zheng; William B Isaacs; Jianfeng Xu
Journal:  Carcinogenesis       Date:  2011-08-19       Impact factor: 4.944

Review 4.  RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts.

Authors:  Sarah Geisler; Jeff Coller
Journal:  Nat Rev Mol Cell Biol       Date:  2013-10-09       Impact factor: 94.444

5.  The polymorphism rs944289 predisposes to papillary thyroid carcinoma through a large intergenic noncoding RNA gene of tumor suppressor type.

Authors:  Jaroslaw Jendrzejewski; Huiling He; Hanna S Radomska; Wei Li; Jerneja Tomsic; Sandya Liyanarachchi; Ramana V Davuluri; Rebecca Nagy; Albert de la Chapelle
Journal:  Proc Natl Acad Sci U S A       Date:  2012-05-14       Impact factor: 11.205

6.  LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.

Authors:  Lina Ma; Ang Li; Dong Zou; Xingjian Xu; Lin Xia; Jun Yu; Vladimir B Bajic; Zhang Zhang
Journal:  Nucleic Acids Res       Date:  2014-11-15       Impact factor: 16.971

7.  An update on LNCipedia: a database for annotated human lncRNA sequences.

Authors:  Pieter-Jan Volders; Kenneth Verheggen; Gerben Menschaert; Klaas Vandepoele; Lennart Martens; Jo Vandesompele; Pieter Mestdagh
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

8.  DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs.

Authors:  Maria D Paraskevopoulou; Georgios Georgakilas; Nikos Kostoulas; Martin Reczko; Manolis Maragkakis; Theodore M Dalamagas; Artemis G Hatzigeorgiou
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

9.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

10.  HMDD v2.0: a database for experimentally supported human microRNA and disease associations.

Authors:  Yang Li; Chengxiang Qiu; Jian Tu; Bin Geng; Jichun Yang; Tianzi Jiang; Qinghua Cui
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

View more
  31 in total

1.  LncRNAs and Available Databases.

Authors:  Sara Napoli
Journal:  Methods Mol Biol       Date:  2021

2.  Landscape of the Noncoding Transcriptome Response of Two Arabidopsis Ecotypes to Phosphate Starvation.

Authors:  Thomas Blein; Coline Balzergue; Thomas Roulé; Marc Gabriel; Laetitia Scalisi; Tracy François; Céline Sorin; Aurélie Christ; Christian Godon; Etienne Delannoy; Marie-Laure Martin-Magniette; Laurent Nussaume; Caroline Hartmann; Daniel Gautheret; Thierry Desnos; Martin Crespi
Journal:  Plant Physiol       Date:  2020-05-13       Impact factor: 8.340

Review 3.  Gain-of-Function Mutations: An Emerging Advantage for Cancer Biology.

Authors:  Yongsheng Li; Yunpeng Zhang; Xia Li; Song Yi; Juan Xu
Journal:  Trends Biochem Sci       Date:  2019-04-29       Impact factor: 13.807

4.  ncRNA-eQTL: a database to systematically evaluate the effects of SNPs on non-coding RNA expression across cancer types.

Authors:  Jiang Li; Yawen Xue; Muhammad Talal Amin; Yanbo Yang; Jiajun Yang; Wen Zhang; Wenqian Yang; Xiaohui Niu; Hong-Yu Zhang; Jing Gong
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

5.  ncRPheno: a comprehensive database platform for identification and validation of disease related noncoding RNAs.

Authors:  Wenliang Zhang; Guocai Yao; Jianbo Wang; Minglei Yang; Jing Wang; Haiyue Zhang; Weizhong Li
Journal:  RNA Biol       Date:  2020-03-26       Impact factor: 4.652

6.  Hierarchical graph attention network for miRNA-disease association prediction.

Authors:  Zhengwei Li; Tangbo Zhong; Deshuang Huang; Zhu-Hong You; Ru Nie
Journal:  Mol Ther       Date:  2022-02-02       Impact factor: 12.910

7.  MNDR v2.0: an updated resource of ncRNA-disease associations in mammals.

Authors:  Tianyu Cui; Lin Zhang; Yan Huang; Ying Yi; Puwen Tan; Yue Zhao; Yongfei Hu; Liyan Xu; Enmin Li; Dong Wang
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

8.  Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.

Authors:  Hui Zhi; Xin Li; Peng Wang; Yue Gao; Baoqing Gao; Dianshuang Zhou; Yan Zhang; Maoni Guo; Ming Yue; Weitao Shen; Shangwei Ning; Lianhong Jin; Xia Li
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

9.  LncRBase V.2: an updated resource for multispecies lncRNAs and ClinicLSNP hosting genetic variants in lncRNAs for cancer patients.

Authors:  Troyee Das; Aritra Deb; Sibun Parida; Sudip Mondal; Sunirmal Khatua; Zhumur Ghosh
Journal:  RNA Biol       Date:  2020-10-28       Impact factor: 4.652

Review 10.  Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer.

Authors:  Anshika Chowdhary; Venkata Satagopam; Reinhard Schneider
Journal:  Front Genet       Date:  2021-07-01       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.