Literature DB >> 29077939

lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs.

Ya-Ru Miao1, Wei Liu1, Qiong Zhang1, An-Yuan Guo1.   

Abstract

Long non-coding RNAs (lncRNAs) are emerging as important regulators in different biological processes through various ways. Because the related data, especially mutations in cancers, increased sharply, we updated the lncRNASNP to version 2 (http://bioinfo.life.hust.edu.cn/lncRNASNP2). lncRNASNP2 provides comprehensive information of SNPs and mutations in lncRNAs, as well as their impacts on lncRNA structure and function. lncRNASNP2 contains 7260238 SNPs on 141353 human lncRNA transcripts and 3921448 SNPs on 117405 mouse lncRNA transcripts. Besides the SNP information in the first version, the following new features were developed to improve the lncRNASNP2. (i) noncoding variants from COSMIC cancer data (859534) in lncRNAs and their effects on lncRNA structure and function; (ii) TCGA cancer mutations (315234) in lncRNAs and their impacts; (iii) lncRNA expression profiling of 20 cancer types in both tumor and its adjacent samples; (iv) expanded lncRNA-associated diseases; (v) optimized the results about lncRNAs structure change induced by variants; (vi) reduced false positives in miRNA and lncRNA interaction results. Furthermore, we developed online tools for users to analyze new variants in lncRNA. We aim to maintain the lncRNASNP as a useful resource for lncRNAs and their variants.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29077939      PMCID: PMC5753387          DOI: 10.1093/nar/gkx1004

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The transcriptional landscape analysis revealed the complexity of human transcriptomes and showed human genome is pervasively transcribed to produce large amount of noncoding transcripts (1). Long non-coding RNAs (lncRNAs) are ncRNAs extensively found in all kinds of eukaryotes, and the organismal complexity is better correlated with the diversity and size of non-coding RNA expression repertoires than with that of protein-coding genes (2). lncRNAs involve in gene expression regulation by various mechanisms, such as regulation of transcription, translation, protein modification and interaction with other molecules (3). Thus they play critical roles in diverse biological processes impacting cell differentiation, senescence and individual development (4). Moreover, lncRNAs were also proved to serve as tumor suppressive or oncogenic factors in different cancers (5). Genome variants including SNPs and mutations contribute to changes of lncRNA structure and function, thus increase the susceptibility to cancers and other diseases (6). SNPs have been reported to affect the structure, expression, and function of lncRNAs (7), such as SNP rs920778 in HOTAIR contributes to the risk of gastric cancer (8) and SNP rs11655237 in LINC00673 confers susceptibility to pancreatic cancer by creating a miR-1231 binding site (9). Two high frequency mutations in lncRNA GAS8-AS1 were associated with papillary thyroid carcinoma (10). A study of whole-genome mutational landscape of liver cancer discovered recurrent mutations in lncRNA NEAT1 and MALAT1 (11). It was reported that mutations in lncRNA NEAT1 were associated with increased expression and unfavorable outcome in papillary renal-cell carcinoma (12). These studies indicate that it is very necessary to study the variants on lncRNAs in cancers to identify biomarkers for carcinogenesis and prognosis. Till now, there are several databases describing the genomic variants on lncRNA genes, including lincSNP, LncVar and our lncRNASNP. lincSNP (13) focused on disease associated SNPs on lncRNAs and their transcription factor binding sites; LncVar (14) identified SNPs and structural variants on lncRNAs as well as their effects on biological function of lncRNAs. While, lncRNASNP (15), the previous version of lncRNASNP2, provides comprehensive information about lncRNA related SNPs in human and mouse and explores their effects on lncRNA structure and potential function on miRNA binding. As the increasing of identified lncRNAs and SNPs in human genome, especially mutations in cancers, we updated the lncRNASNP with the latest data and developed new functions to improve it. In lncRNASNP2, the number of human lncRNA transcripts and SNPs on them increased 3–15-fold in human and mouse. Besides, mutations on lncRNA and lncRNA expression in cancers were newly added. Furthermore, web-based tools were developed for new data analysis. With the abundant data and new features, lncRNASNP2 database will serve as a useful resource for functional studies of lncRNA, especially studies in cancer.

DATA SOURCE AND SUMMARY

We obtained 258 758 lncRNA transcripts (141 353 in human and 117 405 in mouse) of 170 002 lncRNA genes (90 062 in human and 79 940 in mouse) from NONCODE2016 (16). Next, 7 260 238 and 3 921 448 SNPs on human and mouse lncRNA transcripts were identified, respectively. Furthermore, resources associated with cancer mutations as well as other diseases were integrated. Compared with the previous version, the data amount and types in lncRNASNP2 were more comprehensive (Table 1).
Table 1.

Data summary in lncRNASNP2 database

Data contentVersion 1.0Version 2.0
HumanMouseHumanMouse
lncRNA genes/transcripts17 436/32 10825 512/36 47190 062/141 35379 940/117 405
All SNPs495 729777 0957 260 2383 921 149
lncRNA SNP in GWASa142/197 827NA/NA602/2 859 147NA/NA
SNP affected MLPb262 154/280 012366 731/357 2464 524 236/4 559 4032 644 936/1 313 063
All Predicted MLPb6 413 2737 448 2008 842 1038 100 887
Conserved/validated MLPc69 837/809113 780/NA42 787/18 59513 972/NA
TCGA cancer mutationsNANA315 234NA
TCGA mutations affected MLPbNANA83 633/80 114NA
CosmicNCVsNANA859 534NA
CosmicNCVs affected MLPbNANA362 940/35 0827NA
lncRNA expressionsNANA11 857NA
lncRNA-associated diseasesdNANA12 2871/697NA

alncRNA SNPs are GWAS TagSNPs/lncRNA SNPs in GWAS LD regions

bMLP represents miRNA-lncRNA target pair, variants (SNPs, TCGA mutations, CosmicNCVs) in lncRNAs induce the potential MLP loss/gain.

cThe miRNA target sites conserved among human, mouse, rat and dog/miRNA-lncRNA target pairs supported by CLIP experiment results from starbase.

dThe number of lncRNA transcripts associated with diseases (predicted) /the number of experimentally supported lncRNA-associated disease pairs.

alncRNA SNPs are GWAS TagSNPs/lncRNA SNPs in GWAS LD regions bMLP represents miRNA-lncRNA target pair, variants (SNPs, TCGA mutations, CosmicNCVs) in lncRNAs induce the potential MLP loss/gain. cThe miRNA target sites conserved among human, mouse, rat and dog/miRNA-lncRNA target pairs supported by CLIP experiment results from starbase. dThe number of lncRNA transcripts associated with diseases (predicted) /the number of experimentally supported lncRNA-associated disease pairs.

IMPROVED CONTENT AND NEW FEATURES

CosmicNCVs in lncRNA transcripts

To provide mutation information on lncRNA in cancers, we collected the CosmicNCVs (Cosmic NonCoding Variants) from COSMIC (Catalogue Of Somatic Mutations In Cancer) database (17). We identified 859 534 CosmicNCVs in human lncRNA transcripts. Furthermore, we integrated the mutation impact scores calculated by FATHMM from COSMIC to evaluate their functional effects. According to the suggestion of COSMIC, score >0.7 is considered as ‘Pathogenic’ and score <0.5 is considered to be ‘Neutral’. For example, COSN4621983 (score 0.9955) in genes was reported to provide a potential treatment target for mantle cell lymphoma (18). Totally, we identified 71 410 (account for 8%) pathogenic variants on lncRNA transcripts.

TCGA Cancer Mutations in lncRNA transcripts

Except for CosmicNCVs, we also collected cancer mutations from the TCGA (19) project. Genomic coordinates of those mutations were converted from genome assembly GRCh37 to GRCh38 using CrossMap (20). Finally, we identified 315 234 mutations in lncRNA transcripts among 34 cancer types. The top three cancer types with the highest mutation numbers in lncRNAs were SKCM (Skin Cutaneous Melanoma), COAD (Colon adenocarcinoma) and STAD (Stomach adenocarcinoma). Inspired by COSMIC, FATHMM was used to assess mutation impacts on lncRNA transcripts, as the TCGA cancer mutations we collected were filtered by MutSig (http://archive.broadinstitute.org/cancer/cga/mutsig), 78.35% of them tend to be ‘Pathogenic’.

lncRNA expression in TCGA cancers

In recent years, the dysregulation of lncRNAs was found related to tumor progression and survival. For example, lncRNA GAS5 was downregulated in cervical cancer tissues and significantly associated with advanced tumor progression (21). The upregulation of lncRNA DANCR was associated with aggressive progression and poor prognosis in colorectal cancer (22). Here, we collected expression data of 11 857 lncRNA genes in 20 cancer types from TANRIC database (23). Expressions of lncRNA genes (RPKM) ranged from 0 to 360.19 in tumor samples and from 0 to 318.71 in tumor adjacent samples among 20 cancers, but 93% lncRNA genes were lowly expressed with RPKM <10.

miRNA-lncRNA interaction prediction

lncRNA may interact with miRNA as a miRNA sponge to regulate gene expression (24), thus, the identification of miRNA target sites on lncRNA will provide a clue on lncRNA functional research. We collected mature miRNA sequences from miRBase (release 21) (25). To reduce false positives, we intersected the results of MiRanda, TargetScan and Pita as the final miRNA targets in lncRNASNP2. Finally, 8 842 103 and 8 100 887 pairs of miRNA and lncRNA interactions in human and mouse were predicted, respectively. Meanwhile, we identified 548,195 target sites conserved among human, mouse, rat and dog. Furthermore, variants on lncRNA could induce miRNA target sites gain/loss. Among the variants we identified, 80.94% (5 876 208) SNPs, 33.53% (105,683) TCGA cancer mutations and 54.12% (465 178) CosmicNCVs in human were attributed to induce the potential miRNA target sites gain/loss, respectively, and for mouse, the number of SNPs was 79.76% (3 127 650). In addition, experimentally validated miRNA-lncRNA interactions collected from StarbaseV2.0 (26) were increased from 8091 to 18 595.

Prediction of lncRNA-associated diseases

Given the large number of lncRNAs, the relations with human diseases remain unknown for most of them. Here, we predicted the lncRNA-associated diseases using software TAM (27), which was designed for identifying meaningful categories for given miRNAs. In our work, we chose the disease-associated miRNA set from HMDD database (28) as the disease miRNA set. For each lncRNA, we integrated the high probability targeted miRNAs predicted by three tools described above as the miRNA set of the lncRNA, then possible associated diseases of each lncRNA were predicted by enrichment analysis with the disease miRNA set and the targeted miRNAs. The result was measured by P-value, which was calculated by hypergeometric test and adjusted by Bonferroni correction. Meanwhile, we collected 697 pairs of experimentally supported lncRNA-disease entries from LncRNADisease (29).

SNPs in GWAS-trait associated regions

We collected 34 398 GWAS tagSNPs from NHGRI GWAS Catalog (30) and identified 602 GWAS tagSNPs in human lncRNA transcripts. In addition, for each GWAS SNP, we obtained GWAS-trait associated LD regions using SNAP (SNP Annotation and Proxy Search) (31) and identified 2 859 147 (account for 39.4%) lncRNA SNPs in those regions.

lncRNA structure changes induced by variants

In lncRNASNP2, we used RNAsnp (32) to assess variant effects on lncRNA secondary structure. Compared with MFE (Minimum Free Energy) in previous version, P-value of RNAsnp calculated from Boltzmann ensemble are more stable and reliable (32). There were three modes in RNAsnp, in our work, the mode 1 was used for lncRNA <1000 nt, while the mode 2 for lncRNA >1000 nt. Empirical P-value in the result <0.2 means the variant has effect on lncRNA structure. Totally, 1 425 449 (19.63%) and 395 443 (10.08%) SNPs in human and mouse lncRNA transcripts were identified to impact on lncRNA structure, respectively. Except for SNPs, we also predicted lncRNA structure changes caused by cancer mutations, and results showed that 16.67% TCGA cancer mutations and 17.3% CosmicNCVs may lead to the change of lncRNA secondary structure.

Web-based analysis tools

With the prevalence of sequencing technologies, new SNPs and lncRNAs are being or will be characterized. To make it convenient for users to explore those new data, we developed two web-based tools: one for predicting miRNA target sites on lncRNA or miRNA target sites gain/loss caused by variants; the other for users to analyze effect of SNP on lncRNA secondary structure.

DATABASE ORGANIZATION AND WEB INTERFACE

The lncRNASNP2 database was built with the Flask open source framework (http://flask.pocoo.org/), all data mentioned above were organized into MongoDB. The lncRNASNP2 database is freely available at http://bioinfo.life.hust.edu.cn/lncRNASNP2. lncRNASNP2 was composed of five sections: lncRNA, SNP, Mutation, miRNA and Tool sections. A quick search box was designed on the head of home page to search by keywords including SNP ID, lncRNA ID, miRNA ID, abbreviation of TCGA cancer types, CosmicNCV ID and genomic regions. Fuzzy query was allowed for users to get results by searching part of a keyword. The ‘Tool’ menu in the navigation bar includes search tools and prediction tools. Search tools were designed for users to query data by specific IDs and prediction tools can help users to analyze their own data. In lncRNA section, there were two ways to browse lncRNAs in our database: browse by disease or by chromosome. In the lncRNA detail page of each lncRNA, there are several tags, including lncRNA detail, variants in lncRNA, lncRNA diseases, miRNA target sites on lncRNA, miRNA target gain, miRNA target loss and expression. Under the ‘Variants in lncRNA’ tag, we integrated SNPs, TCGA cancer mutations and CosmicNCVs in lncRNA (Figure 1A). Under the ‘miRNA target sites on lncRNA’ tag, we presented a miRNA targeting lncRNA graph to display miRNA targets on the lncRNA, by clicking the miRNA name, users can browse detail about this interaction, including binding regions predicted by Pita, TargetScan and MiRanda as well as binding score, energy and nucleotide pairs.
Figure 1.

Overview of lncRNASNP2 database. (A) Variants (SNPs, TCGA cancer mutations and CosmicNCVs) in lncRNA and their impacts on lncRNA structure or function. (B) An example of miRNA target gain induced by a variant. (C) An example of lncRNA expressions in cancers displayed in histogram.

Overview of lncRNASNP2 database. (A) Variants (SNPs, TCGA cancer mutations and CosmicNCVs) in lncRNA and their impacts on lncRNA structure or function. (B) An example of miRNA target gain induced by a variant. (C) An example of lncRNA expressions in cancers displayed in histogram. The newly added TCGA mutation section was displayed by cancer types. Users can browse mutations on lncRNAs and their effects of the selected cancer. Taking the BLCA (Bladder urothelial carcinoma) as an example, on the page of BLCA mutations in lncRNA, the first line is ‘chr1:2189723 C>T on lncRNA NONHSAT000397.2’. Results under ‘Mutation in lncRNAs’ tag in the mutation detail page showing the functional impact score (predicted by FATHMM) of this mutation is 0.93374, which means this is a pathogenic mutation. The content under ‘miRNA target gain’ tag indicates that the mutation creates a target site of miR-630 and the ‘show’ button will present the detailed interaction pair (Figure 1B). It was reported that the expression of miR-630 was increased in BLCA and it may be a novel prognostic factor for bladder urothelial carcinoma (33). lncRNA expressions were displayed in two ways, which are boxplot showing expressions in single cancer type in TCGA mutation section and histogram showing expressions in 20 cancer types in lncRNA section (Figure 1C). In CosmicNCV section, users can browse the basic information, the impacts on lncRNA structure and function of each CosmicNCV in lncRNAs.

SUMMARY AND FUTURE PERSPECTIVES

As the increasing of identified lncRNAs and SNPs/mutations in human genomes, we updated lncRNASNP with the latest data and new features to improve it. Disease resources including cancers and other diseases will provide a comprehensive reference for lncRNA function. The web-based tools make it convenient for users to analyze new data. With the innovations in RNA-seq technologies and computational biology, lncRNAs are being identified and characterized at a rapid pace, and we believe that more function of lncRNAs will be revealed in the future. To keep pace of the lncRNA researches, we will update the lncRNASNP database regularly. We commit to make our lncRNASNP a helpful repository for functional study of lncRNAs and variants on them.
  33 in total

Review 1.  Unique features of long non-coding RNA biogenesis and function.

Authors:  Jeffrey J Quinn; Howard Y Chang
Journal:  Nat Rev Genet       Date:  2016-01       Impact factor: 53.242

2.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

3.  LncVar: a database of genetic variation associated with long non-coding genes.

Authors:  Xiaowei Chen; Yajing Hao; Ya Cui; Zhen Fan; Shunmin He; Jianjun Luo; Runsheng Chen
Journal:  Bioinformatics       Date:  2016-09-06       Impact factor: 6.937

4.  Decreased expression of lncRNA GAS5 predicts a poor prognosis in cervical cancer.

Authors:  Shihong Cao; Weiliang Liu; Feng Li; Weipin Zhao; Chuan Qin
Journal:  Int J Clin Exp Pathol       Date:  2014-09-15

5.  TANRIC: An Interactive Open Platform to Explore the Function of lncRNAs in Cancer.

Authors:  Jun Li; Leng Han; Paul Roebuck; Lixia Diao; Lingxiang Liu; Yuan Yuan; John N Weinstein; Han Liang
Journal:  Cancer Res       Date:  2015-07-24       Impact factor: 12.701

6.  LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations.

Authors:  Junyi Wang; Ruixia Ma; Wei Ma; Ji Chen; Jichun Yang; Yaguang Xi; Qinghua Cui
Journal:  Nucleic Acids Res       Date:  2016-02-16       Impact factor: 16.971

Review 7.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.

Authors:  Katarzyna Tomczak; Patrycja Czerwińska; Maciej Wiznerowicz
Journal:  Contemp Oncol (Pozn)       Date:  2015

8.  Whole-genome analysis of papillary kidney cancer finds significant noncoding alterations.

Authors:  Shantao Li; Brian M Shuch; Mark B Gerstein
Journal:  PLoS Genet       Date:  2017-03-30       Impact factor: 5.917

9.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors:  Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

10.  HMDD v2.0: a database for experimentally supported human microRNA and disease associations.

Authors:  Yang Li; Chengxiang Qiu; Jian Tu; Bin Geng; Jichun Yang; Tianzi Jiang; Qinghua Cui
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

View more
  77 in total

1.  D-lnc: a comprehensive database and analytical platform to dissect the modification of drugs on lncRNA expression.

Authors:  Wei Jiang; Yinwei Qu; Qian Yang; Xueyan Ma; Qianqian Meng; Juan Xu; Xinyi Liu; Shuyuan Wang
Journal:  RNA Biol       Date:  2019-08-07       Impact factor: 4.652

2.  Using Network Distance Analysis to Predict lncRNA-miRNA Interactions.

Authors:  Li Zhang; Pengyu Yang; Huawei Feng; Qi Zhao; Hongsheng Liu
Journal:  Interdiscip Sci       Date:  2021-07-07       Impact factor: 2.233

3.  LncRNAs and Available Databases.

Authors:  Sara Napoli
Journal:  Methods Mol Biol       Date:  2021

4.  ncRNA-eQTL: a database to systematically evaluate the effects of SNPs on non-coding RNA expression across cancer types.

Authors:  Jiang Li; Yawen Xue; Muhammad Talal Amin; Yanbo Yang; Jiajun Yang; Wen Zhang; Wenqian Yang; Xiaohui Niu; Hong-Yu Zhang; Jing Gong
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

5.  ncRPheno: a comprehensive database platform for identification and validation of disease related noncoding RNAs.

Authors:  Wenliang Zhang; Guocai Yao; Jianbo Wang; Minglei Yang; Jing Wang; Haiyue Zhang; Weizhong Li
Journal:  RNA Biol       Date:  2020-03-26       Impact factor: 4.652

6.  Classification of Breast Cancer and Breast Neoplasm Scenarios Based on Machine Learning and Sequence Features from lncRNAs-miRNAs-Diseases Associations.

Authors:  Juan Gutiérrez-Cárdenas; Zenghui Wang
Journal:  Interdiscip Sci       Date:  2021-06-21       Impact factor: 2.233

7.  LncSEA: a platform for long non-coding RNA related sets and enrichment analysis.

Authors:  Jiaxin Chen; Jian Zhang; Yu Gao; Yanyu Li; Chenchen Feng; Chao Song; Ziyu Ning; Xinyuan Zhou; Jianmei Zhao; Minghong Feng; Yuexin Zhang; Ling Wei; Qi Pan; Yong Jiang; Fengcui Qian; Junwei Han; Yongsan Yang; Qiuyu Wang; Chunquan Li
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

8.  Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities.

Authors:  Zhen-Hao Guo; Zhu-Hong You; Yan-Bin Wang; De-Shuang Huang; Hai-Cheng Yi; Zhan-Heng Chen
Journal:  Gigascience       Date:  2020-06-01       Impact factor: 6.524

9.  DF-MDA: An effective diffusion-based computational model for predicting miRNA-disease association.

Authors:  Hao-Yuan Li; Zhu-Hong You; Lei Wang; Xin Yan; Zheng-Wei Li
Journal:  Mol Ther       Date:  2021-01-09       Impact factor: 11.454

10.  Genetic Polymorphisms of Long Non-coding RNA Linc00312 Are Associated With Susceptibility and Predict Poor Survival of Nasopharyngeal Carcinoma.

Authors:  Zhen Guo; Mei-Hua Bao; Yun-Xia Fan; Yan Zhang; Hai-Yan Liu; Xiao-Long Zhou; Ben Wu; Qing-Qing Lu; Bin-Sheng He; Xu-Ying Nan; Jiao-Yang Lu
Journal:  Front Cell Dev Biol       Date:  2021-07-16
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.