Literature DB >> 28985416

EVLncRNAs: a manually curated database for long non-coding RNAs validated by low-throughput experiments.

Bailing Zhou1,2, Huiying Zhao3, Jiafeng Yu1,2, Chengang Guo1,2, Xianghua Dou1,2, Feng Song1,2, Guodong Hu1,2, Zanxia Cao1,2, Yuanxu Qu4, Yuedong Yang5, Yaoqi Zhou1,5, Jihua Wang1,2.   

Abstract

Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 28985416      PMCID: PMC5753334          DOI: 10.1093/nar/gkx677

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The ENCODE project found that three-quarters of the human genome is capable of being transcribed (1) but <3% of which code for proteins (2). The vast majority of the genome transcribe into non-coding RNAs (lncRNAs), many of which are shown to have important biological functions (3). Small non-coding RNAs (<200 nt), such as microRNAs, snoRNAs, siRNAs and piRNAs, have been extensively studied (4,5). The discoveries of two lncRNAs, H19 and Xist (6,7), inspired the search for additional lncRNAs and determination of their biological functions (8). LncRNAs have been shown to be involved in regulating gene expression and other cellular processes and implicated in disease etiology (8–10) including cancers (9,11–13). As more and more lncRNAs are being discovered, a number of databases (14–16) have been established for different purposes. Early databases were utilized to deposit all lncRNA candidates from high-throughput techniques to facilitate classification, assessment and validation. Examples include LNCipedia for human lncRNA (17), PNRD (Plant non-coding RNA database) (18) and PlantNATsDB for plants (19), and NONCODE for 16 different species (20). Some databases such as NPInter 3.0 (21) contain both manually curated as well as high-throughput experiments. Recognizing potential false positives in high-throughput experiments, five recent databases were built to contain lncRNA validated by low-throughput experiments only. They are lncRNAdb, a manually curated database with 287 functional lncRNAs from eukaryotic organisms (22), LncRNADisease with 1102 entries of experimentally validated lncRNA–disease associations (23), Lnc2Cancer with manually curated 1057 associations between 531 lncRNAs and 86 human cancers (24), lncRInter with 922 lncRNA interaction pairs between 276 lncRNAs and 597 partners in 15 organisms (25), and PLNlncRbase with manually collected 420 plant lncRNAs in 43 plant species after excluding those from high-throughput experiments (26). All of these datasets were well designed for their respective specific purposes [eukaryotic only without legible disease associations (lncRNAdb), disease-focused (Lnc2Cancer and LncRNADisease), interaction-focused (lncRInter) and plant-focused (PLNlncRbase)]. Thus, it is highly desirable to have a central repository for all lncRNAs validated by low-throughput experiments along with their sequence, structure, function and related disease information. Here, we manually curated all experimentally validated lncRNAs (EVlncRNAs) published prior to 1 May 2016. We compared these lncRNAs with those collected in several specific databases (lncRNAdb, Lnc2Cancer, LncRNADisease and PLNlncRbase) and curated additional functional and disease-specific information not covered previously. The final dataset contains 1543 lncRNAs from 77 species, 2324 associations between 886 lncRNAs and 338 diseases, 793 entries of functional descriptions of 664 lncRNAs, and 1163 interactions between 445 lncRNAs and their interacting partners. More than 70% entries are partially or completely new, comparing to existing specific databases of experimentally validated lncRNAs.

DATABASE CONSTRUCTION

For this database, we only kept lncRNAs confirmed by low-throughput experiments such as quantitative reverse transcription-polymerase chain reaction (qRT-PCR), knock down, western blot, northern blot and luciferase reporter assays. Western blotting validates the interaction between lncRNAs and proteins often after knocking down (27,28). qRT-PCR and northern blot confirm the expression level of lncRNAs (29,30). Luciferase reporter assays examine the promoter activity of lncRNA or their interacting partners and confirm the interaction between lncRNAs and their partners (31,32). We excluded lncRNAs derived from RNA-seq only because these RNAs were transcribed but not necessarily functional. Moreover, RNA-seq data suffers from potential problems of partial transcripts, sequencing biases and low sequencing depth for lowly expressed transcripts, in particular (33). Furthermore, it is still challenging to separate peptide-coding from non-coding RNAs. Given the uncertainty, we have excluded lncRNAs if RNA-seq data is the only evidence. For similar reasons, lncRNAs with only microarray evidence are also excluded. All EVLncRNAs entries were curated as follows. We first searched the PubMed with keywords matching ‘lncRNA’, ‘long non coding RNA’, ‘long ncRNA’ or ‘long non-code RNA’ along with their plural forms. We then separated those articles covered in previous databases (lncRNAdb, LncRNADisease, Lnc2Cancer and PLNlncRbase) from those completely new ones, which led to 1267 new publications. From these literatures, we manually extracted information on newly validated lncRNAs, their related diseases and interactions, and found 155 lncRNAs not listed in any existing databases for experimentally validated lncRNAs. Next, the articles referenced as well as not referenced by previous databases were investigated for locating additional information required by our database. We manually examined all entries in lncRNAdb, LncRNADisease, Lnc2Cancer and PLNlncRbase. We excluded 767 lncRNAs collected by PLNlncRbase that was obtained from high-throughput microarray or RNA-seq. For lncRNAs collected by above databases but lacking function annotation, disease association or other information, we manually curated information from literature. In short, we added disease associated to entries in lncRNAdb, function annotations to entries in LncRNADisease, other interaction and disease information to entries in Lnc2Cancer, and interaction information to entries in PLNlncRbase. Only 399 lncRNAs indexed in previous databases were incorporated into our database without finding new information, compared to the fact that 989 entries incorporated were modified with new information. This modification along with 155 novel lncRNAs leads to 1144 (74%, 1144/1543) lncRNAs with new information not covered by existing databases of experimentally validated lncRNAs.

DATABASE CONTENT

Our database EVLncRNAs is intended to be a comprehensive database that includes all species and covers functional and disease-specific roles for all lncRNAs validated by one or more low-throughput experiments. In EVLncRNAs, lncRNAs were named following the HUGO Gene nomenclature Committee (HGNC) (34) with alias included. Sequence and position information of lncRNAs were annotated according to NCBI with links pointing to accessions in the NCBI (35) and Ensembl (36) provided if known. All data in EVLncRNAs were organized using MySQL. The web interface was designed by PHP. The web services were built on Apache Tomcat. The EVLncRNAs database is available at http://biophy.dzu.edu.cn/EVLncRNAs/. The database contains a total of 1543 lncRNAs from 77 species along with their annotated functions, interaction partners and relevant diseases, if known. Table 1 provides statistics of experimentally validated datasets for comparison. EVlncRNAs has nearly three times more lncRNAs than the largest existing dataset and covers 6 new species, 89 new diseases and 405 new interactions. Each entry contains general information, such as lncRNA official name, alias, species, chromosome, start site, end site, chain, exon number, assembly version, lncRNA class (such as lincRNA, antisense, exonic, intronic etc.) and accession numbers to NCBI and Ensembl. Moreover, we have included 14 peptide-coding lncRNAs.
Table 1.

Comparison of EVLncRNAs with other experimentally validated lncRNA databases

Database#LncRNAs#Species#Diseases#Disease associations#Functions (Excl. Interactions)#Interactions
lncRNAdb28771N/AN/A287307
LncRNADisease321142211102N/A475
Lnc2Cancer5311861057N/AN/A
PLNlncRbase420*43N/AN/A420*N/A
lncRInter27615N/AN/AN/A922
EVLncRNAs15437733823247931163

*After excluding 767 lncRNA from high-throughput microarray and RNA-seq.

*After excluding 767 lncRNA from high-throughput microarray and RNA-seq. EVLncRNAs contains 2324 associations between 886 lncRNAs and 338 diseases along with information on experimental methods, experimental samples, expression pattern, dysfunction types, detailed description and original articles (PubMed ID and web link). Among the entries, 89 diseases and 346 lncRNAs–disease associations were manually curated from recent literatures that never appeared in other databases. The 346 lncRNAs–disease associations comprise of 76 associations between previously known lncRNAs and diseases, and 270 associations on new lncRNAs or diseases. Experimentally validated functions of lncRNAs were classified according to expression, mutation, interaction, locus and epigenetics (23). For those lncRNAs that are not associated with diseases, we collected their functional information such as gene regulation, cell differentiation or embryonic development (interaction information is collected separately, see below). There are 166 novel entries in 793 entries from recent literature. The database also contains 1163 entries of 445 lncRNAs interacting with other molecules, where we included the name of interaction target, level of interaction, type of interaction, description of interaction and original publication (PubMed ID and linked). Here, ‘level of interaction’ represents levels of interaction partner including DNA, RNA and protein, and ‘types of interactions’ were classified into binding, regulation and co-expression; ‘description of interaction’ means descriptions of lncRNA interacting process from the literature. Among these entries, 500 entries were curated from recent literatures. In other words, 43% (500/1163) of interactions entries are new.

USER INTERFACE

The interface of EVLncRNAs allows users to browse, search and download. As shown Figure 1, there are four pull-down menus: Species, Disease, Interaction and Peptide-related in the ‘Browse’ menu (located in labeled position 1). Each menu contains sub-menus. Users can browse relevant entries in EVLncRNAs by clicking any pull-down menus or their sub-menus. Alternatively, users can click navigation bar (located in position 2) on the left to view the corresponding page or entry. Using ‘HOXA11-AS’ as an example (position 3), the page of this entry displays the lncRNA information (position 4), its related disease (position 5), function (position 6) and interaction information (position 7). If an lncRNA involves the function of interacting with other molecules, its interaction function will be described in more details in a following table. In the ‘Search’ page, EVLncRNAs enables users to search by any keywords, such as lncRNA name, alias, disease name, experimental methods, associated components and level of interaction. EVLncRNAs offers ‘fuzzy’ searching capabilities and returns all matching records. Figure 2 shows the search result of ‘HIF1A-AS1’ with specific link to the lncRNA found (labeled in position 1) and original articles (with PMID, labeled in 2) along with its disease association (position 3), function (position 4) and interaction information (position 5) as labeled. EVLncRNAs supports users to download the database data in the ‘Download’ page. In addition, EVLncRNAs provides links to the tools for lncRNA prediction (37,38) in the ‘Tools’ page. EVLncRNAs also permits users to submit novel experimentally validated lncRNAs and related diseases or associated components in the ‘Submit’ page. The submitted record will be included in the next release after manually checked by our curators. A detailed user tutorial is available in the ‘Help’ page.
Figure 1.

Browse page of EVLncRNAs.

Figure 2.

Search result page of EVLncRNAs.

Browse page of EVLncRNAs. Search result page of EVLncRNAs.

FUTURE EXTENSIONS

It is expected that the number of experimentally validated lncRNAs will continue to increase. EVLncRNAs will be manually curated and updated in a regular basis. We will also gradually add new tools for analysis and prediction of lncRNA, when they are available.

DISCUSSION AND CONCLUSION

LncRNAs have been found to play an essential role in gene regulation and diseases (3,11,12). However, the majority of databases for lncRNAs relied on computational prediction and/or high-throughput experiments with uncertain accuracy. A few experimentally validated lncRNA databases (22–26) have limited information available. The purpose of this work is to build a comprehensive dataset for all lncRNAs that were validated by low-throughput experiments. The current version of EVLncRNAs (up to 1 May 2016) contains the largest collection of experimentally validated lncRNAs across 77 species. This is close to three times larger than any other experimentally validated datasets. There are 869 lncRNAs for human, 220 for animal, 428 for plant and 26 for microbe (Supplementary Figure S1). The top 20 most studied species on lncRNA were shown in Figure 3. The human and mouse used as a human disease model are top two species with the most identified lncRNAs. The next species is Arabidopsis thaliana, which is a model organism for plant research. About 34 lncRNAs have orthologs in human, mouse and/or rat, which mean the same functional lncRNAs in different species, such as XIST of human and Xist of mouse, and 23 lncRNAs have orthologs in plant species. This should provide important information for evolution studies.
Figure 3.

Top 20 species for the number of experimentally validated lncRNAs.

Top 20 species for the number of experimentally validated lncRNAs. Disease association is an important component of EVLncRNAs. Diseases were classified into cancers (147 types) and others. Top 15 cancers and top 15 other diseases associated with the highest number of lncRNAs are shown in Supplementary Figures S2 and 3, respectively. That the number of cancer-related lncRNAs is greater than those related with other diseases is likely due to the fact that more researchers are working on cancers because even in LncRNADisease (specifically designed to include lncRNAs relevant to human disease) collections, the majority (627/1070 = 59%) are related to neoplastic conditions. That is, the dominance of cancer-related lncRNAs is not caused by the contribution from the entries from Lnc2Cancer collecting lncRNAs relevant to cancers. Currently, the disease associated with the most number of lncRNAs (>100) is hepatocelluar carcinoma. Many lncRNAs are associated with more than one disease. Those lncRNAs associated with more than 10 diseases are shown in Supplementary Figure S4. H19, for example, is associated with 40 cancers and 24 other diseases. In addition, EVLncRNAs contains 1163 entries of interaction information between 445 lncRNAs and their associated partners. As shown in Supplementary Figure S5, the highest number of lncRNAs is involved in regulation. This is followed by binding and co-expression. Click here for additional data file.
  38 in total

1.  PLNlncRbase: A resource for experimentally identified lncRNAs in plants.

Authors:  Hongdong Xuan; Linzhong Zhang; Xueshi Liu; Guomin Han; Juan Li; Xin Li; Aiguo Liu; Mingzhi Liao; Shihua Zhang
Journal:  Gene       Date:  2015-07-23       Impact factor: 3.688

Review 2.  Computational approaches towards understanding human long non-coding RNA biology.

Authors:  Saakshi Jalali; Shruti Kapoor; Ambily Sivadas; Deeksha Bhartiya; Vinod Scaria
Journal:  Bioinformatics       Date:  2015-03-15       Impact factor: 6.937

3.  Long noncoding RNA high expression in hepatocellular carcinoma facilitates tumor growth through enhancer of zeste homolog 2 in humans.

Authors:  Fu Yang; Ling Zhang; Xi-song Huo; Ji-hang Yuan; Dan Xu; Sheng-xian Yuan; Nan Zhu; Wei-ping Zhou; Guang-shun Yang; Yu-zhao Wang; Jing-li Shang; Chun-fang Gao; Feng-rui Zhang; Fang Wang; Shu-han Sun
Journal:  Hepatology       Date:  2011-09-06       Impact factor: 17.425

4.  An update on LNCipedia: a database for annotated human lncRNA sequences.

Authors:  Pieter-Jan Volders; Kenneth Verheggen; Gerben Menschaert; Klaas Vandepoele; Lennart Martens; Jo Vandesompele; Pieter Mestdagh
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

5.  NPInter v3.0: an upgraded database of noncoding RNA-associated interactions.

Authors:  Yajing Hao; Wei Wu; Hui Li; Jiao Yuan; Jianjun Luo; Yi Zhao; Runsheng Chen
Journal:  Database (Oxford)       Date:  2016-04-17       Impact factor: 3.451

6.  Ensembl 2017.

Authors:  Bronwen L Aken; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Friederike Bernsdorff; Jyothish Bhai; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Thomas Juettemann; Stephen Keenan; Matthew R Laird; Ilias Lavidas; Thomas Maurel; William McLaren; Benjamin Moore; Daniel N Murphy; Rishi Nag; Victoria Newman; Michael Nuhn; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Daniel Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Steven P Wilder; Amonida Zadissa; Myrto Kostadima; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Daniel M Staines; Stephen J Trevanion; Fiona Cunningham; Andrew Yates; Daniel R Zerbino; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

7.  Genenames.org: the HGNC and VGNC resources in 2017.

Authors:  Bethan Yates; Bryony Braschi; Kristian A Gray; Ruth L Seal; Susan Tweedie; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2016-10-30       Impact factor: 16.971

8.  Long noncoding RNA: noncoding and not coded.

Authors:  Debra Toiber; Gabriel Leprivier; Barak Rotblat
Journal:  Cell Death Discov       Date:  2017-01-09

9.  COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features.

Authors:  Long Hu; Zhiyu Xu; Boqin Hu; Zhi John Lu
Journal:  Nucleic Acids Res       Date:  2016-09-07       Impact factor: 16.971

10.  Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus.

Authors:  Jian-Feng Xiang; Qing-Fei Yin; Tian Chen; Yang Zhang; Xiao-Ou Zhang; Zheng Wu; Shaofeng Zhang; Hai-Bin Wang; Junhui Ge; Xuhua Lu; Li Yang; Ling-Ling Chen
Journal:  Cell Res       Date:  2014-03-25       Impact factor: 25.617

View more
  32 in total

1.  LncRNAs and Available Databases.

Authors:  Sara Napoli
Journal:  Methods Mol Biol       Date:  2021

2.  An expanded landscape of human long noncoding RNA.

Authors:  Shuai Jiang; Si-Jin Cheng; Li-Chen Ren; Qian Wang; Yu-Jian Kang; Yang Ding; Mei Hou; Xiao-Xu Yang; Yuan Lin; Nan Liang; Ge Gao
Journal:  Nucleic Acids Res       Date:  2019-09-05       Impact factor: 16.971

3.  Transcriptome-guided annotation and functional classification of long non-coding RNAs in Arabidopsis thaliana.

Authors:  Jose Antonio Corona-Gomez; Evelia Lorena Coss-Navarrete; Irving Jair Garcia-Lopez; Christopher Klapproth; Jaime Alejandro Pérez-Patiño; Selene L Fernandez-Valverde
Journal:  Sci Rep       Date:  2022-08-18       Impact factor: 4.996

Review 4.  The Emerging Roles of Long Non-Coding RNAs in Intellectual Disability and Related Neurodevelopmental Disorders.

Authors:  Carla Liaci; Lucia Prandi; Lisa Pavinato; Alfredo Brusco; Mara Maldotti; Ivan Molineris; Salvatore Oliviero; Giorgio R Merlo
Journal:  Int J Mol Sci       Date:  2022-05-30       Impact factor: 6.208

5.  LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases.

Authors:  Zhenyu Bao; Zhen Yang; Zhou Huang; Yiran Zhou; Qinghua Cui; Dong Dong
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

6.  LncSEA: a platform for long non-coding RNA related sets and enrichment analysis.

Authors:  Jiaxin Chen; Jian Zhang; Yu Gao; Yanyu Li; Chenchen Feng; Chao Song; Ziyu Ning; Xinyuan Zhou; Jianmei Zhao; Minghong Feng; Yuexin Zhang; Ling Wei; Qi Pan; Yong Jiang; Fengcui Qian; Junwei Han; Yongsan Yang; Qiuyu Wang; Chunquan Li
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

7.  The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

8.  A meta-analysis of public microarray data identifies biological regulatory networks in Parkinson's disease.

Authors:  Lining Su; Chunjie Wang; Chenqing Zheng; Huiping Wei; Xiaoqing Song
Journal:  BMC Med Genomics       Date:  2018-04-13       Impact factor: 3.063

Review 9.  Transcriptional regulation of osmotic stress tolerance in wheat (Triticum aestivum L.).

Authors:  Shabir H Wani; Prateek Tripathi; Abbu Zaid; Ghana S Challa; Anuj Kumar; Vinay Kumar; Jyoti Upadhyay; Rohit Joshi; Manoj Bhatt
Journal:  Plant Mol Biol       Date:  2018-08-14       Impact factor: 4.076

Review 10.  Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer.

Authors:  Anshika Chowdhary; Venkata Satagopam; Reinhard Schneider
Journal:  Front Genet       Date:  2021-07-01       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.