| Literature DB >> 30371849 |
Pieter-Jan Volders1,2,3,4, Jasper Anckaert1,2,4, Kenneth Verheggen3,4, Justine Nuytens1,2,4, Lennart Martens3,4, Pieter Mestdagh1,2,4, Jo Vandesompele1,2,4.
Abstract
While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30371849 PMCID: PMC6323963 DOI: 10.1093/nar/gky1031
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The different sources of lncRNA transcripts used in LNCipedia 5. The number of unique transcripts and genes varies substantially between the different sources
| Source | Full dataset | High-confidence set | Unique transcripts | Unique genes |
|---|---|---|---|---|
| Broad Institute | 14 043 | 13 277 | 1627 (12%) | 21 (0.3%) |
| Ensembl release 92 - April 2018 | 25 075 | 22 551 | 8631 (34%) | 1334 (10.2%) |
| FANTOM CAT (stringent) | 24 756 | 22 067 | 24 750 (100%) | 2776 (42%) |
| NONCODE v4 | 77 529 | 62 918 | 46 124 (59%) | 24 901 (55.1%) |
| Refseq - NCBI Annotation Release 106 | 5188 | 4319 | 2718 (52%) | 97 (2.5%) |
| Sun and Gadad | 2124 | 1672 | 2124 (100%) | 546 (34.3%) |
| Nielsen | 7119 | 6775 | 7074 (99%) | 6119 (86.7%) |
| Hangauer | 5296 | 5232 | 357 (7%) | 18 (0.4%) |
| Total number of unique transcripts | 127 802 | 107 039 | ||
| Total number of unique genes | 56 946 | 49 372 |
Figure 1.(A) The number of publications on lncRNAs has grown rapidly over the past years. Shown here are the number of entries in PubMed with keyword ‘long non-coding RNA’. (B) LncRNA functional annotation is heavily skewed towards a small number of lncRNAs. (C) Comparison word cloud of the paper abstracts associated with the seven lncRNAs with the most associated papers. Functions and disease associations are immediately clear from this analysis.
Sequence ontology terms used to annotate lncRNA subclasses
| Classification | Gene level SO term | Transcript level SO term |
|---|---|---|
| intergenic | lincRNA_gene (SO:0001641) | lincRNA (SO:0001463) |
| antisense | antisense_lncRNA_gene (SO:0002182) | antisense_lncRNA (SO:0001904) |
| intronic | sense_intronic_ncRNA_gene (SO:0002184) | sense_intronic_ncRNA (SO:0002131) |
| sense-overlapping | sense_overlap_ncRNA_gene (SO:0002183) | sense_overlap_ncRNA (SO:0002132) |
| bidirectional | bidirectional_promoter_lncRNA (SO:0002185) | NA |