| Literature DB >> 33020484 |
Fayaz Seifuddin1, Komudi Singh1, Abhilash Suresh1, Jennifer T Judy1, Yun-Ching Chen1, Vijender Chaitankar1, Ilker Tunc1, Xiangbo Ruan2, Ping Li2, Yi Chen2, Haiming Cao2, Richard S Lee3, Fernando S Goes3, Peter P Zandi3, M Saleet Jafri4,5, Mehdi Pirooznia6.
Abstract
Long non-coding RNA Knowledgebase (lncRNAKB) is an integrated resource for exploring lncRNA biology in the context of tissue-specificity and disease association. A systematic integration of annotations from six independent databases resulted in 77,199 human lncRNA (224,286 transcripts). The user-friendly knowledgebase covers a comprehensive breadth and depth of lncRNA annotation. lncRNAKB is a compendium of expression patterns, derived from analysis of RNA-seq data in thousands of samples across 31 solid human normal tissues (GTEx). Thousands of co-expression modules identified via network analysis and pathway enrichment to delineate lncRNA function are also accessible. Millions of expression quantitative trait loci (cis-eQTL) computed using whole genome sequence genotype data (GTEx) can be downloaded at lncRNAKB that also includes tissue-specificity, phylogenetic conservation and coding potential scores. Tissue-specific lncRNA-trait associations encompassing 323 GWAS (UK Biobank) are also provided. LncRNAKB is accessible at http://www.lncrnakb.org/ , and the data are freely available through Open Science Framework ( https://doi.org/10.17605/OSF.IO/RU4D2 ).Entities:
Mesh:
Substances:
Year: 2020 PMID: 33020484 PMCID: PMC7536183 DOI: 10.1038/s41597-020-00659-z
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Resources of human lncRNA annotation.
| Database Name | Reference build | Annotation file name | URL |
|---|---|---|---|
| CHESS[ | hg38 | chess2.2.gtf | |
| LNCipedia[ | hg19,hg38 | lncipedia_5_2_hc_hg38.gtf | |
| NONCODE[ | hg19,hg38 | NONCODEv5_human_hg38_lncRNA.gtf | |
| FANTOM5[ | hg19 | FANTOM_CAT.lv3_robust.only_lncRNA.gtf | |
| MiTranscriptome[ | hg19 | mitranscriptome.hg19.v2.gtf | |
| BIGTranscriptome[ | hg19 | BIGTranscriptome_lncRNA_catalog.hg19.gtf | |
| deepBase[ | hg19 | hg19_allLncRNA.rnaFam.bed | |
| lncRNAdb[ | hg38 | under development | |
| LncRNAWiki[ | hg19 | RawData.tar.gz | |
| LncBook[ | hg19,hg38 | LncBook_GENCODE_GRCh38_9.28.gtf.gz | ftp://download.big.ac.cn/lncbook/1-LncRNAs(GRCh37%7C38)/LncBook_GENCODE_GRCh38_9.28.gtf.gz |
| RNAcentral[ | hg38 | homo_sapiens.GRCh38.gff3.gz | ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/14.0/genome_coordinates/gff3/homo_sapiens.GRCh38.gff3.gz |
| GENCODE[ | hg19,hg38 | gencode.v33.annotation.gtf.gz | ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz |
| ENSEMBL[ | hg19,hg38 | Homo_sapiens.GRCh38.99.gtf.gz | ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz |
| RefSeq.[ | hg19,hg38 | GRCh38_latest_genomic.gtf.gz | ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gtf.gz |
Fig. 1Overview of lncRNAKB.
Fig. 2Illustration showing the stepwise intersection of two annotations D1 (CHESS) (blue) and D2 (FANTOM-lncRNAs only) (green) at the gene and transcript levels. The genes are shown as solid rectangles and the transcripts are shown with exons and introns. The white arrows show the direction/strand in which the gene is transcribed. The orange bars show the results of the intersection (D1 intersect D2) at the gene level. The red X marks show transcripts and genes that were not incorporated into the merged annotation. D3 (LNCipedia), D4 (NONCODE), D5 (MiTranscriptome) and D6 (BIGTranscriptome) were merged using the same cumulative stepwise intersection method (see Methods: Integration of lncRNA annotations).
Fig. 3Schema of the web/database segment of the lncRNAKB.
Fig. 4Upset plot showing the overlap of all six lncRNAs annotations at the gene level, after the cumulative stepwise intersection method across all. The orange bars indicate the total number of genes in each source before merging. The black bars indicate the total number of genes present within an annotation or shared between annotations indicated by black dots present below the x-axis of the plot. Genes uniquely contributed by a single annotation would be represented as a single dot that horizontally aligns with the respective annotation. Black dots connected by lines indicate the number of annotations that share the genes represented in the bar plot.
Fig. 5Distribution of tissue-specificity scores with data for RNA-seq from 31 solid human normal tissues from GTEx across protein-coding genes (PCGs) and lncRNAs in the lncRNAKB as a comparison. The tissue-specificity scores varies from 0 to 1, where 0 means broadly expressed, and 1 is specific. Graph created with density function from R, which computes kernel density estimates (a) Average Tau score across all tissues. (b) Maximum and normalized specificity value of PEM among all tissues.
Summary results of the cis-eQTL results available from lncRNAKB. Tissues with <80 samples are shown here but, were excluded from the analysis.
| Tissue | Number_of_RNA_seq_samples_with_WGS | Number_of_Males | Number_of_Females | Number_of_SNPs_with_MAF_greater_than_0.05 | Total_number_of_genes_passed_filter | Total_number_of_PCGs | Total_number_of_lncRNAs | Total_SNP_gene_pairs_eQTLs | Total_SNP_gene_pairs_with_permutation_pvalue_less_than_0.05 |
|---|---|---|---|---|---|---|---|---|---|
| Adipose_Tissue | 363 | 220 | 143 | 5,952,169 | 27,029 | 15,175 | 11,854 | 54,871,184 | 5,766 |
| Adrenal_Gland | 146 | 82 | 64 | 5,886,806 | 25,943 | 14,973 | 10,970 | 51,879,876 | 4,077 |
| Bladder | 9 | 4 | 5 | 5,462,615 | 28,695 | 15,597 | 13,098 | * | * |
| Blood | 356 | 226 | 130 | 5,953,536 | 18,412 | 11,788 | 6,624 | 37,414,178 | 2,877 |
| Blood_Vessel | 378 | 241 | 137 | 5,963,536 | 25,614 | 14,770 | 10,844 | 51,947,442 | 5,854 |
| Bone_Marrow | * | * | * | * | 22,571 | 12,612 | 9,959 | * | * |
| Brain | 170 | 116 | 54 | 5,857,467 | 31,339 | 16,148 | 15,191 | 62,844,553 | 3,488 |
| Breast | 184 | 102 | 82 | 5,901,708 | 28,839 | 15,680 | 13,159 | 58,130,064 | 4,267 |
| Cervix_Uteri | 8 | 0 | 8 | 5,522,234 | 28,706 | 15,649 | 13,057 | * | * |
| Colon | 250 | 148 | 102 | 5,907,992 | 28,297 | 15,781 | 12,516 | 57,063,773 | 4,767 |
| Esophagus | 353 | 221 | 132 | 5,941,386 | 26,803 | 15,439 | 11,364 | 54,314,052 | 4,815 |
| Fallopian_Tube | 7 | 0 | 7 | * | 18,492 | 16,552 | 1,940 | * | * |
| Heart | 251 | 163 | 88 | 5,913,705 | 24,959 | 14,788 | 10,171 | 50,153,256 | 4,375 |
| Kidney | 29 | 23 | 6 | 5,742,588 | 28,917 | 15,726 | 13,191 | * | * |
| Liver | 118 | 77 | 41 | 5,871,833 | 23,846 | 14,204 | 9,642 | 47,689,780 | 2,759 |
| Lung | 274 | 182 | 92 | 5,926,605 | 29,045 | 15,744 | 13,301 | 58,884,074 | 5,461 |
| Muscle | 359 | 220 | 139 | 5,962,131 | 22,042 | 13,558 | 8,484 | 44,548,539 | 4,454 |
| Nerve | 268 | 174 | 94 | 5,941,274 | 29,326 | 15,472 | 13,854 | 59,363,204 | 7,416 |
| Ovary | 99 | 0 | 99 | 5,873,449 | 27,292 | 14,845 | 12,447 | 54,588,663 | 3,466 |
| Pancreas | 167 | 98 | 69 | 5,905,087 | 23,569 | 14,210 | 9,359 | 47,408,959 | * |
| Pituitary | 108 | 76 | 32 | 5,814,865 | 30,586 | 15,848 | 14,738 | 60,707,019 | 3,949 |
| Prostate | 101 | 0 | 101 | 5,810,666 | 30,373 | 15,931 | 14,442 | 60,377,553 | * |
| Salivary_Gland | 63 | 43 | 20 | 5,771,591 | 28,409 | 15,679 | 12,730 | * | * |
| Skin | 442 | 278 | 164 | 5,966,760 | 27,316 | 15,442 | 11,874 | 55,698,051 | 6,210 |
| Small_Intestine | 90 | 54 | 36 | 5,777,092 | 30,046 | 15,950 | 14,096 | 59,426,622 | 2,987 |
| Spleen | 108 | 62 | 46 | 5,874,443 | 28,284 | 14,969 | 13,315 | 56,914,604 | 4,743 |
| Stomach | 182 | 104 | 78 | 5,890,077 | 26,974 | 15,530 | 11,444 | 54,242,450 | 3,804 |
| Testis | 171 | 0 | 171 | 5,875,543 | 47,909 | 17,777 | 30,132 | 98,376,057 | 8,951 |
| Thyroid | 286 | 183 | 103 | 5,941,584 | 29,715 | 15,604 | 14,111 | 60,217,108 | 7,611 |
| Uterus | 82 | 0 | 82 | 5,795,583 | 28,175 | 15,166 | 13,009 | 55,748,102 | 3,037 |
| Vagina | 87 | 0 | 87 | 5,837,620 | 28,423 | 15,629 | 12,794 | 56,861,978 | 2,865 |
Summary of classification of lncRNA transcripts with respect to their localization, overlap and orientation relative to transcription of proximal protein-coding RNA transcripts.
| 1aOverlapping | 1GENIC | 1cNested | Total | |
|---|---|---|---|---|
| 1bContaining | ||||
| Antisense Exonic | 9,326 | 1,816 | 3,552 | 14,694 |
| Antisense Intronic | 1,302 | 1,284 | 8,330 | 10,916 |
| Sense Exonic | 29,942 | 42,160 | 29,087 | 101,189 |
| Sense Intronic | 327 | 994 | 13,274 | 14,595 |
| Total | 40,897 | 46,254 | 54,243 | 141,394 |
| Upstream | — | 14,930 | 13,408 | 26,470 |
| Downstream | 11,540 | — | 10,662 | 24,070 |
| Total | 11,540 | 14,930 | 24,070 | 50,540 |
The legend below explains the categories in detail:
1GENIC: when the lncRNA gene overlaps an RNA gene from the reference annotation file
2INTERGENIC (lincRNA): otherwise.
GENIC type:
Then exonic or intronic locations:
1aOverlapping subtype: the lncRNA partially overlaps the RNA partner transcript.
1bContaining subtype: the lncRNA contains the RNA partner transcript.
1cNested subtype: the lncRNA is contained in the RNA partner transcript.
INTERGENIC type:
2aDivergent subtype: the lncRNA is transcribed in head to head orientation with RNA partner transcript: upstream or downstream.
2bConvergent subtype: the lncRNA is oriented in tail to tail with orientation with RNA partner transcript: upstream or downstream.
2cSame_strand subtype: the lncRNA is transcribed in the same orientation with RNA partner transcript: upstream or downstream.
Fig. 6Distribution of mean PhastCons exon sequence conservation scores across lncRNA and protein-coding genes in the lncRNAKB. Graph created with density function from R, which computes kernel density estimates.
Fig. 7Gene expression box plot distributions of gene (a). MALAT1 (Metastasis Associated Lung Adenocarcinoma Transcript 1) and (b). NPPB (natriuretic peptide B). The x-axis represents the 31 solid human normal tissues from GTEx and y-axis is the TPM expression.
Fig. 8Principal Component Analysis (PCA) of GTEx samples using (a). protein-coding and (b). lncRNA (log2(TPM) transformed gene expression. Expression of lncRNA alone also recapitulates tissue types.
Fig. 9Cytoscape network for lncRNA-mRNA co-expression Module 2 (M2) in the heart identified using WGCNA. The network was filtered for heart development genes (n = 148) and correlations >0.20. Orange triangles and green circles/nodes represent lncRNAs and PCGs respectively. The density of gray lines/edges represents the strength of the connection between genes.
| Measurement(s) | regulation of gene expression • sequence feature annotation • lnc_RNA • tissue-specific expression of lncRNA • Expression Quantitative Trait Locus |
| Technology Type(s) | digital curation • computational modeling technique |
| Sample Characteristic - Organism | Homo sapiens |