| Literature DB >> 34591957 |
Yuansheng Zhang1,2,3, Dong Zou1,2, Tongtong Zhu1,2,3, Tianyi Xu1,2, Ming Chen1,2,3, Guangyi Niu1,2,3, Wenting Zong1,2,3, Rong Pan1,2,3, Wei Jing1,2,3, Jian Sang1,2,3, Chang Liu1,2,3, Yujia Xiong4, Yubin Sun1,2, Shuang Zhai1,2, Huanxin Chen1,2, Wenming Zhao1,2,3, Jingfa Xiao1,2,3, Yiming Bao1,2,3, Lili Hao1,2, Zhang Zhang1,2,3.
Abstract
Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.Entities:
Mesh:
Year: 2022 PMID: 34591957 PMCID: PMC8728231 DOI: 10.1093/nar/gkab878
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Database contents and features of Gene Expression Nebulas. Abbreviations used: SC, single-cell; TS, tissue-specific; HS, house-keeping; FPKM, fragments per kilobase of transcript per million mapped fragments; TPM, transcripts per million. SE: skipped exon; A3SS: alternative 3′ splice site; A5SS: alternative 5′ splice site; MXE: mutually exclusive exons; RI: retained intron.
Figure 2.Screenshots of database web interfaces. (A) Curated meta-information of dataset, including sequencing strategies, tissue, cell type, disease, biological context, quality and quantity and etc. (B) Boxplot of expression levels of multiple genes of interest across samples. (C) Heatmap of differentially expressed genes for bulk RNA-seq datasets. (D) Clustering results of single-cell RNA-seq dataset on a 3D UMAP plot where cells are color-coded by clusters.
Data statistics in Gene Expression Nebulas (as of August 2021)
| Kingdom | Species | #Datasets (bulk/single-cell) | #Samples | #Tissues | #Cells |
|---|---|---|---|---|---|
| Animalia |
| 192 (68/124) | 29 942 | 70 | 6 823 695 |
|
| 11 (3/8) | 914 | 7 | 1 176 003 | |
|
| 7 (1/6) | 14 800 | 4 | 3 837 235 | |
|
| 4 (1/3) | 329 | 7 | 42 129 | |
|
| 4 (1/3) | 326 | 1 | 304 | |
|
| 4 (2/2) | 134 | 2 | 122 | |
|
| 3 (1/2) | 86 | 3 | 59 | |
|
| 3 (1/2) | 367 | 9 | 28 773 | |
|
| 2 (1/1) | 142 | 3 | 100 | |
|
| 2 (1/1) | 12 | 2 | 130 713 | |
|
| 2 (1/1) | 30 | 7 | 657 999 | |
|
| 2 (1/1) | 20 | 4 | 22 737 | |
|
| 2 (1/1) | 21 | 8 | 11 380 | |
|
| 2 (1/1) | 32 | 1 | 32 | |
|
| 2 (1/1) | 15 | 2 | 55 930 | |
|
| 2 (1/1) | 32 | 1 | 32 | |
|
| 2 (1/1) | 115 | 2 | 2 520 906 | |
| Plantae |
| 32 (31/1) | 1087 | 14 | 27 |
|
| 16 (16/0) | 499 | 8 | - | |
|
| 8 (5/3) | 242 | 7 | 220 188 | |
|
| 5 (5/0) | 462 | 7 | - | |
|
| 3 (3/0) | 78 | 6 | - | |
|
| 2 (2/0) | 34 | 6 | - | |
|
| 2 (2/0) | 480 | 1 | - | |
|
| 1 (1/0) | 44 | 6 | - | |
|
| 1 (1/0) | 14 | 1 | - | |
|
| 1 (1/0) | 6 | 1 | - | |
| Protista |
| 2 (1/1) | 208 | 0 | 180 |
|
| 2 (1/1) | 12 | 0 | 4988 | |
| Fungi |
| 2 (1/1) | 17 | 0 | 6637 |
|
|
|
|
|
|
|