| Literature DB >> 23842462 |
Valerio Bianchi1, Alessio Colantoni, Alberto Calderone, Gabriele Ausiello, Fabrizio Ferrè, Manuela Helmer-Citterich.
Abstract
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.Entities:
Mesh:
Year: 2013 PMID: 23842462 PMCID: PMC5654372 DOI: 10.1093/database/bat050
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Data sets included in DBATE
| GEO GSE identifier | Samples | Number of reads (×106) | Read length (bp) | Description | Reference |
|---|---|---|---|---|---|
| GSE12946 | Adipose, brain, breast, colon, heart, liver, lymph node, skeletal muscle, testes, BT474, HME, MB435, MCF-7, T47D | 224 | 32 | The Wang data set, from which we selected 14 samples, 9 in normal condition and 5 in tumoral condition | 31 |
| GSE17274 | Three female (HSF1, HSF2, HSF3) and three male liver samples (HSM1, HSM2, HSM3) | 72 | 35 | Sex-specific gene expression in liver in three males and three females | 32 |
| GSE29119 | Breast cancer (HCC1954) and normal breast cells (HMEC) | 97 | 36 | Gene expression analysis of breast cancer | 33 |
| GSE29155 | Prostate epithelial (PrEC) and prostate adenocarcinoma (LNCaP) cell lines | 9 | 36 | Transcription profiling of human prostate epithelial and adenocarcinoma cell lines | 34 |
| GSE29580 | Normal and tumor samples from two colorectal cancer patients | 40 | 36 | Whole transcriptome sequencing of colorectal cancer | NA |
| GSE29968 | Matched esophageal squamous cells from three carcinoma patients | 118 | 38 | Transcriptome analysis of human esophageal squamous cell carcinoma in three pairs of matched patient-derived tumor samples and their adjacent non-tumorous tissues | 35 |
| GSE30611 | Adipose, adrenal, brain, breast, colon, heart, kidney, liver, lung, lymph node, ovary, prostate, skeletal muscle, testes, thyroid, white blood cells | 2000 | 50 | The Illumina BodyMap 2.0 Project, comprising transcription profiling of individual and mixtures of 16 human tissues | NA |
| GSE30772 | Mitochondrion, mitoplasm | 45 | 35 | Examination of the mitochondiral transcriptome | 36 |
| GSE32689 | Pooled oocytes, pooled sister polar bodies, single oocyte, single sister polar body | 120 | 42 | Transcriptome of the human polar body, providing four conditions: pooled oocytes and their sister polar bodies and a single oocyte and its sister polar body | 37 |
| GSE33328 | Peripheral brain tissue, tumor brain tissue | 49 | 75 | Transcriptomic profiling of a glioblastoma multiforme patient with control peripheral brain tissue | 38 |
| GSE37769 | THP1 cells | 287 | 100 | Expression analysis of the THP1 (human monocytic leukemia) cell line | 39 |
| GSE38685 | Prostate epithelial (PrEC) and prostate adenocarcinoma (LNCaP) cell lines | 35 | 75 | Transcription profiling of human prostate epithelial and adenocarcinoma cell lines | 40 |
| GSE43925 | THP1 high glucose, THP1 normal glucose | 60 | 42 | Expression analysis of human THP-1 monocytes in normal conditions and treated with high glucose | 41 |
aPlatform: Genome Analyzer
bPlatform: Genome Analyzer IIx
cPlatform: HiSeq 2000
dPlatform: Genome Analyzer II
eSingle end reads
fPaired end reads
The current DBATE release includes 13 data sets retrieved from the Gene Expression Omnibus (GEO). The Table table reports for each data set its GEO GSE identifier, the samples it contains, the total number of reads (expressed in million reads), the read length, a brief description of the data set content, and the literature reference when available (NA indicates that the data were deposited in GEO but the study is still unpublished). Superscripts indicate the sequencing technology employed used (either GA, GAII, GAIIx, or HiSeq 2000), and whether the reads were sequenced as single or paired ends.
Figure 1.Example of combination of complex queries in DBATE. This heatmap reports expression values in the BodyMap panel of human tissues of splicing variants that encode for protein products containing the Pfam KH domain (PF00013), which are phosphorylated and contain repetitive units. The combination of this information can be easily obtained using the web interface of DBATE that returns in this case 10 different splicing variants that belong to genes ANKRD17, KHSRP, HNRNPK and ANKHD1. Their expression patterns show that splicing variants for these different proteins can have tissue-specific behaviors. The heatmap image is generated by an automated procedure using the statistical software R using the heatmap.2 function, and then loaded on the web interface as part of the results page. The color code of the heatmap ranges from red, lower FPKM values; to black, medium expression values; to green, higher expression values.
Figure 2.Protein interaction network for the ANKRD17, KHSRP, HNRNPK and ANKHD1 genes retrieved from the mentha database and plotted by the mentha browser applet. The mentha database stores manually curated PPIs from five different PPI databases and has been implemented in the DBATE web interface. These four genes have been selected from a complex query search on DBATE to obtain all the splicing variants that encode for protein products containing the KH (K Homology) domain and that are also phosphorylated and contain repeated units. The network includes all primary binding partners of the four genes. Nodes describe genes, and arcs join genes whose protein products are known to physically interact. Nodes corresponding to the query proteins are larger and highlighted with blue circles. Each node is colored according to the expression level of its most expressed splicing variant. Color ranges from red, lower FPKM values; to black, medium expression values; to green, higher expression values. White nodes describe genes for which no splicing variant is expressed in the selected tissue. Protein interaction networks generated by the mentha browser can also be manually expanded and pruned.