| Literature DB >> 28025337 |
Hans Ienasescu1,2, Kang Li1,2,3, Robin Andersson1, Morana Vitezic1,2, Sarah Rennie1, Yun Chen1,2, Kristoffer Vitting-Seerup1,2, Emil Lagoni1,2, Mette Boyd1,2, Jette Bornholdt1,2, Michiel J L de Hoon4, Hideya Kawaji4,5, Timo Lassmann4,6, Yoshihide Hayashizaki4,5, Alistair R R Forrest4,7, Piero Carninci4, Albin Sandelin8,2.
Abstract
Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS.Database URL: http://slidebase.binf.ku.dk.Entities:
Mesh:
Substances:
Year: 2016 PMID: 28025337 PMCID: PMC5199134 DOI: 10.1093/database/baw144
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Selecting genes based on expression levels using sliders. The figure shows a simple artificial example based on three genes and expression data across four tissues (shown in panel A), and three example selections based on sliders shown beneath each bar plot (panels B–D), where the bar plots show the results of each slider configuration. (A) Expression set for selection: The bar plot shows the expression levels of three example genes across four samples (blood, brain, heart and liver) from CAGE data, as indicated by colour. While basal expression values originate from CAGE data, and are measured as TPM (tags per million, values shown at each bar), these are normalized so that the contribution from each tissue sums to 100% per gene (shown on X axis). (B) Selection of brain-specific genes using sliders: For each of the four tissues, we create a percentage ‘slider’, shown beneath the bar plot, with colours corresponding to respective tissue. Each slider is interactive and is used to define a percentage interval of expression thresholds. Having a slider for each sample allows the user to specify the amount of expression originating from one or more samples. In doing so, the results will only contain genes which comply with the described constraints. In this example, to select gene(s) that are predominantly expressed in brain, the left handle of the brain slider is moved to 80% while the right handle remains at 100%. As a result we define the percentage interval [80%, 100%] for brain, which is equivalent to having at least 80% and at most 100% of gene expression come from brain. The selected constraints are shown as dotted lines and a grey area within the bar plot, and identified by a callout. Only gene 1 satisfies these constraints (genes not satisfying the constraints are translucent). (C) Selection of blood- and heart-specific genes: Bar plots and sliders are organized as in panel B. In addition to setting constraints for a single slider it is also possible to combine multiple sliders to create a more refined search. In this example, we require that at least 30% and at most 45% of the expression comes from blood. Furthermore, we also require that another minimum of 30% and maximum of 45% of expression comes from heart. We achieve this by moving the blood and heart left slider handles to 30% and right slider handles to 45%. Only gene 2 satisfies these constraints. (D) Selection of liver-specific genes not expressed in brain: Bar plots and sliders are organized as in panel B. In this example, we wish to select genes, which have no expression in brain, but have at least 50% expression from liver. In order to require that no expression originate from brain, both slider handles are moved to 0% (at least 0% and at most 0%). Finally, we move the liver left slider handle to 50%. Only gene 3 satisfies these constraints.
Figure 2.Example of selection of tissue-specific enhancers. The selection of genes, proteins or miRNAs based on expression or abundance works similarly. (A) Selection of cell/tissue-specific enhancers based on sliders: Sliders for neutrophils, reticulocytes and T cells as well as whole blood are shown (out of a total of 69 cell + 41 tissue sliders, where slides that are not shown are set to defaults = no constraints). The number of selected enhancers obtained from moving sliders is shown in blue boxes. Selecting a minimum expression contribution of 20% from neutrophils and reticulocytes results in 15 enhancers. If also requiring ≥15% expression from blood, the number of enhancers decreases to 12. Sliders also allow for negative selection, allowing at maximum only a certain amount of expression from a cell type: this is done by the right slider handle, exemplified by permitting at maximum 5% T-cell expression. The 11 enhancers resulting from the overall selection are shown in panel B. (B) Detailed expression of selected enhancers. Middle: Overview of the enhancers selected in panel A. The highlighted enhancer serves as an example for data in left and right panels. Left: Detailed expression data across all FANTOM5 samples for each enhancer for tissues/organs and primary cells. Right: SNP overlap and predicted promoter–enhancer associations of selected enhancers. Note that not all data present in the web tool is shown. (C) UCSC browser views of the enhancer region highlighted in panels B. Upper panel shows the larger gene landscape, including the MAPK14 gene linked to the enhancer highlighted in panel B. Lower panel shows a zoom-in.
Current datasets within SlideBase
| Expression features | # Features for selection | # Cell or tissue types | Experimental technique | Underlying database/resource |
|---|---|---|---|---|
| Gene expression | 24 602 genes | 84 tissues and primary cells | Microarray | BioGPS ( |
| Gene expression | 19 692 genes | 32 tissues | RNA-seq | Human protein atlas ( |
| Gene expression | 41 991 genes | 53 tissues | RNA-seq (median expression across large human cohorts) | GTEx ( |
| Transcription start site expression | 184 476 TSSs | 69 primary cell groups, 41 tissue groups | CAGE | FANTOM promoter atlas ( |
| Enhancer RNA expression | 32 693 enhancers | 69 primary cell groups, 41 tissue groups | CAGE | Human enhancer atlas ( |
| miRNA expression | 1857 miRNAs | 67 primary cells | sRNA-seq | Data described in de Rie |
| Protein expression | 14 578 proteins | 45 tissues | Immunohisto-chemical staining | The human protein atlas ( |
Figure 3.Location- and expression-based promoter search. (A) Interface for selecting CAGE-defined promoters localized in a given genomic region. (B) Interface for selecting promoters localized around a given gene. This interface allows for the selection of the set of CAGE-defined promoters that localize within a certain window around an annotated gene TSS. (C) Example of gene-based selection of promoters. Left panel: Using the gene-based search interface in panel B, the NEUROD1 gene was selected: the dropdown menu suggests official gene names matching user input. As a default, all CAGE promoters 100 kbp around the UCSC-gene annotated TSS for NEUROD1 are selected, which results in 20 promoters (right panel) (D) Example of combined gene and tissue/cell constraint-based selection. The upper left panel exemplifies a more focused selection of CAGE promoters by constraining the genomic region analysed to ± 1000 bp of the annotated NEUROD1 TSS. This results in the selection of 11 promoters (right upper panel). On top of this, we add an additional expression constraint using a slider, where at least 75% of expression must come from neuron samples (lower left panel). This results in a subset of 6 promoters (lower right panel), compared to the 11 selected above.
Figure 4.Dual selection of RNA and protein levels using sliders. (A) Slider-based selection of RNA and protein levels. Using data from the Human Protein Atlas, SlideBase uses pairs of sliders for RNA levels (left) and protein levels (right). Simultaneous constraints on matched tissues on RNA and protein level can be applied, and the resulting number of genes is updated in real time. The sliders work as in Figure 1, but based on the nature of the underlying data, protein levels are divided into four set categories. (B) Example of output from the search constraints in panel A. RNA expression and protein abundance are shown in left and right panel, respectively. Note that all output is not shown due to size constraints (in total, data from 32 and 45 tissues for RNA and protein levels, respectively, are available).