Literature DB >> 26954504

A Human "eFP" Browser for Generating Gene Expression Anatograms.

Rohan V Patel^1,2, Erin T Hamanishi³, Nicholas J Provart^1,2.

Abstract

Transcriptomic studies help to further our understanding of gene function. Human transcriptomic studies tend to focus on a particular subset of tissue types or a particular disease state; however, it is possible to collate into a compendium multiple studies that have been profiled using the same expression analysis platform to provide an overview of gene expression levels in many different tissues or under different conditions. In order to increase the knowledge and understanding we gain from such studies, intuitive visualization of gene expression data in such a compendium can be useful. The Human eFP ("electronic Fluorescent Pictograph") Browser presented here is a tool for intuitive visualization of large human gene expression data sets on pictographic representations of the human body as gene expression "anatograms". Pictographic representations for new data sets may be generated easily. The Human eFP Browser can also serve as a portal to other gene-specific information through link-outs to various online resources.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 26954504 PMCID： PMC4783024 DOI： 10.1371/journal.pone.0150982

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Global gene expression profiling studies offer an unparalleled opportunity to further our understanding of gene function. In particular, the ability to decipher when a given gene is expressed, and to what level in certain tissues and developmental stages can prove useful for human biomedical studies. It has been estimated that the human genome contains ~21,000 protein-coding genes [1], with more recent estimates putting this number even lower at ~19,000 [2]. Experimental protein-level evidence for at least 30% of the ~21,000 genes is lacking [3], leaving a sizeable void in our understanding of gene function. Gene expression profiling can help bridge this gap, by generating experimental evidence that a given gene is at least transcribed. Expression levels of human genes vary across a multitude of tissue types, developmental stages and disease states. Typically, studies have focused on a particular subset of these conditions, but “atlas”-type resources such as the Genomics Institute of the Novartis Research Foundation (GNF) Gene Expression Atlas (Su et al., 2004) that encompasses a wide variety of tissue types and disease states have also been generated. Integration of a number of independent microarray studies covering a wide variety of biological conditions is challenging but possible as long as they have been sampled using the same platform [4]. We have integrated several such studies found both in the Gene Expression Omnibus (GEO, [5]) and ArrayExpress [6]. This includes samples from the GNF Gene Expression Atlas as well as the following series: GSE475, GSE2361 [7], GSE3526 [8], GSE8961 [9], GSE4567 [10], GSE7307 [11], GSE19650 [12], E-MTAB-47 [13], E-GEOD-6257 [14], and E-MEXP-2219 [15]. In total, 774 samples from 11 different data sets have been collated. In addition to this, the RNA-Seq Illumina Human BodyMap 2.0 data set ([16]; Ensembl Release 70) containing 16 different samples has been added to the Human eFP Browser, showing the flexibility of this tool to enable viewing of data from different platforms (expression levels for a given gene and tissue combination are not directly comparable if generated by different platforms—a message at the top of the Illumina Body Map 2 view alerts users to this fact). Ultimately, in order to maximize the potential that gene expression studies offer, the ability to rapidly and easily interrogate these data sets is necessary. The interpretation of the gene expression level values should also occur in a coherent and user-friendly manner. Many online resources exist that enable a user to visualize gene expression levels in a data set for a given gene. Such tools include BioGPS [17], EBI Expression Atlas [18], GeneCards [19], Human Protein Atlas [20], GEO Profiles [5], TiGER [21], and Genevestigator [22]. However, these tools don’t provide biological context: outputs are bar graph or heatmap visualizations, with the name of the sample being the only, often somewhat cryptic, indication as to what kind of tissue or cell type that sample was generated from. A more informative way to visualize such data would be to show the level of expression in an anatomical sense, thus lending some context to the data. While the Expression Atlas tool at the EBI [23] does provide a representation of the human body for the Illumina Human Body Map 2.0 data set [16], where the corresponding body part is highlighted if a user moves his/her mouse over the gene expression value of interest, eye saccades and top-down processes [24] are required to actually determine to which part of the body a given expression value belongs. This user interface also fails to provide anatomical context for smaller structures within tissues. Here, we present a tool that enables the user to visualize large-scale human gene expression data sets directly on representations of the human body—the Human eFP Browser at http://bar.utoronto.ca/efp_human/, which is based on an open source framework developed by Winter et al. (2007). Current data sets in the Human eFP Browser were sampled on the HG-U133A and HG-U133 Plus 2 arrays (Affymetrix Inc., Santa Clara, USA), and by RNA-seq in the case of the Illumina Body Map 2 view. The user is shown diagrammatic anatomical representations that correspond to those areas of the body that were used to generate the RNA samples described above (currently categorized into five different views). The normalized gene expression data are stored on the Bio-Analytic Resource (BAR) server [25]. The user enters an Entrez gene identifier, a gene symbol, or a probe set identifier, and then chooses the mode of interpretation (absolute, relative, or compare). After clicking “Go”, the representations of human samples are coloured based on the expression level of the gene of interest, generating expression “anatograms” for rapidly determining where a given gene is most strongly expressed. A yellow-red scale is used in the “Absolute” mode to depict expression levels, with yellow denoting no expression in a given depiction of a tissue and red denoting maximal expression. “Relative” mode displays the ratio of the expression level of a given gene relative to a control level (the median expression level for that gene across all samples in a particular view). The colour scale used in this instance is yellow-red for values above the control level, and yellow-blue for values below the control level. In “Compare” mode the primary gene expression level is compared to that of the secondary gene expression level, and the colour scheme is the same as in the “Relative” mode. Information regarding the view with the highest level of gene expression is given near the top of the view, and information regarding probe set/gene identifiers as well as functional annotation attributed to the query gene is given at the bottom. Since gene expression data are given anatomical context, further interpretation is allowed and data become more accessible to users who may not be completely familiar with all parts of human anatomy. The Human eFP Browser is intended as a rapid and easy means for visualizing gene expression data sets to identify gene expression patterns of interest and facilitate hypothesis generation. Gene-specific link-outs are also provided to corresponding gene records in BioGPS [17], the Gene database at NCBI [26], UniProt [27], EBI, and GeneMANIA [28]. Thus the Human eFP Browser can also serve as a portal to gene-specific information. We have also worked with the curators at NCBI such that link-outs to the Human eFP Browser are available from the human Gene pages at NCBI.

Results

In order to demonstrate the utility of the Human eFP Browser, we present examples of genes whose expression patterns have been published. The first example output shown in Fig 1 is for the insulin (INS) gene, which is expressed most highly in the pancreatic β islet cells [29]. Here, the gene symbol (“INS”) was entered, “Absolute” mode was selected and the “Skeletal Immune Digestive” data source was also selected. The output for this gene shows expression exclusively in the pancreas / islet cells. Also any functional annotation attributed to the gene is given (not shown). Direct links to the records for the INS gene in BioGPS, NCBI, UniProt, and EBI are provided at the top of the output.

Fig 1

Human eFP Browser output showing expression patterns of the INS gene in the “Skeletal Immune Digestive” compendium.

Human eFP Browser output showing expression patterns of the INS gene in the “Skeletal Immune Digestive” compendium.

Strong expression of INS, as denoted by the red colour, is observed in Islet cell cultures, and to a lesser extent in RNA samples generated from the whole pancreas, where these specialized cells are found. A second example output is shown in Fig 2 and is for the SIX homeobox 3 (SIX3) gene, which is associated with developmental abnormalities in the forebrain [30]. The highest levels of gene expression are found in the putamen and nucleus accumbens. Again, additional information related to this gene as well as link-outs to other resources are provided.

Fig 2

Human eFP Browser output for the SIX homeobox 3 (SIX3) gene using the “Nervous” Data Source, showing strong levels of expression in the putamen and nucleus accumbens, as denoted by the red colouring.

The calcium/calmodulin-dependant protein kinase II beta (CAMK2B) gene is the final output example and its expression patterns are shown in Fig 3. It is involved in neuronal plasticity and synapse formation [31]. In the RNA-Seq Human eFP Browser view, highest expression levels are found in the brain and to a lesser extent in the skeletal muscle. In this view, it is also possible to view related information and link outs to other resources.

Fig 3

Human eFP Browser output for the calcium/calmodulin-dependant protein kinase II beta (CAMK2B) gene using the “Human Body Map 2 Illumina” Data Source.

Highest expression levels are found in the brain and to a lesser extent in the skeletal muscle, as denoted by the red colouring.

Human eFP Browser output for the calcium/calmodulin-dependant protein kinase II beta (CAMK2B) gene using the “Human Body Map 2 Illumina” Data Source.

Highest expression levels are found in the brain and to a lesser extent in the skeletal muscle, as denoted by the red colouring.

Discussion

When considering global microarray or RNA-seq gene expression profiling studies, gene expression levels are a useful guide to that gene’s biology. The Human eFP Browser provides users with the ability to easily visualize and rapidly interpret the results of gene expression studies in humans. While many human gene expression studies focus on a particular area of the human body, this tool enables the user to interpret gene expression levels across multiple tissue types. Moreover, for users who are less familiar with human anatomy, such expression data sets will become more accessible as the data are given anatomical context, as opposed to being shown as a bar graph. In order to provide examples of the utility of the Human eFP Browser, we chose three genes that are expected to show high levels of gene expression in specific tissues. INS shows highest expression in the islet cells (Fig 1), while SIX3 shows highest expression in the putamen and nucleus accumbens (Fig 2), and CAMK2B shows highest expression in the brain (Fig 3). These examples show the utility of this tool for visualizing gene expression data sets (both microarray- and RNA-seq-based). At present, link-outs are provided several common repositories for gene information in order to provide further details at the click of a mouse. Users can also access the relevant experiment records in GEO by clicking on individual tissues on the image. Additionally, on mouse-over the tissue name and expression value (absolute, or relative with fold-change or standard deviation) is displayed. Underneath the main image, a link is provided to a table listing all sample names, expression values, fold-changes, and standard deviations, as well as a chart showing the same information. Gene specific link-outs to entries in other databases can be found above the main image. In the future, as more human gene expression experiments are conducted, we envisage adding further data sets and views to this tool, including those that have been profiled on other platforms. Current and future activities involve adding further developmental data sets, as well as disease data sets e.g. cancer gene expression studies, into the Human eFP Browser. In this way, the Human eFP Browser can become a comprehensive resource for visualization and interpretation of human gene expression data and an aggregator of link-outs to various other resources. We encourage any researcher to contact us with ideas for specific views.

Materials and Methods

A number of human microarray data sets are represented within the Human eFP Browser. From GEO, the following data sets are represented: GSE1133, GSE475, GSE2361, GSE3526, GSE8961, GSE4567, GSE7307, and GSE19650. Other data sets are from ArrayExpress: E-MTAB-47, E-GEOD-6257, and E-MEXP-2219. All microarray data sets were normalized in R/Bioconductor using the MAS 5 method with a target value of 100 with the following commands: #Load affy package > library(affy) #Set working directory to directory containing the data you wish to normalize > setwd("[FULL PATH TO DIRECTORY CONTAINING THE DATA]") #Invoke ReadAffy to define specific cdf > GSE35261<-ReadAffy(cdfname = "hgu133acdf") #MAS 5 normalize the data with a tgt value of 100, and the defined cdf file > GSE35261Norm<-mas5(GSE35261, sc = 100) #Write the data to a csv file > write.exprs(GSE35261Norm, file = "GSE35261Norm_tgt100.csv") The RNA-Seq FPKM processed data set was processed by Eric Minikel of cureFFI.org (http://www.cureffi.org/2013/07/11/tissue-specific-gene-expression-data-based-on-human-bodymap-2-0/). The processing by Eric Minikel prior to our download was as follows: Ensembl BAM files were downloaded. Cufflinks was used to summarize expression levels as FPKM values. Only known transcripts were called. The Human eFP Browser is implemented in Python, and inputs include a Targa-based image, XML control file, gene identifier to microarray probe set lookup and annotation databases, and a gene expression database for the given samples. These components work together to produce an output image, as described in Winter et al. (2007). The eFP Browser open source code is available at http://sourceforge.net/projects/efpbrowser/ and original expression data may be downloaded from GEO or ArrayExpress using the accession numbers on the previous page. Processed data are at https://github.com/asherpasha/eFP_Human_Databases under the DOI of 10.5281/zenodo.45940.

27 in total

1. Towards a knowledge-based Human Protein Atlas.

Authors: Mathias Uhlen; Per Oksvold; Linn Fagerberg; Emma Lundberg; Kalle Jonasson; Mattias Forsberg; Martin Zwahlen; Caroline Kampf; Kenneth Wester; Sophia Hober; Henrik Wernerus; Lisa Björling; Fredrik Ponten
Journal: Nat Biotechnol Date: 2010-12 Impact factor: 54.908

2. Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues.

Authors: Xijin Ge; Shogo Yamamoto; Shuichi Tsutsumi; Yutaka Midorikawa; Sigeo Ihara; San Ming Wang; Hiroyuki Aburatani
Journal: Genomics Date: 2005-08 Impact factor: 5.736

3. Genome-wide profile of pleural mesothelioma versus parietal and visceral pleura: the emerging gene portrait of the mesothelioma phenotype.

Authors: Oluf Dimitri Røe; Endre Anderssen; Eli Helge; Caroline Hild Pettersen; Karina Standahl Olsen; Helmut Sandeck; Rune Haaverstad; Steinar Lundgren; Erik Larsson
Journal: PLoS One Date: 2009-08-07 Impact factor: 3.240

4. GeneCards Version 3: the human gene integrator.

Authors: Marilyn Safran; Irina Dalah; Justin Alexander; Naomi Rosen; Tsippi Iny Stein; Michael Shmoish; Noam Nativ; Iris Bahir; Tirza Doniger; Hagit Krug; Alexandra Sirota-Madi; Tsviya Olender; Yaron Golan; Gil Stelzer; Arye Harel; Doron Lancet
Journal: Database (Oxford) Date: 2010-08-05 Impact factor: 3.451

5. Identification of human metapneumovirus-induced gene networks in airway epithelial cells by microarray analysis.

Authors: X Bao; M Sinha; T Liu; C Hong; B A Luxon; R P Garofalo; A Casola
Journal: Virology Date: 2008-01-29 Impact factor: 3.616

6. The human insulin gene is part of a large open chromatin domain specific for human islets.

Authors: Vesco Mutskov; Gary Felsenfeld
Journal: Proc Natl Acad Sci U S A Date: 2009-09-28 Impact factor: 11.205

7. BioGPS and MyGene.info: organizing online, gene-centric information.

Authors: Chunlei Wu; Ian Macleod; Andrew I Su
Journal: Nucleic Acids Res Date: 2012-11-21 Impact factor: 16.971

8. TiGER: a database for tissue-specific gene expression and regulation.

Authors: Xiong Liu; Xueping Yu; Donald J Zack; Heng Zhu; Jiang Qian
Journal: BMC Bioinformatics Date: 2008-06-09 Impact factor: 3.169

9. Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments.

Authors: Robert Petryszak; Tony Burdett; Benedetto Fiorelli; Nuno A Fonseca; Mar Gonzalez-Porta; Emma Hastings; Wolfgang Huber; Simon Jupp; Maria Keays; Nataliya Kryvych; Julie McMurry; John C Marioni; James Malone; Karine Megy; Gabriella Rustici; Amy Y Tang; Jan Taubert; Eleanor Williams; Oliver Mannion; Helen E Parkinson; Alvis Brazma
Journal: Nucleic Acids Res Date: 2013-12-04 Impact factor: 16.971

10. Transcriptional landscape of the prenatal human brain.

Authors: Jeremy A Miller; Song-Lin Ding; Susan M Sunkin; Kimberly A Smith; Lydia Ng; Aaron Szafer; Amanda Ebbert; Zackery L Riley; Joshua J Royall; Kaylynn Aiona; James M Arnold; Crissa Bennet; Darren Bertagnolli; Krissy Brouner; Stephanie Butler; Shiella Caldejon; Anita Carey; Christine Cuhaciyan; Rachel A Dalley; Nick Dee; Tim A Dolbeare; Benjamin A C Facer; David Feng; Tim P Fliss; Garrett Gee; Jeff Goldy; Lindsey Gourley; Benjamin W Gregor; Guangyu Gu; Robert E Howard; Jayson M Jochim; Chihchau L Kuan; Christopher Lau; Chang-Kyu Lee; Felix Lee; Tracy A Lemon; Phil Lesnar; Bergen McMurray; Naveed Mastan; Nerick Mosqueda; Theresa Naluai-Cecchini; Nhan-Kiet Ngo; Julie Nyhus; Aaron Oldre; Eric Olson; Jody Parente; Patrick D Parker; Sheana E Parry; Allison Stevens; Mihovil Pletikos; Melissa Reding; Kate Roll; David Sandman; Melaine Sarreal; Sheila Shapouri; Nadiya V Shapovalova; Elaine H Shen; Nathan Sjoquist; Clifford R Slaughterbeck; Michael Smith; Andy J Sodt; Derric Williams; Lilla Zöllei; Bruce Fischl; Mark B Gerstein; Daniel H Geschwind; Ian A Glass; Michael J Hawrylycz; Robert F Hevner; Hao Huang; Allan R Jones; James A Knowles; Pat Levitt; John W Phillips; Nenad Sestan; Paul Wohnoutka; Chinh Dang; Amy Bernard; John G Hohmann; Ed S Lein
Journal: Nature Date: 2014-04-02 Impact factor: 49.962

6 in total

1. The Pharmacological Mechanism of Xiyanping Injection for the Treatment of Novel Coronavirus Pneumonia (COVID-19): Based on Network Pharmacology Strategy.

Authors: Liang-Jing Xia; Liang-Ming Zhang; Kun Yang; Tong Chen; Xian-Wen Ye; Zi-Jun Yan
Journal: Evid Based Complement Alternat Med Date: 2022-07-08 Impact factor: 2.650

2. Using literature-based discovery to identify candidate genes for the interaction between myocardial infarction and depression.

Authors: Zhenguo Dai; Qian Li; Guang Yang; Yini Wang; Yang Liu; Zhilei Zheng; Yingfeng Tu; Shuang Yang; Bo Yu
Journal: BMC Med Genet Date: 2019-06-11 Impact factor: 2.103

3. Identification of biological pathways and genes associated with neurogenic heterotopic ossification by text mining.

Authors: Yichong Zhang; Yuanbo Zhan; Yuhui Kou; Xiaofeng Yin; Yanhua Wang; Dianying Zhang
Journal: PeerJ Date: 2020-01-03 Impact factor: 2.984

4. Bundle sheath suberisation is required for C₄ photosynthesis in a Setaria viridis mutant.

Authors: Florence R Danila; Vivek Thakur; Jolly Chatterjee; Soumi Bala; Robert A Coe; Kelvin Acebron; Robert T Furbank; Susanne von Caemmerer; William Paul Quick
Journal: Commun Biol Date: 2021-02-26

5. Network Pharmacology-Based Strategy to Investigate the Pharmacologic Mechanisms of Coptidis Rhizoma for the Treatment of Alzheimer's Disease.

Authors: Xian-Wen Ye; Hai-Li Wang; Shui-Qing Cheng; Liang-Jing Xia; Xin-Fang Xu; Xiang-Ri Li
Journal: Front Aging Neurosci Date: 2022-06-21 Impact factor: 5.702

6. An eFP browser for visualizing strawberry fruit and flower transcriptomes.

Authors: Charles Hawkins; Julie Caruana; Jiaming Li; Chris Zawora; Omar Darwish; Jun Wu; Nadim Alkharouf; Zhongchi Liu
Journal: Hortic Res Date: 2017-06-21 Impact factor: 6.793

6 in total