Literature DB >> 21646343

g:Profiler--a web server for functional interpretation of gene lists (2011 update).

Abstract

Functional interpretation of candidate gene lists is an essential task in modern biomedical research. Here, we present the 2011 update of g:Profiler (http://biit.cs.ut.ee/gprofiler/), a popular collection of web tools for functional analysis. g:GOSt and g:Cocoa combine comprehensive methods for interpreting gene lists, ordered lists and list collections in the context of biomedical ontologies, pathways, transcription factor and microRNA regulatory motifs and protein-protein interactions. Additional tools, namely the biomolecule ID mapping service (g:Convert), gene expression similarity searcher (g:Sorter) and gene homology searcher (g:Orth) provide numerous ways for further analysis and interpretation. In this update, we have implemented several features of interest to the community: (i) functional analysis of single nucleotide polymorphisms and other DNA polymorphisms is supported by chromosomal queries; (ii) network analysis identifies enriched protein-protein interaction modules in gene lists; (iii) functional analysis covers human disease genes; and (iv) improved statistics and filtering provide more concise results. g:Profiler is a regularly updated resource that is available for a wide range of species, including mammals, plants, fungi and insects.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Multiprotein Complexes

Year: 2011 PMID： 21646343 PMCID： PMC3125778 DOI： 10.1093/nar/gkr378

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

High-throughput technologies continue to transform biological and medical research. A decade after finalizing the human genome sequence (1), orchestrated efforts focus on exploring human variation and history (2,3), exhaustive identification of functional DNA elements (4), deciphering complex disease traits (5) and cancer (6), among others. Modern researchers routinely apply high-throughput tools like next-generation sequencing to unravel secrets of genomics, transcriptomics and proteomics and explain life, disease and death at the molecular level. As biotechnologies become exponentially more efficient and cost-effective, promising perspectives of personalized genomics and medicine emerge. A common denominator of current high-throughput techniques is the abundance of data resulting from any single experiment. Thousands or millions of reads from genome or proteome-wide assays often highlight many hundreds of genes and proteins that are particularly informative of the experiment. Meaningful biological interpretation of these results is a crucial step. Functional profiling combines computational and statistical methods to link collections of biomolecules (genes, transcripts, proteins) with known, statistically significant functional features. Functional profiling is an active area of research and numerous bioinformatics tools allow researchers to interpret their gene lists [e.g. (7–13), see Huang et al. (14) for a review]. Here, we are proud to present an update note of g:Profiler (15), a web server for functional interpretation and integration of gene lists in the context of versatile biological evidence. We combine rapid algorithms and rich visualization for interpreting gene lists, ranked lists and gene list collections. Our automated interpretative analysis takes advantage of systematized knowledge about gene and protein function, including Gene Ontology (GO) (16), Human Phenotype Ontology (HPO) (17) and biological pathway databases KEGG (18) and Reactome (19). We also cover experimental data like protein–protein interactions from BioGrid (20) and target sites of microRNAs [MicroCosm database (21)] and transcription factors [Transfac (22)]. Even more information can be retrieved from a large collection of microarrays from ArrayExpress (23) via gene coexpression searches, and from dozens of related organisms via gene homology searches. Frequent data updates from multispecies databases of Ensembl (24,25) and GOA (26) guarantee stable and consistent gene annotations. g:Profiler is available for 85 organisms, including common model organisms, mammals, plants, insects and fungi.

g:Profiler WEB SERVER

The g:Profiler web server comprises a collection of five closely integrated tools that perform functional profiling (g:GOSt, g:Cocoa), biomolecule integration tasks (g:Convert) and genomic data mining (g:Sorter, g:Orth). Each tool is briefly described below. A summary of existing and novel g:Profiler features can be found in Table 1.

Table 1.

Summary of g:Profiler features in 2007 and novel additions in 2011

Feature	g:Profiler 2007	Added in g:Profiler 2011
Supported species	31 species Ensembl	85 species Ensembl and Ensembl genomes, mammals, fungi, insects, plants, etc.
Biological evidence considered	Gene ontology Pathways databases (KEGG, Reactome) Transcription factor motifs (Transfac) Gene expression similarity search (GEO)	microRNA target sites (MicroCosm) Disease genes (HPO) Protein–protein interactions (BioGrid) Gene expression similarity search (ArrayExpress)
Input for enrichment analysis	Gene sets Ordered gene lists Functional groups	Chromosomal regions Multiple gene lists or regions
Methods	Hypergeometric test Multiple testing (g:SCS, FDR, Bonferroni)	Customized background set Enrichment of interaction network modules
Tools	g:GOSt—gene group functional profiling g:Convert—gene ID converter g:Sorter—expression similarity search g:Orth—orthology search	g:Cocoa—compact comparison of gene annotations

Summary of g:Profiler features in 2007 and novel additions in 2011 31 species Ensembl 85 species Ensembl and Ensembl genomes, mammals, fungi, insects, plants, etc. Gene ontology Pathways databases (KEGG, Reactome) Transcription factor motifs (Transfac) Gene expression similarity search (GEO) microRNA target sites (MicroCosm) Disease genes (HPO) Protein–protein interactions (BioGrid) Gene expression similarity search (ArrayExpress) Gene sets Ordered gene lists Functional groups Chromosomal regions Multiple gene lists or regions Hypergeometric test Multiple testing (g:SCS, FDR, Bonferroni) Customized background set Enrichment of interaction network modules g:GOSt—gene group functional profiling g:Convert—gene ID converter g:Sorter—expression similarity search g:Orth—orthology search g:Cocoa—compact comparison of gene annotations

g:GOSt—gene group functional profiling

g:GOSt, the core of the g:Profiler, performs statistical enrichment analysis to provide interpretation to user-provided gene lists (Figure 1A). We study multiple sources of functional evidence, including Gene Ontology terms, biological pathways and regulatory motifs for transcription factors. In the 2011 update of g:Profiler, we added support for microRNA motifs, human disease annotations and protein–protein interactions (see below).

Figure 1.

g:Profiler tools (A–E) and data streams (arrows) for constructing analysis pipelines. (A) g:GOSt—functional profiling of gene lists, with sources of input shown on the left [also see (F)]. Output: genes are shown horizontally and annotations vertically, colours denote classes of functional evidence. (B) g:Cocoa—functional profiling of multiple gene lists. Output: gene lists are shown horizontally and annotations vertically, colours (intensity of red) denote strength of enrichment. (C) g:Convert—ID mapping service for kinds of molecules and databases. (D) g:Sorter—gene-based expression similarity search from microarrays. (E) g:Orth—mapping homologous genes of related species. (F) Identification of enriched protein–protein interaction modules in gene lists. A fragment of g:GOSt output includes a module of interacting query genes (core, red) and non-query genes interacting with the core module (neighbourhood, black). Analysis of gene lists and significance estimation is carried out in our rapid C++ library. g:Profiler uses the widely applied hypergeometric distribution for significance estimation, and combines it with our custom multiple testing correction procedure g:SCS described earlier. This method is designed to eliminate false positives while considering the unevenly distributed structure of functional annotations, as confirmed in our simulations (15). The standard alternatives false discovery rate (FDR) and Bonferroni correction are also available. The main input of the tool is a mixed list of genes or proteins, and the user may choose to treat the list as unordered or ordered. The user may also insert a GO or pathway ID to for annotated genes or specify chromosomal coordinates for genes in a region (see below). In case of an unordered gene list, a single enrichment analysis identifies all over-represented functional terms. The ordered option is useful when the genes are placed in some biologically meaningful order, for instance according to the differential expression in a given microarray experiment. g:Profiler then performs incremental enrichment analysis with increasingly larger numbers of genes from the top of the list (15). This optimization procedure identifies specific functional terms that associate to most dramatic changes in gene expression, as well as broader terms that characterize the gene set as a whole. The main output of g:GOSt is a tabular graphical representation of detected functional enrichments in rows, input genes in columns and gene annotations in table cells. The visualization includes a broad spectrum of information for further consideration, including enrichment P-values, gene-to-term mappings, colour-coded evidence codes, hierarchy of enriched functional terms and peak enrichments in ordered gene lists. As a new feature, we visualize protein–protein interaction modules that are enriched in input gene lists (see below). Direct links with other g:Profiler tools, simple programmable interfaces and alternative output formats (textual, spreadsheet) allow smooth creation of analysis pipelines.

g:Cocoa—compact comparison of annotations (new in 2011)

In the 2011 g:Profiler update, we have added a new tool to the web server. g:Cocoa stands for ‘COmpact COmparison of gene Annotations’ and performs comparative analysis of multiple gene lists (Figure 1B). The input of g:Cocoa is presented in a multiline format that resembles FASTA, allowing users to construct and name multiqueries easily. Its output is a tabular graphic similar to g:GOSt, except that each column stands for a gene list rather than a single gene, and cell colours highlight significant enrichments. The compact output of g:Cocoa allows a condensed and minimal comparison of functional enrichments in dozens of gene lists. In addition to enrichment analysis, g:Cocoa performs two further tasks for gene list comparison. First, one may use the tool to identify most meaningful gene lists in a multilist query. In that case, g:Cocoa evaluates the total statistical significance across all functional categories and orders the results accordingly. Second, one may use the tool to evaluate the functional similarity between gene lists. In that case, g:Cocoa compares the first list and all others, and orders the lists according to similarity. Total significance is quantified with log-sums of P-values, and the Euclidean distance over functional enrichments is used for similarity estimation. Such analyses have useful applications, for instance in comparing the performance of several bioinformatics algorithms in retrieving biologically meaningful results (27).

g:Convert—gene ID converter

g:Convert is a gene identifier conversion tool (Figure 1C). It uses information from Ensembl databases (24,25) to handle hundreds of types of IDs for genes, proteins, transcripts, microarray probesets for many species, experimental platforms and biological databases. The most important feature of g:Convert is flexibility—it accepts a mixed list of genes (identifiers, names, accessions) and recognizes their types automatically. Besides providing a front-end service for scientists through tabulated, textual and spreadsheet outputs, g:Convert is a unified interface for all g:Profiler tools and several others. g:Convert accepts a gene list as input, and provides a table with converted identifiers, gene names and short descriptions as output. Our ID mapping strategy describes all biomolecules as triplets of Ensembl genes, transcripts and proteins, and stores these in a bidirectional hashtable-based index for fast performance. As a new feature in 2011, g:Convert and other g:Profiler tools unambiguously support several classes of numeric identifiers, notably EntrezGene accessions and Affymetrix Exon Array probesets. The relevance of g:Convert to the community is illustrated by the fact that ∼30% of all web queries to g:Profiler involve g:Convert.

g:Sorter—expression similarity search

g:Sorter is a search tool for gene expression profiles (Figure 1D). It allows users to find similar gene expression profiles to their gene of interest in a large collection of public microarray datasets from ArrayExpress (23). g:Sorter is conceptually similar to our Multi Experiment Matrix (MEM) software that performs global coexpression search on large microarray collections (28). g:Sorter supports several commonly used distance measures and directions for expression similarity assessment on collections of microarrays from a single experiment. For instance, one may use positive Pearson correlation to find coexpressed genes or anticorrelation to discover genes with reverse regulatory modulation. g:Sorter analysis has been applied successfully in a number of occasions, for instance in uncovering the regulatory network of lysosomal biogenesis (29). The main input of g:Sorter is a gene or a protein of interest (the query gene) and a microarray dataset ID from a large collection. g:Sorter links the query gene to microarray probesets and performs similarity searches. Its output features one or more gene lists that correspond to the identified coexpression or antiexpression patterns. In the 2011 g:Profiler update, we have added a convenient keyword-based search for finding relevant microarray datasets. Our search database now contains >4000 microarray experiments for 16 species.

g:Orth—orthology search

g:Orth is a tool for mapping homologous genes across related organisms (Figure 1E). The input of the tool is a list of genes of an organism. Given a selected target organism, g:Orth retrieves the genes of the target that are similar in sequence to the initial genes. g:Orth handles one-to-one and more complex orthology mappings using data from Ensembl and Ensembl Genomes. The orthology search in g:Orth has several applications in functional analysis. For instance, researchers of model organisms may retrieve human homologs of genes of interest, to study their hypotheses in the context of human health (30).

NEW DEVELOPMENTS IN g:Profiler IN 2011

Enriched protein–protein interaction modules in gene lists

g:Profiler now allows interpretation of gene lists in the context of protein–protein interaction networks (Figure 1F). We take advantage of a novel statistical strategy for identifying significant modules of interactions (subnetworks) in gene lists. Our strategy comprises a set of ‘core’ genes and a set of ‘neighbourhood’ genes. The core set includes all genes in the original input list that are connected by an interaction. The latter set includes all immediate interaction partners of the core set that do not belong to the original input. We use the hypergeometric test to decide if the input list contains significantly more interacting genes than expected, in contrast to the surrounding network neighbourhood. Intuitively, this strategy captures interaction modules (cores) in which a significant number of genes is present in the input gene list. g:Profiler also includes an extension of the above strategy for ordered gene lists. In this case, multiple cores and neighbourhoods are considered similarly to our incremental enrichment analysis, and the core with the lowest P-value is reported as the final result. g:Profiler performs subnetwork enrichment analysis on single gene lists (g:GOSt) as well as multiple gene lists (g:Cocoa). We currently cover >300 000 protein–protein interactions from the Biogrid database (20), for seven species including human, mouse, fly and yeast. We provide a visual representation of the enriched subnetwork and its neighbourhood, and show related genes as text for further processing. Such analysis has several useful applications. For instance, we have studied yeast protein modules and physical complexes that are dysregulated in transcription factor knockout mutants (31). An example of such analysis is shown below (Case Study 2, Supplementary Figure S1).

Enrichment analysis of disease genes in human and model organisms

The g:Profiler 2011 update provides means to interpret gene lists in the context of human disease. We have imported gene annotations from the HPO, a standardized vocabulary of phenotypic abnormalities encountered in human disease (17). HPO contains information from several important sources, notably including the Online Mendelian Inheritance in Man (OMIM) (32). Below we present an example analysis that takes advantage of these data for interpreting evolution in contemporary humans. Due to ethical constraints, a significant portion of research on human disease is conducted in model organisms like mouse and rat. As HPO only describes human genes at the time, we have extended these annotations to model organisms using sequence homology information from Ensembl. Sequence similarity alone is not sufficient for reliable inference of translational disease models. However, we believe that the detailed resources of HPO, combined with stronger sources of evidence such as curated pathways, provide useful pointers for interpreting gene lists of many versatile organisms where detailed knowledge of disease and function is still lacking.

Functional analysis of single nucleotide polymorphisms and chromosomal regions

The 2011 update of g:Profiler allows direct functional analysis of single nucleotides and chromosomal regions. Users may submit entire collections of chromosomal coordinates or ranges, and g:Profiler automatically retrieves gene lists that are located in corresponding regions. Depending on the tool, the genes will be subjected to functional profiling (g:GOSt), orthology search (g:Orth) or gene ID mapping (g:Convert). Furthermore, g:Cocoa provides a special format to insert and study multiple collections of chromosomal regions. Such an analysis has numerous useful applications. One may explore sequence variations like single nucleotide polymorphisms (SNP), copy number variations (CNVs) and chromosomal rearrangements that are responsible for human variation and history, complex disease traits and cancer, among others. These data are rapidly accumulated by genome sequencing efforts (2,3), genome-wide association studies (5) and cancer genomics projects (6). Below we present a case study to investigate the functional significance of genomic regions that have undergone positive selection in contemporary humans (Case Study 1, Figure 2).

Figure 2.

Functional analysis of positively selected regions in modern human genome versus Neanderthal genome. Input genomic loci were grouped by chromosome and analysed with g:Cocoa. R was used for visualization. Enriched annotations are shown vertically and chromosomes horizontally. Coloured cells indicate statistically significant enrichments of corresponding functions (P < 0.05, red tones represent greater significance). Annotation axis labels are grouped and coloured according to data source. To avoid redundant annotations, only most significant function of any hierarchically related group is shown.

Custom statistical background for enrichment detection

In this g:Profiler update, we have improved our statistical methods and added a feature for customizing gene background in g:GOSt and g:Cocoa. Enrichment analysis involves a background set that determines the expected proportions of function-associated and non-associated genes in the input list. The default background usually corresponds to all known genes of a genome. However, such an approach may give misleading results when the user is initially focusing on a narrower set of genes. Relevant examples include microarrays with tissue-specific genes, or studies involving specific genes like transcription factors. While a default analysis of such genes would inevitably recover annotations like liver development or transcriptional regulation, customized background set of initial genes would help reduce global biases and uncover more specific annotations. An example illustrating customized backgrounds is shown below (Case Study 3, Figure 3).

Figure 3.

Comparison of standard, global background gene list (left plot) and customized list (right plot) in functional enrichment analysis of nine core cell cycle transcription factors in yeast. Top 10 enriched categories are shown vertically and log-scale P-values horizontally. R was used for visualization. The global analysis reveals general functional enrichments of transcriptional regulation (black bars), while focused analysis with the custom background of all yeast TFs shows specific cell cycle-related terms (grey bars). g:Profiler users may submit their backgrounds as mixed lists of gene IDs. Besides user-defined backgrounds, a few predefined backgrounds are available for precise analysis of most popular microarray platforms. Analysis with a custom background automatically discards unrelated genes and annotations to provide unbiased statistics.

Filtering of GO electronic annotations

Since the 2011 update of g:Profiler, we support filtering of GO annotations according to the quality of supporting evidence. A significant proportion of GO annotations result from automated analysis and are assigned the IEA evidence code (Inferred from Electronic Annotation). This concerns ∼36% of all functional annotations in current g:Profiler, and this proportion is even more dramatic in human data (77% of 2.7 million annotations). While IEA annotations are valuable in mapping gene functions, manual curation of experimental and computational data is generally of higher confidence. Therefore, it is sometimes advisable to exclude electronically inferred annotations from enrichment analysis and focus on stronger evidence. Excluding IEA annotations may also help reduce bias towards abundant and ubiquitous housekeeping genes like the ribosomes. The IEA filter is enabled in g:GOSt and g:Cocoa via a single checkbox, and corresponding enrichment analyses account for altered structure of GO annotations.

g:Profiler for 85 species

The 2011 version of g:Profiler contains data for 85 species and this number is increasing with nearly every data reload. Besides mammalian and other species from Ensembl (24), we have expanded our repertoire to representatives of metazoa, plants and fungi from the Ensembl Genomes initiative (25). To our knowledge, g:Profiler is the only functional profiling software that supports such a broad collection of species. The current version of g:Profiler includes information for 50 million gene to function annotations, and provides functional profiling tools to different research communities, e.g. those of agricultural and plant genomics (33,34).

Advanced methods for automated analysis

In addition to improved experience of interactive analysis, 2011 g:Profiler update includes developments in the area of automated analysis and application programmable interfaces. All g:Profiler tools include text-based, tabulated and spreadsheet (Excel) output formats. Minimal output format with no HTML headers allows creation of software that interacts with g:Profiler. As an example of g:Profiler integration with programming languages, we have now implemented a simple package for GNU R that accesses the g:Profiler web server to mediate functionality of g:GOSt, g:Cocoa and g:Convert. The R package is available for download on our web site. We have also created an web service that is compatible with ENFIN Encore workflows (35). Since the 2011 update, we maintain archived versions of g:Profiler software and data to provide comparability to earlier studies.

CASE STUDIES

Functional analysis of positively selected regions in the human genome

To demonstrate the new features of g:Profiler, we present a case study with genomic regions that have undergone significant evolution during early human history. We retrieved 212 loci of positive selection from the recent analysis of the Neanderthal genome (3), grouped these into chromosome-specific lists and studied the functional enrichments of localized genes with the g:Cocoa chromosomal multiquery feature. These results were retrieved through the g:Profiler R interface, filtered for non-redundant terms and visualized using a custom script. The results of this case study include several interesting functional features that may provide insights into the evolution of modern humans (Figure 2). For instance, GO functions like ‘defense response to bacterium’ (chr. 6, P = 10−8) and ‘viral reproduction’ (chr. 18, P = 10−7) as well as KEGG pathway for ‘systemic lupus erythematosus’ (P = 10−12) may indicate accumulated disease resistance. A number of enriched HPO disease gene annotations such as ‘malaligned carpal bone’ (chr. 20, P = 10−3) as well as the GO process ‘appendage morphogenesis’ (chr. 2, P = 10−6) may associate to the development of modern human hands. In addition, this analysis provides pointers to the evolution of gene regulation at multiple levels of organization, including epigenetic (GO cell component nucleosome, chr. 1, P = 10−15), transcriptional (six enrichments of transcription factor binding sites, P < 10−3) and post-transcriptional regulation (four enrichments of microRNA target sites, P < 0.02). In summary, these results provide interesting hypotheses for further study and show the usefulness of new g:Profiler features.

Identification of protein complexes in regulator perturbation data

We also provide an example of gene list interpretation in the context of protein–protein interactions. We previously studied differentially expressed genes in yeast transcription factor perturbations (ΔTF) and identified affected protein–protein interaction modules (31), using the method described above. Here, we investigated a collection of high-confidence protein complexes from a recent analysis (36) to interpret our ΔTF data, using the g:Profiler R interface and a custom script for filtering and visualization. In particular, we identified protein complexes that were fully listed in g:Profiler analysis of ΔTF data. This case study highlights the presence of multiple important protein complexes among our enriched modules (Supplementary Figure S1). For instance, the global TFIIA complex is required for all transcription by RNA polymerase II and is therefore important in all processes involving transcription. In our data, all subunits of TFIIA are affected in multiple perturbation strains, including Δgal11, a subunit of the mediator complex that interacts with RNA Polymerase II. TFIIA is also dysregulated in the induced deletions of Rap1 and Gcr1. The latter are essential transcription factors with numerous global roles. In summary, the modules identified by our methods frequently overlap with actual protein complexes and may provide useful pointers for biologically meaningful gene list interpretation.

Application of background gene lists for improving specificity

The usefulness of custom background gene lists is shown in the following example. We studied the functional enrichments of nine core cell cycle transcription factors (TFs) in yeast (Mcm1, Mbp1, Swi4, Swi6, Fkh1, Fkh2, Ace2, Swi5, Ndd1), using the standard background (all yeast genes, ∼6000), as well as a user-defined background list (yeast transcription factors, ∼300). The former approach is reasonable when genes of interest are sampled from the genome-wide collection of genes, while the latter approach is more meaningful when focusing a priori on a narrow, predefined gene group such as transcription factors. The comparison of top 10 results from the two queries highlights the differences of the two approaches (Figure 3, custom R graphic). The first, global analysis of the 9 TFs results in general functional annotations such as ‘positive regulation of transcription’ (P = 10−11) that would be expected from any such group of TFs. Annotations of this specificity are most informative given that the focus is the whole genome. When the analysis is restricted to the TF set beforehand, a customized background set allows one to bypass the a priori knowledge of general TF function and retrieve more specific annotations. In our example, the second analysis results in specific annotations like ‘interphase of mitotic cell cycle’ (P = 10−8). In summary, utilizing a biologically meaningful custom background can improve specificity in enrichment analysis.

DISCUSSION

With the ongoing development and maintenance of g:Profiler, we aim to provide a comprehensive functional profiling service that combines powerful visualization, sophisticated algorithms and up-to-date genomic and functional data. For this update, we implemented several improvements to address the needs of new and experienced g:Profiler users alike. Improved filtering and statistics produce results of greater confidence and more customization power. Multilist enrichment analysis in g:Cocoa delivers results and visualization in a condensed overview. Our special focus in this g:Profiler update relates to developments in biomedical and clinical research that are likely to lead to personalized genomics and medicine in the future. First, we include disease gene annotations in our enrichment tests to enable and promote discovery of associations to human health. Second, we incorporate algorithms for interpreting gene lists in the context of molecular interaction networks, as the concepts of biological systems and networks have become ever more important in the past decade. Last but not least, we provide direct means to study the functional enrichments of user-defined DNA regions and even single nucleotides through corresponding genes. Functional analysis of these data is likely to become increasingly important, as genome and transcriptome sequencing, genome-wide association studies and other biotechnologies reveal more DNA-based evidence to reason about human variation, history, complex disease traits and cancer. Since the publication of g:Profiler in 2007, the tool has found extensive use in a number of projects in stem cell research (37–39), cancer genomics (40,41), machine learning and yeast genomics (42,43), microRNA regulatory networks (44), among others. g:Profiler is related to a whole family of bioinformatics tools that apply its functional profiling and biomolecule mapping methods for various tasks, like global coexpression analysis in large microarray collections [MEM (28)], module identification and interpretation in biological networks [GraphWeb (45)], transcriptome analysis in pathways [KEGGanim (46)] and optimal gene expression clustering with functional profiling [VisHiC (47)]. Future developments of g:Profiler will further elaborate on new and emerging sources of biomedical evidence. Functional interpretation of gene lists will greatly benefit from the comprehensive regulatory maps from ENCODE and modENCODE. Another important area of development involves deeper integration with network-based sources of evidence. Novel methods need to explore massive amounts of data from the next-generation sequencing of tumour catalogues, human variation studies and personalized genomics.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

EU FP6 COBRED; ENFIN; ETF7437 MEM; EITSA; ERDF through EXCS funding projects; M.Curie Biostar, U.Agur and A.Lind fellowships to (J.R.). Funding for open access charge: University of Tartu, Center of Excellence in Computer Science (EXCS). Conflict of interest statement. None declared.

47 in total

1. Genome-wide pattern of TCF7L2/TCF4 chromatin occupancy in colorectal cancer cells.

Authors: Pantelis Hatzis; Laurens G van der Flier; Marc A van Driel; Victor Guryev; Fiona Nielsen; Sergei Denissov; Isaäc J Nijman; Jan Koster; Evan E Santo; Willem Welboren; Rogier Versteeg; Edwin Cuppen; Marc van de Wetering; Hans Clevers; Hendrik G Stunnenberg
Journal: Mol Cell Biol Date: 2008-02-11 Impact factor: 4.272

2. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

3. Ensembl Genomes: extending Ensembl across the taxonomic space.

Authors: P J Kersey; D Lawson; E Birney; P S Derwent; M Haimel; J Herrero; S Keenan; A Kerhornou; G Koscielny; A Kähäri; R J Kinsella; E Kulesha; U Maheswari; K Megy; M Nuhn; G Proctor; D Staines; F Valentin; A J Vilella; A Yates
Journal: Nucleic Acids Res Date: 2009-11-01 Impact factor: 16.971

4. Comprehensive transcriptome analysis of mouse embryonic stem cell adipogenesis unravels new processes of adipocyte development.

Authors: Nathalie Billon; Raivo Kolde; Jüri Reimand; Miguel C Monteiro; Meelis Kull; Hedi Peterson; Konstantin Tretyakov; Priit Adler; Brigitte Wdziekonski; Jaak Vilo; Christian Dani
Journal: Genome Biol Date: 2010-08-03 Impact factor: 13.583

5. ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments.

Authors: Helen Parkinson; Ugis Sarkans; Nikolay Kolesnikov; Niran Abeygunawardena; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Ele Holloway; Natalja Kurbatova; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Gabriella Rustici; Anjan Sharma; Eleanor Williams; Tomasz Adamusiak; Marco Brandizi; Nataliya Sklyar; Alvis Brazma
Journal: Nucleic Acids Res Date: 2010-11-10 Impact factor: 16.971

6. Ensembl 2011.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

7. Reactome knowledgebase of human biological pathways and processes.

Authors: Lisa Matthews; Gopal Gopinath; Marc Gillespie; Michael Caudy; David Croft; Bernard de Bono; Phani Garapati; Jill Hemish; Henning Hermjakob; Bijay Jassal; Alex Kanapin; Suzanna Lewis; Shahana Mahajan; Bruce May; Esther Schmidt; Imre Vastrik; Guanming Wu; Ewan Birney; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2008-11-03 Impact factor: 16.971

8. KEGG spider: interpretation of genomics data in the context of the global gene metabolic network.

Authors: Alexey V Antonov; Sabine Dietmann; Hans W Mewes
Journal: Genome Biol Date: 2008-12-18 Impact factor: 13.583

9. Fast approximate hierarchical clustering using similarity heuristics.

Authors: Meelis Kull; Jaak Vilo
Journal: BioData Min Date: 2008-09-22 Impact factor: 2.522

10. Onto-Tools: new additions and improvements in 2006.

Authors: Purvesh Khatri; Calin Voichita; Khalid Kattan; Nadeem Ansari; Avani Khatri; Constantin Georgescu; Adi L Tarca; Sorin Draghici
Journal: Nucleic Acids Res Date: 2007-06-21 Impact factor: 16.971

276 in total

1. An exploratory genome-wide analysis of genetic risk for alcoholic hepatitis.

Authors: James J Beaudoin; Nanye Long; Suthat Liangpunsakul; Puneet Puri; Patrick S Kamath; Vijay Shah; Arun J Sanyal; David W Crabb; Naga P Chalasani; Thomas J Urban
Journal: Scand J Gastroenterol Date: 2017-08-04 Impact factor: 2.423

2. Research resource: interactome of human embryo implantation: identification of gene expression pathways, regulation, and integrated regulatory networks.

Authors: Signe Altmäe; Jüri Reimand; Outi Hovatta; Pu Zhang; Juha Kere; Triin Laisk; Merli Saare; Maire Peters; Jaak Vilo; Anneli Stavreus-Evers; Andres Salumets
Journal: Mol Endocrinol Date: 2011-11-10

3. Knowledge-based analysis of functional impacts of mutations in microRNA seed regions.

Authors: Anindya Bhattacharya; Yan Cui
Journal: J Biosci Date: 2015-10 Impact factor: 1.826

4. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

Authors: Jianguo Xia; Erin E Gill; Robert E W Hancock
Journal: Nat Protoc Date: 2015-05-07 Impact factor: 13.491

5. SND1 transcription factor-directed quantitative functional hierarchical genetic regulatory network in wood formation in Populus trichocarpa.

Authors: Ying-Chung Lin; Wei Li; Ying-Hsuan Sun; Sapna Kumari; Hairong Wei; Quanzi Li; Sermsawat Tunlaya-Anukit; Ronald R Sederoff; Vincent L Chiang
Journal: Plant Cell Date: 2013-11-26 Impact factor: 11.277

6. Mycophenolate Mofetil Treatment of Systemic Sclerosis Reduces Myeloid Cell Numbers and Attenuates the Inflammatory Gene Signature in Skin.

Authors: Monique Hinchcliff; Diana M Toledo; Jaclyn N Taroni; Tammara A Wood; Jennifer M Franks; Michael S Ball; Aileen Hoffmann; Sapna M Amin; Ainah U Tan; Kevin Tom; Yolanda Nesbeth; Jungwha Lee; Madeleine Ma; Kathleen Aren; Mary A Carns; Patricia A Pioli; Michael L Whitfield
Journal: J Invest Dermatol Date: 2018-01-31 Impact factor: 8.551

7. A misplaced lncRNA causes brachydactyly in humans.

Authors: Philipp G Maass; Andreas Rump; Herbert Schulz; Sigmar Stricker; Lisanne Schulze; Konrad Platzer; Atakan Aydin; Sigrid Tinschert; Mary B Goldring; Friedrich C Luft; Sylvia Bähring
Journal: J Clin Invest Date: 2012-10-24 Impact factor: 14.808

8. A transcriptional and metabolic signature of primary aneuploidy is present in chromosomally unstable cancer cells and informs clinical prognosis.

Authors: Jason M Sheltzer
Journal: Cancer Res Date: 2013-09-16 Impact factor: 12.701

9. Dendritic cell expression of the C-type lectin receptor CD209a: A novel innate parasite-sensing mechanism inducing Th17 cells that drive severe immunopathology in murine schistosome infection.

Authors: Holly E Ponichtera; Miguel J Stadecker
Journal: Exp Parasitol Date: 2015-04-23 Impact factor: 2.011

10. Methotrexate-induced neurotoxicity and leukoencephalopathy in childhood acute lymphoblastic leukemia.

Authors: Deepa Bhojwani; Noah D Sabin; Deqing Pei; Jun J Yang; Raja B Khan; John C Panetta; Kevin R Krull; Hiroto Inaba; Jeffrey E Rubnitz; Monika L Metzger; Scott C Howard; Raul C Ribeiro; Cheng Cheng; Wilburn E Reddick; Sima Jeha; John T Sandlund; William E Evans; Ching-Hon Pui; Mary V Relling
Journal: J Clin Oncol Date: 2014-02-18 Impact factor: 44.544