| Literature DB >> 17478515 |
Jüri Reimand1, Meelis Kull, Hedi Peterson, Jaanus Hansen, Jaak Vilo.
Abstract
g:Profiler (http://biit.cs.ut.ee/gprofiler/) is a public web server for characterising and manipulating gene lists resulting from mining high-throughput genomic data. g:Profiler has a simple, user-friendly web interface with powerful visualisation for capturing Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual gene levels. Besides standard multiple testing corrections, a new improved method for estimating the true effect of multiple testing over complex structures like GO has been introduced. Interpreting ranked gene lists is supported from the same interface with very efficient algorithms. Such ordered lists may arise when studying the most significantly affected genes from high-throughput data or genes co-expressed with the query gene. Other important aspects of practical data analysis are supported by modules tightly integrated with g:Profiler. These are: g:Convert for converting between different database identifiers; g:Orth for finding orthologous genes from other species; and g:Sorter for searching a large body of public gene expression data for co-expression. g:Profiler supports 31 different species, and underlying data is updated regularly from sources like the Ensembl database. Bioinformatics communities wishing to integrate with g:Profiler can use alternative simple textual outputs.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17478515 PMCID: PMC1933153 DOI: 10.1093/nar/gkm226
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) A typical user input and output scenario of g:Profiler. User inserts a set of genes in the main text window and optionally adjusts query parameters. Results are provided either graphically or in textual format. Genes are presented in columns, and significant functional categories in rows. The analysis of an ordered list shows the length of the most significant query head. GO annotation evidence codes are coloured like a heat map, showing the strength of evidence between a gene and GO term. The legend is provided at the top of the page. It is displayed when the user clicks on the tree icon on the results page. The g:Orth, g:Convert and G:Sorter tools are directly linked to relevant genes from the current query. Additional examples are available in Supplementary Data. (B) Hierarchical relations between the resulting GO categories can be browsed by clicking on corresponding icons.
Overview of the functionality and data sources for different organisms in g:Profiler. Entries with (1) have less than 10 000 related GO associations
Figure 2.Comparison of multiple testing corrections for H. sapiens GO annotations for 2000 randomly generated queries for each query size between 1 and 150. X-axis shows the input query size, and Y-axis represents a P-value of the significance threshold of the single set comparison. Lines represent the thresholds for estimated significance cut-off. Frequency histogram of GO annotation set sizes is shown in logarithmic scale. The SCS threshold (white line) follows the 95% quantile of empirically observed simulated values (black line) very closely. In comparison, the more conservative Bonferroni (blue line) or FDR (purple line)-based estimations are shown. Besides the empirical estimation of the 95% significance level, the colour coding on the background shows how frequently at least one such P-value has been achieved from the randomly generated queries.