| Literature DB >> 17526521 |
Christina Backes1, Andreas Keller, Jan Kuentzer, Benny Kneissl, Nicole Comtesse, Yasser A Elnakady, Rolf Müller, Eckart Meese, Hans-Peter Lenhof.
Abstract
We present a comprehensive and efficient gene set analysis tool, called 'GeneTrail' that offers a rich functionality and is easy to use. Our web-based application facilitates the statistical evaluation of high-throughput genomic or proteomic data sets with respect to enrichment of functional categories. GeneTrail covers a wide variety of biological categories and pathways, among others KEGG, TRANSPATH, TRANSFAC, and GO. Our web server provides two common statistical approaches, 'Over-Representation Analysis' (ORA) comparing a reference set of genes to a test set, and 'Gene Set Enrichment Analysis' (GSEA) scoring sorted lists of genes. Besides other newly developed features, GeneTrail's statistics module includes a novel dynamic-programming algorithm that improves the P-value computation of GSEA methods considerably. GeneTrail is freely accessible at http://genetrail.bioinf.uni-sb.de.Entities:
Mesh:
Year: 2007 PMID: 17526521 PMCID: PMC1933132 DOI: 10.1093/nar/gkm323
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of the identifier types currently supported by GeneTrail
| NCBI GeneID | 5894, 11186, 11848 |
| NCBI NP/XP number (Protein RefSeq) | NP_006261, XP_941900, NP_872606 |
| NCBI Protein GI | 28201876, 113431221, 121114292 |
| NCBI NM/XM number (RNA RefSeq) | NM_018993, NM_008284, NM_021168 |
| NCBI RNA GI | 54792783, 51093847, 91105420 |
| SwissProt/UniProt | Q9NZD4, P55008, O15155 |
| UniGene | Hs.652097, Hs.652094, Hs.652089 |
| Ensembl Gene ID | ENSG00000003147, ENSG00000005801 |
| SGD yeast ORF ID | YCR024C-B, YCR108C, YLR157W-E |
| Amersham Human Whole Genome | GE200018, GE897528, GE519380 |
| Affymetrix HG-U133A | 1487_at, 1320_at, 1316_at |
| Affymetrix HG-U95A | 1014_at, 1015_s_at, 1017_at |
| Affymetrix HG-U133 Plus 2.0 | 1552258_at, 1487_at, 1438_at |
| Affymetrix HG-U133B | 200017_at, 200018_at, 200013_at |
Figure 1.Visualization of different running sum statistics when applying a ‘Gene Set Enrichment Analysis’. The running sum (y-axis) is shown as function of the index in the sorted list (x-axis). Part A and B of the figure illustrate a ‘mountain-like shape’ for top ranked genes. In part C, a ‘valley-like shape’ for bottom ranked genes is shown. Part D illustrates a ‘zigzag’ shape which is not statistically significant; the genes are randomly distributed.
Figure 2.This figure exemplifies the workflow of the GeneTrail server. The five steps needed to perform an ‘Over-Representation Analysis’ are shown in consecutive order. First, the organism and the identifier type have to be selected. Afterwards, a test set should be uploaded and a reference set can be uploaded or selected from a pre-defined list. Finally, the user can specify the desired analysis methods and the required parameters. For each step, we show small screenshots in the background taken from the GeneTrail user interface.
Figure 3.HTML view excerpt of the output of the ORA performed on the example set provided on the GeneTrail web server homepage. The illustration shows the two significant KEGG pathway categories with the highest P-value. The red arrows denote the over-representation of these two categories. If available, the categories and the genes are connected via weblink to their external data sources.
Figure 4.Graph visualization of the output of the ORA performed on the example set provided on the GeneTrail web server homepage. The left hand side shows an excerpt of the complete overview graph presented on the upper right. There are two types of nodes: oval nodes representing the genes in the example set and logos representing the categories. Blue edges connect the genes and the categories they are found in, black edges denote interactions of gene products.