| Literature DB >> 21293737 |
Aristotelis A Chatziioannou1, Panagiotis Moulos.
Abstract
StRAnGER is a web application for the automated statistical analysis of annotated gene profiling experiments, exploiting controlled biological vocabularies, like the Gene Ontology or the KEGG pathways terms. Starting from annotated lists of differentially expressed genes and gene enrichment scores, regarding the terms of each vocabulary, StRAnGER repartitions and reorders the initial distribution of terms to define a new distribution of elements. Each element pools terms holding the same enrichment score. The new distribution thus derived, is reordered in a decreasing order to the right, according to the observation score of the elements, while elements with the same score, are sorted again in a decreasing order of their enrichment scores. By applying bootstrapping techniques, a corrected measure of the statistical significance of these elements is derived, which enables the selection of terms mapped to these elements, unambiguously associated with respective significant gene sets. The selected terms are immunized against the bias infiltrating statistical enrichment analyses, producing technically very high statistical scores, due to the finite nature of the data population. Besides their high statistical score, another selection criterion for the terms is the number of their members, something that incurs a biological prioritization in line with a Systems Biology context. The output derived, represents a detailed ranked list of significant terms, which constitute a starting point for further functional analysis.Entities:
Keywords: gene enrichment analysis; ontological analysis; pathway analysis
Year: 2011 PMID: 21293737 PMCID: PMC3032379 DOI: 10.3389/fnins.2011.00008
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1(A) Frequency of sorted elements, according to StRAnGER algorithm: the figure presents the number of observations for each element. The dash-dotted line depicts the cutoff element by just applying a threshold on the statistical p-value, while the dashed line depicts the corrected threshold based on the application of bootstrap. (B) The left bar labeled “Before” depicts the ratio of enriched terms to the number of all terms that the significant genes are annotated to. This ratio is derived after the application of solely an enrichment score p-value cutoff. The finite nature of the enrichment statistical tests renders the analysis extremely sensitive to false positives, and thus tremendously error prone, due to bias infiltration, as many terms can obtain technically a high statistical score. The right bar labeled “After” depicts the same ratio after the application of StRAnGER algorithm. The graphs are based on the data presented in Moulos et al. (2009).
Figure 2Two graphical outputs from StRAnGER applications. (A) A tree view presenting the relationships among the 10 top GO terms from Table 1. The ancestor depth has been set at 2. The significance of each GO term is depicted by the accent of the fill color in the respective nodes (here red), that has been selected to map each dataset. (B) Illustration of KEGG pathway “Fatty acid metabolism” from Table 2, with colored components.
Figure 3Instance of the StRAnGER application web interface depicting various parameters of the StRAnGER algorithm and the tree visualization.
A list of significantly enriched GO terms derived from the data in (Moulos et al., .
| GO term | Definition | Set | Enrichment | |
|---|---|---|---|---|
| GO:0005634 | Nucleus | C | 1.55E–10 | 194/6236 |
| GO:0016740 | Transferase activity | F | 6.62E–08 | 86/2383 |
| GO:0000166 | Nucleotide binding | F | 8.81E–08 | 99/2891 |
| GO:0016787 | Hydrolase activity | F | 1.16E–06 | 78/2246 |
| GO:0019900 | Kinase binding | F | 1.34E–06 | 4/11 |
| GO:0004402 | Histone acetyltransferase activity | F | 1.74E–06 | 6/30 |
| GO:0003676 | Nucleic acid binding | F | 1.79E–06 | 72/2045 |
| GO:0003723 | RNA binding | F | 1.89E–06 | 42/978 |
| GO:0046872 | Metal ion binding | F | 2.92E–06 | 109/3528 |
| GO:0005622 | Intracellular | C | 3.59E–06 | 88/2704 |
| GO:0005515 | Protein binding | F | 5.37E–06 | 198/7381 |
| GO:0004364 | Glutathione transferase activity | F | 6.41E–06 | 6/36 |
| GO:0030509 | BMP signaling pathway | P | 8.16E–06 | 4/15 |
| GO:0003824 | Catalytic activity | F | 8.86E–06 | 35/805 |
| GO:0043433 | Negative regulation of transcription factor activity | P | 1.06E–05 | 3/8 |
| GO:0005524 | ATP binding | F | 1.07E–05 | 77/2348 |
| GO:0016491 | Oxidoreductase activity | F | 1.18E–05 | 41/1019 |
| GO:0006270 | DNA replication initiation | P | 2.95E–05 | 4/19 |
| GO:0008415 | Acyltransferase activity | F | 5.38E–05 | 14/224 |
| GO:0008270 | Zinc ion binding | F | 5.47E–05 | 93/3114 |
| GO:0000287 | Magnesium ion binding | F | 6.96E–05 | 25/554 |
| GO:0005739 | Mitochondrion | C | 0.00011592 | 45/1275 |
| GO:0016874 | Ligase activity | F | 0.00015613 | 21/454 |
| GO:0003954 | NADH dehydrogenase activity | F | 0.000236 | 5/44 |
| GO:0008285 | Negative regulation of cell proliferation | P | 0.00026588 | 10/152 |
| GO:0030855 | Epithelial cell differentiation | P | 0.00030166 | 4/30 |
| GO:0008137 | NADH dehydrogenase (ubiquinone) activity | F | 0.00030269 | 5/46 |
| GO:0030529 | Ribonucleoprotein complex | C | 0.00034927 | 20/449 |
| GO:0006412 | Translation | P | 0.00037692 | 21/484 |
| GO:0007049 | Cell cycle | P | 0.00045196 | 27/694 |
| GO:0005762 | Mitochondrial large ribosomal subunit | C | 0.00047806 | 4/33 |
| GO:0042157 | Lipoprotein metabolic process | P | 0.00049001 | 3/19 |
| GO:0006464 | Protein modification process | P | 0.00052038 | 14/276 |
| GO:0008152 | Metabolic process | P | 0.00054429 | 32/881 |
| GO:0008134 | Transcription factor binding | F | 0.000618 | 9/142 |
| GO:0016301 | Kinase activity | F | 0.00062479 | 45/1378 |
| GO:0003735 | Structural constituent of ribosome | F | 0.00069785 | 14/284 |
| GO:0006869 | Lipid transport | P | 0.00071252 | 7/96 |
| GO:0006260 | DNA replication | P | 0.00071381 | 11/198 |
| GO:0006749 | Glutathione metabolic process | P | 0.0010504 | 3/23 |
| GO:0006631 | Fatty acid metabolic process | P | 0.0011307 | 7/103 |
| GO:0008092 | Cytoskeletal protein binding | F | 0.001226 | 6/81 |
| GO:0006470 | Protein amino acid dephosphorylation | P | 0.0012779 | 12/241 |
| GO:0045177 | Apical part of cell | C | 0.0016441 | 4/43 |
| GO:0006446 | Regulation of translational initiation | P | 0.0016908 | 3/26 |
| GO:0006350 | Transcription | P | 0.0018383 | 64/2230 |
| GO:0005737 | Cytoplasm | C | 0.0018487 | 66/2314 |
| GO:0043123 | Positive regulation of I-kappaB kinase/NF-kappaB cascade | P | 0.0019357 | 5/65 |
| GO:0004721 | Phosphoprotein phosphatase activity | F | 0.0021059 | 12/255 |
| GO:0005840 | Ribosome | C | 0.0021789 | 12/256 |
| GO:0000184 | mRNA catabolic process, nonsense-mediated decay | P | 0.0022434 | 3/28 |
| GO:0006917 | Induction of apoptosis | P | 0.0022511 | 8/141 |
| GO:0003677 | DNA binding | F | 0.0022689 | 82/3014 |
| GO:0051301 | Cell division | P | 0.0023318 | 14/321 |
| GO:0045944 | Positive regulation of transcription from RNA polymerase II promoter | P | 0.0024536 | 11/229 |
| GO:0005783 | Endoplasmic reticulum | C | 0.0025159 | 34/1045 |
| GO:0030528 | Transcription regulator activity | F | 0.0026217 | 14/325 |
| GO:0009117 | Nucleotide metabolic process | P | 0.0029513 | 4/49 |
| GO:0006118 | Electron transport | P | 0.0031794 | 23/645 |
| GO:0003713 | Transcription coactivator activity | F | 0.0032915 | 7/122 |
| GO:0004842 | Ubiquitin–protein ligase activity | F | 0.0034684 | 11/239 |
| GO:0005802 | Trans-Golgi network | C | 0.0034872 | 5/73 |
| GO:0016481 | Negative regulation of transcription | P | 0.0035622 | 8/151 |
The list of 925 significantly modulated genes from Moulos et al. (.
A list of significantly enriched KEGG pathways derived from the data in Moulos et al. (.
| KEGG ID | KEGG pathway | Class | Enrichment | |
|---|---|---|---|---|
| 00603 | Glycosphingolipid biosynthesis – globo series | Metabolism; glycan biosynthesis and metabolism | 2.01E–11 | 1/1 |
| 00670 | One carbon pool by folate | Metabolism; metabolism of cofactors and vitamins | 1.32E–05 | 5/27 |
| 00480 | Glutathione metabolism | Metabolism; metabolism of other amino acids | 2.28E–05 | 7/59 |
| 00920 | Sulfur metabolism | Metabolism; energy metabolism | 3.07E–05 | 3/10 |
| 00450 | Selenoamino acid metabolism | Metabolism; metabolism of other amino acids | 3.17E–05 | 2/4 |
| 00230 | Purine metabolism | Metabolism; nucleotide metabolism | 3.59E–05 | 12/167 |
| 00240 | Pyrimidine metabolism | Metabolism; nucleotide metabolism | 8.76E–05 | 5/37 |
| 04130 | SNARE interactions in vesicular transport | Genetic information processing; folding, sorting and degradation | 0.000294 | 6/64 |
| 00780 | Biotin metabolism | Metabolism; metabolism of cofactors and vitamins | 0.000402 | 1/2 |
| 00450 | Selenoamino acid metabolism | Metabolism; metabolism of other amino acids | 0.000429 | 5/49 |
| 00071 | Fatty acid metabolism | Metabolism; lipid metabolism | 0.000468 | 6/69 |
| 00240 | Pyrimidine metabolism | Metabolism; nucleotide metabolism | 0.000535 | 8/115 |
| 00624 | 1- and 2-Methylnaphthalene degradation | Metabolism; xenobiotics biodegradation and metabolism | 0.000822 | 4/37 |
| 00362 | Benzoate degradation via hydroxylation | Metabolism; xenobiotics biodegradation and metabolism | 0.000868 | 2/10 |
| 04720 | Long-term potentiation | – | 0.000869 | 8/123 |
| 00980 | Metabolism of xenobiotics by cytochrome P450 | Metabolism; xenobiotics biodegradation and metabolism | 0.000933 | 7/100 |
| 00643 | Styrene degradation | Metabolism; xenobiotics biodegradation and metabolism | 0.001189 | 1/3 |
The list of 925 significantly modulated genes from Soong (.
Figure 4Observations of hierarchically low GO terms in the resulting list of significantly enriched GO terms, derived from the usage of StRAnGER and two widely used software packages: GOEAST and GOstat. In both panels, numbers in the horizontal axis represent the number of times that GO terms, connected to only one or very few genes, describing a very specific and limited from the pathway perspective, action, appear in the background list that is used (in this case, all annotated probes in the microarray), while in the vertical axis a measure of observation of these low frequency GO terms is depicted, regarding the significant list of over-represented ones. The vertical axis in left panel depicts a simple count of low frequency GO terms in the significantly over-represented GO terms (how many times these terms infiltrate the significant list), resulting from the usage of the three packages, while in the right panel, the count of low frequency GO terms is normalized to the total number of over-represented GO terms returned by each package. In both cases, StRAnGER performs, as its curve shows, better or at the same level (in the case of GOEAST for very low frequency terms) with the others, implying that its prioritization algorithm manages to filter out noise caused by very specific functions, being low at the GO hierarchy, without the application of any multiple-testing correction methodology, a strategy reported as controversial (see main text).
Figure A1(A). The number of statistically significant GO terms after (light bars) or prior (dark bars) the application of multiple testing correction, as derived by the use of the 3 software packages described in the main text. StRAnGER's main algorithm does not necessarily require the application of a multiple testing procedure, as the bootstrapping applied estimates the GO term distribution cutoffs. (B). Number of the resulting, statistically significant GO terms, representing very general biological functions (i.e. “protein binding”) yielded by the 3 software packages described in the main text. StRAnGER's main algorithm manages to filter out many of them.