| Literature DB >> 15272936 |
Rainer Breitling1, Anna Amtmann, Pawel Herzyk.
Abstract
BACKGROUND: One of the most time-consuming tasks after performing a gene expression experiment is the biological interpretation of the results by identifying physiologically important associations between the differentially expressed genes. A large part of the relevant functional evidence can be represented in the form of graphs, e.g. metabolic and signaling pathways, protein interaction maps, shared GeneOntology annotations, or literature co-citation relations. Such graphs are easily constructed from available genome annotation data. The problem of biological interpretation can then be described as identifying the subgraphs showing the most significant patterns of gene expression. We applied a graph-based extension of our iterative Group Analysis (iGA) approach to obtain a statistically rigorous identification of the subgraphs of interest in any evidence graph.Entities:
Mesh:
Year: 2004 PMID: 15272936 PMCID: PMC509016 DOI: 10.1186/1471-2105-5-100
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Principle of Graph-based Iterative Group Analysis. A Evidence network. Genes are associated with their annotation in the form of a bigraph (two types of nodes). B The same evidence represented as a simple network. Genes that share an annotation are connected. C-H Example of a GiGA analysis using fictitious microarray results. C Genes are assigned their ranks based on observed expression changes. D Local minima are found, i.e. genes that have no connection to genes with a better rank. E-H Iterative expansion of subgraphs from one of the local minima, gene 2 (rank 1). E The neighboring node with the smallest rank is included (gene 4, rank 4), which leads to the additional inclusion of genes 5 (rank 3) and 6 (rank 2). F Gene 3 (rank 5) is included). G Gene 7 (rank 7) is included, leading to the inclusion of gene 8 (rank 6). H The last gene reachable from this local minimum, gene 1 (rank 8), is included and the process terminates. For each of the subgraphs a p-value can be calculated (see text) and the subgraph with the smallest p-value is declared the "regulated neighborhood" of the local minimum. In the example, genes 2, 4, 5, and 6 form a regulated neighborhood (p = 0.014). The graph expansion process would then be repeated for the remaining two local minima.
Iterative Group Analysis of gene expression during the yeast diauxic shift. Up-regulated groups.
| 6144 – purine base metabolism | 3773 – heat shock protein activity | |||||
| 9277 – cell wall (sensu Fungi) | 3773 – heat shock protein activity | 3773 – heat shock protein activity | ||||
| 297 – spermine transporter activity | 6950 – response to stress | |||||
| 15846 – polyamine transport | 297 – spermine transporter activity | 6950 – response to stress | ||||
| 3773 – heat shock protein activity | ||||||
| 15846 – polyamine transport | 6537 – glutamate biosynthesis | |||||
| 5353 – fructose transporter activity | 7039 – vacuolar protein catabolism | |||||
| 15578 – mannose transporter activity | 6950 – response to stress | |||||
| 7039 – vacuolar protein catabolism | ||||||
| 8645 – hexose transport | ||||||
| 4396 – hexokinase activity | 4396 – hexokinase activity | 30162 – regulation of proteolysis and peptidolysis | ||||
| 5215 – transporter activity | 297 – spermine transporter activity | 4364 – glutathione transferase activity | 16491 – oxidoreductase activity | |||
For this analysis, genes were assigned to groups based on GeneOntology annotations obtained from Affymetrix . All groups that are changed with a minimal p-value smaller than 1/ [number of annotated genes] (1/4087 = 2.4E-4) are shown, sorted by significance. Numbers and names are the standardized GeneOntology identifiers. Groups shown in bold were also reported as changed in the original publication (DeRisi et al., 1997) Up-regulated groups
Iterative Group Analysis of gene expression during the yeast diauxic shift.
| 7152 – spore wall assembly (sensu Saccharomyces) | ||||||
| 3938 – IMP dehydrogenase activity | ||||||
| 6183 – GTP biosynthesis | ||||||
Down-regulated groups. See Table 1 for details. Down-regulated groups.
Graph-based iterative Group Analysis of gene expression during the yeast diauxic shift.
| YHL015W | 5.87E-86 | <0.01 | 39 | 48 | |
| YMR217W | amino acid and nucleotide biosynthesis | 3.38E-13 | 2.7 | 9 | 172 |
| YDR144C | cell wall biogenesis | 4.06E-08 | 4.5 | 6 | 242 |
| YNL065W | membrane transporter | 4.02E-05 | 9.3 | 3 | 141 |
| YLR062C | bud site selection | 6.41E-05 | 9.9 | 4 | 367 |
| YGL225W | protein glycosylation in Golgi | 1.12E-04 | 10.8 | 4 | 422 |
| YPR074C | pentose phosphate pathway | 1.44E-04 | 11.2 | 4 | 449 |
| YNL141W | 4.67E-59 | <0.01 | 39 | 45 | |
| YOR224C | RNA polymerases | 2.59E-13 | 1.1 | 23 | 219 |
| YER065C | 8.57E-77 | <0.01 | 39 | 66 | |
| YKL217W | membrane transporters (sugar, amino acids) | 1.76E-15 | 2.3 | 8 | 62 |
| YAL017W | protein kinases | 1.07E-07 | 4.8 | 6 | 284 |
| YBL043W | cell wall biogenesis | 3.81E-07 | 5.4 | 4 | 103 |
| YGR248W | carbohydrate metabolism | 5.66E-07 | 5.5 | 5 | 232 |
| YEL011W | 1.01E-06 | 5.8 | 3 | 42 | |
| YER037W | protein phosphatases | 1.07E-06 | 5.8 | 8 | 736 |
| YJL137C | 7.46E-06 | 7.3 | 4 | 215 | |
| YDL085W | disulfide oxidoreductases | 1.05E-05 | 7.6 | 4 | 234 |
| YNL173C | mating signal transduction | 1.65E-05 | 8.2 | 4 | 262 |
| YNL134C | alcohol dehydrogenase | 1.34E-04 | 11.1 | 3 | 210 |
| YBL038W | 1.99E-04 | 11.9 | 4 | 487 | |
| YER065C | 4.96E-53 | 0.11 | 39 | 54 | |
| YGR088W | 3.09E-10 | 1.2 | 11 | 106 | |
| YFR015C | 2.08E-04 | 3.6 | 3 | 45 | |
| YJR073C | methyltransferases | 3.85E-04 | 4.0 | 5 | 156 |
| YDR001C | 5.01E-04 | 4.2 | 3 | 60 | |
| YCR014C | DNA and RNA polymerases | 5.44E-04 | 4.2 | 17 | 481 |
| YIR038C | glyoxalases | 8.64E-04 | 4.5 | 5 | 183 |
The evidence network was constructed either from GeneOntology information (nodes are connected if they share a GeneOntology annotation) or from enzyme activity information obtained from Swissprot . In the latter case, genes are connected if their encoded proteins convert the same substrate (as product or educt, i.e. the direction of the reaction is not taken into account here). This type of network is much smaller (only 744 genes), as only genes coding for enzymes are included. All groups that are changed with a minimal p-value smaller than 1/[number of annotated genes] are shown, sorted by significance. The corresponding E-value as estimated by the analysis of 100 random permutations of the data is also shown. The employed threshold for inclusion in the table is very generous and does not guarantee that all subgraphs shown are statistically significant. The local minimum anchoring each regulated neighborhood is indicated by its genetic locus name (for overlapping neighborhoods, only the best-ranking minimum is shown). Descriptive group names were added manually. Groups that correspond to processes discussed in the original paper are highlighted in italics. It can be seen that the highest ranking group in each case is the largest and contains the central biological processes detected by DeRisi et al. (1997) and by iGA (see Table 1 and 2). N, number of genes in each subgraph.
Figure 2Visualization of the most significant "down-regulated neighborhood" identified by GiGA using a GeneOntology-based network. The expression data are taken from the 20.5 h timepoint of the yeast diauxic shift (DeRisi et al., 1997). The layout was generated from the output of GiGA by the free software aiSee using a force-directed algorithm with default parameters. The same software can also be used for the versatile real-time navigation of the network. Colored boxes show the regulated genes (darker shading indicates stronger regulation), white boxes show the evidence linking the genes (in this case GeneOntology numbers and terms). Several important components of this regulatory neighborhood are indicated (small and large ribosomal subunit proteins, rRNA processing/snRNP, nucleolar proteins, translation elongation factors). These components were also identified in the original publication after manual analysis. GiGA finds them automatically, and also detects the – biologically obvious – connections between them. As all the evidence is included in the same picture, the biologist can then use her expertise to assess the relevance of each link, without having to make the connections ad hoc by tedious literature studies.
Figure 3Visualization of the two most significant "up-regulated neighborhoods" identified by GiGA using a metabolic network derived from Swissprot annotations. The expression data are taken from the 20.5 h timepoint of the yeast diauxic shift (DeRisi et al., 1997). The layout was generated as in Fig 2. Colored boxes show the regulated genes (darker shading indicates stronger regulation), white boxes show the substrates that are in common between genes. Important components of this regulatory neighborhood are indicated (TCA cycle and glyoxylate cycle enzymes, and the various respiratory chain complexes). Here it can be seen that GiGA not only detects protein complexes (such as ribosomes or the respiratory chain complexes), but also "linear" metabolic pathways such as TCA cycle and glyoxylate cycle (and potentially signal transduction pathways or regulatory cascades etc.). Almost all the enzymes discussed by DeRisi et al. (1997) are included in these two subgraphs, plus the relevant enzymatic information necessary to assess the relevance of each link, without the danger of missing some genes (unless the annotation is incomplete).