| Literature DB >> 19775443 |
Petri Törönen1, Pauli J Ojala, Pekka Marttinen, Liisa Holm.
Abstract
BACKGROUND: A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined.Entities:
Mesh:
Year: 2009 PMID: 19775443 PMCID: PMC2761411 DOI: 10.1186/1471-2105-10-307
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of method performances from artificial data analysis
| parameters (if any) | fixed size, up-regulation | fixed size, up and down-regulation | varying size, up-regulation | Varying size, up and down-regulation | included to fig. 1 and 2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| GSZ-score | 0 | 0.1 | -0.0002 | 0.0867 | 0.184 | ||||||
| GSZ-score | 0.1 | 0.1 | 0.003 | -0.0896 | 0.0883 | ||||||
| GSZ-score | 0.2 | 0.1 | 0.0048 | 0.0119 | 0.0891 | 0.0065 | -0.0085 | ||||
| GSZ-score | 0.5 | 0.1 | 0.0837 | -0.0515 | |||||||
| GSZ-score | 0 | 0.3 | |||||||||
| GSZ-score | 0.1 | 0.3 | |||||||||
| GSZ-score | 0.2 | 0.3 | |||||||||
| GSZ-score | 0.5 | 0.3 | 0.0041 | -0.1618 | 0.0895 | 0.0725 | |||||
| GSZ-score | 0 | 0.5 | X | ||||||||
| GSZ-score | 0.1 | 0.5 | X | ||||||||
| GSZ-score | 0.2 | 0.5 | X | ||||||||
| GSZ-score | 0.5 | 0.5 | 0.2115 | 0.0077 | -0.019 | X | |||||
| t-test | 0 | ||||||||||
| t-test | 0.1 | 0.0074 | -0.0679 | ||||||||
| t-test | 0.3 | 0.0087 | 0.005 | ||||||||
| t-test | 1 | X | |||||||||
| t-test | 3 | -0.0079 | -0.1024 | X | |||||||
| KS | -0.0001 | -0.0268 | 0.0462 | 0.0027 | -0.0439 | -0.3352 | X | ||||
| modKS | -0.1558 | -0.0231 | -0.3216 | -0.076 | -0.4214 | X | |||||
| iGA | 0.2485 | 0.0827 | -0.0325 | X | |||||||
Table shows average performance of various methods and parameters with artificial datasets. Scores A and B are explained in the main text. The five best methods are highlighted with bold and underlined in each column. The next five methods are underlined. Five weakest scores are represented in italics. Scores obtained with fixed class size represent the performance when the results are normalized with class specific permutation, whereas the results with varying class size show the performance without any normalization.
Figure 1Method performances for different artificial signals, when only up-regulation occurs. Method performances for each proportion of the signal representing part (Y axis) and signal magnitude (X axis) as measured by average AUC score. Score is shown by the radius of each sector. Methods represented are (starting from 11 o'clock, anticlockwise): 4 versions of GSZ-score, 2 versions of t-test, KS test, modKS test and iGA. Selected methods/parameters are shown in detail in table 1. Colouring (blue, cyan, green, yellow) highlights the four best methods, with equally well performing methods represented with the same colour. Dotted lines separate areas where different methods show the best performance. GSZ-score, selected to later analysis, is at 9 o'clock position.
Figure 2Method performances for different artificial signals, when class shows simultaneous up and down-regulation. Method performances for each proportion of signal representing part (Y axis) and signal magnitude (X axis) as measured by average AUC score. Score is also here shown by radius. Methods represented are the same as in the previous figure. Colouring (blue, cyan, green, yellow) again highlights the top methods. Dotted lines highlight the signal levels where the best method changes.
Rank correlations for each method with a gold standard obtained from the other half of the dataset
| 1st split | 2nd split | 3rd split | 4th split | |||||
|---|---|---|---|---|---|---|---|---|
| GSZ-score | ||||||||
| t-test | 0.5373 | 0.5276 | 0.5691 | 0.5748 | 0.5722 | 0.5827 | 0.5727 | |
| KS test | 0.4470 | 0.5089 | 0.4981 | 0.5140 | 0.5340 | 0.5054 | 0.4928 | 0.5388 |
| modKS | 0.5048 | 0.5772 | 0.5339 | 0.6035 | 0.5957 | 0.5336 | 0.5756 | 0.5873 |
| iGA | 0.5976 | |||||||
Table shows rank correlation between the results for each half of a dataset with the gold standard ranking, obtained with all the methods from the other half (case ii evaluation). Various results are highlighted similarly to previous table. Two results for each split are obtained by testing the first half with a gold standard from the second half and testing the second half with a gold standard from the first half. Notice that GSZ-score clearly shows the best correlation. iGA is usually the second best method. The only exception is the first half of the 4th split, where t-test shows the best performance and GSZ-score scores as the close second best method.
Figure 3Average results from the case i. Figure represents the AUC score for each evaluated method as the rank limit of the positive GO classes is increased. The set of positive classes used for AUC grows as the rank threshold becomes bigger. Methods represented are GSZ-score: blue line with circles, t-test: green line with cross, KS test: red line with box, modKS test: cyan line with diamond, iGA: magenta line with x. Lower part zooms into the smallest ranks. Here GSZ-score shows the best performance and t-test performs equally well with the top ranks, while other methods show weaker performance.
Figure 4Average results from the case ii. Figure represents the AUC score for each evaluated method as the rank limit of the positive GO classes is increased. Methods are coloured identically to the earlier figure. Here, the GSZ-score shows the best performance and iGA is the second best method.
Rank and Pearson correlations for each method's results between the split parts of the dataset
| 1st split | 2nd split | 3rd split | 4th split | |||||
|---|---|---|---|---|---|---|---|---|
| GSZ-score | ||||||||
| t-test | ||||||||
| KS test | 0.5402 | 0.585 | 0.5768 | 0.6197 | 0.5824 | 0.6282 | 0.5845 | 0.6145 |
| modKS | 0.6183 | 0.6397 | 0.6443 | 0.6658 | 0.6343 | 0.6554 | 0.6952 | 0.7083 |
| iGA | 0.6134 | 0.7261 | 0.6424 | 0.7401 | 0.6358 | 0.7562 | 0.6239 | 0.7243 |
Rank and Pearson correlation between the results from two halves of the dataset for each method (case i). The best result is highlighted with bold font and the second best is underlined. Notice that GSZ-score has the highest correlation with both correlation measures. t-test shows the second best performance. The only deviation is the 4th split where t-test shows the best rank correlation, although even in that case GSZ-score still shows the best Pearson correlation.
Rank correlations between results obtained by different methods
| GSZ | t-test | KS | modKS | iGA | |
|---|---|---|---|---|---|
| GSZ | 1 | 0.754 | 0.708 | 0.665 | |
| t-test | 1 | 0.685 | 0.643 | 0.708 | |
| KS | 0.708 | 0.685 | 1 | 0.502 | |
| modKS | 0.643 | 0.502 | 1 | 0.627 | |
| iGA | 0.708 | 0.809 | 0.627 | 1 |
Rank correlations between all the method pairs obtained using the whole ALL dataset and normalized results. Strongest correlation (excluding the diagonal) is highlighted on each row with a bold font. Notice the strong correlation between t-test and GSZ-score and the very strong correlation between iGA and GSZ-score.
Figure 5Visualization of the empirical log-p-values from ALL dataset for the top-100 classes of each scoring function. Part A shows the obtained p-values when all class randomizations are normalized and pooled. Part B shows the results when each class is analyzed separately. Largest value in lower plot refers to p-value = 0 (see main text for details). Lines are: blue with circles = GSZ-score; green with cross = t-test; red with squares = KS; magenta with x = iGA; cyan with diamonds = modKS. Notice that GSZ-score shows here a very clear separation from the other methods in both of the plots. Any reasonable threshold would result in a larger number of significant classes with GSZ-score than with any other method.
Figure 6Visualization of the empirical log-p-values from p53 dataset for the top-40 classes of each scoring function. Part A shows the results when all class randomizations are normalized and pooled. Part B shows the results when each class is analyzed separately. Different functions are marked similarly to earlier figure. Notice that GSZ-score shows here a clear separation from the other methods in upper plot. In the lower plot it is the best performing method at top ranks and then as signal levels drops GSZ drops to 2best.
Comparison of scoring functions on p53 dataset
| GSZ | iGA | t-test | modKS | KS | |
|---|---|---|---|---|---|
| 1 | CC:0009434: microtubule-based flagellum | BP:0050962: detection of light... | |||
| 2 | BP:0051668: localization within membrane | CC:0031903: micro-body membrane | MF:0033558: protein deacetylase activity | BP:0050953: sensory perception of light stimulus | BP:0050908: detection of light... |
| 3 | CC:0031903: micro-body membrane | CC:0005778: peroxisomal membrane | MF:0004407: histone deacetylase activity | BP:0007601: visual perception | BP:0007602: phototransduction |
| 4 | CC:0005778: peroxisomal membrane | CC:0044438: micro-body part | CC:0031903: micro-body membrane | CC:0000118: histone deacetylase complex | BP:0009584: detection of visible light |
| 5 | BP:0042787: protein ubiquitination... catabolic process | CC:0044439: peroxisomal part | CC:0005778: peroxisomal membrane | MF:0001664: G-protein-coupled receptor binding | BP:0009583: detection of light... |
| 6 | BP:0042787: protein ubiquitination... catabolic process | CC:0019861: flagellum | BP:0015674: ditri-valent inorganic cation transport | MF:0016018: cyclosporin A binding | |
| 7 | CC:0044438: micro-body part | BP:0050953: sensory perception of light stimulus | CC:0044438: micro-body part | MF:0033558: protein deacetylase activity | |
| 8 | CC:0044439: peroxisomal part | BP:0007601: visual perception | CC:0044439: peroxisomal part | MF:0004407: histone deacetylase activity | |
| 9 | CC:0009434: microtubule-based flagellum | MF:0033558: protein deacetylase activity | BP:0030890: positive regulation of B cell proliferation | BP:0006816: calcium ion transport | |
| 10 | BP:0051205: protein insertion into membrane | MF:0004407: histone deacetylase activity | BP:0050962: detection of light... | MF:0019237: centromeric DNA binding | CC:0031903: micro-body membrane |
| 11 | BP:0046504: glycerol ether biosynthetic process | CC:0009434: microtubule-based flagellum | BP:0050908: detection of light... | MF:0030170: pyridoxal phosphate binding | CC:0005778: peroxisomal membrane |
| 12 | BP:0045017: glycerolipid biosynth... | MF:0016018: cyclosporin A binding | BP::0007602:: phototransduction | BP:0008015: circulation | MF:0033558: protein deacetylase activity |
| 13 | BP:0008643: carbohydrate transport | MF:0019237: centromeric DNA... | BP:0009584: detection of visible light | BP:0035136: forelimb morphogenesis | MF:0004407: histone deacetylase activity |
| 14 | BP:0051205: protein insertion into membrane | CC:0000118: histone deacetylase complex | BP:0015918: sterol transport | CC:0031594: neuromuscular junction | |
| 15 | BP:0046504: glycerol ether biosynthetic process | BP:0018298: protein-chromophore linkage | BP:0030301: cholesterol transport | MF:0005048: signal sequence binding |
Biological classes reported by each scoring function from p53 dataset. Apoptosis related classes are highlighted with asterisk before and after the class name. Clear apoptosis classes are shown with bold font and border cases with underlined font. Although many methods report same apoptosis GO class as the top class, only GSZ is able to report three other apoptosis GO classes. Furthermore, our supplementary table S1 [see additional file 7] shows that GSZ reports strongest p-values for reported apoptosis gene sets.
Figure 7Visualization of the cumulative sum of biologically positive classes among the top-80 classes for each scoring function. Figure shows how many biologically positive classes each method discovers across their top ranks from ALL dataset. Different scoring functions are marked similarly to earlier figures. Notice that although GSZ, iGA and t-test first show equal performance, GSZ outweigh other methods across the later ranks. A more detailed view is provided in the supplementary table S2 [see additional file 8].
Figure 8Visualization of the cumulative sum of biologically positive classes among the top-70 classes for GSZ and compared programs. Figure shows how many biologically positive classes each method discovers across their top ranks. Blue line with circles denotes GSZ. Green line with triangle downwards denotes SP. Black line with triangle upwards denotes GSA. Cyan line with triangle to left denotes GSEA. Notice that although GSZ, GSA and SP first show equal performance, GSZ outperforms other methods across the later ranks. A more detailed view is provided in the supplementary table S5 [see additional file 11].
Comparison of GSZ and program packages using p53 dataset
| GSZ-score | GSA | SP | GSEA | |
|---|---|---|---|---|
| 1 | CC:0031903: micro-body membrane | CC:0009434: microtubule-based flagellum | CC:0000118: histone deacetylase complex | |
| 2 | BP:0051668: localization within membrane | CC:0005778: peroxisomal membrane | MF:0005125: cytokine activity | BP:0009314: response to radiation |
| 3 | CC:0031903: micro-body membrane | CC:0009434: microtubule-based flagellum | CC:0019861: flagellum | BP:0050962: detection of light... |
| 4 | CC:0005778: peroxisomal membrane | CC:0044438: micro-body part | BP:0050962: detection of light... | BP:0050908: detection of light... |
| 5 | BP:0042787: protein ubiquitination... | CC:0044439: peroxisomal part | BP:0050908: detection of light... | BP: 0007602: phototransduction |
| 6 | BP:0007602: phototransduction | BP:0009584: detection of visible light | ||
| 7 | CC:0044438: micro-body part | CC:0005626: insoluble Fraction | BP: 0009584: detection of visible light | BP:0009583: detection of light... |
| 8 | CC:0044439: peroxisomal part | CC:0019861: flagellum | MF:0030170: pyridoxal phosphate binding | |
| 9 | CC:0009434: microtubule-based flagellum | MF:0015103: inorganic anion transmembrane... | BP:0009583: detection of light... | |
| 10 | BP:0051205: protein insertion into mem brane | BP:0051668: localization within membrane | BP:0006955: immune response | MF:0005487: nucleocytoplasmic trans porter activity |
| 11 | BP:0046504: glycerol ether biosynthetic process | BP:0050962: detection of light... | BP:0030890: positive regulation of B cell proliferation | BP:0006654: phosphatidic acid biosynthetic process |
| 12 | BP:0045017: glycerolipid biosynthetic process | BP:0050908: detection of light... | MF:0001664: G-protein-coupled receptor binding | BP:0046473: phosphatidic acid metabolic process |
| 13 | BP:0008643: carbohydrate transport | BP:0007602: photo-transduction | BP:0006572: tyrosine catabolic process | BP:0035137: hindlimb morphogenesis |
| 14 | BP:0009584: detection of visible light | MF:0008009: chemokine activity | BP:0051716: cellular response to stimulus | |
| 15 | CC:0031594: neuromuscular junction | MF:0042379: chemokine receptor binding | BP:0008217: blood pressure regulation |
Table compares GSZ analysis with three popular software packages. Apoptosis related positive classes are highlighted as in table 5. GSZ, GSA and SP reported classes with strong signal, whereas GSEA did not report any significant classes. GSZ, GSA and SP reported strong p-value for GO class 1836, but only GSZ ranks it as top class. GSZ also reports 3 other apoptosis related classes.