| Literature DB >> 26197809 |
Maria K Jaakkola, Laura L Elo.
Abstract
Multiple methods have been proposed to estimate pathway activities from expression profiles, and yet, there is not enough information available about the performance of those methods. This makes selection of a suitable tool for pathway analysis difficult. Although methods based on simple gene lists have remained the most common approach, various methods that also consider pathway structure have emerged. To provide practical insight about the performance of both list-based and structure-based methods, we tested six different approaches to estimate pathway activities in two different case study settings of different characteristics. The first case study setting involved six renal cell cancer data sets, and the differences between expression profiles of case and control samples were relatively big. The second case study setting involved four type 1 diabetes data sets, and the profiles of case and control samples were more similar to each other. In general, there were marked differences in the outcomes of the different pathway tools even with the same input data. In the cancer studies, the results of a tested method were typically consistent across the different data sets, yet different between the methods. In the more challenging diabetes studies, almost all the tested methods detected as significant only few pathways if any.Entities:
Keywords: comparison; functional genomics; pathway analysis; pathway structure; statistical analysis
Mesh:
Substances:
Year: 2015 PMID: 26197809 PMCID: PMC4793894 DOI: 10.1093/bib/bbv049
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
General information about the tested pathway analysis methods
| Software | Input from user | Output for each pathway | Version | Reference |
|---|---|---|---|---|
| Methods using pathway structure | ||||
| SPIA | DE genes with values; background genes; pathway files from KEGG [ | FDR | 2.18.0 | [ |
| CePa | DE genes; background genes | Multiple | 0.5 | [ |
| NetGSA | Gene expression matrix; sample labels; pathway structure | 1.0 | [ | |
| Methods not using pathway structure | ||||
| DAVID | DE genes; background genes | FDR | 6.7 | [ |
| GSEA | Gene expression matrix; sample labels; gene sets | FDR | build 0039 | [ |
| Pathifier | Gene expression matrix; sample labels; gene sets | FDR for each sample | 1.4.0 | [ |
All information in the table is about those versions of the methods used in this study. Most of the methods can use different types of input data, and the output might include additional information not listed here.
Information about the data sets used for comparing the pathway methods
| Data set id | Data base | Platform | Array | Number of samples case + control | Number of genes |
|---|---|---|---|---|---|
| Clear cell renal cell carcinoma data sets | |||||
| GSE781 | GEO | Affymetrix | HG-U133A | 9 + 8 | 12 752 |
| GSE6344 | GEO | Affymetrix | HG-U133A | 10 + 10 | 12 752 |
| GSE15641 | GEO | Affymetrix | HG-U133A | 32 + 23 | 12 752 |
| GSE14994 | GEO | Affymetrix | HG-U133A | 22 + 8 | 12 743 |
| GSE11024 | GEO | Affymetrix | HG-U133 Plus 2.0 | 10 + 12 | 17 699 |
| GSE14762 | GEO | Affymetrix | HG-U133 Plus 2.0 | 10 + 12 | 17 232 |
| Type 1 diabetes data sets | |||||
| GSE9006 | GEO | Affymetrix | HG-U133A | 19 + 24 | 12 752 |
| GSE30211 | GEO | Affymetrix | HG-U219 | 13 + 12 | 19 040 |
| GSE51058 | GEO | Illumina | HumanHT-12 | 21 + 15 | 17 981 |
| TABM666 | ArrayExpress | Affymetrix | HG-U133 Plus 2.0 | 3 + 3 | 20 156 |
Figure 1.Consistency of the significant pathways (rows) identified using the six different tools (columns) in (A) six ccRCC data sets, (B) four T1D data sets and (C) 10 artificial data sets. The heatmap illustrates the percentages of data sets in which each pathway (row) is detected as significant. In artificial data sets, no consistent findings are expected. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
Figure 2.Number of overlapping pathways detected by different methods from ccRCC data set GSE14994. NetGSA results were excluded because it did not detect any pathways significant from that data set. The total numbers of detected significant pathways are indicated after the method names. In total, 86 pathways were tested. A colour version of this figureis available at BIB online: http://bib.oxfordjournals.org.
Pathways found as significant from at least four ccRCC data sets by at least one method, excluding Pathifier
| Pathway | SPIA | CePa | NetGSA | DAVID | GSEA | Pathifier |
|---|---|---|---|---|---|---|
| PPAR signaling pathway | 0.17 (0.00) | 0.50 (0.00) | 0.17 (0.00) | 1.00 (0.50) | ||
| Cytokine–cytokine receptor interaction | 0.00 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| ECM–receptor interaction | 0.00 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| Complement and coagulation cascades | 0.17 (0.00) | 0.17 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| Leukocyte transendothelial migration | 0.33 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| Intestinal immune network for IgA production | 0.00 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.17 (0.00) | 1.00 (0.20) | |
| Allograft rejection | 0.00 (0.00) | 0.17 (0.00) | 0.17 (0.00) | 0.33 (0.00) | 1.00 (0.20) | |
| Viral myocarditis | 0.00 (0.00) | 0.17 (0.00) | 0.17 (0.00) | 0.17 (0.00) | 1.00 (0.50) | |
| Systemic lupus erythematosus | 0.00 (0.00) | 0.17 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 1.00 (0.70) | |
| Natural killer cell-mediated cytotoxicity | 0.17 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| Antigen processing and presentation | 0.17 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.60) | |
| Focal adhesion | 0.00 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) | |
| Chemokine signaling pathway | 0.00 (0.00) | 0.17 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.50) |
Columns are methods and each cell describes the proportion of data sets in which the method found the pathway significant. The percentage greater than or equal to 0.67 that brought the pathway to this table is underlined. The corresponding percentages from artificial data sets are shown in parenthesis.
Figure 3.PPAR signaling pathway from KEGG. The nodes correspond to genes and other functional units, and edges represent interactions between those units. Nodes are colored based on their differential expression between case and control samples detected with ROTS in the ccRCC data set GSE14994. Solid borders indicate that the gene is highly expressed in ccRCC patient samples, and correspondingly dashed borders mean low expression compared with control samples. If FDR value of the node is ≤0.05, the color of the node is strong. The color is light if the FDR value is between 0.05 and 0.1. Genes with white node color have FDR value >0.1. For gray nodes without borders, there are no measurements available. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
Pathways found as significant from at least two T1D data sets by at least one method, excluding Pathifier
| Pathway | SPIA | CePa | NetGSA | DAVID | GSEA | Pathifier |
|---|---|---|---|---|---|---|
| Antigen processing and presentation | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.50 (0.60) | |
| Allograft rejection | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.50 (0.20) | |
| Viral myocarditis | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 0.50 (0.50) |
Columns are methods and each cell describes the proportion of data sets in which the method found the pathway significant. The percentage greater than or equal to 0.50 that brought the pathway to this table is underlined. The corresponding percentages from artificial data sets are shown in parenthesis.
Scores of the methods describing the amount and consistency of the detections across six clear cell renal cell carcinoma (ccRCC) or four type 1 diabetes (T1D) data sets
| Method | Scaled T1D | Original T1D | Scaled ccRCC | Original ccRCC |
|---|---|---|---|---|
| Scores with ROTS preprocessing | ||||
| SPIA | 3.20 | 0.00 | 79.88 | 49.42 |
| CePa | 0.00 | 0.00 | 9.08 | 0.00 |
| NetGSA | 0.00 | – | 0.00 | – |
| DAVID | 0.00 | 0.00 | 0.73 | 3.63 |
| GSEA | – | 0.00 | – | 0.73 |
| Pathifier | 0.04 | – | 0.66 | – |
| Scores with Limma preprocessing | ||||
| SPIA | 0.00 | 0.00 | 47.24 | 22.22 |
| CePa | 0.00 | 0.00 | 0.00 | 0.00 |
| NetGSA | 0.00 | – | 0.00 | – |
| DAVID | 0.00 | 0.00 | 0.00 | 0.00 |
| GSEA | – | 0.00 | – | 0.73 |
| Pathifier | 0.01 | – | 1.26 | – |
The higher the score, the better the method. For NetGSA and GSEA, the method to detect DE genes is not determined by the user, and therefore, NetGSA and GSEA lines are identical in upper (ROTS) and lower (Limma) parts of the table. The dash (–) indicates that the method (row) was not tested in a data set (column).