| Literature DB >> 22962466 |
Enrico Glaab1, Anaïs Baudot, Natalio Krasnogor, Reinhard Schneider, Alfonso Valencia.
Abstract
MOTIVATION: Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of interest and (iv) tissue-specific gene/protein set associations cannot be recognized.Entities:
Mesh:
Year: 2012 PMID: 22962466 PMCID: PMC3436816 DOI: 10.1093/bioinformatics/bts389
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Xd-score ranking table for the top 20 functional associations between genes mutated in gastric cancer and pathways in the BioCarta database (see also the correlation plot for the same dataset in Fig. 1)
Pathways with the same Fisher Q-value 0.01 but different Xd-scores are highlighted in blue colour.
Fig. 1.Regression plot: Xd-scores versus significance-of-overlap scores (Fisher's test, q-values), computed for the comparison of gastric cancer mutated genes against gene sets from the BioCarta database (absolute Pearson correlation: 0.93). Non-overlapping dataset pairs, for which a meaningful scoring is only possible with the XD-distance, are highlighted on the right. See also Table 1 for a list of the 20 top-ranked pathways in this plot
Fig. 2.Protein–protein interaction sub-networks (largest connected components) for target and reference set pairs with small overlap, predicted to be functionally associated by EnrichNet: (a) gastric cancer mutated genes (blue) and genes/proteins from the BioCarta pathway ‘Role of Erk5 in Neuronal Survival’ (magenta, the shared genes are shown in green); (b) bladder cancer mutated genes (blue) and genes/proteins from Gene Ontology term ‘Tyrosine phosphorylation of Stat3’ (GO:0042503, magenta; the only shared gene NF2 is shown in green). An over-representation analysis approach would have missed these associations, since only few of the cancer mutated genes are members of the corresponding processes
Fig. 3.Protein–protein interaction sub-network (largest connected component) for the PD gene set (blue) and genes/proteins from GO term ‘Regulation of interleukin-6 biosynthetic process’ (magenta, GO:0045408; the only shared gene IL1B is shown in green)
Enrichment scores and P-value estimates for the comparative validation of EnrichNet and ORA using Fisher's exact test across all combinations of five microarray gene expression datasets and two gene set collections
| Microarray dataset | Gene set collection | Fisher's exact test Enrichment score ( | EnrichNet Enrichment score ( |
|---|---|---|---|
| p53 | C1 | 13.5 ( | 36.9 ( |
| C2 | 45.6 ( | 65.2 ( | |
| Lung (Boston) | C1 | 2.6 ( | 40.0 ( |
| C2 | 15.0 ( | 43.7 ( | |
| Lung (Michigan) | C1 | 21.2 ( | 40.8 ( |
| C2 | 9.1 ( | 40.5 ( | |
| Colon | C1 | 6.85 ( | 70.1 ( |
| C2 | 22.8 ( | 94.9 ( | |
| Lymphoma | C1 | 8.0 ( | 65.2 ( |
| C2 | 0.94 ( | 69.8 ( |
EnrichNet provides higher enrichment scores and lower or equivalent P-value estimates in all cases.