| Literature DB >> 22554182 |
Axel Skarman1, Mohammad Shariati, Luc Jans, Li Jiang, Peter Sørensen.
Abstract
BACKGROUND: Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows.Entities:
Mesh:
Year: 2012 PMID: 22554182 PMCID: PMC3434019 DOI: 10.1186/1471-2105-13-73
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Dendrogram of the relative overlaps among all KEGG pathways calculated as the number of overlapping genes divided by the smaller of the two sets. The black bars to the left show the pathways with a posterior probability larger than 0.99. The names of the KEGG pathways are not shown.
Figure 2Dendrogram showing the relative overlaps among the KEGG pathways calculated as the number of overlapping genes divided by the smaller of the two sets. Only the KEGG pathways with a p-value less than 0.000001, adjusted for multiple testing by the method of Benjamini and Yekutieli [12], are shown. The black bars on the left-hand side show the pathways that had a Bayes factor larger than 100.
Figure 3Heat-map showing the dendrogram from the overlaps among all KEGG pathways; the points represent posterior Pearson correlations between the indicator variables, generated by Gibbs sampling, corresponding to all the KEGG pathways. The latent variables could be either one or zero, indicating that a particular KEGG pathway is included in or excluded from the model, respectively. A positive posterior correlation would indicate pathways that are selected together in the model; and a negative correlation would indicate pathways that tend to be included in the model alternately. Ending to only select one of them at the time but not the other one or select the second one but not the first one.
Figure 4Two histograms show the number of pathways with different posterior probabilities of being included in the model (0.05 on the left and 0.40 on the right).
30 top-ranked KEGG pathways using a prior probability of 0.05
| ABC transporters | 24 | 1 | Infinity | 4.24 |
| Lysosome | 90 | 1 | Infinity | 3.56 |
| Proteasome | 40 | 1 | Infinity | 3.53 |
| Complement and coagulation cascades | 53 | 1 | Infinity | 3.13 |
| RIG-I-like receptor signaling pathway | 44 | 1 | Infinity | 2.80 |
| ECM-receptor interaction | 63 | 1 | Infinity | 2.59 |
| Cell adhesion molecules (CAMs) | 95 | 1 | Infinity | 2.50 |
| Axon guidance | 79 | 1 | Infinity | 2.46 |
| SNARE interactions in vesicular transport | 31 | 1 | Infinity | 2.39 |
| RNA degradation | 49 | 1 | Infinity | 2.39 |
| Ubiquitin mediated proteolysis | 105 | 1 | Infinity | 2.34 |
| Neuroactive ligand-receptor interaction | 125 | 1 | Infinity | 2.27 |
| PPAR signaling pathway | 58 | 1 | Infinity | 2.16 |
| Ribosome | 76 | 1 | Infinity | 2.15 |
| MAPK signaling pathway | 179 | 1 | Infinity | 2.11 |
| Aminoacyl-tRNA biosynthesis | 34 | 1 | Infinity | 2.08 |
| Endocytosis | 139 | 1 | Infinity | 2.08 |
| Fc gamma R-mediated phagocytosis | 70 | 1 | Infinity | 2.00 |
| Insulin signaling pathway | 97 | 1 | Infinity | 1.88 |
| Cell cycle | 96 | 1 | Infinity | 1.85 |
| Notch signaling pathway | 31 | 1 | Infinity | 1.73 |
| Cytokine-cytokine receptor interaction | 115 | 1 | Infinity | 1.69 |
| Chemokine signaling pathway | 138 | 1 | Infinity | 1.58 |
| Metabolic pathways | 793 | 1 | Infinity | 1.47 |
| Tight junction | 97 | 0.938 | 288 | 1.11 |
| Purine metabolism | 117 | 0.927 | 240 | 1.16 |
| Chronic myeloid leukemia | 58 | 0.919 | 215 | 1.27 |
| Pathways in cancer | 227 | 0.847 | 105 | 0.945 |
| Basal transcription factors | 23 | 0.826 | 90.3 | 1.28 |
| Circadian rhythm - mammal | 5 | 0.590 | 27.4 | 1.77 |
“Odds ratio” indicates the odds ratio between the prior and posterior probability of being included in the model. “Variance per gene” means the estimated variance of the t-statistic per gene. The gene sets are ranked primarily according to the posterior probabilities and secondarily according to the variance per gene.