| Literature DB >> 25915206 |
Hesen Peng1, Junjie Ma2, Yun Bai3, Jianwei Lu4, Tianwei Yu1.
Abstract
Probabilistic association discovery aims at identifying the association between random vectors, regardless of number of variables involved or linear/nonlinear functional forms. Recently, applications in high-dimensional data have generated rising interest in probabilistic association discovery. We developed a framework based on functions on the observation graph, named MeDiA (Mean Distance Association). We generalize its property to a group of functions on the observation graph. The group of functions encapsulates major existing methods in association discovery, e.g. mutual information and Brownian Covariance, and can be expanded to more complicated forms. We conducted numerical comparison of the statistical power of related methods under multiple scenarios. We further demonstrated the application of MeDiA as a method of gene set analysis that captures a broader range of responses than traditional gene set analysis methods.Entities:
Mesh:
Year: 2015 PMID: 25915206 PMCID: PMC4411044 DOI: 10.1371/journal.pone.0124620
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Random samples generated from independent bivariate normal distribution (left), and mixture bivariate normal distribution with ±0.8 covariates (right).
The dashed lines connects two observations if they are nearest neighbors.
Comparison between the independent bivariate normal distribution and mixture normal distribution in Fig 1.
| Metric | Left (Independent) | Right (Mixed Normal) |
|---|---|---|
| mean distance (MeDiA) | 1.81 | 1.70 |
| mean nearest neighbor distance (MeDiANN) | 0.14 | 0.12 |
| mean log( | -2.24 | -2.43 |
Fig 2Comparison of statistical power under different scenarios.
Gene sets associated with the two-dimensional clinical outcome based on MeDiA.
| GO term | FDR | Name |
|---|---|---|
| 4 GO:0019827 | 1.65E-06 | stem cell maintenance |
| 1 GO:0050852 | 1.47E-05 | T cell receptor signaling pathway |
| 3 GO:0006693 | 0.00042 | prostaglandin metabolic process |
| 5 GO:0033627 | 0.00047 | cell adhesion mediated by integrin |
| 1 GO:0030183 | 0.00051 | B cell differentiation |
| 1 GO:0045058 | 0.00072 | T cell selection |
| 3 GO:0009225 | 0.0027 | nucleotide-sugar metabolic process |
| 1 GO:0045730 | 0.0027 | respiratory burst |
| GO:0000122 | 0.0031 | negative regulation of transcription from RNA polymerase II promoter |
| 2 GO:0007229 | 0.0031 | integrin-mediated signaling pathway |
| 6 GO:0051668 | 0.0031 | localization within membrane |
| 3 GO:0006633 | 0.0038 | fatty acid biosynthetic process |
| 6 GO:0008105 | 0.0038 | asymmetric protein localization |
| 1 GO:0019882 | 0.0038 | antigen processing and presentation |
| 2 GO:0043123 | 0.0038 | positive regulation of I-kappaB kinase/NF-kappaB cascade |
| 2 GO:0043627 | 0.0038 | response to estrogen stimulus |
| GO:0001837 | 0.0047 | epithelial to mesenchymal transition |
| 1 GO:0006959 | 0.0047 | humoral immune response |
| GO:0044419 | 0.0047 | interspecies interaction between organisms |
| 2 GO:0006469 | 0.0064 | negative regulation of protein kinase activity |
| 2 GO:0019221 | 0.0064 | cytokine-mediated signaling pathway |
| 2 GO:0019722 | 0.0064 | calcium-mediated signaling |
| 7 GO:0015012 | 0.0066 | heparan sulfate proteoglycan biosynthetic process |
| 3 GO:0042632 | 0.0079 | cholesterol homeostasis |
| 1 GO:0050869 | 0.0079 | negative regulation of B cell activation |
| 5 GO:0022407 | 0.0079 | regulation of cell-cell adhesion |
| GO:0046677 | 0.0087 | response to antibiotic |
| 8 GO:0006919 | 0.0094 | activation of caspase activity |
| GO:0006997 | 0.0099 | nucleus organization |
* Superscripts by the GO terms are for easy reference from the main text.
Fig 3Network interaction for celiac disease pathways.
Red edge indicates that the interaction between connected pathways are amplified in disease individuals. Blue edge indicates the interaction suppressed in disease individuals.
Fig 4Network interaction for lung cancer pathways.
Red edge indicates that the interaction between connected pathways are amplified in disease individuals. Blue edge indicates the interaction suppressed in disease individuals.
Summary of methods on Probabilistic association discovery discussed in this paper.
|
|
|
|
|---|---|---|
| MeDiA | Mean distance | Permutation Test |
| MeDiANN | Mean nearest neighbor distance | Permutation Test |
| Mutual Information | Mean log nearest neighbor distance | Permutation Test |
| Brownian Cov | Distance covariate | Permutation Test |