| Literature DB >> 24758699 |
Ståle Nygård1, Trond Reitan, Trevor Clancy, Vegard Nygaard, Johannes Bjørnstad, Biljana Skrbic, Theis Tønnessen, Geir Christensen, Eivind Hovig.
Abstract
BACKGROUND: It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24758699 PMCID: PMC4006456 DOI: 10.1186/1471-2105-15-115
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Results on simulated data. We have used the simulation scheme proposed by [26] sampling approximately 100 genes per data set. Sample sizes (N) were varied using N = 10, and 0, as well as an extra variation (SD) added to each element in the expression matrix, using SD = 10, and 0. hclust = hierarchical clustering, kmeans = k-means clustering, PAM = Prediction Around Medoids, Mclust = model-based clustering, tight = tight clustering, MCIP-A, is our method (MCMC Clustering using Informative Priors), but with no priors used, MCIP-B is our method using priors with 20% of the priors mis-specified, and MCIP-C is our method with all prior pairs correctly specified.
Summarizing the inferred heart failure clusters
| 1 | 315 | 89.3 | 98910 | 0.4 | Extracellular region, basement membrane, proteinaceous extracellular matrix |
| 2 | 85 | 28.4 | 7140 | 0.1 | Positive regulation of cell cycle, polysaccharide metabolic process, carbohydrate |
| biosynthetic process |
Size = number of genes in each cluster, Upreg. (%) = percentage of upregulated genes, Edges = number of edges, Priors (%) = percentage of edges that were prior pairs, and Top three GO terms = the three most significant Gene Ontology terms found by GOstats [33].
Figure 2Heart failure result networks. Network comprised of prior pairs within the main module. Red node color means upregulated in aorta banding vs sham, green color downregulated. Red edges depict known protein-protein interactions, green edges transcription factor bindings, and blue edges illustrate protein sequence homologies. Text on protein-protein interaction edges denote type of interactions (MI), lowest pmid reuse (lpr) and number of publications (np), text on transcription factor binding edges denote prediction score, and text on protein homology edges denote sequence similarity.
Summarizing the inferred melanoma cancer clusters
| 1 | 130 | 2.3 | 16770 | 1.3 | Epidermis development, cornified envelope, keratinization |
| 2 | 25 | 4.0 | 600 | 0.0 | Androgen biosynthetic process, extracellular region, desmosome |
| 3 | 28 | 3.6 | 756 | 1.7 | Forebrain morphogenesis, response to vitamin A, negative regulation of neuron maturation |
| 4 | 25 | 0.0 | 600 | 1.2 | Keratinization, peptide cross-linking, desmosome |
| 5 | 17 | 52.9 | 272 | 1.8 | Testosterone 16-alpha-hydroxylase activity, negative regulation of Rho GTPase activity, |
| | | | | | negative regulation of epidermal growth factor-activated receptor activity |
| 6 | 13 | 61.5 | 156 | 1.9 | Alcohol sulfotransferase activity, CDP-diacylglycerol biosynthetic process, embryonic |
| | | | | | hindgut morphogenesis |
| 7 | 13 | 53.8 | 156 | 1.3 | Regulation of mesonephros development, negative regulation of cell proliferation involved |
| | | | | | in mesonephros development, negative regulation of fibroblast growth factor receptor |
| | | | | | signaling pathway involved in ureteric bud formation |
| 8 | 12 | 0.0 | 132 | 0.0 | Glutamate dehydrogenase (NAD+) activity, glutamate dehydrogenase [NAD(P)+] activity, |
| | | | | | negative regulation of myelination |
| 9 | 12 | 16.7 | 132 | 1.5 | Serine-type endopeptidase activity, serine hydrolase activity, peptidase activity, acting on |
| | | | | | L-amino acid peptides |
| 10 | 10 | 50.0 | 90 | 0.0 | Adherens junction organization,interleukin-1 Type I receptor binding, dihydrotestosterone |
| | | | | | 17-beta-dehydrogenase activity |
| 11 | 10 | 0 | 90 | 1.1 | N-acylglucosamine 2-epimerase activity, osteoclast proliferation, alkaloid catabolic process |
| 12 | 10 | 30.0 | 90 | 0.0 | Middle ear morphogenesis, vagus nerve morphogenesis, activation of phospholipase |
| | | | | | D activity by G-protein coupled receptor protein signaling pathway |
| 13 | 11 | 9.1 | 110 | 0.9 | Scavenger receptor activity, negative regulation of collateral sprouting of intact axon |
| | | | | | in response to injury, regulation of cell adhesion |
| 14 | 11 | 0.0 | 110 | 0.9 | Positive regulation of cell development, regulation of cell morphogenesis involved in |
| | | | | | differentiation, argininosuccinate synthase activity |
| 15 | 12 | 16.7 | 132 | 0.0 | Pharynx development, cellular response to external biotic stimulus, nucleus |
| 16 | 9 | 0.0 | 72 | | 1.4 Alpha-dystroglycan binding, isopeptide cross-linking via N6-(L-isoglutamyl)-L lysine, |
| | | | | | positive regulation of arachidonic acid secretion |
| 17 | 7 | 100.0 | 42 | 14.3 | Vascular transport, interleukin-1 Type II blocking receptor activity, Golgi cis cisterna |
| 18 | 6 | 66.7 | 30 | 3.3 | NLRP1 inammasome complex, 17-alpha,20-alpha-dihydroxypregn-4-en-3-one |
| dehydrogenase activity, androsterone dehydrogenase (B-specific) activity |
Size = number of genes in each cluster, Upreg. (%) = percentage of upregulated genes, Edges = number of edges, Priors (%) = percentage of edges that were prior pairs, and Top three GO terms = the three most significant Gene Ontology terms found by GOstats [33].
Figure 3Melanoma cancer result networks. Prior pairs within the largest module. Red node color means upregulated in metastatic melanoma, green color downregulated. Red edges illustrate known protein-protein interactions, green edges transcription factor bindings, and blue edges protein sequence homologies. Text on protein-protein interaction edges denote type of interactions (MI), lowest pmid reuse (lpr) and number of publications (np), text on transcription factor binding edges denote prediction score, and text on protein homology edges denote sequence similarity.
Comparing performance of our method (MCIP) with and without the use of priors to -means clustering
| | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MCIP with priors | 0.62 | 0.68 | 0.023 | 0.65 | 2 | 0.67 | 0.6 | 0.030 | 0.64 | 18 |
| MCIP without priors | 0.61 | 0.64 | 0.020 | 0.62 | 3 | 0.68 | 0.53 | 0.024 | 0.60 | 20 |
| Kmeans | 0.52 | 0.65 | 0.019 | 0.59 | 3 | 0.70 | 0.48 | 0.021 | 0.58 | 10 |
The performances are obtained by evaluating the result clusters against a literature based reference network comprising pairs of genes co-cited in Pubmed articles with the Medical Subject Headings (MeSH) left ventricular hypertrophy (for the heart failure data) and melanoma cancer (for the melanoma cancer data). Sens.=sensitivity, Spec. = specificity, PPV = Positive predictive value, AUC = Area under receiver operator curve, and K = number of clusters, which is automatically found by our method, and by using the Gap index for k-means clustering.