| Literature DB >> 21696594 |
Anthony M L Liekens1, Jeroen De Knijf, Walter Daelemans, Bart Goethals, Peter De Rijk, Jurgen Del-Favero.
Abstract
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be.Entities:
Mesh:
Year: 2011 PMID: 21696594 PMCID: PMC3218845 DOI: 10.1186/gb-2011-12-6-r57
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Schematic representation of the data integration and data mining methodology. (a) Public databases with heterogeneous biomedical relations are integrated into a common network. (b) Illustratively, genes (green circles), diseases (red boxes) and protein domains (blue diamonds) are related through gene-disease associations, gene-gene interactions and gene-domain annotations and integrated into a unified graph. (c) The a priori accessibility of each concept is computed by performing stochastic random walks to detect highly connected hubs in the network (area of a node scales with its rank score). (d) The a posteriori rank of each concept with respect to a source concept, in this case disease A, is computed by performing random walks with restarts in the source. (e) The posterior probabilities are adjusted using the prior probabilities to score the importance of each concept, specific to the source target (area of node scales with log of rank score). Genes (green circles) are ranked according to this score, gene 1 being most specific to disease A and gene 8 least specific.
Integrated databases
| Database | Concept 1 | Relation | Concept 2 | Literature references | Number of relations |
|---|---|---|---|---|---|
| BioGRID [ | Gene/protein | PPI | Gene/protein | Yes | 29,566 |
| CTD [ | Compound | Association | Gene/protein | Yes | 62,336 |
| Compound | Association | Disease | Yes | 5,438 | |
| Gene/protein | Association | Disease | Yes | 8,123 | |
| DIP [ | Gene/protein | PPI | Gene/protein | Yes | 1,524 |
| GOA [ | Gene/protein | Annotation | Gene Ontology term | No | 26,949 |
| HPRD [ | Gene/protein | PPI | Gene/protein | Yes | 149,036 |
| IntAct [ | Gene/protein | PPI | Gene/protein | Yes | 37,258 |
| InterPro [ | Gene/protein | Contains | Protein domain/repeat/region | No | 26,652 |
| Gene/protein | Is member of | Gene family | No | 22,988 | |
| Gene/gene family/protein domain/repeat/region | Annotation | Gene Ontology term | No | 18,446 | |
| KEGG [ | Gene/protein | Is part of | Pathway | No | 14,100 |
| Gene/protein | Has metabolite | Compound | No | 19,073 | |
| MeSH [ | Disease | Belongs to | Disease (family) | No | 21,282 |
| MINT [ | Gene/protein | PPI | Gene/protein | Yes | 11,389 |
| miR2Disease [ | MicroRNA | Targets | Gene | Yes | 2,615 |
| MicroRNA | Association | Disease | Yes | 344 | |
| NetworKIN [ | Gene/protein | Phosphorylates | Gene/protein | No | 2,811 |
| OMIM Morbid Map [ | Gene/protein | Association | Disease | Yes | 6,199 |
| OMIM [ | Disease | Is related to | Disease | Yes | 2,467 |
| TarBase [ | MicroRNA | Targets | Gene | No | 858 |
Overview of the 21 publicly available curated databases used to create BioGraph's heterogeneous knowledge base. Specific concept types were extracted from the various databases and integrated into a central graph. Note that these represent relations selected for Homo sapiens only. OMIM's disease-disease relations have been added after the data freeze of March 2010. CTD, Comparative Toxicogenomics Database; DIP, Database of Interacting Proteins; GOA, Gene Ontology Annotations; HPRD, Human Protein Reference Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MeSH, Medical Subject Headings; MINT, Molecular Interactions Database; OMIM, Online Mendelian Inheritance in Man; PPI, protein-protein interaction.
Figure 2Schematic representation of the backtracking heuristic to find most probable paths from a source concept . (a) Assume a network with source and target concepts. For clarity, the nodes are ordered by their accessibility from s (leftmost nodes are most accessible, rightmost nodes least accessible). (b) As a first step in the backtracking process, we find the neighbors of the target t, leading in the direction of the source, that is, the neighbors of t with highest accessibility with respect to s. (c) The paths from the target are repeatedly expanded to include highly accessible nodes leading toward the source concept. Pruning of least probable paths keeps the growing set of paths to a workable size (not shown). (d) Most probable paths that arrive in the source (continuous lines) are considered as functional hypotheses linking the target to the source concept. Unfinished paths (dashed paths) continue being expanded until k paths between s and t have been found.
Top inferred genes for schizophrenia
| Number | Gene | Prioritization hypothesis | SZ association studies |
|---|---|---|---|
| 1 | Affected by the antipsychotics aripiprazole and risperidone, neuroactive ligand-receptor interaction, associated with autistic disorder | No association studies. Associated with autistic disorder [ | |
| 2 | Target of mir-20b | No association studies | |
| 3 | Related to | Positive association [ | |
| 4 | Related to | Positive association [ | |
| 5 | Target of mir-29*, related to | Positive association [ | |
| 6 | Target of mir-29*, related to | No association studies | |
| 7 | Target of mir-206 | No association studies | |
| 8 | Related to | No association found [ | |
| 9 | Target of mir-20b, involved in CNS development | No association studies | |
| 10 | Target of mir-346 | No association studies | |
| 11 | Interacts with | No association studies | |
| 12 | Myelin sheet, interacts with | Weak positive association [ | |
| 13 | Target of mir-29*, Alzheimer's disease | No association studies. Schizophrenia-like phenotypes in | |
| 14 | Target of mir-20b | No association studies | |
| 15 | Target of mir-206, axonal and synaptic transmission | No association studies. Down-regulated in psychosis [ | |
| 16 | Interacts with | Positive association [ | |
| 17 | Related to | No association studies. Associated with epilepsy [ | |
| 18 | Interacts with | No association studies | |
| 19 | Interacts with | No association studies | |
| 20 | Interacts with | No association studies. Associated with essential tremor and Parkinson's disease [ |
BioGraph top inferred genes for schizophrenia that are not known as direct relations in the integrated network. Prioritizations are based on a data freeze of September 2009 to retrospectively verify predictions in more recent literature. CNS, central nervous system.
Figure 3ROC curve of prioritization performance on 845 recent disease-gene relations. The performance of BioGraph prioritizations is 86.14%, confirming the relations recently added to the resource databases but not present in the integrated database. The diagonal dashed line represents a theoretical random algorithm.
Figure 4Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of . Solid, dashed and dotted line styles represent the importance of the link in descending order, that is, the probability to visit the relation to reach the target gene concepts while performing random walks from the source schizophrenia concept. All links are grounded in their originating integrated curated knowledge bases, annotated with their semantic meanings and enriched by their references to the literature (not shown).
Figure 5Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of .