| Literature DB >> 19566964 |
Tsutomu Matsunaga1, Chikara Yonemori, Etsuji Tomita, Masaaki Muramatsu.
Abstract
BACKGROUND: Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph.Entities:
Mesh:
Year: 2009 PMID: 19566964 PMCID: PMC2721841 DOI: 10.1186/1471-2105-10-205
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of a biomedical relational graph. Hypertension and hypertension-related genes are represented by nodes, and the associations between them are represented by edges.
All maximal cliques obtained from the graph in Figure 1
| Maximal clique | Size |
| {CMA1, AGT, ACE} | 3 |
| {Hypertension, AGT, ACE, AGTR1} | 4 |
| {Hypertension, AGT, ACE, REN} | 4 |
| {Hypertension, AGT, PRCP, REN} | 4 |
| {Hypertension, CYP11B2, CYP11B1, HSD11B2} | 4 |
| {Hypertension, GNB3} | 2 |
| {Hypertension, PNMT} | 2 |
| {Hypertension, SCNN1B, SCNN1G} | 3 |
| {SAH, SCNN1G} | 2 |
| {CYP17, CYP11B2, CYP11B1, HSD11B2} | 4 |
| {AGTR2, AGT, AGTR1} | 3 |
Structural properties of the graph used in the experiment
| Number of nodes | 13,722 |
| Number of edges | 35,749 |
| Average degree | 5.23 |
| Characteristic path length | 4.99 |
| Clustering coefficient | 0.27 |
The 20 most frequent genes in the maximal cliques.
| Size | ||||||||||
| Rank | Gene | Degree | Total | 2 | 3 | 4 | 5 | 6 | 7 | 8–12 |
| 1 | TNF | 176 | 244 | 19 | 89 | 64 | 44 | 20 | 8 | 0 |
| 2 | TP53 | 182 | 220 | 31 | 118 | 52 | 17 | 1 | 1 | 0 |
| 3 | NFKB1 | 153 | 217 | 15 | 57 | 53 | 53 | 25 | 9 | 5 |
| 4 | TGFB1 | 123 | 136 | 35 | 48 | 38 | 8 | 7 | 0 | 0 |
| 5 | IFNG | 128 | 115 | 19 | 34 | 33 | 21 | 7 | 1 | 0 |
| 6 | TNFRSF1A | 70 | 115 | 3 | 20 | 39 | 37 | 12 | 4 | 0 |
| 7 | RB1 | 103 | 112 | 17 | 48 | 32 | 11 | 4 | 0 | 0 |
| 8 | HD | 102 | 105 | 20 | 57 | 19 | 3 | 6 | 0 | 0 |
| 9 | MYC | 111 | 104 | 26 | 62 | 9 | 6 | 1 | 0 | 0 |
| 10 | HRAS | 103 | 103 | 26 | 50 | 26 | 0 | 1 | 0 | 0 |
| 11 | BRCA1 | 98 | 97 | 20 | 37 | 29 | 8 | 2 | 0 | 1 |
| 12 | APC | 84 | 94 | 13 | 53 | 24 | 4 | 0 | 0 | 0 |
| 13 | MAPK8 | 67 | 91 | 3 | 25 | 41 | 16 | 5 | 1 | 0 |
| 14 | EGF | 97 | 85 | 32 | 38 | 12 | 3 | 0 | 0 | 0 |
| 15 | TNFRSF6 | 66 | 81 | 7 | 24 | 28 | 22 | 0 | 0 | 0 |
| 16 | MEN1 | 69 | 74 | 21 | 34 | 17 | 2 | 0 | 0 | 0 |
| 17 | GH1 | 73 | 73 | 8 | 25 | 29 | 9 | 2 | 0 | 0 |
| 18 | NF1 | 76 | 73 | 19 | 39 | 13 | 1 | 1 | 0 | 0 |
| 19 | TAF1 | 46 | 71 | 4 | 9 | 6 | 27 | 13 | 11 | 1 |
| 20 | IL1B | 72 | 70 | 13 | 32 | 18 | 7 | 0 | 0 | 0 |
Also listed for each gene is its degree and the number of times it was found in cliques of various sizes.
Number of cliques corresponding to θ values
| Size\ | 1.0 (complete) | 0.96 | 0.92 | 0.88 |
| 14 | - | - | - | 1 |
| 13 | - | - | 5 | 27 |
| 12 | 2 | 2 | 2 | 26 |
| 11 | 1 | 11 | 6 | 31 |
| 10 | 6 | 2 | 1 | 75 |
| 9 | 4 | 11 | 17 | 394 |
| 8 | 11 | 66 | 447 | 452 |
| 7 | 49 | 16 | 53 | 347 |
| 6 | 188 | 188 | 1597 | 227 |
| 5 | 712 | 712 | 168 | 6352 |
| 4 | 2188 | 2188 | 2188 | 385 |
| 3 | 6330 | 6330 | 6330 | 6330 |
| 2 | 10995 | 10995 | 10995 | 10995 |
| Total | 20486 | 20521 | 21809 | 25642 |
The 15 KEGG pathways most relevant to the extracted gene modules
| Rank | KEGG pathway [number of genes] | Gene module size | |
| 1 | hsa00601 Blood Group Glycolipid Biosynthesis – Lact Series [6] | 4 | 0.667 (4/6) |
| 2 | hsa00920 Sulfur Metabolism [7] | 4 | 0.571 (4/7) |
| 3 | hsa00532 Chondroitin/Heparan Sulfate Biosynthesis [8] | 6 | 0.556 (5/9) |
| 4 | hsa00140 C21-Steroid Hormone Metabolism [11] | 7 | 0.500 (6/12) |
| 5 | hsa00040 Pentose and Glucuronate Interconversions [10] | 5 | 0.500 (5/10) |
| 6 | hsa00400 Phenylalanine, Tyrosine and Tryptophan Biosynthesis [8] | 3 | 0.375 (3/8) |
| 7 | hsa00511 Glycoprotein Degradation [8] | 3 | 0.375 (3/8) |
| 8 | hsa03050 Proteasome [14] | 5 | 0.357 (5/14) |
| 9 | hsa03020 RNA Polymerase [15] | 5 | 0.333 (5/15) |
| 10 | hsa00530 Aminosugars Metabolism [9] | 3 | 0.333 (3/9) |
| 11 | hsa00580 Phospholipid Degradation [9] | 3 | 0.333 (3/9) |
| 12 | hsa00062 Fatty Acid Biosynthesis (path 2) [6] | 2 | 0.333 (2/6) |
| 13 | hsa00602 Blood Group Glycolipid Biosynthesis – Neolact Series [6] | 2 | 0.333 (2/6) |
| 14 | hsa00271 Methionine Metabolism [10] | 3 | 0.300 (3/10) |
| 15 | hsa00790 Folate Biosynthesis [10] | 3 | 0.300 (3/10) |
The captured gene pairs in the protein-protein interaction database
| Rank | Number of times | Gene pair | ||
| 1 | 70 | AKT1 | - | CHUK |
| 2 | 38 | ACVRL1 | - | TGFB1 |
| 3 | 24 | AGRP | - | MC3R |
| 4 | 21 | BAK1 | - | BAX |
| 5 | 17 | CASP10 | - | FADD |
| 6 | 15 | BAX | - | BCL2L1 |
| 6 | 15 | BMPR1A | - | BMPR1B |
| 8 | 14 | ACVRL1 | - | TGFBR2 |
| 8 | 14 | CASP10 | - | CFLAR |
| 8 | 14 | CASP8 | - | FADD |
| 11 | 13 | APC | - | AXIN2 |
| 12 | 12 | BAK1 | - | BCL2L1 |
| 13 | 11 | A2M | - | APOE |
| 13 | 11 | ABL1 | - | BCR |
| 13 | 11 | APAF1 | - | CASP9 |
| 13 | 11 | ARNTL | - | CLOCK |
| 13 | 11 | CASP8 | - | CASP10 |
Typical large gene modules computationally extracted as pseudo-cliques
| Gene module | Attribute |
| {PPBP, SCYB6, GRO2, GRO3, IL8, SCYB10, IFNG, GRO1, PF4, SCYB5, MIG, SCYB11} | Family |
| {NFKBIA, NFKB1, NFKB2, RELA, REL, CHUK, MAP3K7, IKBKB, NFKBIB, MAP3K14, RELB} | Family & Complex |
| {RFC4, RFC1, BRCA1, MSH2, MLH1, APC, RFC2, MSH6, MRE11A, BLM} | Complex |
| {POLR2A, GTF2E1, GTF2B, GTF2F1, GTF2H1, TAF1, TAF10, GTF2A2, GTF2A1} | Complex |
| {TNFRSF5, NFKB1, TNF, TNFRSF1A, TNFRSF1B, CHUK, TRAF2, MAP3K14} | Pathway |
The 10 most frequent genes in the 188 extracted modules associated with the metabolic syndrome
| Size | ||||||||
| Rank | Gene | Total | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | INS | 29 | 2 | 6 | 2 | 18 | 1 | 0 |
| 2 | LEP | 27 | 2 | 6 | 4 | 12 | 3 | 0 |
| 3 | POMC | 16 | 1 | 0 | 3 | 10 | 2 | 0 |
| 4 | PCSK1 | 13 | 0 | 1 | 2 | 9 | 1 | 0 |
| 5 | IRS2 | 12 | 0 | 0 | 0 | 11 | 1 | 0 |
| 5 | IGF1 | 12 | 0 | 1 | 0 | 10 | 1 | 0 |
| 5 | INSR | 12 | 3 | 4 | 1 | 4 | 0 | 0 |
| 8 | IRS1 | 11 | 0 | 0 | 1 | 9 | 1 | 0 |
| 9 | MC4R | 10 | 0 | 0 | 1 | 7 | 2 | 0 |
| 10 | FGF1 | 9 | 0 | 0 | 1 | 4 | 2 | 2 |
Figure 2188 modules associated with obesity, diabetes, hyperlipidemia, and hypertension. The vertical bars indicate the genes/diseases in the modules, the columns represent modules, and the rows represent the genes/diseases. The rows and columns are sorted in the ascending order by the score calculated by correspondence analysis (see Methods). Red, grey, blue, and green bars respectively indicate cliques that contain the hypertension, hyperlipidemia, obesity, and diabetes nodes. The letters 't,' 'l,' 'o,' and 'd' on the right show that in the literature the genes are related to hypertension, hyperlipidemia, obesity, and diabetes (see text for the literature references).