| Literature DB >> 23282070 |
Abstract
BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification.Entities:
Mesh:
Year: 2012 PMID: 23282070 PMCID: PMC3521411 DOI: 10.1186/1471-2164-13-S7-S27
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The procedure of generating KNN graph with . Each column of the similarity matrix represents the similarity between one data point and others, in which two shaded units are the 2 nearest neighbors of one data point. The adjacency matrix is derived from the similarity matrix.
Figure 2Construction of a multigraph by merging. Three kinds of links denote three kinds of relationships obtained from different data sources.
Figure 3A Complex Heterogenous Network (CHN) is constructed by connecting between a phenotype network and a multigraph gene network. Nodes between these two types of networks are called bridging nodes. The other nodes are called internal nodes.
Overview of Four Gene Networks and the Merged Gene Network
| Data Source | Nodes | Edges |
|---|---|---|
| PPI Network | 9,474 | 36,619 |
| GO-BP | 9,740 | 74,184 |
| GO-CC | 7,400 | 64,709 |
| GO-MF | 10,661 | 132,311 |
| Merged Network | 14,529 | 307,823 |
Figure 4The procedure of leave one out cross validation.
Figure 5ROC curves of RWRM and four RABI models, on the benchmark dataset of 36 diseases.
Performance of five different integration models on 36 diseases in terms of overall AUC values (%)
| Integration Models | Overall AUC (%) |
|---|---|
| RWRM | 89.4 |
| RWR+DRS | 87.7 |
| RWR+NDOS | 85.1 |
| 1CSVM+DRS | 85.4 |
| 1CSVM+NDOS | 84.4 |
Figure 6The impact of . This figure shows the performance of single networks from BP, CC, MF and PPI, constructed with various K values, and the performance of three integration models.
Top five ranked candidate genes for each phenotype of IDDM
| SN | MIM ID | Locus | Number of Candidate Genes | Top 5 Candidate Genes |
|---|---|---|---|---|
| 1 | 600318 | 15q26 | 72 | BLM, IGF1R, FURIN, RHCG, |
| 2 | 600319 | 11q13 | 263 | PPME1, RELA, STIP1, GSTP1, MEN1 |
| 3 | 601941 | 18q21 | 88 | BCL2, TXNL1, MYO5B, SMAD4, TNFRSF11A |
| 4 | 600321 | 2q31 | 81 | |
| 5 | 600883 | 6q25-q27 | 38 | ESR1, PLG, TBP, VIL2, IGF2R |
| 6 | 601208 | 14q24.3-q31 | 85 | TSHR, FOS, EIF2B2, NEK9, PTPN21 |
| 7 | 601318 | 2q34 | 19 | IDH1, ERBB4, PIP5K3, LANCL1, PTHR2 |
| 8 | 601666 | 6q21 | 51 | FYN, CDC40, ATG5, CD164, TUBE1 |
| 9 | 603266 | 10q25 | 55 | |
| 10 | 605598 | 5q31.1-q33.1 | 241 | SPINK1, |
| 11 | 612520 | 12q24 | 216 | SH2B3, TCF1, OAS1, PTPN11, OAS2 |
| 12 | 612521 | 6q25 | 70 | ESR1, SYNE1, VIL2, LATS1, OPRM1 |
| 13 | 612622 | 4q27 | 17 | CCNA2, IL2, MAD2L1, FGF2, TRPC3 |
| 14 | 613006 | 10q23.31 | 26 | PTEN, ACTA2, PANK1, ANKRD1, MPHOSPH1 |