| Literature DB >> 20840752 |
Daniela Nitsch1, Joana P Gonçalves, Fabian Ojeda, Bart de Moor, Yves Moreau.
Abstract
BACKGROUND: Discovering novel disease genes is still challenging for diseases for which no prior knowledge--such as known disease genes or disease-related pathways--is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.Entities:
Mesh:
Year: 2010 PMID: 20840752 PMCID: PMC2945940 DOI: 10.1186/1471-2105-11-460
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of prioritization results.
| STRING v. 7.1 | STRING v. 8.2 | |||||||
|---|---|---|---|---|---|---|---|---|
| top 10 | top 20 | AUC | top 10 | top 20 | AUC | |||
| log2 ratio | 27 | 31 | 0.859 | 12 | 23 | 0.747 | ||
| sign. log2 ratio | 28 | 31 | 0.880 | 12 | 24 | 0.760 | ||
| test statistic | 29 | 30 | 0.856 | 13 | 22 | 0.738 | ||
| log2 ratio | 23 | 29 | 0.809 | 14 | 26 | 0.759 | ||
| sign. log2 ratio | 27 | 32 | 0.868 | 17 | 25 | 0.817 | ||
| test statistic | 20 | 26 | 0.771 | 15 | 20 | 0.691 | ||
| log2 ratio | 34 | 0.900 | 33 | 0.913 | ||||
| all expression values for | sign. log2 ratio | 31 | 34 | 31 | 35 | |||
| test statistic | 34 | 0.901 | 34 | 0.911 | ||||
| log2 ratio | 27 | 31 | 0.857 | 27 | 29 | 0.851 | ||
| sign. log2 ratio | 28 | 31 | 0.885 | 28 | 31 | 0.873 | ||
| test statistic | 28 | 30 | 0.855 | 28 | 29 | 0.844 | ||
| log2 ratio | 27 | 31 | 0.874 | 12 | 23 | 0.761 | ||
| sign. log2 ratio | 25 | 28 | 0.855 | 11 | 22 | 0.736 | ||
| test statistic | 27 | 31 | 0.863 | 12 | 24 | 0.750 | ||
| log2 ratio | 21 | 28 | 0.769 | 17 | 24 | 0.756 | ||
| sign. log2 ratio | 27 | 31 | 0.835 | 20 | 26 | 0.796 | ||
| test statistic | 19 | 22 | 0.745 | 16 | 23 | 0.744 | ||
| log2 ratio | 31 | 33 | 34 | |||||
| all expression values for | sign. log2 ratio | 28 | 33 | 0.889 | 29 | 34 | 0.907 | |
| test statistic | 33 | 0.895 | 35 | 0.913 | ||||
| log2 ratio | 27 | 32 | 0.875 | 26 | 31 | 0.874 | ||
| sign. log2 ratio | 25 | 29 | 0.860 | 25 | 28 | 0.852 | ||
| test statistic | 27 | 31 | 0.865 | 28 | 31 | 0.862 | ||
| log2 ratio | 23 | 27 | 0.846 | 9 | 20 | 0.743 | ||
| sign. log2 ratio | 25 | 28 | 0.844 | 11 | 22 | 0.729 | ||
| test statistic | 27 | 30 | 0.855 | 12 | 24 | 0.745 | ||
| log2 ratio | 18 | 24 | 0.736 | 17 | 24 | 0.766 | ||
| sign. log2 ratio | 23 | 29 | 0.834 | 17 | 22 | 0.755 | ||
| test statistic | 13 | 18 | 0.790 | 16 | 24 | 0.790 | ||
| log2 ratio | 26 | 32 | 0.877 | 28 | 34 | 0.890 | ||
| all expression values for | sign. log2 ratio | 26 | 31 | 0.877 | 26 | 34 | 0.899 | |
| test statistic | 32 | 34 | ||||||
| log2 ratio | 25 | 27 | 0.849 | 24 | 27 | 0.851 | ||
| sign. log2 ratio | 25 | 29 | 0.853 | 25 | 27 | 0.847 | ||
| test statistic | 26 | 30 | 0.858 | 27 | 29 | 0.850 | ||
The results are based on optimized parameter settings for all presented strategies using different STRING networks. Note that the results of the Simple Expression Ranking do not depend on the network because they are only using differential expression levels to rank the candidate genes.
Figure 1Performance comparison of . Comparison of the performance between STRING network version 7.1 and version 8.2 applying the four presented strategies using RMA preprocessed data and the significant log2 ratio as the expression measure in comparison to the Simple Expression Ranking using MAS5 preprocessed data.
Overview of the global network properties of the underlying networks.
| Database (mouse) | Number of Genes | Number of Interactions | Average Node Degree |
|---|---|---|---|
| STRING v7.1 | 16,566 | 820,177 | 49.5 |
| STRING v8.2 | 24,442 | 1,405,375 | 57.5 |
| BioGRID v2.0.61 | 1,417 | 2,026 | 2.5 |
| I2 D v1.72 | 10,867 | 79,088 | 10.6 |
The benchmark data.
| Gene Name | GEO accession number | Gene Name | GEO accession number | ||
|---|---|---|---|---|---|
| 1 | Abca1 | GSE5496 | 21 | Mbnl1 | GSE14691 |
| 2 | Btk | GSE2826 | 22 | Mst1r, Ron | GSE16629 |
| 3 | Cav1 | GSE10849 | 23 | MyD88 | GSE6688 |
| 4 | Cav3 | GSE10848 | 24 | Nos3, eNos | GSE1988 |
| 5 | Cftr | GSE5715 | 25 | Phgdh | GSE8555 |
| 6 | Clcn1 | GSE14691 | 26 | Pmp22 | GSE1947 |
| 7 | Cnr1 | GSE7694 | 27 | PPAR | GSE6864 |
| 8 | Emd | GSE5304 | 28 | Prkag3, AMPK G3 | GSE4065 |
| 9 | Epas1, Hif-2 | GSE16067 | 29 | Pthlh, Pthrp | GSE17654 |
| 10 | Esrra | GSE7196 | 30 | Rab3a | GSE6527 |
| 11 | Gap43 | GSE12687 | 31 | RasGrf1 | GSE8425 |
| 12 | Gnmt | GSE9809 | 32 | Rbm15 | GSE12628 |
| 13 | Hdac1 | GSE5583 | 33 | Runx | GSE4911 |
| 14 | Hdac2 | GSE6770 | 34 | Scd1 | GSE2926 |
| 15 | Hsf4 | GSE12415 | 35 | Slc26a4 | GSE10587 |
| 16 | Hspa1A, Hsp70.1 | GSE11120 | 36 | Srf | GSE13333 |
| 17 | Il6 | GSE411 | 37 | Tgm2 | GSE10285 |
| 18 | Lhx1, Lim1 | GSE4230 | 38 | Zc3h12a | GSE14891 |
| 19 | Lhx8 | GSE11897 | 39 | Zfp36, Tpp | GSE5324 |
| 20 | Lmna | GSE5304 | 40 | Zfx | GSE7069 |
The benchmark consists of 40 publicly available data sets originated from Affymetrix chips on which mice with (simple) knockout genes were tested against controls.