| Literature DB >> 22923290 |
Peng Yang1, Xiao-Li Li, Jian-Ping Mei, Chee-Keong Kwoh, See-Kiong Ng.
Abstract
BACKGROUND: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly.Entities:
Mesh:
Year: 2012 PMID: 22923290 PMCID: PMC3467748 DOI: 10.1093/bioinformatics/bts504
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Extract reliable negative gene set (RN) from U
Overall comparison among different techniques
| Techniques | Precision ( | Recall ( | F-measure ( |
|---|---|---|---|
| PUDI | 72.3 | 81.0 | 76.5 |
| ProDiGe | 72.4 | 75.9 | 74.5 |
| Smalter’s method | 62.9 | 61.5 | 62.2 |
| Xu’s method (1) | 65.0 | 55.6 | 59.9 |
| Xu’s method (5) | 66.3 | 57.1 | 61.3 |
Results of individual feature and combinations of features
| Category | Precision ( | Recall ( | F-measure ( |
|---|---|---|---|
| BP | 63.4 | 81.3 | 71.3 |
| MF | 50.3 | 99.6 | 68.6 |
| CC | 54.5 | 93.5 | 67.8 |
| D | 56.2 | 86.5 | 68.1 |
| PPI | 55.1 | 88.2 | 67.8 |
| ALL-BP | 65.3 | 83.3 | 73.2 |
| ALL-MF | 66.0 | 84.7 | 74.2 |
| ALL-CC | 67.4 | 85.7 | 75.4 |
| ALL-D | 62.3 | 86.9 | 72.6 |
| ALL-PPI | 67.9 | 86.7 | 76.1 |
Cardiovascular and endocrine disease gene classification
| Disease class | Techniques | Precision ( | Recall ( | F-measure ( |
|---|---|---|---|---|
| Cardiovascular diseases | PUDI | 82.0 | 80.6 | 80.4 |
| ProDiGe | 54.3 | 96.3 | 69.3 | |
| Smalter’s method | 75.4 | 67.6 | 70.6 | |
| Xu’s method (1) | 72.1 | 60.0 | 65.4 | |
| Xu’s method (5) | 73.6 | 63.0 | 67.9 | |
| Endocrine diseases | PUDI | 83.6 | 75.3 | 79.2 |
| ProDiGe | 57.3 | 87.7 | 69.3 | |
| Smalter’s method | 76.4 | 58.8 | 66.5 | |
| Xu’s method (1) | 75.4 | 62.0 | 68.0 | |
| Xu’s method (5) | 72.5 | 62.2 | 67.0 |
Predicted novel disease genes using all confirmed genes
| Genes | Prob (%) | Relevant disease | References |
|---|---|---|---|
| GP5 | 99.2 | Bernard–Soulier syndrome | (Roth |
| Gray platelet syndrome | (Berger | ||
| Platelet disorder | (Shi | ||
| Autoimmune thrombocytopenia | (Mayer | ||
| Coagulopathy | (Modderman | ||
| Thrombocytopenia | (Acar | ||
| Thrombosis | (Ravanat | ||
| ALG13 | 97.9 | ||
| ADPRHL1 | 96.7 | ||
| PARVA | 96.6 | Tumors | (Attwell |
| Cancer | (Sepulveda | ||
| ODAM | 96.4 | ||
| ANGPTL1 | 96.3 | Melanoma | (Smagur |
| Tumors | (Xu | ||
| PTK7 | 96.1 | Panic | (Eser |
| Panic attacks | (van Megen | ||
| Panic disorder | (Bradwejn | ||
| Premenstrual dysphoric disorder | (Le Mellédo | ||
| Effects cardiovascular | (Bradwejn | ||
| Agoraphobia | (Koszycki | ||
| Anxiety disorders | (Bradwejn | ||
| Colon carcinoma | (Mossie | ||
| WSB1 | 95.7 | neurobalstoma | (Chen, 2006) |
| AFF1 | 95.0 | Lymphoblastic leukemia acute | (Bertrand |
| Acute leukemia | (Chen | ||
| Leukemogenesis | (Yamamoto | ||
| Leukemia | (Li | ||
| Chromosomal aberrations | (Nakamura | ||
| INHBB | 94.7 | Tumors | (Peschon |
| MAPK12 | 94.4 | Shock | (Cuenda |
| PHLDA1 | 94.3 | Tumors | (Nagai |
| CABLES2 | 94.0 | ||
| BDH2 | 94.0 | ||
| CD97 | 94.0 | Thyroid carcinoma | (Hoang-Vu |
| Thyroid carcinoma anaplastic | (Hoang-Vu | ||
| Arthritis reactive | (Hamann | ||
| Colorectal tumors | (Steinert | ||
| Colorectal carcinoma | (Steinert | ||
| SLC29A4 | 93.9 | ||
| FAIM | 93.8 | Leukemia, lymphocytic, Acute | (Ross |
| EIF2AK2 | 93.8 | Virus infection | (Gil |
| Vesicular stomatitis | (Lee | ||
| Hepatitis c | (Hiasa | ||
| Influenza | (Min | ||
| Herpes simplex | (Smith | ||
| KRT20 | 93.7 | Carcinoma merkel cell | (Cheuk |
| Carcinoma mucinous | (Ji | ||
| adenocarcinoma | (Chen | ||
| ITGB1BP2 | 93.7 | Cardiac hypertrophy | (Brancaccio |
| hypertrophy | (Palumbo et al., 2009) |