Literature DB >> 29890338

C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization.

Akram Vasighizaker1, Saeed Jalili2.   

Abstract

Disease gene detection is an important stage in the understanding disease processes and treatment. Some candidate disease genes are identified using many machine learning methods Although there are some differences in these methods including feature vector of genes, the method used to selecting reliable negative data (non-disease genes), and the classification method, the lack of negative data is the most significant challenge of them. Recently, candidate disease genes are identified by semi-supervised learning methods based on positive and unlabeled data. These methods are reasonably accurate and achieved more desirable results versus preceding methods. In this article, we propose a novel Positive Unlabeled (PU) learning technique based upon clustering and One-Class classification algorithm. In this regard, unlike existing methods, we make a more Reliable Negative (RN) set in three steps: (1) Clustering positive data, (2) Learning One-Class classifier models using the clusters, and (3) Selecting intersection set of negative data as the Reliable Negative set. Next, we attempt to identify and rank the candidate disease genes using a binary classifier based on support vector machine (SVM) algorithm. Experimental results indicate that the proposed method yields to the best results, that is 92.8, 93.6, and 93.1 in terms of precision, recall, and F-measure respectively. Compared to the existing methods, the increase of performances of our proposed method is 11.7 percent better than the best method in terms of F-measure. Also, results show about 6% increase in the prioritization results.
Copyright © 2018 Elsevier Ltd. All rights reserved.

Keywords:  Candidate disease genes; Classification; Clustering; Identification; Pul; Semi-supervised learning

Mesh:

Year:  2018        PMID: 29890338     DOI: 10.1016/j.compbiolchem.2018.05.022

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  3 in total

1.  Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction.

Authors:  Ming He; Chen Huang; Bo Liu; Yadong Wang; Junyi Li
Journal:  BMC Bioinformatics       Date:  2021-03-29       Impact factor: 3.169

2.  A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

Authors:  Saeid Azadifar; Ali Ahmadi
Journal:  BMC Bioinformatics       Date:  2022-10-14       Impact factor: 3.307

3.  A novel one-class classification approach to accurately predict disease-gene association in acute myeloid leukemia cancer.

Authors:  Akram Vasighizaker; Alok Sharma; Abdollah Dehzangi
Journal:  PLoS One       Date:  2019-12-11       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.