Literature DB >> 29993676

AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications.

Pengyi Yang, John T Ormerod, Wei Liu, Chendong Ma, Albert Y Zomaya, Jean Y H Yang.   

Abstract

Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data.

Mesh:

Substances:

Year:  2018        PMID: 29993676     DOI: 10.1109/TCYB.2018.2816984

Source DB:  PubMed          Journal:  IEEE Trans Cybern        ISSN: 2168-2267            Impact factor:   11.448


  5 in total

1.  Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning.

Authors:  Hani Jieun Kim; Pierre Osteil; Sean J Humphrey; Senthilkumar Cinghu; Andrew J Oldfield; Ellis Patrick; Emilie E Wilkie; Guangdun Peng; Shengbao Suo; Raja Jothi; Patrick P L Tam; Pengyi Yang
Journal:  Nucleic Acids Res       Date:  2020-02-28       Impact factor: 16.971

2.  Co-evolution based machine-learning for predicting functional interactions between human genes.

Authors:  Doron Stupp; Elad Sharon; Idit Bloch; Marinka Zitnik; Or Zuk; Yuval Tabach
Journal:  Nat Commun       Date:  2021-11-09       Impact factor: 14.919

3.  PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning.

Authors:  Junyi Zhou; Xiaoyu Lu; Wennan Chang; Changlin Wan; Xiongbin Lu; Chi Zhang; Sha Cao
Journal:  PLoS Comput Biol       Date:  2022-03-29       Impact factor: 4.475

4.  Protocol for the processing and downstream analysis of phosphoproteomic data with PhosR.

Authors:  Hani Jieun Kim; Taiyun Kim; Di Xiao; Pengyi Yang
Journal:  STAR Protoc       Date:  2021-06-05

Review 5.  Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms.

Authors:  Lihong Peng; Fuxing Liu; Jialiang Yang; Xiaojun Liu; Yajie Meng; Xiaojun Deng; Cheng Peng; Geng Tian; Liqian Zhou
Journal:  Front Genet       Date:  2020-01-31       Impact factor: 4.599

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.