Literature DB >> 31056508

On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics.

Chao Lan, Sai Nivedita Chandrasekaran, Jun Huan.   

Abstract

In cheminformatics, compound-target binding profiles has been a main source of data for research. For data repositories that only provide positive profiles, a popular assumption is that unreported profiles are all negative. In this paper, we caution the audience not to take this assumption for granted, and present empirical evidence of its ineffectiveness from a machine learning perspective. Our examination is based on a setting where binding profiles are used as features to train predictive models; we show (1) prediction performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves prediction performance. In particular, we propose a framework that jointly recovers profiles and learns predictive model, and show it achieves further performance improvement. The presented study not only suggests applying matrix recovery methods to recover unreported profiles, but also initiates a new missing feature problem which we called Learning with Positive and Unknown Features.

Mesh:

Year:  2019        PMID: 31056508     DOI: 10.1109/TCBB.2019.2913855

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  1 in total

1.  SeEn: Sequential enriched datasets for sequence-aware recommendations.

Authors:  Marcia Barros; André Moitinho; Francisco M Couto
Journal:  Sci Data       Date:  2022-08-04       Impact factor: 8.501

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.