Literature DB >> 32255883

PUlasso: High-Dimensional Variable Selection With Presence-Only Data.

Hyebin Song1, Garvesh Raskutti1.   

Abstract

In various real-world problems, we are presented with classification problems with positive and unlabeled data, referred to as presence-only responses. In this article we study variable selection in the context of presence only responses where the number of features or covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this article, we develop the PUlasso algorithm for variable selection and classification with positive and unlabeled responses. Our algorithm involves using the majorization-minimization framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm converges to a stationary point, and then prove that any stationary point within a local neighborhood of the true parameter achieves the minimax optimal mean-squared error under both strict sparsity and group sparsity assumptions. We also demonstrate through simulations that our algorithm outperforms state-of-the-art algorithms in the moderate p settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example. Supplementary materials for this article are available online.

Entities:  

Keywords:  Majorization-minimization; Nonconvexity, PU-learning; Regularization

Year:  2019        PMID: 32255883      PMCID: PMC7133715          DOI: 10.1080/01621459.2018.1546587

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  8 in total

1.  Sparse multinomial logistic regression: fast algorithms and generalization bounds.

Authors:  Balaji Krishnapuram; Lawrence Carin; Mário A T Figueiredo; Alexander J Hartemink
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-06       Impact factor: 6.226

2.  Presence-only data and the em algorithm.

Authors:  Gill Ward; Trevor Hastie; Simon Barry; Jane Elith; John R Leathwick
Journal:  Biometrics       Date:  2009-06       Impact factor: 2.571

3.  Experimental illumination of a fitness landscape.

Authors:  Ryan T Hietpas; Jeffrey D Jensen; Daniel N A Bolon
Journal:  Proc Natl Acad Sci U S A       Date:  2011-04-04       Impact factor: 11.205

4.  Dissecting enzyme function with microfluidic-based deep mutational scanning.

Authors:  Philip A Romero; Tuan M Tran; Adam R Abate
Journal:  Proc Natl Acad Sci U S A       Date:  2015-05-26       Impact factor: 11.205

5.  Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors.

Authors:  Patrick Breheny; Jian Huang
Journal:  Stat Comput       Date:  2015-03       Impact factor: 2.559

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  Strong rules for discarding predictors in lasso-type problems.

Authors:  Robert Tibshirani; Jacob Bien; Jerome Friedman; Trevor Hastie; Noah Simon; Jonathan Taylor; Ryan J Tibshirani
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-03       Impact factor: 4.488

8.  Deep mutational scanning: a new style of protein science.

Authors:  Douglas M Fowler; Stanley Fields
Journal:  Nat Methods       Date:  2014-08       Impact factor: 28.547

  8 in total
  4 in total

1.  Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Authors:  Hyebin Song; Bennett J Bremer; Emily C Hinds; Garvesh Raskutti; Philip A Romero
Journal:  Cell Syst       Date:  2020-11-18       Impact factor: 10.304

2.  A semi-supervised model to predict regulatory effects of genetic variants at single nucleotide resolution using massively parallel reporter assays.

Authors:  Zikun Yang; Chen Wang; Stephanie Erjavec; Lynn Petukhova; Angela Christiano; Iuliana Ionita-Laza
Journal:  Bioinformatics       Date:  2021-01-30       Impact factor: 6.937

3.  Microfluidic deep mutational scanning of the human executioner caspases reveals differences in structure and regulation.

Authors:  Hridindu Roychowdhury; Philip A Romero
Journal:  Cell Death Discov       Date:  2022-01-10

4.  PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning.

Authors:  Junyi Zhou; Xiaoyu Lu; Wennan Chang; Changlin Wan; Xiongbin Lu; Chi Zhang; Sha Cao
Journal:  PLoS Comput Biol       Date:  2022-03-29       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.