Literature DB >> 31777681

Snuba: Automating Weak Supervision to Label Training Data.

Paroma Varma1, Christopher Ré1.   

Abstract

As deep learning models are applied to increasingly diverse problems, a key bottleneck is gathering enough high-quality training labels tailored to each task. Users therefore turn to weak supervision, relying on imperfect sources of labels like pattern matching and user-defined heuristics. Unfortunately, users have to design these sources for each task. This process can be time consuming and expensive: domain experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text patterns. To address these challenges, we present Snuba, a system to automatically generate heuristics using a small labeled dataset to assign training labels to a large, unlabeled dataset in the weak supervision setting. Snuba generates heuristics that each labels the subset of the data it is accurate for, and iteratively repeats this process until the heuristics together label a large portion of the unlabeled data. We develop a statistical measure that guarantees the iterative process will automatically terminate before it degrades training label quality. Snuba automatically generates heuristics in under five minutes and performs up to 9.74 F1 points better than the best known user-defined heuristics developed over many days. In collaborations with users at research labs, Stanford Hospital, and on open source datasets, Snuba outperforms other automated approaches like semi-supervised learning by up to 14.35 F1 points.

Entities:  

Year:  2018        PMID: 31777681      PMCID: PMC6879381          DOI: 10.14778/3291264.3291268

Source DB:  PubMed          Journal:  Proceedings VLDB Endowment        ISSN: 2150-8097


  11 in total

1.  Constructing biological knowledge bases by extracting information from text sources.

Authors:  M Craven; J Kumlien
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1999

2.  Training products of experts by minimizing contrastive divergence.

Authors:  Geoffrey E Hinton
Journal:  Neural Comput       Date:  2002-08       Impact factor: 2.026

3.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

Authors:  Alex Graves; Jürgen Schmidhuber
Journal:  Neural Netw       Date:  2005 Jun-Jul

4.  scikit-image: image processing in Python.

Authors:  Stéfan van der Walt; Johannes L Schönberger; Juan Nunez-Iglesias; François Boulogne; Joshua D Warner; Neil Yager; Emmanuelle Gouillart; Tony Yu
Journal:  PeerJ       Date:  2014-06-19       Impact factor: 2.984

5.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

6.  Learning the Structure of Generative Models without Labeled Data.

Authors:  Stephen H Bach; Bryan He; Alexander Ratner; Christopher Ré
Journal:  Proc Mach Learn Res       Date:  2017-08

7.  Data Programming: Creating Large Training Sets, Quickly.

Authors:  Alexander Ratner; Christopher De Sa; Sen Wu; Daniel Selsam; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2016-12

8.  Inferring Generative Model Structure with Static Analysis.

Authors:  Paroma Varma; Bryan He; Payal Bajaj; Imon Banerjee; Nishith Khandwala; Daniel L Rubin; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2017-12

9.  Snorkel: Rapid Training Data Creation with Weak Supervision.

Authors:  Alexander Ratner; Stephen H Bach; Henry Ehrenberg; Jason Fries; Sen Wu; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2017-11

10.  Fonduer: Knowledge Base Construction from Richly Formatted Data.

Authors:  Sen Wu; Luke Hsiao; Xiao Cheng; Braden Hancock; Theodoros Rekatsinas; Philip Levis; Christopher Ré
Journal:  Proc ACM SIGMOD Int Conf Manag Data       Date:  2018-06
View more
  1 in total

1.  Rule-Enhanced Active Learning for Semi-Automated Weak Supervision.

Authors:  David Kartchner; Davi Nakajima An; Wendi Ren; Chao Zhang; Cassie S Mitchell
Journal:  Artif Intell       Date:  2022-03-16       Impact factor: 14.050

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.