Literature DB >> 31777414

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale.

Stephen H Bach1, Daniel Rodriguez2, Yintao Liu2, Chong Luo2, Haidong Shao2, Cassandra Xia2, Souvik Sen2, Alex Ratner3, Braden Hancock3, Houman Alborzi2, Rahul Kuchhal2, Chris Ré3, Rob Malkin2.   

Abstract

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and executes over millions of data points in tens of minutes.

Entities:  

Keywords:  Systems for machine learning; weak supervision

Year:  2019        PMID: 31777414      PMCID: PMC6879379          DOI: 10.1145/3299869.3314036

Source DB:  PubMed          Journal:  Proc ACM SIGMOD Int Conf Manag Data        ISSN: 0730-8078


  7 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Learning the Structure of Generative Models without Labeled Data.

Authors:  Stephen H Bach; Bryan He; Alexander Ratner; Christopher Ré
Journal:  Proc Mach Learn Res       Date:  2017-08

3.  Extracting Databases from Dark Data with DeepDive.

Authors:  Ce Zhang; Jaeho Shin; Christopher Ré; Michael Cafarella; Feng Niu
Journal:  Proc ACM SIGMOD Int Conf Manag Data       Date:  2016 Jun-Jul

4.  Zero-Shot Learning-A Comprehensive Evaluation of the Good, the Bad and the Ugly.

Authors:  Yongqin Xian; Christoph H Lampert; Bernt Schiele; Zeynep Akata
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-07-19       Impact factor: 6.226

5.  Training Complex Models with Multi-Task Weak Supervision.

Authors:  Alexander Ratner; Braden Hancock; Jared Dunnmon; Frederic Sala; Shreyash Pandey; Christopher Ré
Journal:  Proc Conf AAAI Artif Intell       Date:  2019 Jan-Feb

6.  Data Programming: Creating Large Training Sets, Quickly.

Authors:  Alexander Ratner; Christopher De Sa; Sen Wu; Daniel Selsam; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2016-12

7.  Snorkel: Rapid Training Data Creation with Weak Supervision.

Authors:  Alexander Ratner; Stephen H Bach; Henry Ehrenberg; Jason Fries; Sen Wu; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2017-11
  7 in total
  3 in total

1.  Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.

Authors:  Vincent S Chen; Sen Wu; Zhenzhen Weng; Alexander Ratner; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2019-12

2.  Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset.

Authors:  Ross W Filice; Anouk Stein; Carol C Wu; Veronica A Arteaga; Stephen Borstelmann; Ramya Gaddikeri; Maya Galperin-Aizenberg; Ritu R Gill; Myrna C Godoy; Stephen B Hobbs; Jean Jeudy; Paras C Lakhani; Archana Laroia; Sundeep M Nayak; Maansi R Parekh; Prasanth Prasanna; Palmi Shah; Dharshan Vummidi; Kavitha Yaddanapudi; George Shih
Journal:  J Digit Imaging       Date:  2020-04       Impact factor: 4.056

3.  Cross-Modal Data Programming Enables Rapid Medical Machine Learning.

Authors:  Jared A Dunnmon; Alexander J Ratner; Khaled Saab; Nishith Khandwala; Matthew Markert; Hersh Sagreiya; Roger Goldman; Christopher Lee-Messer; Matthew P Lungren; Daniel L Rubin; Christopher Ré
Journal:  Patterns (N Y)       Date:  2020-04-28
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.