Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Snorkel: Rapid Training Data Creation with Weak Supervision.

Literature DB >> 29770249

Snorkel: Rapid Training Data Creation with Weak Supervision.

Alexander Ratner¹, Stephen H Bach¹, Henry Ehrenberg¹, Jason Fries¹, Sen Wu¹, Christopher Ré¹.

Abstract

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

Entities: Chemical Disease Gene Species

Year: 2017 PMID： 29770249 PMCID： PMC5951191 DOI： 10.14778/3157794.3157797

Source DB: PubMed Journal: Proceedings VLDB Endowment ISSN： 2150-8097

7 in total

1. Training products of experts by minimizing contrastive divergence.

Authors: Geoffrey E Hinton
Journal: Neural Comput Date: 2002-08 Impact factor: 2.026

2. Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

Authors: Alex Graves; Jürgen Schmidhuber
Journal: Neural Netw Date: 2005 Jun-Jul

3. Ranking and combining multiple predictors without labeled data.

Authors: Fabio Parisi; Francesco Strino; Boaz Nadler; Yuval Kluger
Journal: Proc Natl Acad Sci U S A Date: 2014-01-13 Impact factor: 11.205

4. The mobilize center: an NIH big data to knowledge center to advance human movement research and improve mobility.

Authors: Joy P Ku; Jennifer L Hicks; Trevor Hastie; Jure Leskovec; Christopher Ré; Scott L Delp
Journal: J Am Med Inform Assoc Date: 2015-08-13 Impact factor: 4.497

5. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

Authors: Allan Peter Davis; Thomas C Wiegers; Phoebe M Roberts; Benjamin L King; Jean M Lay; Kelley Lennon-Hopkins; Daniela Sciaky; Robin Johnson; Heather Keating; Nigel Greene; Robert Hernandez; Kevin J McConnell; Ahmed E Enayetallah; Carolyn J Mattingly
Journal: Database (Oxford) Date: 2013-11-28 Impact factor: 3.451

6. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

Authors: Ron Caspi; Richard Billington; Luciana Ferrer; Hartmut Foerster; Carol A Fulcher; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Pallavi Subhraveti; Daniel S Weaver; Peter D Karp
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

7. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Authors: Chih-Hsuan Wei; Yifan Peng; Robert Leaman; Allan Peter Davis; Carolyn J Mattingly; Jiao Li; Thomas C Wiegers; Zhiyong Lu
Journal: Database (Oxford) Date: 2016-03-19 Impact factor: 3.451

7 in total

30 in total

1. Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.

Authors: Vincent S Chen; Sen Wu; Zhenzhen Weng; Alexander Ratner; Christopher Ré
Journal: Adv Neural Inf Process Syst Date: 2019-12

2. Training Complex Models with Multi-Task Weak Supervision.

Authors: Alexander Ratner; Braden Hancock; Jared Dunnmon; Frederic Sala; Shreyash Pandey; Christopher Ré
Journal: Proc Conf AAAI Artif Intell Date: 2019 Jan-Feb

3. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale.

Authors: Stephen H Bach; Daniel Rodriguez; Yintao Liu; Chong Luo; Haidong Shao; Cassandra Xia; Souvik Sen; Alex Ratner; Braden Hancock; Houman Alborzi; Rahul Kuchhal; Chris Ré; Rob Malkin
Journal: Proc ACM SIGMOD Int Conf Manag Data Date: 2019 Jun-Jul

4. Snuba: Automating Weak Supervision to Label Training Data.

Authors: Paroma Varma; Christopher Ré
Journal: Proceedings VLDB Endowment Date: 2018-11

5. TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Authors: Shorabuddin Syed; Adam Jackson Angel; Hafsa Bareen Syeda; Carole Franc Jennings; Joseph VanScoy; Mahanazuddin Syed; Melody Greer; Sudeepa Bhattacharyya; Shaymaa Al-Shukri; Meredith Zozus; Fred Prior; Benjamin Tharian
Journal: Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap Date: 2022-02