| Literature DB >> 35845102 |
David Kartchner1,2,3, Davi Nakajima An1,2, Wendi Ren2, Chao Zhang2,4, Cassie S Mitchell1,3,4.
Abstract
A major bottleneck preventing the extension of deep learning systems to new domains is the prohibitive cost of acquiring sufficient training labels. Alternatives such as weak supervision, active learning, and fine-tuning of pretrained models reduce this burden but require substantial human input to select a highly informative subset of instances or to curate labeling functions. REGAL (Rule-Enhanced Generative Active Learning) is an improved framework for weakly supervised text classification that performs active learning over labeling functions rather than individual instances. REGAL interactively creates high-quality labeling patterns from raw text, enabling a single annotator to accurately label an entire dataset after initialization with three keywords for each class. Experiments demonstrate that REGAL extracts up to 3 times as many high-accuracy labeling functions from text as current state-of-the-art methods for interactive weak supervision, enabling REGAL to dramatically reduce the annotation burden of writing labeling functions for weak supervision. Statistical analysis reveals REGAL performs equal or significantly better than interactive weak supervision for five of six commonly used natural language processing (NLP) baseline datasets.Entities:
Keywords: active learning; data labeling; natural language processing; text classification; text mining; weak supervision
Year: 2022 PMID: 35845102 PMCID: PMC9281613 DOI: 10.3390/ai3010013
Source DB: PubMed Journal: Artif Intell ISSN: 0004-3702 Impact factor: 14.050
Figure 1.REGAL model setup. REGAL takes unlabeled documents and seed rules as input. It then iteratively proposes new labeling functions by extracting high-quality patterns from the training data and soliciting user feedback about which to keep.
Figure 2.Labeling structure for traditional active learning, weak supervision, and REGAL. In traditional active learning, high-value instances are selected and sent to a human annotators for labeling. In traditional weak supervision, annotators write rules based on patterns they observe in data. REGAL synthesizes these two approaches by extracting high-value candidate LFs which are then filtered by human annotators.
Figure 3.Model architecture for REGAL.
Summary of REGAL data and rule generation parameters. The data below describe the respective sizes of traditional train, validate, and test sets, though REGAL only extracts rules from the train set. Coverage denotes the total coverage of the initial set of seed rules, whereas Bal. Coverage denotes the accuracy after downsampling to balance class-wise labeling propensities.
| Dataset | # Train | # Valid | # Test | # Classes | Coverage | Bal. Coverage |
|---|---|---|---|---|---|---|
| Yelp | 30,400 | 3800 | 3800 | 2 | 0.2239 | 0.1042 |
| IMDB | 24,500 | 500 | 25,000 | 2 | 0.1798 | 0.1663 |
| AG News | 96,000 | 12,000 | 12,000 | 4 | 0.0963 | 0.0144 |
| Journalist/Photographer | 15,629 | 500 | 16,129 | 2 | 0.3211 | 0.2364 |
| Professor/Physician | 26,738 | 500 | 27,238 | 2 | 0.5149 | 0.3772 |
| Professor/Teacher | 11,794 | 500 | 12,294 | 2 | 0.5195 | 0.3574 |
| Painter/Architect | 5618 | 500 | 6118 | 2 | 0.4516 | 0.2650 |
Performance comparison of LF extraction methods. LF accuracy and coverage are averaged over all LFs produced by the model.
| Dataset | Model | LF Acc | Coverage | LM Acc | LM AUC | |
|---|---|---|---|---|---|---|
| AG News | IWS | - | - | - | - | - |
| REEF/Snuba | - | - | - | - | - | |
| REGAL | 280 | 0.912 | 0.007 | 0.856 | - | |
| FS BERT | - | - | - | 0.952 | - | |
| IMDB | IWS | 35 |
| 0.065 |
|
|
| REEF/Snuba | 50 | 0.729 |
| 0.722 | 0.787 | |
| REGAL |
| 0.787 | 0.017 | 0.510 | 0.757 | |
| FS BERT | - | - | - | 0.914 | 0.974 | |
| Journalist/Photographer | IWS | 110 | 0.877 | 0.033 | 0.898 |
|
| REEF/Snuba | 23 |
|
|
| 0.944 | |
| REGAL |
| 0.840 | 0.030 | 0.733 | 0.890 | |
| FS BERT | - | - | - | 0.954 | 0.990 | |
| Painter/Architect | IWS | 157 | 0.883 | 0.032 | 0.893 | 0.966 |
| REEF/Snuba | 23 |
|
| 0.874 | 0.947 | |
| REGAL |
| 0.876 | 0.034 |
|
| |
| FS BERT | - | - | - | 0.968 | 0.995 | |
| Professor/Physician | IWS | 238 | 0.860 | 0.042 |
|
|
| REEF/Snuba | 26 |
|
| 0.882 | 0.935 | |
| REGAL |
| 0.876 | 0.041 | 0.794 | 0.871 | |
| FS BERT | - | - | - | 0.951 | 0.994 | |
| Professor/Teacher | IWS |
| 0.785 | 0.030 | 0.760 |
|
| REEF/Snuba | 12 | 0.562 |
| 0.782 | 0.839 | |
| REGAL | 211 |
| 0.029 |
| 0.877 | |
| FS BERT | - | - | - | 0.938 | 0.982 | |
| Yelp | IWS | 87 | 0.799 | 0.047 | 0.747 | 0.830 |
| REEF/Snuba | 38 |
|
|
|
| |
| REGAL |
| 0.803 | 0.018 | 0.770 | 0.837 | |
| FS BERT | - | - | - | 0.960 | 0.992 | |
| macro-average | IWS | 140.833 |
| 0.041 |
|
|
| REEF/Snuba | 28.667 | 0.805 |
|
| 0.890 | |
| REGAL |
| 0.834 | 0.028 | 0.753 | 0.868 | |
| FS BERT | - | - | - | 0.9475 | 0.988 |
LFs denotes the total number of LFs selected/predicted by the model, not the number proposed. LM Acc and LM AUC represent the accuracy and area under the ROC curve, respectively, of the probabilistic labels produced by a Snorkel label model. For fully-supervised BERT models (denoted by FS BERT), accuracy and AUC are not computed with a label model.
FS BERT results for AG News taken from [30].
For fair comparison with IWS and REEF/Snuba, REGAL and FS BERT macro averages exclude AG News.
Statistical comparison of REGAL and IWS using the Mann–Whitney–Wilcoxon (MWW) test. The methods show no significant difference except on the Journalist/Photographer and Professor/Physician datasets. After Bonferroni correction, MWW shows that REGAL outperforms IWS on Professor/Physician and IWS is outperforms REGAL on Journalist/Photographer.
| Dataset | Higher Med. Acc. | MWW |
|---|---|---|
| Yelp | REGAL | 0.3438 |
| IMDB | IWS | 0.1926 |
| Journalist/Photographer |
|
|
| Professor/Teacher | REGAL | 0.2086 |
| Professor/Physician |
|
|
| Painter/Architect | IWS | 0.1438 |
Significant at p < 0.05 after Bonferroni correction;
significant at p < 0.01 after Bonferroni correction.
Effects of balancing data on model label model performance. We balanced data by calculating the total number of noisy label votes for each class and randomly replacing votes for dominant classes until all label distribution was approximately balanced. We measure change in total coverage as well as Accuracy and AUC for both Snorkel label models and a simple majority voting LF aggregator (denoted “MV”). Imbalance Ratio reflects the ratio of most labeled class: least labeled class. Note that rows with higher imbalance ratio have tend to see larger improvements in accuracy after balancing.
| Dataset | Model | Δ Accuracy | Δ AUC | MV Acc | Δ MV AUC | Δ Coverage | Imbalance Ratio |
|---|---|---|---|---|---|---|---|
|
|
| 0.011 | – | −0.034 | – | −0.154 | 2.245 |
|
|
| −0.002 | −0.014 | 0.008 | 0.001 | −0.107 | 1.896 |
|
| 0.002 | 0.000 | 0.000 | 0.000 | −0.002 | 1.053 | |
|
| 0.066 | −0.068 | 0.083 | −0.008 | −0.165 | 3.573 | |
|
|
| −0.001 | −0.013 | −0.012 | 0.001 | −0.112 | 2.492 |
|
| −0.003 | −0.004 | −0.004 | 0.000 | −0.006 | 1.493 | |
|
| −0.014 | 0.004 | 0.025 | −0.012 | −0.001 | 1.319 | |
|
|
| 0.033 | −0.014 | 0.022 | 0.007 | −0.136 | 3.969 |
|
| 0.001 | 0.000 | −0.003 | −0.003 | −0.004 | 1.340 | |
|
| −0.011 | −0.006 | 0.015 | −0.004 | −0.001 | 1.238 | |
|
|
| −0.010 | −0.008 | 0.006 | −0.001 | −0.002 | 1.170 |
|
| −0.004 | 0.001 | −0.007 | −0.002 | 0.000 | 1.499 | |
|
| −0.026 | −0.024 | 0.024 | −0.009 | 0.000 | 1.380 | |
|
|
| 0.120 | −0.033 | 0.146 | 0.075 | −0.253 | 7.109 |
|
| 0.008 | 0.000 | 0.000 | −0.008 | 0.000 | 1.012 | |
|
| −0.001 | −0.013 | 0.000 | −0.003 | 0.000 | 1.121 | |
|
|
| 0.085 | 0.061 | 0.060 | −0.007 | −0.140 | 3.285 |
|
| 0.003 | 0.002 | 0.001 | 0.000 | −0.008 | 1.226 | |
|
| 0.010 | 0.012 | 0.021 | −0.019 | −0.036 | 1.642 |
Top 6 unigram labeling functions from first 5 iterations of REGAL. In some cases, REGAL did not identify LFs for particular classes at some iterations, denoted by “-”.
| Dataset | Class | Iter. 1 | Iter. 2 | Iter. 3 |
|---|---|---|---|---|
|
|
| ‘ioc’, ‘olympic’, ‘knicks’, ‘nba’, ‘ncaa’, ‘medal’ | ‘mls’, ‘mvp’, ‘fc’, ‘sport’, ‘cowboys’, ‘golf’ | ‘102’, ‘35th’, ‘vs’, ‘2012’, ‘700th’, ‘ruud’ |
|
| ‘microprocessors’, ‘microprocessor’, ‘antivirus’, ‘workstations’, ‘passwords’, ‘mainframe’ | ‘xp’, ‘os’, ‘x86’, ‘sp2’, ‘worms’, ‘worm’ | ‘hd’, ‘666666’, ‘src’, ‘sd’, ‘br’, ‘200301151450’ | |
|
| ‘allawi’, ‘prime’, ‘ayad’, ‘iyad’, ‘kofi’, ‘sadr’ | ‘plo’, ‘holy’, ‘roh’, ‘troops’, ‘troop’, ‘mp’ | - | |
|
| ‘futures’, ‘indexes’, ‘trading’, ‘investors’, ‘traders’, ‘shares’ | ‘http’, ‘www’, ‘output’, ‘bp’, ‘dow’, ‘bhp’ | ‘ob’ | |
|
|
| ‘enchanting’, ‘errol’, ‘astaire’, ‘matthau’, ‘witherspoon’, ‘mclaglen’ | ‘garcia’, ‘ruby’, ‘1939’, ‘emily’, ‘myrna’, ‘poem’ | ‘delight’, ‘stellar’, ‘vivid’, ‘voight’, ‘burns’, ‘dandy’ |
|
| ‘dumbest’, ‘manos’, ‘lame’, ‘whiny’, ‘laughable’, ‘camcorder’ | ‘pointless’, ‘inept’, ‘inane’, ‘implausible’, ‘abysmal’, ‘cheap’ | ‘vomit’, ‘joke’, ‘morons’, ‘ugh’, ‘snakes’, ‘avoid’ | |
|
|
| ‘35mm’, ‘shoots’, ‘polaroid’, ‘head- shots’, ‘captures’, ‘portraiture’ | ‘exposures’, ‘kodak’, ‘nudes’, ‘viewer’, ‘imagery’, ‘colors’ | ‘shadows’, ‘macro’, ‘canvas’, ‘skill’, ‘poses’, ‘hobby’ |
|
| ‘corruption’, ‘government’, ‘cnn’, ‘previously’, ‘policy’, ‘stints’ | ‘governance’, ‘anchor’, ‘pbs’, ‘npr’, ‘democracy’, ‘bureau’ | ‘arabic’, ‘programme’, ‘elsewhere’, ‘economy’, ‘crisis’, ‘prior’ | |
|
|
| ‘galleries’, ‘collections’, ‘residencies’, ‘acrylic’, ‘plein’, ‘pastels’ | ‘impressionist’, ‘textures’, ‘strokes’, ‘flowers’, ‘figurative’, ‘brush’ | ‘palette’, ‘feelings’, ‘realism’, ‘emotion’, ‘realistic’, ‘filled’ |
|
| ‘soa’, ‘enterprise’, ‘bim’, ‘server’, ‘scalable’, ‘solutions’ | ‘infrastructure’, ‘methodologies’, ‘certifications’, ‘intelligence’, ‘teams’, ‘developer’ | ‘automation’, ‘computing’, ‘delivery’, ‘healthcare’, ‘initiatives’, ‘processing’ | |
|
|
| ‘banking’, ‘democratization’, ‘verification’, ‘cooperation’, ‘governance’, ‘b’ | ‘security’, ‘finance’, ‘macroeconomics’, ‘microeconomics’, ‘political’, ‘law’ | ‘acm’, ‘optimization’, ‘mechanical’, ‘metaphysics’, ‘computational’, ‘visualization’ |
|
| ‘specializes’, ‘alaska’, ‘takes’, ‘accepts’, ‘norfolk’, ‘ky’ | ‘speaks’, ‘aurora’, ‘carolinas’, ‘menorah’, ‘novant’, ‘affiliated’ | ‘vidant’, ‘anthonys’, ‘southside’, ‘fluent’, ‘hindi’, ‘osf’ | |
|
|
| ‘grades’, ‘ages’, ‘eighth’, ‘aged’, ‘graders’, ‘grade’ | ‘ratings’, ‘sixth’, ‘fifth’, ‘fun’, ‘fourth’, ‘tutoring’ | ‘pupils’, ‘favorite’, ‘cooking’, ‘volunteering’, ‘comparing’, ‘friends’ |
|
| ‘governance’, ‘constitutional’, ‘cooperation’, ‘regulation’, ‘democracy’, ‘finance’ | ‘econometrics’, ‘banking’, ‘economy’, ‘markets’, ‘entrepreneurship’, ‘economic’ | ‘globalization’, ‘optimization’, ‘firms’, ‘statistical’, ‘conflict’, ‘tax’ | |
|
|
| ‘phenomenal’, ‘yummy’, ‘delectable’, ‘favorite’, ‘amazing’, ‘atmosphere’ | ‘terrific’, ‘heavenly’, ‘notch’, ‘hearty’, ‘chic’, ‘stylish’ | ‘handmade’, ‘kale’, ‘cozy’, ‘carpaccio’, ‘tender’, ‘fave’ |
|
| ‘refund’, ‘pharmacy’, ‘disrespectful’, ‘refunded’, ‘warranty’, ‘rudest’ | ‘cancel’, ‘scam’, ‘confirmed’, ‘dealership’, ‘driver’, ‘appt’ | ‘receipt’, ‘confirm’, ‘reply’, ‘cox’, ‘clerk’, ‘policy’ |
Top 6 bigram labeling functions from the first iteration of REGAL. Cases where REGAL did not select any bigram LFs are denoted by “-”.
| Dataset | Class | Length 2 Rules |
|---|---|---|
|
|
| ‘2006 world’, ‘93 -’, ‘- star’, ‘half goals’, ‘world short’, ‘1 draw’ |
|
| ‘worm that’, ‘os x’, ‘/ l’, ‘data -’, ‘a flaw’, ‘chart)’ | |
|
| ‘labour party’, ‘labor party’, ‘s party’, ‘al -’, ‘bush “, ‘pro -’ | |
|
| ‘- wall’, ‘$ 46’, ‘up 0’, ‘$ 85’, ‘$ 43’, ‘a &’ | |
|
|
| ‘kelly and’, ‘claire danes’, ‘george burns’, ‘jack lemmon’, ‘michael jackson’, ‘hong kong’ |
|
| ‘just really’, ‘plain stupid’, ‘maybe if’, ‘avoid it’, ‘so stupid’, ‘stupid the’ | |
|
|
| - |
|
| ‘see less’, ‘twitter :’ | |
|
|
| ‘attended the’, ‘public collections’, ‘collections including’ |
|
| - | |
|
|
| ‘of financial’, ‘see less’, ‘film and’, ‘and society’, ‘fiction and’,’_b’ |
|
| ‘oh and’, ‘va and’, ‘la and’, ‘tn and’, ‘ca and’, ‘ok and’ | |
|
|
| ‘childhood education’, ‘early childhood’, ‘primary school’, ‘of 4’, ‘special education’, ‘rating at’ |
|
| ‘modeling and’, ‘and computational’, ‘climate change’, ‘and organizational’, ‘of government’, ‘nsf career’ | |
|
|
| ‘affordable and’, ‘food good’, ‘highly recommended’, ‘highly recommend’, ‘top notch’, ‘definitely recommend’ |
|
| ‘never again’, ‘never recommend’, ‘ever again’, ‘very bad’, ‘never going’, ‘my card’ |