| Literature DB >> 35808404 |
Hyeong-Ryeol Baek1, Yong-Suk Choi2.
Abstract
Sentence-level relation extraction (RE) has a highly imbalanced data distribution that about 80% of data are labeled as negative, i.e., no relation; and there exist minority classes (MC) among positive labels; furthermore, some of MC instances have an incorrect label. Due to those challenges, i.e., label noise and low source availability, most of the models fail to learn MC and get zero or very low F1 scores on MCs. Previous studies, however, have rather focused on micro F1 scores and MCs have not been addressed adequately. To tackle high mis-classification errors for MCs, we introduce (1) a minority class attention module (MCAM), and (2) effective augmentation methods specialized in RE. MCAM calculates the confidence scores on MC instances to select reliable ones for augmentation, and aggregates MCs information in the process of training a model. Our experiments show that our methods achieve a state-of-the-art F1 scores on TACRED as well as enhancing minority class F1 score dramatically.Entities:
Keywords: data augmentation; minority class; relation extraction
Mesh:
Year: 2022 PMID: 35808404 PMCID: PMC9269806 DOI: 10.3390/s22134911
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Top seven classes in TACRED training dataset ordered by the level of label noise in descending order (a) and those ordered by the number of correct instances in ascending order (b). per and org are the abbreviation of person and organization, Noise denotes the the level of label noise for each class which is calculated by , and Correct denotes the number of correct labels for each class. Noisy labels, i.e., wrong labels are determined by the refined annotation [9]. Four classes marked in bold font suffer both of noise label and low source availability regime, i.e., MC. MC instances are totally 227 out of 68,124 training instances (0.33%) and the positive class which has most instances, 2443, is person:title (3.6%).
| (a) | |
|---|---|
|
|
|
|
|
|
| per:countries_of_residence | 80.7% |
|
|
|
| per:other_family | 68.7% |
|
|
|
| per:cities_of_residence | 65.8% |
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
| per:country_of_birth | 15 |
|
|
|
| per:stateorprovince_of_birth | 29 |
| per:stateorprovince_of_death | 33 |
|
|
|
Examples of traininig dataset from TACRED. The relation between [Entity1] and [Entity2] is annotated as shown in the TACRED label column.
| Sentence | TACRED Label | Correct? |
|---|---|---|
| Kaiser’s parents had emigrated in 1905 from Ukraine, then part of | per:country of death | No |
| The president told | org:dissolved | No |
| org:dissolved | Yes |
Figure 1Overall architecture of our model: (left) aggregation of the main vector and the weighted sum of the value vectors and (right) incorporating MCs information into the value vector of corresponding MC. Following [31,32], special tokens (@, #) are used as entity indicators and added at before and after [Entity1] and [Entity2] tokens, respectively. We also trained a model to predict MC using its value vector alone and induced the model to align MC and its Ref vector. The representation vectors of Refs is denoted as .
Figure 2Workflow for the selective augmentation of MC.
Training dataset statistics. We list the number of relations (# Rel), MC instances (# MC), and no relation instances (# N/A) with the percentage.
| Datasets | # Rel | # MC (%) | # N/A (%) | # Total |
|---|---|---|---|---|
| TACRED | 42 | 227 (0.33) | 55,112 (81) | 68,124 |
The average number of models that correctly predict for each class.
| Relation Type | Average Number of Models |
|---|---|
| per:country_of_death | 0 |
| org:member_of | 0.1 |
| org:dissolved | 0.5 |
| org:shareholders | 1.3 |
| per:country_of_birth | 1.4 |
| org:members | 1.7 |
| per:alternate_names | 2.5 |
| per:other_family | 4.7 |
| org:parents | 10.6 |
| per:stateorprovince_of_death | 12.1 |
| org:subsidiaries | 13.5 |
| per:city_of_death | 14.9 |
| per:cause_of_death | 16.3 |
| org:founded_by | 17.8 |
| per:date_of_death | 18.3 |
| per:city_of_birth | 19.6 |
| org:country_of_headquarters | 20.3 |
| per:children | 20.4 |
| org:political/religious_affiliation | 21 |
| per:parents | 21.2 |
| per:countries_of_residence | 22.2 |
| per:religion | 23.6 |
| per:siblings | 23.9 |
| org:number_of_employees/members | 25.1 |
| per:stateorprovinces_of_residence | 25.2 |
| per:stateorprovince_of_birth | 25.3 |
| per:cities_of_residence | 25.3 |
| per:schools_attended | 27.4 |
| per:origin | 29.2 |
| per:spouse | 30 |
| per:employee_of | 30.9 |
| org:stateorprovince_of_headquarters | 34.7 |
| org:city_of_headquarters | 35.2 |
| per:date_of_birth | 38.1 |
| per:charges | 38.4 |
| org:website | 41.5 |
| org:alternate_names | 41.7 |
| org:founded | 42.1 |
| org:top_members/employees | 42.4 |
| per:title | 42.5 |
| per:age | 44.9 |
| no_relation | 48.4 |
The test scores on TACRED and Rev-TACRED. Results with * are from [6].
| Data | Model | F1 | Ma. F1 | MC F1 | W. MC F1 |
|---|---|---|---|---|---|
| TACRED | C-GCN | 67.3 | 49.5 | 17.4 | - |
| SpanBERT * | 70.8 | 56.1 | 19.2 | - | |
| KnowBERT * | 71.5 | 57.6 | 12.5 | - | |
| LUKE | 72.7 | 58.9 | 3.8 | - | |
| RE-marker | 74.5 | 62 | 12.2 | - | |
| RE-MC ( | 75.1 | 62.1 | 24.1 | - | |
| RE-MC ( |
|
|
| - | |
| RE-MC ( | 74.6 | 62.5 | 26.9 | - | |
| Rev-TACRED | C-GCN | 74.8 | 55.5 | 0 | 0 |
| SpanBERT * | 78 | 63.7 | 21.4 | 16.6 | |
| KnowBERT * | 79.3 | 63.4 | 0 | 0 | |
| LUKE | 81.5 | 67 | 14.3 | 11 | |
| RE-marker | 82.9 | 70.8 | 24 | 24.9 | |
| RE-MC ( |
| 71.8 | 47.1 | 53.3 | |
| RE-MC ( | 84.7 |
| 44 | 51.8 | |
| RE-MC ( | 83.3 | 70 |
|
|
The test scores on CRE. A model with a higher Acc− score, and a smaller gap (Diff.) between Acc+ and Acc− is considered more robust to heuristic methods, i.e., spurious association. Results with are from [29].
| Model | Acc | Acc+ | Acc− | Diff. |
|---|---|---|---|---|
| SpanBERT | 63.5 |
| 42.5 | 47.2 |
| KnowBERT | 72.4 | 84.2 | 62.9 | 21.3 |
| LUKE |
| 87.3 | 75.5 | 11.8 |
| RE-marker | 78.6 | 87.5 | 71.4 | 16.1 |
| RE-MC( | 80.2 | 84.8 |
|
|
90% confidence interval of the differences between MC F1 scores of models. L.B., U.B. and M denotes the lower bound, upper bound and median value, respectively.
| L.B. | U.B. | M | |
|---|---|---|---|
| Ours—LUKE | 0 | 42.9 | 21.2 |
| Ours—RE-Marker | 0 | 41.9 | 21.2 |
Figure 3Distribution of the bootstrapping results. We calculated the difference between MC F1 scores of ours and LUKE (a) and RE-marker (b), respectively. X-axis represents the difference between MC F1 scores and Y-axis represents the frequency. The value of lower bound and upper bound (solid line), and median (dotted line) under 90% confidence level is marked in the figures.
Performance comparison for ablation study. w/o Aug denotes the removal of augmentation; w/o Add denotes the removal of additional training; and w/o LSR denotes removal of LSR when additional training.
| Model | F1 | Ma. F1 | MC F1 |
|---|---|---|---|
| RE-MC ( |
|
|
|
|
| 84.6 | 70.9 | 9.1 |
|
| 83.3 | 68 | 0 |
|
| 84.2 | 70 | 27.6 |
Figure 4Augmentation settings and F1 scores on Rev-TACRED test and dev datasets. Y-axis is F1; X-axis is scale factor N; legend S is the proportion of the token replacements; and MC boot. F1 in plot (3, 2) denotes the bootstrap mean of MC F1 score.
Hyper parameters. CE denotes and AG denotes .
| Name | Value |
|---|---|
| Maximum word length | 512 |
| Mini batch size | 4 |
| Learning rate | 5 × 10 |
| Optimizer | AdamW |
| Warmup steps | the first 10% of steps of the first epoch |
| Weight decay | 1 × 10 |
| Initial training epochs | 5 |
| Additional training epochs | 6 |
| Label smoothing | 0.3 |
| Label smoothing | 0.3 |
Selected model implementation details for TACRED.
| ( | ( | ( | |
|---|---|---|---|
| # Aug. | 429 | 901 | 1814 |
|
| 114 | 228 | 454 |
|
| 106 | 212 | 462 |
|
| 99 | 231 | 442 |
|
| 110 | 230 | 456 |
| # Total | 68,424 | 68,896 | 69,789 |
MCs distribution. Train and Test is that of TACRED and R- indicates Revised TACRED. Aug. indicates augmentation which of values was added to the original training dataset for our final model (RE-MC).
| Train | Test | R-Test | R-Dev | |
|---|---|---|---|---|
| per:country of death | 6 | 9 | 10 | 47 |
| org:member of | 122 | 18 | 4 | 7 |
| org:dissolved | 23 | 2 | 1 | 1 |
| org:shareholders | 76 | 13 | 3 | 35 |