| Literature DB >> 34266499 |
Olga Majewska1, Charlotte Collins2, Simon Baker2, Jari Björne3, Susan Windisch Brown4, Anna Korhonen2, Martha Palmer4.
Abstract
BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames.Entities:
Keywords: Text classification; Verb lexicon; VerbNet
Year: 2021 PMID: 34266499 PMCID: PMC8280585 DOI: 10.1186/s13326-021-00247-z
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1A visual representation of the BioVerbNet semantic classes. The shaded boxes represent the top-level classes, while the unshaded boxes represent the subclasses
Assignment of verbs into classes and sub-classes
| Top-level classes | Sub-classes | Verbs | |
|---|---|---|---|
| Automatically assigned verbs | 16 | 48 | 283 (29.4%) |
| Manually assigned verbs | - | - | 410 (42.7%) |
| Re-assigned within original sub-classes | - | - | 104 |
| Assigned to new sub-classes created within original top-level classes | - | 30 | 93 |
| Assigned to new top-level classes | 6 | 39 | 213 |
| Total assigned verbs | 22 | 117 | 693 (72.1%) |
| Non-assigned verbs | - | - | 268 (27.9%) |
| Total verbs | - | - | 961 (100%) |
Examples of common verbs with senses specific to the biological sciences
| Sense | Example | |
|---|---|---|
| To inactivate expression of a gene. | Eukaryotic cells express small noncoding RNAs to silence target genes | |
| To suppress the immune response. | Propathogenic cells dampen the early T cell response | |
| To combine with and remove reactive oxygen species. | Antioxidant properties of plants scavenge reactive oxygen species | |
| To present antigen to naïve lymphocytes, causing them | These antigens may prime an immune response | |
| to differentiate. | ||
| To transform one cell type into a different cell type. | Mash1 and Brn2 reprogram fibroblasts into neurons | |
| To inactivate expression of a gene through methylation. | A period of stimulation could imprint on a T cell a “biochemical memory” | |
| To undergo cell division into two or more daughter cells. | Cultures of Tetrahymena pyriformis were induced to divide synchronously | |
| To extract a cell population or substance in a pure form. | We used soft agar to isolate phototrophic bacteria |
Examples of differences in semantic roles of the arguments of the same verb (underlined) in VerbNet and BioVerbNet
| Source | Example sentence | Verb Frame |
|---|---|---|
| VerbNet | The couple | Agent V Location |
| BioVerbNet | Most parasites | Patient V {in} Location |
| VerbNet | The gardener | Agent V Patient {into} Product |
| BioVerbNet | Alga-free paramecia and symbiotic algae can | Agent {and} Co-Agent {can} V ADV |
| VerbNet | He | Agent V Theme |
| BioVerbNet | Plants | Agent V Source |
| VerbNet | The secretary | Agent V Theme |
| BioVerbNet | These viruses | Bio-Agent V Patient {in} Location |
Examples of syntactic annotation (verb class members underlined)
| Verb sub-class | Example sentence | Syntactic annotation |
|---|---|---|
| 1.1.2 Suppress | Amine groups | NP V NP |
| 1.3.0 Increase/Decrease | Hormonal stimuli | NP V |
| Antibody levels | NP V ADVP | |
| 2.2.2 Cleave | The activated caspases | NP V NP |
| The conjugated salts | NP V NP PP | |
| 2.3.0 Interact | Both drug classes | NP V |
| Estrogen may | NP V PP | |
| 4.1.1 Wash | Subepithelial mucous gland secretions | NP V NP |
| 4.2.0 Precipitate | Specific antisera | NP V NP |
| VITF-A and the viral capping enzyme | NP V PP | |
| 8.1.2 Chemically combine | Nonfunctional receptors could not | NP V |
| Curcumin can | NP V NP | |
| Lomefloxacin can | NP V PP | |
| 9.5.0 Decipher | We | NP V |
| We | NP V NP | |
| 10.2.0 Score | We | NP V NP PP |
| Clinicians | NP V NP ADVP | |
| 17.2.2 Downregulate gene expression | HDAC4 and MEF2C | NP V NP |
| MicroRNAs | NP V NP PP ADVP | |
| 20.1.3 Repair | Hematopoietic stem cells can | NP V NP |
| Adult zebra fish | NP V NP PP |
Summary statistics of the Hallmarks of Cancer (HOC) and the Chemical Exposure Assessment (CEA) datasets
| HOC | CEA | |||
|---|---|---|---|---|
| Document | Sentence | Document | Sentence | |
| Train | 1,303 | 12,279 | 2,555 | 25,307 |
| Dev | 183 | 1,775 | 384 | 3,770 |
| Test | 366 | 3,410 | 722 | 7,100 |
| 1,852 | 17,464 | 3,661 | 36,177 | |
Hyper-parameters used in our convolutional neural network
| Parameters | Values |
|---|---|
| Vector dimension | 200 |
| Filter sizes | 3,4 and 5 |
| Number of filters | 300 |
| Dropout probability | 0.5 |
| Minibatch size | 50 |
| Input size (in tokens) | 500 (documents), 100 (sentences) |
Evaluation results for the Hallmarks of Cancer task (HOC) text classification task
| Document classification | Sentence classification | |||||
|---|---|---|---|---|---|---|
| Model | Precision | Recall | Precision | Recall | ||
| Baseline (no retrofitting) | 77.8 | 51.7 | 62.1 | 56.8 | 30.7 | 39.9 |
| 22-classes retrofitted | 74.4 | 62.1 | 67.7* | 49.1 | 35.8 | |
| 117-subclasses retrofitted | 74.8 | 62.5 | 48.6 | 35.2 | 40.8* | |
The Baseline model is a skip-gram model without any retrofitting. All figures are micro-averages expressed as percentages (Bold denotes the best F1-score, * denotes statistically significant scores with respect to the baseline)
Evaluation results for the Chemical Exposure Assessment (CEA) text classification task
| Document classification | Sentence classification | |||||
|---|---|---|---|---|---|---|
| Model | Precision | Recall | Precision | Recall | ||
| Baseline (no retrofitting) | 89.5 | 87.1 | 88.3 | 66.2 | 62.8 | 64.5 |
| 22-classes retrofitted | 89.9 | 87.5 | 88.7* | 67.3 | 62.1 | |
| 117-subclasses retrofitted | 89.2 | 88.6 | 66.3 | 60.3 | 63.2* | |
Baseline model is a skip-gram model without any retrofitting. All figures are micro-averages expressed as percentages (Bold denotes the best F1-score, * denotes statistically significant scores with respect to the baseline)