| Literature DB >> 35104909 |
Areej Jaber1,2, Paloma Martínez2.
Abstract
BACKGROUND: Abbreviations are considered an essential part of the clinical narrative; they are used not only to save time and space but also to hide serious or incurable illnesses. Misreckoning interpretation of the clinical abbreviations could affect different aspects concerning patients themselves or other services like clinical support systems. There is no consensus in the scientific community to create new abbreviations, making it difficult to understand them. Disambiguate clinical abbreviations aim to predict the exact meaning of the abbreviation based on context, a crucial step in understanding clinical notes.Entities:
Mesh:
Year: 2022 PMID: 35104909 PMCID: PMC9246508 DOI: 10.1055/s-0042-1742388
Source DB: PubMed Journal: Methods Inf Med ISSN: 0026-1270 Impact factor: 1.800
Some examples of how abbreviations are formed
| Rule | Abbreviation | Sense |
|---|---|---|
| Truncating the end of long form | DIP | |
| First letter initialization from each word | VBG | |
| Syllabic initialization | US | |
| Combination if the beginning of some of the words of long form | Ad lib | |
| Symbols/synonyms substitution or initialization | T3 | Triiodothyronine |
Sample of University of Minnesota (UMN) data set abbreviations with its number of senses and distributions
| Abbreviations | Sentences | Tokens | Senses | Senses no. | Senses (%) |
|---|---|---|---|---|---|
| AMA | 2,881 | 37,887 | Against medical advice | 444 | 88.8 |
| Advanced maternal age | 31 | 6.2 | |||
| Antimitochondrial antibody | 25 | 5.0 | |||
| BAL | 3,267 | 38,483 | Bronchoalveolar lavage | 457 | 91.4 |
| Blood alcohol level | 43 | 8.6 | |||
| OTC | 6,173 | 37,356 | Over the counter | 469 | 93.8 |
| Ornithine transcarbamoylase | 31 | 6.2 |
Fig. 1Pie chart of senses and number of examples relation, showing the frequency and the percentage of each sense in the University of Minnesota (UMN) data set.
The pretrained models architecture is used in this study
| Characteristic | No. |
|---|---|
| Layers | 12 |
| Hidden units | 768 |
| Self-attention heads | 12 |
| Total trainable parameters | 110M |
Fig. 2An example of input representation for one sequence including [CLS], [SEP], and [PAD] tokens, in addition to added segments, attention mask, and embedding layers.
Fig. 3The architecture of the proposed model.
Accuracy results for the University of Minnesota (UMN) data set
| Pretrained | Accuracy (%) | ||
|---|---|---|---|
| Training | Validation | Test | |
| Bio_Clinical | 98.85 | 98.97 | 98.99 |
| BlueBERT | 98.46 | 98.73 | 98.75 |
| MS_BERT | 98.98 | 99.11 | 99.13 |
Note: Slightly different between the three pretrained models.
Accuracy results for the University of Minnesota (UMN) data set compared with several previous works
| Methods | Accuracy (%) |
|---|---|
| Multiclassifier | |
|
Convolutional Neural Network (CNN)
| 77.83 |
|
CLASS-GATOR
| 76.92 |
| One-fits-all classifier | |
|
ELMo + Topic
| 70.41 |
|
Latent Meaning Cells (LMC)
| 71.00 |
|
Candidate Classification
| 98.39 |
| MS_BERT (our approach) | 99.13 |
Note: Our model achieves the state-of-the-art with MS_BERT pretrained model.