| Literature DB >> 32349764 |
Ying Xiong1, Shuai Chen1, Haoming Qin1, He Cao1, Yedan Shen1, Xiaolong Wang1, Qingcai Chen1,2, Jun Yan3, Buzhou Tang4,5.
Abstract
BACKGROUND: Semantic textual similarity (STS) is a fundamental natural language processing (NLP) task which can be widely used in many NLP applications such as Question Answer (QA), Information Retrieval (IR), etc. It is a typical regression problem, and almost all STS systems either use distributed representation or one-hot representation to model sentence pairs.Entities:
Keywords: Clinical semantic textual similarity; Distributed representation; Gated network; One-hot representation
Mesh:
Year: 2020 PMID: 32349764 PMCID: PMC7191689 DOI: 10.1186/s12911-020-1045-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Fractional similarity interval distribution in the training, develop and test sets
Annotated examples
| Score | Example |
|---|---|
| 0 | |
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 |
Fig. 2Overview architecture of our distributed representation and one-hot representation fusion system based on gated network
Performance of systems on the clinical STS corpus of the BioCreative/OHNLP shared task in 2018
| Method | Score Interval | |||||
|---|---|---|---|---|---|---|
| [0,1] | [1, 2] | [2, 3] | [3, 4] | [4, 5] | Overall | |
| Baseline | ||||||
| One-hot | 0.5567 | 0.2311 | 0.0998 | 0.2409 | 0.1167 | 0.7939 |
| CNN | 0.3960 | −0.0850 | −0.0090 | 0.0370 | −0.0654 | 0.4444 |
| LSTM | 0.3920 | −0.2945 | 0.2088 | −0.0538 | −0.0303 | 0.4275 |
| BERT | 0.7613 | 0.1206 | 0.2635 | 0.2530 | 0.1210 | 0.8461 |
| Concatenation | ||||||
| CNN + one-hot | 0.5406 | 0.6917 | 0.1352 | 0.2539 | 0.0744 | 0.8083 |
| LSTM+one-hot | 0.5850 | 0.3415 | 0.2269 | 0.2173 | 0.2155 | 0.8030 |
| BERT+one-hot | 0.6684 | 0.3038 | 0.2309 | 0.2425 | 0.2203 | 0.8525 |
| Fusion (gated network) | ||||||
| CNN + one-hot | 0.6973 | 0.2324 | 0.1675 | 0.2336 | 0.0864 | 0.8442 |
| LSTM+one-hot | 0.6253 | 0.3583 | 0.1869 | 0.2550 | 0.1018 | 0.8379 |
| BERT+one-hot | 0.6872 | 0.1605 | 0.3238 | 0.2822 | 0.1666 | |
Fig. 3Mean Square Error (MSE) on fractional similarity intervals