| Literature DB >> 34961490 |
Xuan Gu1,2, Zhengya Sun3,4, Wensheng Zhang1,2.
Abstract
BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions.Entities:
Keywords: Composition driven; Medical consultation; Named entity recognition; Symptom phrase recognition
Mesh:
Year: 2021 PMID: 34961490 PMCID: PMC8714445 DOI: 10.1186/s12911-021-01716-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The architecture of the medical QA systems
Cases of extraction results by word matching
Fig. 2The workflow of ComD
Some basic composition forms of Chinese symptom phrases
Fig. 3Description of the skip-gram model. The model used in Word2Vec to find an optimal representation to predict the surrounding context of a target word. Consider a standard symptom phrases from the symptom dictionary, (ENG: “supraclavicular lymph nodes were not palpable and enlarged”). The example highlights the window around (ENG: “lymph node”), organs that produce immune cells for fighting infections. The target word, (ENG: “lymph node”), is linked to each of its neighboring words and the pairs are fed into the network. The learning process optimizes the probability of predicting the contextual words of (ENG: “lymph node”)
An example of interaction scores between the internals and the boundaries
Ablation study results on curated data from MedConSult
| Method | Macro/% | Micro/% | |||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | Precision | Recall | ||||
| 0.6 | ComD | 77.82 | 78.17 | 77.99 | 77.62 | 78.17 | 77.89 |
| ComD-NoInt | 61.62 | 61.97 | 61.80 | 61.54 | 61.97 | 61.75 | |
| ComD-NoPos | 16.20 | 16.20 | 16.20 | 16.20 | 16.20 | 16.20 | |
| 0.7 | ComD | 75.00 | 75.35 | 75.18 | 74.83 | 75.35 | 75.09 |
| ComD-NoInt | 51.06 | 51.41 | 51.23 | 51.05 | 51.41 | 51.23 | |
| ComD-NoPos | 9.15 | 9.15 | 9.15 | 9.15 | 9.15 | 9.15 | |
| 0.8 | ComD | 61.62 | 61.97 | 61.80 | 61.54 | 61.97 | 61.75 |
| ComD-NoInt | 26.41 | 26.76 | 26.58 | 26.57 | 26.76 | 26.67 | |
| ComD-NoPos | 5.63 | 5.63 | 5.63 | 5.63 | 5.63 | 5.63 | |
| 1.0 | ComD | 60.92 | 61.27 | 61.09 | 60.84 | 61.27 | 61.05 |
| ComD-NoInt | 15.14 | 15.49 | 15.31 | 15.38 | 15.49 | 15.44 | |
| ComD-NoPos | 2.11 | 2.11 | 2.11 | 2.11 | 2.11 | 2.11 | |
Performance comparison of the proposed and baseline methods on the MedConSult
| Method | Macro/% | Micro/% | |||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | Precision | Recall | ||||
| 0.6 | ComD | 41.02 | 39.01 | ||||
| ComD-Character | 36.23 | 33.37 | 34.74 | 34.13 | 13.62 | 19.47 | |
| BERT-CRF | 16.57 | 24.52 | 15.32 | 23.13 | |||
| BiLSTM-CRF | 36.50 | 29.58 | 32.68 | 34.01 | 8.43 | 13.51 | |
| BDMM-based | 8.23 | 12.15 | 9.81 | 5.92 | 8.58 | 7.00 | |
| Dictionary-based | 13.52 | 7.72 | 9.83 | 13.76 | 7.72 | 9.89 | |
| 0.7 | ComD | 33.53 | 31.11 | ||||
| ComD-Character | 28.36 | 25.76 | 27.00 | 26.30 | 10.49 | 15.00 | |
| BERT-CRF | 13.05 | 19.38 | 12.03 | 18.21 | |||
| BiLSTM-CRF | 32.00 | 25.80 | 28.57 | 26.18 | 7.51 | 11.67 | |
| BDMM-based | 7.14 | 10.46 | 8.49 | 5.10 | 7.37 | 6.02 | |
| Dictionary-based | 9.51 | 5.69 | 7.12 | 9.48 | 5.38 | 6.86 | |
| 0.8 | ComD | 27.45 | 25.23 | ||||
| ComD-Character | 21.81 | 19.44 | 20.55 | 19.78 | 7.89 | 11.28 | |
| BERT-CRF | 9.48 | 14.10 | 8.82 | 13.35 | |||
| BiLSTM-CRF | 19.25 | 14.94 | 16.82 | 14.99 | 3.61 | 5.82 | |
| BDMM-based | 6.17 | 8.77 | 7.25 | 4.22 | 6.15 | 5.00 | |
| Dictionary-based | 8.65 | 5.14 | 6.44 | 8.72 | 4.94 | 6.31 | |
| 1.0 | ComD | 25.05 | 23.37 | ||||
| ComD-Character | 19.42 | 17.26 | 18.28 | 17.61 | 7.03 | 10.04 | |
| BERT-CRF | 9.11 | 13.57 | 8.53 | 12.91 | |||
| BiLSTM-CRF | 15.00 | 12.52 | 13.65 | 12.59 | 3.03 | 4.89 | |
| BDMM-based | 5.94 | 8.54 | 7.01 | 4.10 | 5.98 | 4.86 | |
| Dictionary-based | 8.65 | 5.13 | 6.44 | 8.72 | 4.94 | 6.31 | |
The best result with bold font for each parameter/model/characteristic
Fig. 4Performance comparison between JSD-based and other distance measures
Fig. 5sensitivity comparison
Typical Case