| Literature DB >> 32859189 |
Yang An1, Jianlin Wang2, Liang Zhang3, Hanyu Zhao4, Zhan Gao5, Haitao Huang6, Zhenguang Du6, Zengtao Jiao7, Jun Yan7, Xiaopeng Wei1, Bo Jin8.
Abstract
BACKGROUNDS: Knowledge discovery from breast cancer treatment records has promoted downstream clinical studies such as careflow mining and therapy analysis. However, the clinical treatment text from electronic health data might be recorded by different doctors under their hospital guidelines, making the final data rich in author- and domain-specific idiosyncrasies. Therefore, breast cancer treatment entity normalization becomes an essential task for the above downstream clinical studies. The latest studies have demonstrated the superiority of deep learning methods in named entity normalization tasks. Fundamentally, most existing approaches adopt pipeline implementations that treat it as an independent process after named entity recognition, which can propagate errors to later tasks. In addition, despite its importance in clinical and translational research, few studies directly deal with the normalization task in Chinese clinical text due to the complexity of composition forms.Entities:
Keywords: Breast cancer; Cascade learning; Chinese clinical text mining; Treatment entity normalization
Mesh:
Year: 2020 PMID: 32859189 PMCID: PMC7456389 DOI: 10.1186/s12911-020-01216-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Illustration of clinical text, normalization examples and possible applications. a EHR data; b Clinical text from EHRs : an example of real clinical text and translated version; c Real-world data and standard entity; d Applied scenarios
An illustration example of TEN in a clinical text
| Clinical text | Real text: | ||||||||||
| Translated text: After admission, the relevant examination was improved, without chemotherapy | |||||||||||
| contraindications, and | |||||||||||
| reported nausea, vomiting and dysphagia, and provided Dolasetron mesylate injection. After | |||||||||||
| chemotherapy, there was no obvious side effect, and life sign was stable; | |||||||||||
| Characters sequence | ... | E | C | T | H | ... | |||||
| Standard entity | ... | O | EC-TH | EC-TH | EC-TH | EC-TH | EC-TH | EC-TH | O | O | ... |
Fig. 2Main architecture of PASCAL model. PASCAL consists of four modules: character embedding module, encoder module (containing a gated convolutional neural network to learn the shared representation with temporal relationship), pseudo cascade structure module (including the enhanced primary task TEN and an auxiliary task TER)
Fig. 3Detailed structure of encoder module: gated convolutional neural network (GCNN). GCNN consists of three key parts: convolutional block, gating block and residual connection
Performance comparison on a real-world breast cancer dataset
| Model | Precision | Recall | F1 |
|---|---|---|---|
| Softmax | |||
| Bi-LSTM | 0.8171±0.0143 | 0.8796±0.0264 | 0.8472±0.0221 |
| Bi-OnLSTM | 0.8316±0.0205 | 0.8978±0.0139 | 0.8635±0.0152 |
| TCN | 0.7135±0.0129 | 0.8218±0.0231 | 0.7638±0.0245 |
| GCNN | 0.8817±0.0117 | 0.9016±0.0210 | 0.8921±0.0124 |
| CRF | |||
| Bi-LSTM | 0.8682±0.0125 | 0.8905±0.0238 | 0.8792±0.0201 |
| Bi-OnLSTM | 0.8678±0.0187 | 0.8952±0.0145 | 0.8813±0.0168 |
| TCN | 0.8486±0.0089 | 0.9076±0.0214 | 0.8771±0.0179 |
| GCNN | 0.9628±0.0181 | 0.9535±0.0094 | |
| PASCAL (Softmax + CRF) | |||
| Bi-LSTM | 0.8931±0.0153 | 0.9121±0.0183 | 0.9025±0.0168 |
| Bi-OnLSTM | 0.9078±0.0149 | 0.9348±0.0156 | 0.9211±0.0175 |
| TCN | 0.8744±0.0102 | 0.9342±0.0192 | 0.9033±0.0149 |
| GCNN | 0.9413±0.0156 | ||
Fig. 4Accuracy comparison between PASCAL and Feedback [17]
Fig. 5Computational efficiency comparison of PASCAL with different encoders
Performance comparison with regard to different bias values
| Precision | Recall | F1 | |
|---|---|---|---|
| 0.5 | 0.9390 | 0.9769 | 0.9576 |
| 0.6 | 0.9347 | 0.9773 | 0.9555 |
| 0.7 | 0.9413 | 0.9770 | 0.9589 |
| 0.8 | 0.9402 | 0.9779 | 0.9587 |
| 0.9 |
Error cases about the breast cancer treatment normalization
Notes: The treatments are specifically extracted from the clinical context that describes the treatment process of the patient. Treatments in red color indicates the error cases on both the name and position of treatment