| Literature DB >> 31881967 |
Xue Shi1, Dehuan Jiang1, Yuanhang Huang1, Xiaolong Wang1, Qingcai Chen1, Jun Yan2, Buzhou Tang3.
Abstract
BACKGROUND: Family history (FH) information, including family members, side of family of family members (i.e., maternal or paternal), living status of family members, observations (diseases) of family members, etc., is very important in the decision-making process of disorder diagnosis and treatment. However FH information cannot be used directly by computers as it is always embedded in unstructured text in electronic health records (EHRs). In order to extract FH information form clinical text, there is a need of natural language processing (NLP). In the BioCreative/OHNLP2018 challenge, there is a task regarding FH extraction (i.e., task1), including two subtasks: (1) entity identification, identifying family members and their observations (diseases) mentioned in clinical text; (2) family history extraction, extracting side of family of family members, living status of family members, and observations of family members. For this task, we propose a system based on deep joint learning methods to extract FH information. Our system achieves the highest F1- scores of 0.8901 on subtask1 and 0.6359 on subtask2, respectively.Entities:
Keywords: Deep joint learning; Entity identification; Family history extraction; Family history information
Mesh:
Year: 2019 PMID: 31881967 PMCID: PMC6933634 DOI: 10.1186/s12911-019-0995-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Overview architecture of our deep joint learning model
Rules used to determine the LS of an FM
| Alive | Healthy | LS score |
|---|---|---|
| No | * | 0 |
| Yes | NA | 2 |
Hyperparameters used in our experiments
| Hyperparameters | Value |
|---|---|
| Dimension of word embeddings | 50 |
| Dimension of POS embeddings | 20 |
| Dimension of label embeddings | 10 |
| Number of LSTM hidden states | 100 |
| Optimizer | SGD |
| Learning rate | 0.005 |
| Dropout rate in entity recognition | 0.5 |
| Dropout rate in relation extraction | 0.3 |
| Epoch number | 20/25 |
| Combination coefficient ( | 0.4/0.5/0.6 |
Performance of the pipeline method and the joint method
| Subtask | Method | Three types | Five types | ||||
|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | ||
| FM information Extraction | Pipeline | 0.8566 | 0.8825 | 0.8457 | 0.8805 | ||
| Joint | 0.9030 | 0.9058 | |||||
| Relation Extraction | Pipeline | 0.5556 | 0.5773 | 0.5662 | 0.5976 | 0.6247 | 0.6109 |
| Joint | |||||||
All highest values are highlighted in bold
Effect of the combination coefficient (α) on the deep joint learning method (F1-score)
| Subtask | FM information extraction | Relation extraction | ||||||
|---|---|---|---|---|---|---|---|---|
| Combination coefficient (α) | Validation set | Test set | Validation set | Test set | ||||
| 3 types | 5 types | 3 types | 5 types | 3 types | 5 types | 3 types | 5 types | |
| 0.4 | 0.8743 | 0.8693 | 0.8825 | 0.8828 | 0.5580 | 0.4484 | ||
| 0.5 | 0.8753 | 0.8718 | 0.8852 | 0.8883 | 0.6316 | 0.6897 | 0.4534 | 0.5372 |
| 0.6 | 0.8747 | 0.8839 | 0.5543 | 0.6769 | 0.4356 | 0.5132 | ||
All highest values are highlighted in bold
Performance of the deep joint learning method on each type of FM information and relation
| Type | P | R | F | |
|---|---|---|---|---|
| FM information recognition | FM (Maternal) | 0.9412 | 0.9552 | 0.9481 |
| FM (Paternal) | 0.9286 | 0.7800 | 0.8478 | |
| FM (NA) | 0.8452 | 0.8875 | 0.8659 | |
| Observation | 0.8753 | 0.9146 | 0.8945 | |
| LSa | 0.8418 | 0.9116 | 0.8753 | |
| Overall | 0.8775 | 0.9030 | 0.8901 | |
| Relation Extraction | FM-LS | 0.6084 | 0.6273 | 0.6177 |
| FM- Observation | 0.6451 | 0.6451 | 0.6451 | |
| Overall | 0.6327 | 0.6392 | 0.6359 |
aThe results are obtained according to the gold LS mentions, not the gold standard LSs for final evaluation, which are not provided. Therefore, the overall performance on FM information recognition does not cover LS