| Literature DB >> 32719838 |
Dongfang Xu1, Manoj Gopale2, Jiacheng Zhang3, Kris Brown4, Edmon Begoli4, Steven Bethard1.
Abstract
OBJECTIVE: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization.Entities:
Keywords: concept normalization; deep learning; generate-and-rank; natural language processing; unified medical language system
Mesh:
Year: 2020 PMID: 32719838 PMCID: PMC7566510 DOI: 10.1093/jamia/ocaa080
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Architecture of our proposed framework for concept normalization. The edges out of a search process indicate the number of matches necessary to follow the edge. Outlined nodes are terminal states that represent the predictions of the model. Candidate generator (a–e). Candidate reranker (f). BERT: Bidirectional Encoder Representations from Transformers; CUI: concept unique identifier; UMLS: Unified Medical Language System.
Dataset statistics of the medical concept normalization corpus
| Statistics | Train | Dev | Test |
|---|---|---|---|
| # of documents | 40 | 10 | 50 |
| # of mentions | 5334 | 1350 | 6925 |
| # of unique concepts | 1981 | 755 | 2579 |
|
| — | 53.48 | 50.76 |
|
| — | 32.37 | 29.85 |
|
| 2.32 | 2.00 | 3.13 |
|
| 2.69 | 1.79 | 2.69 |
|
| 3.30 | 1.04 | 2.77 |
The # of unseen mentions for dev indicates the # of mentions that do not appear in the training set but do appear in the dev set. The # of unseen mentions for test indicates # of mentions that do not appear in the training or dev set but do appear in the test set. The # of unseen concepts for dev indicates the # of mentions whose normalized concepts do not appear in the training set but do appear in the dev set. The # of unseen concepts for test indicates the # of mentions whose normalized concepts do not appear in the training or dev set but do appear in the test set. The # of CUI-less indicates the # of mentions that could not be mapped to any concepts in the ontology. The # of ambiguous mentions indicates the # of mentions that could be mapped to more than 1 concept in the dataset.
CUI: concept unique identifier.
Figure 2.Outputs of Lucene(b-e) in the candidate generator are fed as inputs into the candidate ranker(f). BERT: Bidirectional Encoder Representations from Transformers.
Accuracy of different systems on the dev and test sets
| Systems | Dev | Test | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Recall @30 (%) | Overall Accuracy (%) | Seen Accuracy (%) | Unseen Accuracy (%) | Recall @30 (%) | Overall Accuracy (%) | Seen Accuracy (%) | Unseen Accuracy (%) | ||
|
| — | — | — | 76.35 | — | — | |||
|
| Submitted run #1 | — | — | — | — | — | — | — | |
| Submitted run #2 | — | — | — | — | — | — | — | ||
| Submitted run #3 | — | — | — | — | — | — | — | ||
| Submitted run #3 (after fixing row-alignment bug) | — | — | — | — | — | — | — | ||
|
| Lucene(a + b) | 52.30 | 52.07 | 77.00 | 0 | 55.08 | 54.66 | 77.91 | 0 |
| Lucene(c) | 32.59 | 32.59 | 36.36 | 24.71 | 34.56 | 34.35 | 38.43 | 24.77 | |
| Lucene(d) | 62.22 | 53.48 | 58.49 | 43.02 | 62.86 | 53.26 | 57.20 | 43.98 | |
| Lucene(e) | 86.15 | 58.07 | 58.93 | 56.29 | 85.73 | 57.50 | 57.76 | 56.89 | |
| Lucene(c + d) | 58.74 | 55.56 | 60.35 | 45.54 | 60.06 | 56.43 | 60.95 | 45.82 | |
| Lucene(d + e) | 81.78 | 63.04 | 64.95 | 59.04 | 82.20 | 62.11 | 63.15 | 59.65 | |
| Lucene(c + d + e) | 78.30 | 65.11 | 66.81 | 61.56 | 79.21 | 65.29 | 66.90 | 61.49 | |
| Lucene(a + b + e) | 88.15. | 76.67 | 86.64 | 55.84 | 88.38 | 77.43 | 86.23 | 56.75 | |
| Lucene(a + b + c + d + e) | 87.41 | 78.96 | 87.62 | 60.87 | 87.87 | 79.25 | 86.89 | 61.30 | |
|
| Lucene(e) + BERT(f-e) | 86.15 | 78.96 | 84.88 | 66.59 | 85.73 | 77.36 | 83.78 | 62.26 |
| Lucene(a + b + e)+ BERT(f-e) | 88.15. | 83.13 | 91.05 | 66.59 | 88.38 | 82.06 | 90.41 | 62.46 | |
| Lucene(a + b + c + d + e) + BERT(f-e) | 87.41 | 83.56 | 91.02 | 67.96 | 87.87 | 82.75 | 90.30 | 64.99 | |
| Lucene(e) + BERT(f-e + ST) | 86.15 | 79.85 | 85.10 | 68.88 | 85.73 | 77.98 | 83.92 | 64.01 | |
| Lucene(c + d + e) + BERT(f-e + ST) | 78.30 | 75.41 | 77.77 | 70.48 | 79.21 | 75.00 | 78.59 | 66.57 | |
| Lucene(a + b + e)+ BERT(f-e + ST) | 88.15. | 84.05 | 91.71 | 68.05 | 88.38 | 82.90 | 90.90 | 64.10 | |
| Lucene(a + b + c + d + e)+ BERT(f-e + ST) | 87.41 | 84.44a | 91.57 | 69.57 | 87.87 | 83.56a | 90.80 | 66.52 | |
Accuracy indicates how often the top prediction is correct (for the Lucene-only rows, we take the first matched concept as the top prediction). Recall@30 indicates how often the correct candidate is within the first 30 matched candidate concepts.
BERT: Bidirectional Encoder Representations from Transformers.
The best performance.
Accuracy for each component of the candidate generator in our best complete system Lucene(a + b + c + d + e) + BERT(f-e + ST) on dev set
| Overall |
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| Components |
| Accuracy (%) | Recall@30 (%) |
| Accuracy |
| Accuracy (%) | Recall@30 (%) |
| Lucene(a + b) | 705 | 97.16 | 97.59 | 681 | 97.65 | 24 | 83.33 | 95.83 |
| Lucene(c) | 165 | 84.42 | 84.42 | 161 | 84.47 | 4 | 75 | 75 |
| Lucene(d) | 164 | 73.78 | 81.10 | 127 | 82.68 | 37 | 43.24 | 75.68 |
| Lucene(e) | 315 | 38.41 | 69.84 | 6 | 16.67 | 309 | 38.83 | 70.87 |
| Lucene(a + b + c + d + e) | 1350 | 78.96 | 87.41 | 976 | 92.93 | 374 | 42.51 | 72.99 |
Accuracy indicates how often the first matched candidate concept is correct. Recall@30 indicates how often the correct candidate is within the first 30 matched candidate concepts. the size of the candidate concepts. number of mentions predicted by each component.
BERT: Bidirectional Encoder Representations from Transformers; ST: semantic type regularizer.
Accuracies of our proposed architectures and their oracle versions on dev set
| Overall |
| |||
|---|---|---|---|---|
| System | Accuracy (%) |
| Accuracy | Recall@30 (%) |
| Lucene(a + b + c + d + e) + BERT(f-e + ST) | 84.44 | 374 | 62.23 | 72.99 |
| Lucene(a + b + c + d + e) + BERT(f-e + ST) (Oracle CandGen) | 88.07 | 374 | 75.40 | 100 |
| Lucene(e) + BERT(f-e + ST) | 79.85 | 1327 | 80.11 | 86.51 |
| Lucene(e) + BERT(f-e + ST) (Oracle CandGen) | 89.19 | 1327 | 89.60 | 100 |
Accuracy indicates how often the first matched candidate concept is correct. Recall@30 indicates how often the correct candidate is within the first 30 matched candidate concepts. Oracle CandGen indicates that we artificially inject the correct concept into the candidate generator’s list if it was not there when .
BERT: Bidirectional Encoder Representations from Transformers; ST: semantic type regularizer.
Figure 3.Predictions for mention “a right above-knee amputation” and their rankings from the candidate generator (CG), candidate generator + BERT-based ranker + ST (+f-e +ST), and candidate generator + BERT-based ranker (+f-e). BERT: Bidirectional Encoder Representations from Transformers; CUI: concept unique identifier; CG: candidate generator; ST: semantic type regularizer.
Figure 4.Predictions for mention “right calf” and their rankings. CG: candidate generator; CUI: concept unique identifier; MRI: magnetic resonance imaging; ST: semantic type regularizer.