| Literature DB >> 28657162 |
Benedikt F H Becker1, Paul Avillach1,2, Silvana Romio1,3, Erik M van Mulligen1, Daniel Weibel1, Miriam C J M Sturkenboom1,4, Jan A Kors1.
Abstract
BACKGROUND: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process.Entities:
Keywords: UMLS; concept identification; database extraction; multiple medical vocabularies; semantic operations
Mesh:
Year: 2017 PMID: 28657162 PMCID: PMC5575526 DOI: 10.1002/pds.4245
Source DB: PubMed Journal: Pharmacoepidemiol Drug Saf ISSN: 1053-8569 Impact factor: 2.890
Figure 1Key phases of CodeMapper (top) and the usage of information from the UMLS Metathesaurus, exemplified by the concept for Cough with CUI C0010200 (bottom). Terms from the Metathesaurus drive the automatic identification of concepts in the free‐text case definition. Hierarchical information about concepts in the Metathesaurus is used to retrieve related concepts during revision of the mapping. Information in the Metathesaurus is used to project the selected concepts to codes from the targeted vocabularies.
Figure 2(a) The second screen of the CodeMapper application provides operations to revise the concepts of a mapping. The mapping is displayed as a table. The cells show the code names from the vocabulary stated in the column that correspond to the concept of the row. Individual codes are shown when hovering the terms. The balloons in the last column indicate the number of comments attached to a concept. (b) Example of the operation for concept expansion: A list of concepts that are related to pertussis as more specific is displayed for the selection and insertion in the mapping. [Color figure can be viewed at wileyonlinelibrary.com]
Number of words in the case definitions and number of codes in the reference set. The numbers of exclusion codes are given in brackets
| Case definition (word count) | Codes | ||||
|---|---|---|---|---|---|
| Event | ICD‐9 | ICD‐10 | ICPC‐2a | Read‐2 | |
| Acute pancreatitis | 49 | 1 (0) | 6 (0) | 1 (0) | 7 (0) |
| Bladder cancer | 87 | 12 (0) | 12 (0) | 1 (3) | 91 (0) |
| Hemorrhagic stroke | 48 | 3 (2) | 22 (2) | 1 (2) | 36 (0) |
| Ischemic stroke | 53 | 10 (0) | 11 (0) | 2 (1) | 20 (0) |
| Acute myocardial infarction | 39 | 11 (1) | 7 (0) | 1 (6) | –b |
| Pancreatic cancer | 19 | 8 (0) | 9 (0) | 1 (1) | 109 (0) |
| Ventricular arrhythmia | 234 | 5 (0) | 5 (0) | 1 (1) | 27 (0) |
| Sum | 529 | 50 (3) | 72 (2) | 8 (14) | 290 (0) |
| Average | 75.57 | 7.14 (0.43) | 10.29 (0.29) | 1.14 (2.0) | 48.33 (0.0) |
Additional text‐based queries for IPCI database.
Text‐based query only for GePaRD database.
Figure 3Automatic evaluation of CodeMapper. Reference code sets were created manually for each targeted vocabulary from the free‐text case definition of an event. The baseline mappings and expansion steps were generated automatically from the same case definition using the operations available in CodeMapper.
Figure 4Categories of false negatives and false positives in the error analysis. Two codes are siblings if they are associated with the same concept.
Number of concepts and performance measures of the mappings in the evaluation. Numbers per vocabularies are macro‐averages over all events
| Revision (number of concepts) | ICD‐9 | ICD‐10 | ICPC‐2 | Read‐2 | Average | |
|---|---|---|---|---|---|---|
| Baseline (46) | Sensitivity | 0.300 | 0.195 | 0.357 | 0.131 | 0.246 |
| PPV | 0.387 | 0.380 | 0.500 | 0.411 | 0.420 | |
| Expansion step 1 (183) | Sensitivity | 0.858 | 0.848 | 1.000 | 0.568 | 0.818 |
| PPV | 0.483 | 0.558 | 0.762 | 0.729 | 0.633 | |
| Expansion step 2 (297) | Sensitivity | 0.914 | 1.000 | 1.000 | 0.846 | 0.940 |
| PPV | 0.463 | 0.509 | 0.762 | 0.749 | 0.621 | |
| Expansion step 3 (335) | Sensitivity | 0.929 | 1.000 | 1.000 | 0.882 | 0.953 |
| PPV | 0.462 | 0.498 | 0.762 | 0.742 | 0.616 |
Number of false‐positive codes after three expansion steps by vocabulary and error category, and their percentage of all false‐positive codes
| Vocabulary | FP category | Count | Percentage |
|---|---|---|---|
| ICD‐9 CM | With TP sibling | 52 | 22.2 |
| No TP sibling | 22 | 9.4 | |
| ICD‐10 | With TP sibling | 66 | 28.2 |
| No TP sibling | 30 | 12.8 | |
| ICPC‐2 | With TP sibling | 3 | 1.3 |
| No TP sibling | 1 | 0.4 | |
| READ‐2 | With TP sibling | 43 | 18.4 |
| No TP sibling | 17 | 7.3 | |
| Overall | With TP sibling | 164 | 70.1 |
| No TP sibling | 70 | 29.9 |
Number of false‐negative codes after three expansion steps by vocabulary and error category, and their percentage of all false‐negative codes
| Vocabulary | FN category | Count | Percentage |
|---|---|---|---|
| READ‐2 | No sibling in reference | 19 | 54.3 |
| Not in UMLS | 11 | 31.4 | |
| ICD‐9 CM | No sibling in reference | 5 | 14.3 |
| Overall | No sibling in reference | 24 | 68.6 |
| Not in UMLS | 11 | 31.4 |